Tuesday, 21 May 2019

Google Vision for PDF

I need to send a PDF file to Google Vision to extract and return text. From documentation I understood that DPF file must be located on Google Storage, so I am putting the file to my Google Storage bucket like this:

require '../vendor/autoload.php';

use Google\Cloud\Storage\StorageClient;

$storage = new StorageClient([
    'keyFilePath' => '/my-keyfile.json',
    'projectId' => PROJECT_ID
]);

$bucket = $storage->bucket(BUCKET_NAME);

$bucket->upload(
    fopen($_SESSION['local_pdf_url'], 'r')
);

It works. After I redirect to another page that is suppose to get that file to Vision, and that's where it fails. I found an example function. Here's the code:

require '../vendor/autoload.php';

use Google\Cloud\Storage\StorageClient;
use Google\Cloud\Vision\V1\AnnotateFileResponse;
use Google\Cloud\Vision\V1\AsyncAnnotateFileRequest;
use Google\Cloud\Vision\V1\Feature;
use Google\Cloud\Vision\V1\Feature\Type;
use Google\Cloud\Vision\V1\GcsDestination;
use Google\Cloud\Vision\V1\GcsSource;
use Google\Cloud\Vision\V1\ImageAnnotatorClient;
use Google\Cloud\Vision\V1\InputConfig;
use Google\Cloud\Vision\V1\OutputConfig;

$storage = new StorageClient([
    'keyFilePath' => '/my-keyfile.json',
    'projectId' => PROJECT_ID
]);

$path = 'gs://my-bucket/'.$_SESSION['pdf_file_name'];

function detect_pdf_gcs($path, $output)
{
    # select ocr feature
    $feature = (new Feature())
        ->setType(Type::DOCUMENT_TEXT_DETECTION);

    # set $path (file to OCR) as source
    $gcsSource = (new GcsSource())
        ->setUri($path);
    # supported mime_types are: 'application/pdf' and 'image/tiff'
    $mimeType = 'application/pdf';
    $inputConfig = (new InputConfig())
        ->setGcsSource($gcsSource)
        ->setMimeType($mimeType);

    # set $output as destination
    $gcsDestination = (new GcsDestination())
        ->setUri($output);
    # how many pages should be grouped into each json output file.
    $batchSize = 2;
    $outputConfig = (new OutputConfig())
        ->setGcsDestination($gcsDestination)
        ->setBatchSize($batchSize);

    # prepare request using configs set above
    $request = (new AsyncAnnotateFileRequest())
        ->setFeatures([$feature])
        ->setInputConfig($inputConfig)
        ->setOutputConfig($outputConfig);
    $requests = [$request];

    # make request
    $imageAnnotator = new ImageAnnotatorClient();
    $operation = $imageAnnotator->asyncBatchAnnotateFiles($requests);
    print('Waiting for operation to finish.' . PHP_EOL);
    $operation->pollUntilComplete();

    # once the request has completed and the output has been
    # written to GCS, we can list all the output files.
    preg_match('/^gs:\/\/([a-z0-9\._\-]+)\/(\S+)$/', $output, $match);
    $bucketName = $match[1];
    $prefix = $match[2];

    $storage = new StorageClient();
    $bucket = $storage->bucket($bucketName);
    $options = ['prefix' => $prefix];
    $objects = $bucket->objects($options);

    # save first object for sample below
    $objects->next();
    $firstObject = $objects->current();

    # list objects with the given prefix.
    print('Output files:' . PHP_EOL);
    foreach ($objects as $object) {
        print($object->name() . PHP_EOL);
    }

    # process the first output file from GCS.
    # since we specified batch_size=2, the first response contains
    # the first two pages of the input file.
    $jsonString = $firstObject->downloadAsString();
    $firstBatch = new AnnotateFileResponse();
    $firstBatch->mergeFromJsonString($jsonString);

    # get annotation and print text
    foreach ($firstBatch->getResponses() as $response) {
        $annotation = $response->getFullTextAnnotation();
        print($annotation->getText());
    }

    $imageAnnotator->close();
}

When I run the second script I get the following errors:

Fatal error: Uncaught DomainException: Could not load the default credentials. Browse to https://developers.google.com/accounts/docs/application-default-credentials for more information in /home/domain/vendor/google/auth/src/ApplicationDefaultCredentials.php:168 Stack trace: #0 /home/domain/vendor/google/gax/src/CredentialsWrapper.php(197): Google\Auth\ApplicationDefaultCredentials::getCredentials(Array, Object(Google\Auth\HttpHandler\Guzzle6HttpHandler), NULL, NULL) #1 /home/domain/vendor/google/gax/src/CredentialsWrapper.php(114): Google\ApiCore\CredentialsWrapper::buildApplicationDefaultCredentials(Array, Object(Google\Auth\HttpHandler\Guzzle6HttpHandler)) #2 /home/domain/vendor/google/gax/src/GapicClientTrait.php(326): Google\ApiCore\CredentialsWrapper::build(Array) #3 /home/domain/vendor/google/gax/src/GapicClientTrait.php(308): Google\Cloud\Vision\V1\Gapic\ImageAnnotatorGapicClient->createCredentialsWrapper(NULL, Array) #4 /home/domain/vendor/google/cloud/Vision/src/V1/Gapic/ImageAnnotatorGapicClient.php(216): Google\Clou in /home/domain/vendor/google/gax/src/CredentialsWrapper.php on line 200

How do I authenticate for this service? What am I missing?



from Google Vision for PDF

No comments:

Post a Comment