I need to send a PDF file to Google Vision to extract and return text. From documentation I understood that DPF file must be located on Google Storage, so I am putting the file to my Google Storage bucket like this:
require '../vendor/autoload.php';
use Google\Cloud\Storage\StorageClient;
$storage = new StorageClient([
'keyFilePath' => '/my-keyfile.json',
'projectId' => PROJECT_ID
]);
$bucket = $storage->bucket(BUCKET_NAME);
$bucket->upload(
fopen($_SESSION['local_pdf_url'], 'r')
);
It works. After I redirect to another page that is suppose to get that file to Vision, and that's where it fails. I found an example function. Here's the code:
require '../vendor/autoload.php';
use Google\Cloud\Storage\StorageClient;
use Google\Cloud\Vision\V1\AnnotateFileResponse;
use Google\Cloud\Vision\V1\AsyncAnnotateFileRequest;
use Google\Cloud\Vision\V1\Feature;
use Google\Cloud\Vision\V1\Feature\Type;
use Google\Cloud\Vision\V1\GcsDestination;
use Google\Cloud\Vision\V1\GcsSource;
use Google\Cloud\Vision\V1\ImageAnnotatorClient;
use Google\Cloud\Vision\V1\InputConfig;
use Google\Cloud\Vision\V1\OutputConfig;
$storage = new StorageClient([
'keyFilePath' => '/my-keyfile.json',
'projectId' => PROJECT_ID
]);
$path = 'gs://my-bucket/'.$_SESSION['pdf_file_name'];
function detect_pdf_gcs($path, $output)
{
# select ocr feature
$feature = (new Feature())
->setType(Type::DOCUMENT_TEXT_DETECTION);
# set $path (file to OCR) as source
$gcsSource = (new GcsSource())
->setUri($path);
# supported mime_types are: 'application/pdf' and 'image/tiff'
$mimeType = 'application/pdf';
$inputConfig = (new InputConfig())
->setGcsSource($gcsSource)
->setMimeType($mimeType);
# set $output as destination
$gcsDestination = (new GcsDestination())
->setUri($output);
# how many pages should be grouped into each json output file.
$batchSize = 2;
$outputConfig = (new OutputConfig())
->setGcsDestination($gcsDestination)
->setBatchSize($batchSize);
# prepare request using configs set above
$request = (new AsyncAnnotateFileRequest())
->setFeatures([$feature])
->setInputConfig($inputConfig)
->setOutputConfig($outputConfig);
$requests = [$request];
# make request
$imageAnnotator = new ImageAnnotatorClient();
$operation = $imageAnnotator->asyncBatchAnnotateFiles($requests);
print('Waiting for operation to finish.' . PHP_EOL);
$operation->pollUntilComplete();
# once the request has completed and the output has been
# written to GCS, we can list all the output files.
preg_match('/^gs:\/\/([a-z0-9\._\-]+)\/(\S+)$/', $output, $match);
$bucketName = $match[1];
$prefix = $match[2];
$storage = new StorageClient();
$bucket = $storage->bucket($bucketName);
$options = ['prefix' => $prefix];
$objects = $bucket->objects($options);
# save first object for sample below
$objects->next();
$firstObject = $objects->current();
# list objects with the given prefix.
print('Output files:' . PHP_EOL);
foreach ($objects as $object) {
print($object->name() . PHP_EOL);
}
# process the first output file from GCS.
# since we specified batch_size=2, the first response contains
# the first two pages of the input file.
$jsonString = $firstObject->downloadAsString();
$firstBatch = new AnnotateFileResponse();
$firstBatch->mergeFromJsonString($jsonString);
# get annotation and print text
foreach ($firstBatch->getResponses() as $response) {
$annotation = $response->getFullTextAnnotation();
print($annotation->getText());
}
$imageAnnotator->close();
}
When I run the second script I get the following errors:
Fatal error: Uncaught DomainException: Could not load the default credentials. Browse to https://developers.google.com/accounts/docs/application-default-credentials for more information in /home/domain/vendor/google/auth/src/ApplicationDefaultCredentials.php:168 Stack trace: #0 /home/domain/vendor/google/gax/src/CredentialsWrapper.php(197): Google\Auth\ApplicationDefaultCredentials::getCredentials(Array, Object(Google\Auth\HttpHandler\Guzzle6HttpHandler), NULL, NULL) #1 /home/domain/vendor/google/gax/src/CredentialsWrapper.php(114): Google\ApiCore\CredentialsWrapper::buildApplicationDefaultCredentials(Array, Object(Google\Auth\HttpHandler\Guzzle6HttpHandler)) #2 /home/domain/vendor/google/gax/src/GapicClientTrait.php(326): Google\ApiCore\CredentialsWrapper::build(Array) #3 /home/domain/vendor/google/gax/src/GapicClientTrait.php(308): Google\Cloud\Vision\V1\Gapic\ImageAnnotatorGapicClient->createCredentialsWrapper(NULL, Array) #4 /home/domain/vendor/google/cloud/Vision/src/V1/Gapic/ImageAnnotatorGapicClient.php(216): Google\Clou in /home/domain/vendor/google/gax/src/CredentialsWrapper.php on line 200
How do I authenticate for this service? What am I missing?
from Google Vision for PDF
No comments:
Post a Comment