I want to extract text form image using Python. (Tessaract lib does not work for me because it requires instalation).
I have found boto3 lib and Textract, but Im having trouble with working with it. Im still new to this. Can you tell me what I need to do in order to run my script correctly. This is my code:
import cv2
import boto3
import textract
#img = cv2.imread('slika2.jpg') #this is jpg file
with open('slika2.pdf', 'rb') as document:
img = bytearray(document.read())
textract = boto3.client('textract',region_name='us-west-2')
response = textract.detect_document_text(Document={'Bytes': img}). #gives me error
print(response)
When I run this code, I get:
Invalid type for parameter Document.Bytes, value: '''very long aray'''
type: <class 'numpy.ndarray'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object
I have also tried this:
# Document
documentName = "slika2.jpg"
# Read document content
with open(documentName, 'rb') as document:
imageBytes = bytearray(document.read())
# Amazon Textract client
textract = boto3.client('textract',region_name='us-west-2')
# Call Amazon Textract
response = textract.detect_document_text(Document={'Bytes': imageBytes}) #ERROR
#print(response)
# Print detected text
for item in response["Blocks"]:
if item["BlockType"] == "LINE":
print ('\033[94m' + item["Text"] + '\033[0m')
But I get this error:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Im noob in this, so any help would be good. How can I read text form my image or pdf file?
I have also added this block of code, but the error is still Unable to locate credentials
.
session = boto3.Session(
aws_access_key_id='xxxxxxxxxxxx',
aws_secret_access_key='yyyyyyyyyyyyyyyyyyyyy'
)
from Using Textract for OCR locally
No comments:
Post a Comment