Wednesday, 27 June 2018

Can't execute the following script successfully

I've written a script using python in combination with PyPDF2, PIL and pytesseract to extract the text from the first page of the scanned pages of a pdf file. However, when I tried the below script to get the content from the first scanned page out of that pdf file, It throws the following error when reaches the line containing img = Image.open(pdfReader.getPage(0)).convert('L').

Script I have tried so far:

import PyPDF2
import pytesseract
from PIL import Image

pdfFileObj = open(r'C:\Users\WCS\Desktop\Scan project\Scanned.pdf', 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
img = Image.open(pdfReader.getPage(0)).convert('L')
imagetext = pytesseract.image_to_string(img)
print(imagetext)
pdfFileObj.close()

Error I'm having:

Traceback (most recent call last):
  File "C:\Users\WCS\AppData\Local\Programs\Python\Python36-32\SO.py", line 8, in <module>
    img = Image.open(pdfReader.getPage(0)).convert('L')
  File "C:\Users\WCS\AppData\Local\Programs\Python\Python36-32\lib\site-packages\PIL\Image.py", line 2554, in open
    fp = io.BytesIO(fp.read())
AttributeError: 'PageObject' object has no attribute 'read'

How can I make it a go successfully?



from Can't execute the following script successfully

No comments:

Post a Comment