Friday 13 November 2020

Error while trying to extract text from pdf file using pdfminer.six

I am trying to extract text from pdf using pdfminer.six library (like here), I have already installed it in my virtual environment. here is my code :

import pdfminer as miner

text = miner.high_level.extract_text('file.pdf')


print(text)  

but when I execute the code with python pdfreader.py I get the following error :

Traceback (most recent call last):
  File ".\pdfreader.py", line 9, in <module>
    text = miner.high_level.extract_text('pdfBulletins/corona1.pdf')
AttributeError: module 'pdfminer' has no attribute 'high_level'  

I am suspecting it has something to do with Python path, because I installed pdfminer inside my virtual environment, but I see that this installed pdf2txt.py outside in my system python install, is this behaviour normal ? I mean something that happens inside my venv should not alter my system Python installation. I am successful extracting the text using pdf2txt.py utility that comes with pdfminer.six library (from command line and using system python install), but not from the code inside my venv project. my pdfminer.six version is 20201018

What could be the problem with my code ?



from Error while trying to extract text from pdf file using pdfminer.six

No comments:

Post a Comment