I'm dealing with a problem trying to develop a web-app, part of which converts uploaded docx files to pdf files (after some processing). With python-docx
and other methods, I do not require a windows machine with word installed, or even libreoffice on linux, for most of the processing (my web server is pythonanywhere - linux but without libreoffice and without sudo
or apt install
permissions). But converting to pdf seems to require one of those. From exploring questions here and elsewhere, this is what I have so far:
import subprocess
try:
from comtypes import client
except ImportError:
client = None
def doc2pdf(doc):
"""
convert a doc/docx document to pdf format
:param doc: path to document
"""
doc = os.path.abspath(doc) # bugfix - searching files in windows/system32
if client is None:
return doc2pdf_linux(doc)
name, ext = os.path.splitext(doc)
try:
word = client.CreateObject('Word.Application')
worddoc = word.Documents.Open(doc)
worddoc.SaveAs(name + '.pdf', FileFormat=17)
except Exception:
raise
finally:
worddoc.Close()
word.Quit()
def doc2pdf_linux(doc):
"""
convert a doc/docx document to pdf format (linux only, requires libreoffice)
:param doc: path to document
"""
cmd = 'libreoffice --convert-to pdf'.split() + [doc]
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
p.wait(timeout=10)
stdout, stderr = p.communicate()
if stderr:
raise subprocess.SubprocessError(stderr)
As you can see, one method requires comtypes
, another requires libreoffice
as a subprocess. Other than switching to a more sophisticated hosting server, is there any solution?
from Converting docx to pdf with pure python (on linux, without libreoffice)
No comments:
Post a Comment