I am using wkhtmltopdf
to render a (Django-templated) HTML document to a single-page PDF file. I would like to either render it immediately with the correct height (which I've failed to do so far) or render it incorrectly and trim it. I'm using Python.
Attempt type 1
wkhtmltopdf
render to a very, very long single-page PDF with a lot of extra space using--page-height
- Use
pdfCropMargins
to trim:crop(["-p4", "100", "0", "100", "100", "-a4", "0", "-28", "0", "0", "input.pdf"])
The PDF is rendered perfectly with 28 units of margin at the bottom, but I had to use the filesystem to execute the crop
command. It seems that the tool expects an input file and output file, and also creates temporary files midway through. So I can't use it.
Attempt type 2
wkhtmltopdf
render to multi-page PDF with default parameters- Use
PyPDF4
(orPyPDF2
) to read the file and combine pages into a long, single page
The PDF is rendered fine-ish in most cases, however, sometimes a lot of extra white space can be seen on the bottom if by chance the last PDF page had very little content.
Ideal scenario
The ideal scenario would involve a function that takes HTML and renders it into a single-page PDF with the expected amount of white space at the bottom. I would be happy with rendering the PDF using wkhtmltopdf
, since it returns bytes, and later processing these bytes to remove any extra white space. But I don't want to involve the file system in this, as instead, I want to perform all operations in memory. Perhaps I can somehow inspect the PDF directly and remove the white space manually, or do some HTML magic to determine the render height before-hand?
from How to trim (crop) bottom whitespace of a PDF document, in memory
No comments:
Post a Comment