Hemant Vishwakarma: Combine a bunch of PDFs converted from TIFF files as they're read in thru a loop

Tuesday, 17 November 2020

Combine a bunch of PDFs converted from TIFF files as they're read in thru a loop

I've got a Python web scraper that crawls thru a bunch of TIFF pages online and converts each to PDF but I can't figure out how to combine all the converted PDFs into one and write it to my computer.

import img2pdf, requests
outPDF = []

for pgNum in range(1,20):
    tiff = requests.get("http://url-to-tiff-file.com/page="+str(pgNum)).content
    pdf = img2pdf.convert(tiff)
    outPDF.append(pdf)

with open("file","wb") as f:
    f.write(''.join(outPDF))

I get the following error when I run it:

f.write(''.join(outPDF))
TypeError: sequence item 0: expected str instance, bytes found

Update

If you go to http://oris.co.palm-beach.fl.us/or_web1/details_img.asp?doc_id=23543456&pg_num=1, then open up a web dev console in your browser, you can see a form tag with a bunch of ".tif" URLs in a bunch of hidden input tags.

from Combine a bunch of PDFs converted from TIFF files as they're read in thru a loop

Hemant Vishwakarma

Tuesday, 17 November 2020

Combine a bunch of PDFs converted from TIFF files as they're read in thru a loop

No comments:

Post a Comment