Extracting text from a PDF — Data driven nerd out


I recently found a PDF document which listed food additives by name and number. I want this data in a pandas DataFrame, which means I need to extract data from the PDF document. Here I’ll be using the python package textract and re (for regular expressions*) to get data out of a PDF document.

via Extracting text from a PDF — Data driven nerd out

Advertisements