Extracting text from a PDF — Data driven nerd out


I recently found a PDF document which listed food additives by name and number. I want this data in a pandas DataFrame, which means I need to extract data from the PDF document. Here I’ll be using the python package textract and re (for regular expressions*) to get data out of a PDF document.

via Extracting text from a PDF — Data driven nerd out

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s