Parsing pdf file

It’s difficult if the pdf is not systematically created by some system (say bank statement pdfs from your bank)

see Lib for pdf processing - and experiment with parsing the output from pdftotext - eg linkedin uri and emails should be easy - something like phone number, addresses will be more difficult.

1 Like