Tables in PDF files
I have come across so many public data repositories that hold data in PDF format. Other websites have tables within documents such as annual reports etc., also in PDF format. A data source for PDFs or tables from PDFs would be awesome!
Thanks everyone for your feedback/votes!
We’re actively working on this new connector. You can find an early demo in the recording of the Power Query session at the Microsoft Business Applications Summit: https://www.microsoft.com/en-us/businessapplicationssummit/video/BAS2018-2167
Please stay tuned to the Power BI blog for further announcements.
get this ASAP
We need this so our client can get PDF of Table Sorter visual
Any update on this ?
IT important to export this in PDF for our clients
[Deleted User] commented
Any updates on this it important to enable pdf export option in table shorter visual
Damm we need this badly
This will help a great Deal to export in pdf to may department in my organization
Please this as the default feature in final table shorter visual.
I have the same need as everyone else but I'm not sure if this functionality should be built within Power Query. I have an overall needs to get data from web pages (aka web scraping) behind logins and also to download and parse PDF tables. I currently use a third-party web scraping and PDF extract service to do this and it works. I think having a a service by Microsoft with PowerQuery integration and Microsoft Flow integration would be beneficial.
Peter Schmidt commented
I have a customer that wishes to analyse their phone bill, but the "electronic" version their provider sends them is a 400 page PDF document!! You can use Excel as an intermediate step, but columns get transposed so manual data wrangling is still required. This feature cannot come soon enough!
thomas jackson commented
much needed to keep on top of industry publications that are more commonly released as pdf.
Colin Miles commented
Many individual business are now sending PDF receipts via email, ability to parse data would be amazing for granular project visibility.
Any update on this, start
Max Gregson commented
This is huge for professional services firms too. Even if the result had to be cleaned up after. The biggest ask we get is to be able to extract the data held in tables within pdfs which feels like it should be an easy/easier solution.
Brian Spiller commented
Exactly as Ken Puls states...
I am looking at a bank statement that is 186 (and sometimes statements that much, much bigger) pages long. I can use NitroPro to convert the document to straight txt file and then bring it into Excel, either directly or through PQ.
But cutting that conversion step to txt is one of the big reasons I thought PQ exist?
Considering that some PDF's are the only source for certain data dissemination, I am really surprised that it is not yet a valid source for PQ.
Ken Puls commented
Since this got merged from a different thread, I just want to clarify something as the topic is not quite the same...
What I'm looking for is the ability to read from a PDF. While extracting tables would be nice, my priority would be to read the PDF as a text file so that I can do my own parsing of any of the data inside. I.e. I don't want this restricted to only pulling in data that looks like a table.
Anthony Newell commented
Here's my input on this idea:
1) Ability to extract from a document (PDF or Word) If you received a data source on a regular basis in document format that had a regular embedded table of data you could extract it using PQ
2) Convert a set of reports in PBI to PDF document to enable you to produce and distribute a hard copy report pack by email or upload to Sharepoint. Sometimes the requirement is to have reports consumed in this way so this is greater flexibility opening up more usage possibilities for PBI