Tables in PDF files
I have come across so many public data repositories that hold data in PDF format. Other websites have tables within documents such as annual reports etc., also in PDF format. A data source for PDFs or tables from PDFs would be awesome!
The PDF connector is now generally available in the April release of Power BI Desktop. Learn more here: https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-april-2019-feature-summary/#pdf
Is the PDF Connector limited to only handling 19 columns? In my tests it seems to be, but just wanted to clarify?
Ricardo Diaz commented
Waiting for this TBA in Excel Vote for it here... https://excel.uservoice.com/forums/304921-excel-for-windows-desktop-application/suggestions/13611924-allow-powerquery-to-extract-data-from-pdf-tables
Super for Power BI, when available in Excel Get&Transform?
Kawabata Yoshihiro commented
Complete NICE for PDF connector by Power BI Desktop
AdminEhren (Admin, Microsoft Power BI) commented
To all those who have feedback to share about this feature, please send me a private message on the Power BI Community. We want to hear from you!
Mike Honey commented
A heads-up that while this feature kind-of worked in the November 2018 release, it was totally wrecked in the December 2018 update - the table and row detection broke to the point it was unusable.
The good news is the February 2019 update has fixed those problems and improved the detection of similar tables on subsequent pages - they are combined into one table. An odd new "feature" is that after detecting a table they leave you to manually add a Promoted Headers step - ideally that would be generated.
I've worked on many projects with multiple similar technologies over the years (Tabula was my previous fav). It's always a little messy due to the limitations of the PDF format. But I feel more confident tackling them in Power Query, with all it's glorious data transformation power at my fingertips.
Thanks Amanda and team!
We're looking for sharing features to be available to non-designers. That report consumers/viewers can use them when embedded, regardless of embed methods.
This feature would be most useful (for us) when this can be applicable outside the power bi app and the power bi service.
In the same light, export to PDF is most useful when its embedded, ie beyond Power BI app and beyond Power BI service.
When will this be available in Get & Transform in Excel?
Mattia Russo commented
the pdf connector works only for PBI Desktop. When i try to use a Gateway on a dataset that use a pdf Connector the Gateway doesn't work!!!! Are you working on it? When will fix this bug?
thanks in advanced!
This is great but I've come across where the data becomes corrupted and produces errors in the editor.
My source is a folder and I have 2 PDFs sent to me daily that I drop in that folder. The PDFs are identical except for the dollar amounts. Inconsistently; BI will corrupt one or a few of the documents when I refresh the dashboard.
Feature is working fine in my applications.
When will this be shipped with Excel Get & Transform since I prepare my data in Excel and I have to log the data imported?
It would be good if this could also read the data from formatted fields within the PDF. I believe they maybe in a fdf format. But as they are named fields it should be relatively easy to show a list of fields by column. And let you import like other files. Currently the data is not imported at all
Sharon Maxon commented
Beyond just importing a chart from a PDF, we need to be able to import a chart in a collection of PDFS with a consistent format in a folder. For example, a report in a standardized format is received on a weekly basis. We need to be able save the PDFs for a SharePoint Online folder and then let Power BI find each chart to append them together. This is a powerful feature that work for multiple Excel files in a folder, so replicate the same with PDFs.
Niko Suomi commented
Can this read hand-written tables, if those are scanned and then saved as pdf-file?
SEPT 2018 UPDATE: I am testing the PDF Import/Connector & have already found minor issues. Who/How/Where do I report?
IN BRIEF - I have "sample data" G/L Ledger 51 pages. PBI is not bringing in column headers which is not a big deal, but in skipping the headers it is merging any data where there is only 1 space between columns. EXAMPLE: PERIOD & SOURCE of 1 PJ became 1PJ & ACCOUNT_NUM & ACCOUNT_DESC of 21200 TRADE COLLECTORS became 21200TRADE COLLECTORS - these 2 are easy enough to "split columns" to fix.
HOWEVER, AMOUNT & DESCRIPTION were also merged so instead of -409.09 Pre-conversion purchase, I have -409.09Pre-conversion purchase. There is not a decimal in every amount & the amount total digits can vary. While it is highly unlikely that our company will connect to PDFs on a regular basis, we feel that this is an important feature for PBI.
Our own software has no problems with this sample file. Tableau merges fields same as PBI, but at least it leaves the space so that the fields can be "split"
Kawabata Yoshihiro commented
Nice, 'STARTED' status 😁
The summit is over - and It is Mid 2018 - when in the PDF connector scheduled
This is obviously a much needed data source, any update on its release will be appreciated
Any updates or potential release date?
Any news on when this feature will be released?