Tables in PDF files
I have come across so many public data repositories that hold data in PDF format. Other websites have tables within documents such as annual reports etc., also in PDF format. A data source for PDFs or tables from PDFs would be awesome!
Updating the status to be more accurate. We have now shipped a preview of this feature in our September release (https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-september-2018-feature-summary/#pdf) and an update to it in our November release (https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-november-2018-feature-summary/#pdf). Make sure to try out the preview and give us feedback if you haven’t already.
I routinely import PDF data on Excel's PowerQuery and are working on data editing. From my work experience, the easiest way to edit data is to prepare a file converted to HTML with Adobe Acrobat DC's export function and import it via the web connection route. The problem is that it is difficult to automate editing after import. After that, it will be explained assuming editing PDF table. I do not know clearly the trigger, but because the fields are joined and split, and the data decomposition of the join columns is involved. Furthermore, the pattern of joining and splitting field columns also varies. While looking at the state, you will be disassembling. This is presumed to depend on how the table conversion function of PDF before importing is specified. We think that it is necessary to request Adobe to output table data without unintended coupling / division which is different from display.
This would be a great feature indeed, the amount of data contained in these not straight readable format is huge, and many times the contained data is pretty well structured!.
Your Office 'Word' program already can do it, apparently. It opens pdf documents that have been produced with different pdf software, just by back applying the pdf standard.
This facilitates a workaround by the way:
- Download the pdf
- Open it with MS Word (not all files are readable, e.g. optically scanned/printed as image docs, etc.)
- Copy table and paste it in excel
- Re-shape appropriately and point the query to the excel
No need to say that refreshment is kind of manual and painful!... I guess that large tables that aren't updated too often it may worth the work, though
Unfortunately some database holders are jealous about you querying their data without using their user interface and are reluctant to offer plane csv or other straight readable format. This is typically done by governmental 'open data' websites!
Looking forward to have this in place :)
You rock PBI guys!
Takahiko Doi commented
Finally Tableau support this capability.
Please add this ASAP!!
Basic & very important feature - please add it ASAP
get this ASAP
We need this so our client can get PDF of Table Sorter visual
Any update on this ?
IT important to export this in PDF for our clients
[Deleted User] commented
Any updates on this it important to enable pdf export option in table shorter visual
Damm we need this badly
This will help a great Deal to export in pdf to may department in my organization
Please this as the default feature in final table shorter visual.
I have the same need as everyone else but I'm not sure if this functionality should be built within Power Query. I have an overall needs to get data from web pages (aka web scraping) behind logins and also to download and parse PDF tables. I currently use a third-party web scraping and PDF extract service to do this and it works. I think having a a service by Microsoft with PowerQuery integration and Microsoft Flow integration would be beneficial.
Peter Schmidt commented
I have a customer that wishes to analyse their phone bill, but the "electronic" version their provider sends them is a 400 page PDF document!! You can use Excel as an intermediate step, but columns get transposed so manual data wrangling is still required. This feature cannot come soon enough!
thomas jackson commented
much needed to keep on top of industry publications that are more commonly released as pdf.
Colin Miles commented
Many individual business are now sending PDF receipts via email, ability to parse data would be amazing for granular project visibility.
Any update on this, start