How can we improve Power BI?

Tables in PDF files

I have come across so many public data repositories that hold data in PDF format. Other websites have tables within documents such as annual reports etc., also in PDF format. A data source for PDFs or tables from PDFs would be awesome!

3,042 votes
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Gogula Aryalingam shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

    Updating the status to be more accurate. We have now shipped a preview of this feature in our September release (https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-september-2018-feature-summary/#pdf) and an update to it in our November release (https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-november-2018-feature-summary/#pdf). Make sure to try out the preview and give us feedback if you haven’t already.

    274 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      Signed in as (Sign out)
      Submitting...
      • kigayo083 commented  ·   ·  Flag as inappropriate

        I routinely import PDF data on Excel's PowerQuery and are working on data editing. From my work experience, the easiest way to edit data is to prepare a file converted to HTML with Adobe Acrobat DC's export function and import it via the web connection route. The problem is that it is difficult to automate editing after import. After that, it will be explained assuming editing PDF table. I do not know clearly the trigger, but because the fields are joined and split, and the data decomposition of the join columns is involved. Furthermore, the pattern of joining and splitting field columns also varies. While looking at the state, you will be disassembling. This is presumed to depend on how the table conversion function of PDF before importing is specified. We think that it is necessary to request Adobe to output table data without unintended coupling / division which is different from display.

      • Anonymous commented  ·   ·  Flag as inappropriate

        This would be a great feature indeed, the amount of data contained in these not straight readable format is huge, and many times the contained data is pretty well structured!.

        Your Office 'Word' program already can do it, apparently. It opens pdf documents that have been produced with different pdf software, just by back applying the pdf standard.

        This facilitates a workaround by the way:

        - Download the pdf
        - Open it with MS Word (not all files are readable, e.g. optically scanned/printed as image docs, etc.)
        - Copy table and paste it in excel
        - Re-shape appropriately and point the query to the excel

        No need to say that refreshment is kind of manual and painful!... I guess that large tables that aren't updated too often it may worth the work, though

        Unfortunately some database holders are jealous about you querying their data without using their user interface and are reluctant to offer plane csv or other straight readable format. This is typically done by governmental 'open data' websites!

        Looking forward to have this in place :)
        You rock PBI guys!

      • Anonymous commented  ·   ·  Flag as inappropriate

        This will help a great Deal to export in pdf to may department in my organization

      • Anonymous commented  ·   ·  Flag as inappropriate

        I have the same need as everyone else but I'm not sure if this functionality should be built within Power Query. I have an overall needs to get data from web pages (aka web scraping) behind logins and also to download and parse PDF tables. I currently use a third-party web scraping and PDF extract service to do this and it works. I think having a a service by Microsoft with PowerQuery integration and Microsoft Flow integration would be beneficial.

        Just imagine the mountains of data locked up on web pages and PDFs but most tables are not so simple to parse with javascript post-backs, badly coded websites and PDF that have tables but not easy to magically parse all the data. Having a visual tool that helps in writing the powerquery code in debug mode as you step through a website or PDF is needed to make it a strong offering instead of just good enough.

      • Peter Schmidt commented  ·   ·  Flag as inappropriate

        I have a customer that wishes to analyse their phone bill, but the "electronic" version their provider sends them is a 400 page PDF document!! You can use Excel as an intermediate step, but columns get transposed so manual data wrangling is still required. This feature cannot come soon enough!

      • Colin Miles commented  ·   ·  Flag as inappropriate

        Many individual business are now sending PDF receipts via email, ability to parse data would be amazing for granular project visibility.

      Feedback and Knowledge Base

      Ready to get started?

      Try new features of Power BI today by signing up and learn more about our powerful suite of apps.