Can this extract data from a scanned PDF?

Not directly. Scanned PDFs are images of pages and contain no real text data to extract. They would need OCR (optical character recognition) processing first.

Will formulas from the original document be preserved?

No. PDFs only contain the visible result of any calculation, not the underlying formula, since PDF is a presentation format, not a data format.

Why are some of my rows split into two in the extracted spreadsheet?

This typically happens when a line of text in the original PDF wraps onto two visual lines. The extraction tool may interpret that as two separate rows since it works from vertical text position.

Is there a way to extract just one specific table from a multi-page PDF?

Most tools extract by page rather than by detecting individual tables. If you only need one table, isolate or note which page it's on for an easier post-extraction cleanup.

How to Extract Data From a PDF Into Excel

Why this comes up so often

A huge amount of business data lives in PDF form: bank statements, invoices from vendors, government filings, exported reports. The information itself is structured and tabular, but it arrives in a format built for reading, not recalculating. Getting that data into Excel means re-extracting structure from what is, technically, just positioned text on a flat page.

How PDF-to-Excel extraction actually works

A PDF doesn't store an internal concept of "this is a table." It only stores where each piece of text sits on the page. A PDF-to-Excel tool works by reading the position of every text element and grouping items that share a similar vertical position into the same row — effectively reconstructing a table by inferring structure from layout, rather than reading an actual table object.

This means extraction accuracy depends heavily on how clean and consistent the original PDF's layout is.

What extracts well

Simple, single tables with consistent row heights and clear column alignment
Invoices and statements with a standard line-item format
Reports generated directly from a database or spreadsheet (rather than designed manually)

What's harder to extract accurately

Tables with merged or spanning cells
Multiple tables positioned close together on the same page
Documents with inconsistent spacing or unusual fonts
Scanned (image-based) tables, which need OCR before any extraction is possible at all

Step-by-step: extracting PDF data to Excel

Confirm your PDF has real, selectable text (not a scanned image) — try highlighting text in your PDF viewer to check.
Upload the file to a PDF to Excel converter.
Download the resulting spreadsheet, which typically places each PDF page on its own sheet.
Review the extracted rows against the original PDF, checking especially for any rows that may have been split or merged incorrectly.
Clean up column headers and remove any stray rows from page headers or footers that got pulled in alongside the actual data.

Tips to improve extraction accuracy

Extract one table at a time when possible. If a PDF contains multiple distinct tables on one page, isolating and converting just the relevant page can reduce confusion in the output compared to extracting an entire multi-table page at once.

Expect to do some manual cleanup. Even well-extracted data often needs a final pass to fix column headers, remove repeated header rows from multi-page tables, and correct any row that got split across two lines in the original PDF.

For one-off, simple data, manual copy-paste might be faster. If you only need a handful of numbers from a short PDF, sometimes directly copying selectable text out of the PDF viewer and pasting into Excel, then using Excel's Text to Columns feature, is quicker than a full extraction.

Cleaning up extracted data efficiently in Excel

Once your data is in Excel, a few built-in features handle most of the common cleanup work quickly. Excel's Text to Columns wizard can split a single extracted column into multiple proper columns if your data came through merged together. The Remove Duplicates feature handles any repeated header rows pulled in from a multi-page table. And a quick Find & Replace pass can clean up stray characters or extra whitespace that sometimes comes through in extracted text.

If you just need the text, not a spreadsheet

If your real goal is editable text rather than spreadsheet rows and columns, a PDF to Word conversion will likely give cleaner, more readable results for prose-heavy documents.