Table Detection

Table-Detection-Text.png

Our Table Detection service enables you to find tables in financial documents such as a PDF, Word, Excel or Email, localising them to specific pages and text regions, and digesting them into rows and columns.

Why is it important?

Most financial information in a document is available through tables. For example, a financial statement might contain information about company profits and losses, the same information for a three-month view, 6 months view and how it changed year on year. It is extremely important all of the data is captured correctly, as a tiny error might completely change how investors view this information.

Our solution is resilient to changes in document layout, text and graphical features.

How does it work? 

We use a combination of Computer Vision, Natural Language Processing and hand-crafted features to detect tables and recognise their internal structure.

Artificial Intelligence allows computers to think, Computer Vision allows them to see, Natural Language Processing allows computers to read and hand-crafted features, informed by Domain Knowledge, solve edge-cases that are difficult to learn from data.

Our approach allows the system to learn over time and to generalise on unseen examples in the wild.

We like to benchmark our solution on a wide range of datasets and metrics. These range from open source (ICDAR, PubTabNet) data to much-bigger, pristine quality in-house tagged use cases broken down by document type: Financial Statements, Annual reports, Quarterly reports, Solvency, Bonds and more.

We know exactly how well our model performs on various aspects of table detection: rows, columns, spanning cells, empty cells, totals, headers, table localisation. Our custom metrics, together with quality data, provide insights like no other.