How to extract tabular data from PDF utilising Semantic Table Extract?
In this video, Semantic Evolution discusses how Semantic Table Extract parses unstructured documents, by locating and extracting tabular data.
Semantic Table Extract is based on deep learning, combining state of the art computer vision and natural language processing, to accurately localise tables and to recognise their internal structure.
Why is it challenging to extract tabular data from PDF files?
Tables next to charts or graphs
Tables with a title or paragraph between them
Tables with a lot of text in them
Tables that are side-by-side
Tables with different page layouts
Tables with different colours
Tables that are large or small
Tables with big lines or no lines
Advantages of Semantic Table Extract
can handle a wide range of complex tables.
can detect internal table structure.
supports empty and complex spanning cells.
handles complex cell merges and splitting.
supports major and minor headers.
supports totals and sub-totals.
About Semantic Evolution
We are a fast-growing technology firm with offices in London and Manhattan, dealing with great clients across the globe.
Our product uses artificial intelligence techniques to capture data from unstructured documents such as pdf's, spreadsheets and emails. Our parsing technology provides efficiencies to repetitive tasks which would normally require the time-consuming manual extraction of data.
Our unique scientific approach, industry leadership and total transparency bring intelligence to our client’s data.