Table detection in business document images by message passing networks. (July 2022)
- Record Type:
- Journal Article
- Title:
- Table detection in business document images by message passing networks. (July 2022)
- Main Title:
- Table detection in business document images by message passing networks
- Authors:
- Riba, Pau
Goldmann, Lutz
Terrades, Oriol Ramos
Rusticus, Diede
Fornés, Alicia
Lladós, Josep - Abstract:
- Highlights: A table detection approach with heterogeneous formats for business documents working on anonymized data. A new graph neural network architecture that poses the table detection problems in terms of node and edge classification. A final consensus layer based on the belief propagation algorithm to marginalize the edge probability. Extensive experimentation on three document datasets. Abstract: Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitiveHighlights: A table detection approach with heterogeneous formats for business documents working on anonymized data. A new graph neural network architecture that poses the table detection problems in terms of node and edge classification. A final consensus layer based on the belief propagation algorithm to marginalize the edge probability. Extensive experimentation on three document datasets. Abstract: Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches. … (more)
- Is Part Of:
- Pattern recognition. Volume 127(2022)
- Journal:
- Pattern recognition
- Issue:
- Volume 127(2022)
- Issue Display:
- Volume 127, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 127
- Issue:
- 2022
- Issue Sort Value:
- 2022-0127-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-07
- Subjects:
- Business document processing -- Anonymized document processing -- Table detection -- Graph neural networks -- Node and edge classification
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2022.108641 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22270.xml