Multi-modal graph reasoning for structured video text extraction. (April 2023)
- Record Type:
- Journal Article
- Title:
- Multi-modal graph reasoning for structured video text extraction. (April 2023)
- Main Title:
- Multi-modal graph reasoning for structured video text extraction
- Authors:
- Shi, Weitao
Wang, Han
Lou, Xin - Abstract:
- Abstract: Structured video text information extraction is a crucial part of video understanding for exploring the structured text fields from different category-specific videos such as scores in basketball games or identities in news. Recent natural language models and text detectors have demonstrated state-of-the-art performance in video text detection and recognition. However, understanding text from unstructured video frames is challenging in practice due to a variety of video text and dynamic text layout changes. Limited work has focused on the solutions that efficiently extract structured information from the video text. In this paper, we address this task by modeling a multi-modal attention graph on the video text. Specifically, we encode both the visual and textual features of detected text regions as nodes of the graph; the spatial layout relationship of the text regions is modeled as edges of the graph. The structured information extraction is solved by iteratively propagating text region messages along graph edges and reasoning the structured categories of graph nodes. To promote the representation capacity of the graph, we further introduce a contrastive loss on the visual embeddings of the text regions in a self-supervised manner. In order to roundly evaluate our proposed method as well as boost future research, we release a new dataset collected and annotated from several standard NBA regular seasons and playoff match videos. Experimental results demonstrate theAbstract: Structured video text information extraction is a crucial part of video understanding for exploring the structured text fields from different category-specific videos such as scores in basketball games or identities in news. Recent natural language models and text detectors have demonstrated state-of-the-art performance in video text detection and recognition. However, understanding text from unstructured video frames is challenging in practice due to a variety of video text and dynamic text layout changes. Limited work has focused on the solutions that efficiently extract structured information from the video text. In this paper, we address this task by modeling a multi-modal attention graph on the video text. Specifically, we encode both the visual and textual features of detected text regions as nodes of the graph; the spatial layout relationship of the text regions is modeled as edges of the graph. The structured information extraction is solved by iteratively propagating text region messages along graph edges and reasoning the structured categories of graph nodes. To promote the representation capacity of the graph, we further introduce a contrastive loss on the visual embeddings of the text regions in a self-supervised manner. In order to roundly evaluate our proposed method as well as boost future research, we release a new dataset collected and annotated from several standard NBA regular seasons and playoff match videos. Experimental results demonstrate the superior performance of the proposed method over several state-of-the-art methods. … (more)
- Is Part Of:
- Computers & electrical engineering. Volume 107(2023)
- Journal:
- Computers & electrical engineering
- Issue:
- Volume 107(2023)
- Issue Display:
- Volume 107, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 107
- Issue:
- 2023
- Issue Sort Value:
- 2023-0107-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-04
- Subjects:
- Structured text extraction -- Video text -- Multi-modal -- Graph Neural Network
Computer engineering -- Periodicals
Electrical engineering -- Periodicals
Electrical engineering -- Data processing -- Periodicals
Ordinateurs -- Conception et construction -- Périodiques
Électrotechnique -- Périodiques
Électrotechnique -- Informatique -- Périodiques
Computer engineering
Electrical engineering
Electrical engineering -- Data processing
Periodicals
Electronic journals
621.302854 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00457906/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compeleceng.2023.108641 ↗
- Languages:
- English
- ISSNs:
- 0045-7906
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.680000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26153.xml