VSRNet: End-to-end video segment retrieval with text query. (November 2021)
- Record Type:
- Journal Article
- Title:
- VSRNet: End-to-end video segment retrieval with text query. (November 2021)
- Main Title:
- VSRNet: End-to-end video segment retrieval with text query
- Authors:
- Sun, Xiao
Long, Xiang
He, Dongliang
Wen, Shilei
Lian, Zhouhui - Abstract:
- Highlights: We propose a novel framework that combines both video retrieval and segment localization into one network, and the joint training improves the performance of each task. We introduce a text-aligned attention mechanism to efficiently generate temporal proposal and a collaborative ranking strategy to improve the performance of video segment retrieval. Extensive experiments conducted on DiDeMo and ActivityNet Captions demonstrate the superiority of our method in VSR task. Abstract: Users are sometimes interested in specific segments of an untrimmed video when using the video search engine. Targeting at this demand, we explore a novel research topic of text query based video segment retrieval (VSR). Different from the conventional video retrieval task or localizing text descriptions in a single video, it requires the retrieval of the most relevant video from a large collection as well as localizing the start and end timestamps of a segment that matches the text query best from the video. A direct solution is to perform video-level matching first, and then apply description localization among such video candidates. Such two-stage based methods are not able to utilize complementary information of each stage, and are time-consuming in inference. In this paper, We propose VSRNet, an end-to-end framework that efficiently retrieves video at segment granularity with two branches. In the first branch, individual videos and texts are mapped to a common space for stand-aloneHighlights: We propose a novel framework that combines both video retrieval and segment localization into one network, and the joint training improves the performance of each task. We introduce a text-aligned attention mechanism to efficiently generate temporal proposal and a collaborative ranking strategy to improve the performance of video segment retrieval. Extensive experiments conducted on DiDeMo and ActivityNet Captions demonstrate the superiority of our method in VSR task. Abstract: Users are sometimes interested in specific segments of an untrimmed video when using the video search engine. Targeting at this demand, we explore a novel research topic of text query based video segment retrieval (VSR). Different from the conventional video retrieval task or localizing text descriptions in a single video, it requires the retrieval of the most relevant video from a large collection as well as localizing the start and end timestamps of a segment that matches the text query best from the video. A direct solution is to perform video-level matching first, and then apply description localization among such video candidates. Such two-stage based methods are not able to utilize complementary information of each stage, and are time-consuming in inference. In this paper, We propose VSRNet, an end-to-end framework that efficiently retrieves video at segment granularity with two branches. In the first branch, individual videos and texts are mapped to a common space for stand-alone ranking. In the second branch, we propose a supervised text-aligned attention mechanism and calculate the response of every frame to the text query, from which the frames with high scores are aggregated as segment proposals. Extensive experiments conducted on ActivityNet Captions and DiDeMo verify the effectiveness of our method and show that our solution significantly outperforms the state of the art. … (more)
- Is Part Of:
- Pattern recognition. Volume 119(2021)
- Journal:
- Pattern recognition
- Issue:
- Volume 119(2021)
- Issue Display:
- Volume 119, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 119
- Issue:
- 2021
- Issue Sort Value:
- 2021-0119-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11
- Subjects:
- Video segment retrieval -- Video retrieval -- Description localization
00-01 -- 99-00
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2021.108027 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17786.xml