Timed-image based deep learning for action recognition in video sequences. (August 2020)
- Record Type:
- Journal Article
- Title:
- Timed-image based deep learning for action recognition in video sequences. (August 2020)
- Main Title:
- Timed-image based deep learning for action recognition in video sequences
- Authors:
- Atto, Abdourrahmane Mahamane
Benoit, Alexandre
Lambert, Patrick - Abstract:
- Highlights: Image data conditioning issue: the paper first highlights that referring 2D spatial convolution to its 1D Hilbert based instance is highly accurate for information compressibility upon image frames associated with a wide class of video files. Video library conditioning issue: because of the above compressibility, the paper proposes converting 2D + X data volume into a single meta-image file format called timed-image, prior to machine learning frameworks. This conversion is such that any 2D frame of the 2D + X data is reshaped as a 1D array indexed by a Hilbert space-filling curve and the third variable X of the initial file format becomes the second variable in the meta-image format. Sensitive action recognition benchmark: the paper provides two datasets having respectively 2 and 3 violence video categories. The datasets involve visual non-violent, moderate and extreme violence actions. Sensitive action recognition issue: outstanding 2-level and 3-level violence classification results are obtained from a deep convolutional neural networks trained from scratch and operating on meta-image databases. Abstract: The paper addresses two issues relative to machine learning on 2D + X data volumes, where 2D refers to image observation and X denotes a variable that can be associated with time, depth, wavelength, etc. The first issue addressed is conditioning these structured volumes for compatibility with respect to convolutional neural networks operating on 2D imageHighlights: Image data conditioning issue: the paper first highlights that referring 2D spatial convolution to its 1D Hilbert based instance is highly accurate for information compressibility upon image frames associated with a wide class of video files. Video library conditioning issue: because of the above compressibility, the paper proposes converting 2D + X data volume into a single meta-image file format called timed-image, prior to machine learning frameworks. This conversion is such that any 2D frame of the 2D + X data is reshaped as a 1D array indexed by a Hilbert space-filling curve and the third variable X of the initial file format becomes the second variable in the meta-image format. Sensitive action recognition benchmark: the paper provides two datasets having respectively 2 and 3 violence video categories. The datasets involve visual non-violent, moderate and extreme violence actions. Sensitive action recognition issue: outstanding 2-level and 3-level violence classification results are obtained from a deep convolutional neural networks trained from scratch and operating on meta-image databases. Abstract: The paper addresses two issues relative to machine learning on 2D + X data volumes, where 2D refers to image observation and X denotes a variable that can be associated with time, depth, wavelength, etc. The first issue addressed is conditioning these structured volumes for compatibility with respect to convolutional neural networks operating on 2D image file formats. The second issue is associated with sensitive action detection in the "2D + Time" case (video clips and image time series). For the data conditioning issue, the paper first highlights that referring 2D spatial convolution to its 1D Hilbert based instance is highly accurate for information compressibility upon tight frames of convolutional networks. As a consequence of this compressibility, the paper proposes converting the 2D + X data volume into a single meta-image file format, prior to machine learning frameworks. This conversion is such that any 2D frame of the 2D + X data is reshaped as a 1D array indexed by a Hilbert space-filling curve and the third variable X of the initial file format becomes the second variable in the meta-image format. For the sensitive action recognition issue, the paper provides: ( i ) a 3 category video database involving non-violent, moderate and extreme violence actions; ( ii ) the conversion of this database into a timed meta-image database from the 2D + Time to 2D conditioning stage described above and ( iii ) outstanding 2-level and 3-level violence classification results from deep convolutional neural networks operating on meta-image databases. … (more)
- Is Part Of:
- Pattern recognition. Volume 104(2020:Aug.)
- Journal:
- Pattern recognition
- Issue:
- Volume 104(2020:Aug.)
- Issue Display:
- Volume 104 (2020)
- Year:
- 2020
- Volume:
- 104
- Issue Sort Value:
- 2020-0104-0000-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-08
- Subjects:
- Data conditioning -- Video analysis -- Deep learning -- Convolution frames -- Hilbert space-filling curve -- Action recognition -- Violence detection
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2020.107353 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13672.xml