Adversarial gesture generation with realistic gesture phasing. (June 2020)
- Record Type:
- Journal Article
- Title:
- Adversarial gesture generation with realistic gesture phasing. (June 2020)
- Main Title:
- Adversarial gesture generation with realistic gesture phasing
- Authors:
- Ferstl, Ylva
Neff, Michael
McDonnell, Rachel - Abstract:
- Highlights: We explore generative adversarial networks (GANs) for automatic speech gesture generation for virtual humans. We define our training objective as a series of smaller sub-problems, including believable gesture dynamics with periods of both energy and acceleration, as well as pause. To assess gesture dynamics, we train a classifier to automatically detect gesture phases, and we validate our results on a second speaker. Graphical abstract: Abstract: Conversational virtual agents are increasingly common and popular, but modeling their non-verbal behavior is a complex problem that remains unsolved. Gesture is a key component of speech-accompanying behavior but is difficult to model due to its non-deterministic and variable nature. We explore the use of a generative adversarial training paradigm to map speech to 3D gesture motion. We define the gesture generation problem as a series of smaller sub-problems, including plausible gesture dynamics, realistic joint configurations, and diverse and smooth motion. Each sub-problem is monitored by separate adversaries. For the problem of enforcing realistic gesture dynamics in our output, we train three classifiers with different levels of detail to automatically detect gesture phases. We hand-annotate and evaluate over 3.8 hours of gesture data for this purpose, including samples of a second speaker for comparing and validating our results. We find adversarial training to be superior to the use of a standard regression lossHighlights: We explore generative adversarial networks (GANs) for automatic speech gesture generation for virtual humans. We define our training objective as a series of smaller sub-problems, including believable gesture dynamics with periods of both energy and acceleration, as well as pause. To assess gesture dynamics, we train a classifier to automatically detect gesture phases, and we validate our results on a second speaker. Graphical abstract: Abstract: Conversational virtual agents are increasingly common and popular, but modeling their non-verbal behavior is a complex problem that remains unsolved. Gesture is a key component of speech-accompanying behavior but is difficult to model due to its non-deterministic and variable nature. We explore the use of a generative adversarial training paradigm to map speech to 3D gesture motion. We define the gesture generation problem as a series of smaller sub-problems, including plausible gesture dynamics, realistic joint configurations, and diverse and smooth motion. Each sub-problem is monitored by separate adversaries. For the problem of enforcing realistic gesture dynamics in our output, we train three classifiers with different levels of detail to automatically detect gesture phases. We hand-annotate and evaluate over 3.8 hours of gesture data for this purpose, including samples of a second speaker for comparing and validating our results. We find adversarial training to be superior to the use of a standard regression loss and discuss the benefit of each of our training objectives. We recorded a dataset of over 6 hours of natural, unrehearsed speech with high-quality motion capture, as well as audio and video recording. … (more)
- Is Part Of:
- Computers & graphics. Volume 89(2020)
- Journal:
- Computers & graphics
- Issue:
- Volume 89(2020)
- Issue Display:
- Volume 89, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 89
- Issue:
- 2020
- Issue Sort Value:
- 2020-0089-2020-0000
- Page Start:
- 117
- Page End:
- 130
- Publication Date:
- 2020-06
- Subjects:
- Gesture generation -- Machine learning -- Generative adversarial networks -- Gesture segmentation -- Virtual humans
Computer graphics -- Periodicals
006.6 - Journal URLs:
- http://www.elsevier.com/journals ↗
- DOI:
- 10.1016/j.cag.2020.04.007 ↗
- Languages:
- English
- ISSNs:
- 0097-8493
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.700000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13523.xml