A simple generative model of incremental reference resolution for situated dialogue. (January 2017)
- Record Type:
- Journal Article
- Title:
- A simple generative model of incremental reference resolution for situated dialogue. (January 2017)
- Main Title:
- A simple generative model of incremental reference resolution for situated dialogue
- Authors:
- Kennington, Casey
Schlangen, David - Abstract:
- Highlights: We present a simple generative model of reference resolution. The model resolves references incrementally. The model learns a grounded semantics from data. The model is robust to noise in the speech and object representations. The model has been rigorously evaluated in three languages and through varying aspects of the model. Abstract: Referring to visually perceivable objects is a very common occurrence in everyday language use. In order to produce expressions that refer, the speaker needs to be able to pick out visual properties that the referred object has and determine the words that name those properties, such that the expression can direct a listener's attention to the intended object. The speaker can aid the listener by looking in the direction of the object and by providing a pointing gesture to indicate it. In order to resolve the reference, the listener has a difficult job to do: simultaneously use all of the linguistic and non-linguistic information; the words of the referring expression that denote properties of the object, such as its colour or shape, need to already be known, and the non-linguistic gaze direction and pointing gesture of the speaker need to be incorporated. Crucially, the listener does not wait until the end of the referring expression before she begins to resolve it; rather, she is interpreting it as it unfolds. A model that resolves referring expressions as the listener must be able to do all of these things. In this paper, weHighlights: We present a simple generative model of reference resolution. The model resolves references incrementally. The model learns a grounded semantics from data. The model is robust to noise in the speech and object representations. The model has been rigorously evaluated in three languages and through varying aspects of the model. Abstract: Referring to visually perceivable objects is a very common occurrence in everyday language use. In order to produce expressions that refer, the speaker needs to be able to pick out visual properties that the referred object has and determine the words that name those properties, such that the expression can direct a listener's attention to the intended object. The speaker can aid the listener by looking in the direction of the object and by providing a pointing gesture to indicate it. In order to resolve the reference, the listener has a difficult job to do: simultaneously use all of the linguistic and non-linguistic information; the words of the referring expression that denote properties of the object, such as its colour or shape, need to already be known, and the non-linguistic gaze direction and pointing gesture of the speaker need to be incorporated. Crucially, the listener does not wait until the end of the referring expression before she begins to resolve it; rather, she is interpreting it as it unfolds. A model that resolves referring expressions as the listener must be able to do all of these things. In this paper, we present such a generative model of reference resolution. We explain our model and show empirically through a series of experiments that the model can work incrementally (i.e., word for word) as referring expressions unfold, can incorporate multimodal information such as gaze and pointing gestures in two ways, can learn a grounded meaning of words in the referring expression, can incorporate contextual (i.e., saliency) information, and is robust to noisy input such as automatic speech recognition transcriptions, as well as uncertainty in the representation of the candidate objects. … (more)
- Is Part Of:
- Computer speech & language. Volume 41(2016)
- Journal:
- Computer speech & language
- Issue:
- Volume 41(2016)
- Issue Display:
- Volume 41, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 41
- Issue:
- 2016
- Issue Sort Value:
- 2016-0041-2016-0000
- Page Start:
- 43
- Page End:
- 67
- Publication Date:
- 2017-01
- Subjects:
- Dialogue -- Situated -- Incremental -- Stochastic -- Reference resolution
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2016.04.002 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2481.xml