Investigating topics, audio representations and attention for multimodal scene-aware dialog. (November 2020)