Dynamic self-attention with vision synchronization networks for video question answering. (December 2022)