A multimodal attention fusion network with a dynamic vocabulary for TextVQA. (February 2022)