Towards local visual modeling for image captioning. (June 2023)