A core region captioning framework for automatic video understanding in story video contents. (25th February 2022)