A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. (2019)