TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering. (22nd February 2023)