Joint speaker diarization and speech recognition based on region proposal networks. (March 2022)