Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks. (November 2021)