Representation learning using step-based deep multi-modal autoencoders. (November 2019)