A dynamic penalty approach to state constraint handling in deep reinforcement learning. (July 2022)