A deep deterministic policy gradient algorithm based on averaged state-action estimation. (July 2022)