A smoothed Q‐learning algorithm for estimating optimal dynamic treatment regimes. (26th December 2018)