mpo maxWe introduce a new algorithm for reinforcement learning called Maximum a-posteriori Policy Optimisation (MPO) based on coordinate ascent on a relative-entropyOrang bernama Max Mpo ; Bebe Mo Maxine ; Max Mpofu ; Lampard Max Moeketsi ; Max Monasta Mosta Lagell ; Max Lim (morcana).