Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning