Actor-critic multi-objective reinforcement learning for non-linear utility functions