Actor-Critic Multi-Objective Reinforcement Learning for Non-Linear Utility Functions