chanmuzi
<Reward> Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation (2024.01)