chanmuzi
<Alignment> RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback