chanmuzi
<Alignment> Fine-Grained Human Feedback Gives Better Rewards for Language Model Training