| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| full_hh_rlhf.py | ||
| geo3k.py | ||
| gsm8k.py | ||
| hellaswag.py | ||
| math_dataset.py |
## What does this PR do? 1. Separate the prompt part and the response part in reward manager to avoid the reward leakage of format reward. 2. Update the reward score function for Geometry3k dataset. 3. Update the content in the readme file. ## Who can review? @vermouth1992 @PeterSH6
| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| full_hh_rlhf.py | Loading commit data... | |
| geo3k.py | Loading commit data... | |
| gsm8k.py | Loading commit data... | |
| hellaswag.py | Loading commit data... | |
| math_dataset.py | Loading commit data... |