Name |
Last commit
|
Last update |
---|---|---|
.github/workflows | ||
docker | ||
docs | ||
examples | ||
patches | ||
scripts | ||
tests | ||
verl | ||
.gitignore | ||
.readthedocs.yaml | ||
.style.yapf | ||
LICENSE | ||
Notice.txt | ||
README.md | ||
pyproject.toml | ||
requirements.txt | ||
setup.py |
## Description Added [ReMax](https://arxiv.org/abs/2310.10505) support to verl. ReMax is a simple, efficient, and stable RL algorithm customized for LLM training, with theoretical guarantees for variance reduction. The [HybridFlow](https://arxiv.org/pdf/2409.19256v2) paper experimented with ReMax, but verl did not provide an implementation. Therefore, ReMax has been added. ## Changes - Added RayReMaxTrainer implementation - Added example scripts for ReMax training - Added documentation for ReMax algorithm ## Testing - Tested ReMax example scripts with Qwen models validation reward of optimizing Qwen2.5-3B-Instruct on the GSM8K dataset: <img width="501" alt="截屏2025-02-09 20 51 14" src="https://github.com/user-attachments/assets/742c2eab-6877-4c3c-b0a2-4159bd109add" /> The curve demonstrates the effectiveness of ReMax, though its performance can be further enhanced through hyperparameter fine-tuning. ## Documentation - Added ReMax documentation - Updated example configurations ## Checklist - [x] Code follows project's style guidelines (yapf formatted) - [x] Tests added/updated and passing - [x] Documentation updated - [x] Example scripts added
Name |
Last commit
|
Last update |
---|---|---|
.github/workflows | Loading commit data... | |
docker | Loading commit data... | |
docs | Loading commit data... | |
examples | Loading commit data... | |
patches | Loading commit data... | |
scripts | Loading commit data... | |
tests | Loading commit data... | |
verl | Loading commit data... | |
.gitignore | Loading commit data... | |
.readthedocs.yaml | Loading commit data... | |
.style.yapf | Loading commit data... | |
LICENSE | Loading commit data... | |
Notice.txt | Loading commit data... | |
README.md | Loading commit data... | |
pyproject.toml | Loading commit data... | |
requirements.txt | Loading commit data... | |
setup.py | Loading commit data... |