-
Feature/add remax support (#234) · 769b8d04
## Description Added [ReMax](https://arxiv.org/abs/2310.10505) support to verl. ReMax is a simple, efficient, and stable RL algorithm customized for LLM training, with theoretical guarantees for variance reduction. The [HybridFlow](https://arxiv.org/pdf/2409.19256v2) paper experimented with ReMax, but verl did not provide an implementation. Therefore, ReMax has been added. ## Changes - Added RayReMaxTrainer implementation - Added example scripts for ReMax training - Added documentation for ReMax algorithm ## Testing - Tested ReMax example scripts with Qwen models validation reward of optimizing Qwen2.5-3B-Instruct on the GSM8K dataset: <img width="501" alt="截屏2025-02-09 20 51 14" src="https://github.com/user-attachments/assets/742c2eab-6877-4c3c-b0a2-4159bd109add" /> The curve demonstrates the effectiveness of ReMax, though its performance can be further enhanced through hyperparameter fine-tuning. ## Documentation - Added ReMax documentation - Updated example configurations ## Checklist - [x] Code follows project's style guidelines (yapf formatted) - [x] Tests added/updated and passing - [x] Documentation updated - [x] Example scripts added
Ziniu Li committed
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
data_preprocess | Loading commit data... | |
generation | Loading commit data... | |
grpo_trainer | Loading commit data... | |
ppo_trainer | Loading commit data... | |
ray | Loading commit data... | |
remax_trainer | Loading commit data... | |
sft/gsm8k | Loading commit data... | |
split_placement | Loading commit data... |