examples · v0.2 · ZhangXiaoyun / verl

Feature/add remax support (#234) · 769b8d04

## Description
Added [ReMax](https://arxiv.org/abs/2310.10505) support to verl. ReMax
is a simple, efficient, and stable RL algorithm customized for LLM
training, with theoretical guarantees for variance reduction.

The [HybridFlow](https://arxiv.org/pdf/2409.19256v2) paper experimented
with ReMax, but verl did not provide an implementation. Therefore, ReMax
has been added.


## Changes
- Added RayReMaxTrainer implementation
- Added example scripts for ReMax training
- Added documentation for ReMax algorithm

## Testing
- Tested ReMax example scripts with Qwen models

validation reward of optimizing Qwen2.5-3B-Instruct on the GSM8K
dataset:

<img width="501" alt="截屏2025-02-09 20 51 14"
src="https://github.com/user-attachments/assets/742c2eab-6877-4c3c-b0a2-4159bd109add"
/>

The curve demonstrates the effectiveness of ReMax, though its
performance can be further enhanced through hyperparameter fine-tuning.

## Documentation
- Added ReMax documentation
- Updated example configurations

## Checklist
- [x] Code follows project's style guidelines (yapf formatted)
- [x] Tests added/updated and passing
- [x] Documentation updated
- [x] Example scripts added

committed Feb 10, 2025

769b8d04

Name	Last commit	Last update
..
data_preprocess		Loading commit data...
generation		Loading commit data...
grpo_trainer		Loading commit data...
ppo_trainer		Loading commit data...
ray		Loading commit data...
remax_trainer		Loading commit data...
sft/gsm8k		Loading commit data...
split_placement		Loading commit data...