Unverified Commit b14299c8 by Zefan Wang Committed by GitHub

update README.md (#534)

1. add [PRIME](https://arxiv.org/abs/2502.01456) to README.md
2. slightly change the example script to align with the paper
parent f0e7f9fc
...@@ -44,7 +44,7 @@ verl is fast with: ...@@ -44,7 +44,7 @@ verl is fast with:
- **vLLM** and **HF Transformers** for rollout generation, **SGLang** support coming soon. - **vLLM** and **HF Transformers** for rollout generation, **SGLang** support coming soon.
- Compatible with Hugging Face Transformers and Modelscope Hub. - Compatible with Hugging Face Transformers and Modelscope Hub.
- Supervised fine-tuning. - Supervised fine-tuning.
- Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [ReMax](examples/remax_trainer/), [Reinforce++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), etc. - Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [ReMax](examples/remax_trainer/), [Reinforce++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), [PRIME](recipe/prime/), etc.
- Support model-based reward and function-based reward (verifiable reward) - Support model-based reward and function-based reward (verifiable reward)
- Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh) - Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh)
- Flash attention 2, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [sequence parallelism](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh). - Flash attention 2, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [sequence parallelism](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh).
......
...@@ -22,10 +22,10 @@ python3 -m recipe.prime.main_prime \ ...@@ -22,10 +22,10 @@ python3 -m recipe.prime.main_prime \
data.accuracy_upper_bound=0.8 \ data.accuracy_upper_bound=0.8 \
data.oversample_factor=4 \ data.oversample_factor=4 \
actor_rollout_ref.model.path=$model_path \ actor_rollout_ref.model.path=$model_path \
actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.optim.lr=5e-7 \
actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment