Unverified Commit fbc8fe82 by HL Committed by GitHub

docs: update readme with links to examples

parent 6e003c0b
...@@ -42,7 +42,7 @@ veRL is fast with: ...@@ -42,7 +42,7 @@ veRL is fast with:
- Supervised fine-tuning - Supervised fine-tuning
- Reinforcement learning from human feedback with [PPO](https://github.com/volcengine/verl/tree/main/examples/ppo_trainer) and [GRPO](https://github.com/volcengine/verl/tree/main/examples/grpo_trainer) - Reinforcement learning from human feedback with [PPO](https://github.com/volcengine/verl/tree/main/examples/ppo_trainer) and [GRPO](https://github.com/volcengine/verl/tree/main/examples/grpo_trainer)
- Support model-based reward and function-based reward (verifiable reward) - Support model-based reward and function-based reward (verifiable reward)
- flash-attention integration, sequence packing, and long context support via DeepSpeed Ulysses - flash-attention, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [long context](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh)
- scales up to 70B models and hundreds of GPUs - scales up to 70B models and hundreds of GPUs
- experiment tracking with wandb and mlflow - experiment tracking with wandb and mlflow
...@@ -60,7 +60,7 @@ Checkout this [Jupyter Notebook](https://github.com/volcengine/verl/tree/main/ex ...@@ -60,7 +60,7 @@ Checkout this [Jupyter Notebook](https://github.com/volcengine/verl/tree/main/ex
**Running a PPO example step-by-step:** **Running a PPO example step-by-step:**
- Data and Reward Preparation - Data and Reward Preparation
- [Prepare Data (Parquet) for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html) - [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)
- [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html) - [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)
- Understanding the PPO Example - Understanding the PPO Example
- [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html) - [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment