docs: update readme with links to examples

fbc8fe82 · HL · GitHub · 6e003c0b · fbc8fe82
Unverified Commit fbc8fe82 authored Feb 02, 2025 by HL Committed by GitHub Feb 02, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

README.md
+2 -2

No files found.
--- a/README.md
+++ b/README.md
@@ -42,7 +42,7 @@ veRL is fast with:
 - Supervised fine-tuning
 - Reinforcement learning from human feedback with [PPO](https://github.com/volcengine/verl/tree/main/examples/ppo_trainer) and [GRPO](https://github.com/volcengine/verl/tree/main/examples/grpo_trainer)
  - Support model-based reward and function-based reward (verifiable reward)
- flash-attention integration, sequence packing, and long context support via DeepSpeed Ulysses
+- flash-attention, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [long context](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh)
 - scales up to 70B models and hundreds of GPUs
 - experiment tracking with wandb and mlflow
@@ -60,7 +60,7 @@ Checkout this [Jupyter Notebook](https://github.com/volcengine/verl/tree/main/ex
 **Running a PPO example step-by-step:**
 - Data and Reward Preparation
-  - [Prepare Data (Parquet) for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)
+  - [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)
  - [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)
 - Understanding the PPO Example
  - [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)