docs: add programming model guide (#230)

e842b73d · HL · GitHub · bdb50ac3 · e842b73d · e842b73d
Unverified Commit e842b73d authored Feb 09, 2025 by HL Committed by GitHub Feb 09, 2025
8 changed files
--- a/docs/advance/fsdp_extension.rst
+++ b/docs/advance/fsdp_extension.rst
@@ -6,7 +6,7 @@ Model
 --------------------------
 In principle, our FSDP backend can support any HF model and we can
-sychronoize the actor model weight with vLLM using `hf_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py>`_.
+sychronoize the actor model weight with vLLM using `hf_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py>`_.
 However, ``hf_weight_loader`` is will gather the full state_dict of a
 model during synchronization, which may cause OOM. We suggest using
 ``dtensor_weight_loader`` which gather the full model parameter layer by

--- a/docs/hybrid_flow.rst
+++ b/docs/hybrid_flow.rst
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -29,29 +29,34 @@ verl is fast with:
 .. toctree::
   :maxdepth: 5
   :caption: Quickstart
-   :titlesonly:
-   :numbered:
   start/install
   start/quickstart
 .. toctree::
+   :maxdepth: 4
+   :caption: Programming guide
+   hybrid_flow
+.. toctree::
   :maxdepth: 5
   :caption: Data Preparation
-   :titlesonly:
-   :numbered:
   preparation/prepare_data
   preparation/reward_function
 .. toctree::
+   :maxdepth: 5
+   :caption: Configurations
+   examples/config
+.. toctree::
   :maxdepth: 2
   :caption: PPO Example
-   :titlesonly:
-   :numbered:
   examples/ppo_code_architecture
-   examples/config
   examples/gsm8k_example
 .. toctree:: 

--- a/docs/preparation/prepare_data.rst
+++ b/docs/preparation/prepare_data.rst
-Prepare Data (Parquet) for Post-Training
+Prepare Data for Post-Training
 ========================================
 Before starting the post-training job, we need to prepare the data for

--- a/docs/start/install.rst
+++ b/docs/start/install.rst
@@ -15,9 +15,9 @@ verl supports various backends. Currently, the following configurations are avai
 Training backends
 ------------------
-We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in `PyTorch FSDP Backend <https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html>`_.
+We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in :doc:`FSDP Workers<../workers/fsdp_workers>`.
-For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM@core_v0.4.0 with some internal patches (soon be updated to latest version directly relying on upstream Megatron-LM). The guide for using Megatron-LM backend can be found in `Megatron-LM Backend <https://verl.readthedocs.io/en/latest/workers/megatron_workers.html>`_.
+For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM v0.4 [1]_. The guide for using Megatron-LM backend can be found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
 Install from docker image
@@ -25,7 +25,7 @@ Install from docker image
 We provide pre-built Docker images for quick setup.
-Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` if you want to build your own image.
+Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` for NGC-based image or if you want to build your own.
 1. Launch the desired Docker image:
@@ -85,53 +85,14 @@ own post-training jobs.
   cd verl
   pip3 install -e .
-You can also install verl using ``pip3 install``
-.. code:: bash
+Megatron is optional. It's dependencies can be setup as below:
-   # directly install from pypi
-   pip3 install verl
-Dependencies
------------
-verl requires Python >= 3.9 and CUDA >= 12.1.
-verl support various backend, we currently release FSDP and Megatron-LM
-for actor training and vLLM for rollout generation.
-The following dependencies are required for all backends, PyTorch FSDP and Megatron-LM.
-The pros, cons and extension guide for using PyTorch FSDP backend can be
-found in :doc:`FSDP Workers<../workers/fsdp_workers>`.
-.. code:: bash
-   # install torch [or you can skip this step and let vllm to install the correct version for you]
-   pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
-   # install vllm
-   pip3 install ray vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
-   # flash attention 2
-   pip3 install flash-attn --no-build-isolation
-For users who pursue better scalability, we recommend using Megatron-LM
-backend. Please install the above dependencies first.
-Currently, we support Megatron-LM\@core_v0.4.0 and we fix some internal
-issues of Megatron-LM. Here's the additional installation guide (optional).
-The pros, cons and extension guide for using Megatron-LM backend can be
-found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
 .. code:: bash
-   # Megatron-LM Backend (optional)
   # apex
-   pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
+   pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
-            --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
+       git+https://github.com/NVIDIA/apex
-            git+https://github.com/NVIDIA/apex
   # transformer engine
   pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7
@@ -145,4 +106,7 @@ found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
   cp ../verl/patches/megatron_v4.patch .
   git apply megatron_v4.patch
   pip3 install -e .
   export PYTHONPATH=$PYTHONPATH:$(pwd)
\ No newline at end of file
+.. [1] Megatron v0.4 is supported with verl's patches to fix issues such as virtual pipeline hang. It will be soon updated with latest the version of upstream Megatron-LM without patches.
\ No newline at end of file
--- a/docs/start/quickstart.rst
+++ b/docs/start/quickstart.rst
 .. _quickstart:
 =========================================================
-Quickstart: Post-train a LLM using PPO with GSM8K dataset
+Quickstart: PPO training on GSM8K dataset
 =========================================================
-Post-train a LLM using GSM8K dataset
+Post-train a LLM using GSM8K dataset.
-===================================================================
 Introduction
 ------------
@@ -52,9 +51,9 @@ We preprocess the dataset in parquet format so that (1) it contains necessary fi
 Step 2: Download a model for post-training
 -------------------------------------------
-Usually we recommend starting with an "instruct" model variant so that the model follows instructions. In this example, we start with the ``Qwen2.5-0.5B-Instruct`` model.
+In this example, we start with the ``Qwen2.5-0.5B-Instruct`` model.
-If you start from a "base" model variant, doing SFT before RL is recommended. Refer to the `sft directory <https://github.com/volcengine/verl/blob/main/examples/sft/gsm8k>`_ and `SFT Trainer <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_ for further details.
+If you want to perform SFT before RL, refer to the :doc:`Complete GSM8K Example<../examples/gsm8k_example>`, the `sft directory <https://github.com/volcengine/verl/blob/main/examples/sft/gsm8k>`_ and `SFT Trainer <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_ for further details.
 .. code-block:: bash

--- a/examples/split_placement/README.md
+++ b/examples/split_placement/README.md
@@ -58,4 +58,4 @@ actor_output = actor_output.get()
 ```
 bash run_deepseek7b_llm.sh
 ```
\ No newline at end of file
--- a/verl/workers/sharding_manager/fsdp_ulysses.py
+++ b/verl/workers/sharding_manager/fsdp_ulysses.py
@@ -14,10 +14,8 @@
 """
 Contains a resharding manager that binds weights from FSDP zero3 to XPerfGPT
 """
-from typing import Optional
 from .base import BaseShardingManager
-import random
 from torch.distributed.device_mesh import DeviceMesh
 from verl.utils.torch_functional import allgather_dict_tensors