docs: add programming model guide (#230)

e842b73d · HL · GitHub · bdb50ac3 · e842b73d · e842b73d
Unverified Commit e842b73d authored Feb 09, 2025 by HL Committed by GitHub Feb 09, 2025
8 changed files
--- a/docs/advance/fsdp_extension.rst
+++ b/docs/advance/fsdp_extension.rst
@@ -6,7 +6,7 @@ Model
 --------------------------

 In principle, our FSDP backend can support any HF model and we can
-sychronoize the actor model weight with vLLM using `hf_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py>`_.
+sychronoize the actor model weight with vLLM using `hf_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py>`_.
 However, ``hf_weight_loader`` is will gather the full state_dict of a
 model during synchronization, which may cause OOM. We suggest using
 ``dtensor_weight_loader`` which gather the full model parameter layer by

--- a/docs/hybrid_flow.rst
+++ b/docs/hybrid_flow.rst
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -29,29 +29,34 @@ verl is fast with:
 .. toctree::
   :maxdepth: 5
   :caption: Quickstart
-   :titlesonly:
-   :numbered:

   start/install
   start/quickstart

 .. toctree::
+   :maxdepth: 4
+   :caption: Programming guide
+
+   hybrid_flow
+
+.. toctree::
   :maxdepth: 5
   :caption: Data Preparation
-   :titlesonly:
-   :numbered:

   preparation/prepare_data
   preparation/reward_function

 .. toctree::
+   :maxdepth: 5
+   :caption: Configurations
+
+   examples/config
+
+.. toctree::
   :maxdepth: 2
   :caption: PPO Example
-   :titlesonly:
-   :numbered:

   examples/ppo_code_architecture
-   examples/config
   examples/gsm8k_example

 .. toctree:: 

--- a/docs/preparation/prepare_data.rst
+++ b/docs/preparation/prepare_data.rst
-Prepare Data (Parquet) for Post-Training
+Prepare Data for Post-Training
 ========================================

 Before starting the post-training job, we need to prepare the data for

--- a/docs/start/install.rst
+++ b/docs/start/install.rst
@@ -15,9 +15,9 @@ verl supports various backends. Currently, the following configurations are avai
 Training backends
 ------------------

-We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in `PyTorch FSDP Backend <https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html>`_.
+We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in :doc:`FSDP Workers<../workers/fsdp_workers>`.

-For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM@core_v0.4.0 with some internal patches (soon be updated to latest version directly relying on upstream Megatron-LM). The guide for using Megatron-LM backend can be found in `Megatron-LM Backend <https://verl.readthedocs.io/en/latest/workers/megatron_workers.html>`_.
+For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM v0.4 [1]_. The guide for using Megatron-LM backend can be found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.


 Install from docker image
@@ -25,7 +25,7 @@ Install from docker image

 We provide pre-built Docker images for quick setup.

-Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` if you want to build your own image.
+Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` for NGC-based image or if you want to build your own.

 1. Launch the desired Docker image:

@@ -85,53 +85,14 @@ own post-training jobs.
   cd verl
   pip3 install -e .

-You can also install verl using ``pip3 install``

-.. code:: bash
-
-   # directly install from pypi
-   pip3 install verl
-
-Dependencies
------------
-
-verl requires Python >= 3.9 and CUDA >= 12.1.
-
-verl support various backend, we currently release FSDP and Megatron-LM
-for actor training and vLLM for rollout generation.
-
-The following dependencies are required for all backends, PyTorch FSDP and Megatron-LM.
-
-The pros, cons and extension guide for using PyTorch FSDP backend can be
-found in :doc:`FSDP Workers<../workers/fsdp_workers>`.
-
-.. code:: bash
-
-   # install torch [or you can skip this step and let vllm to install the correct version for you]
-   pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
-
-   # install vllm
-   pip3 install ray vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
-
-   # flash attention 2
-   pip3 install flash-attn --no-build-isolation
-
-For users who pursue better scalability, we recommend using Megatron-LM
-backend. Please install the above dependencies first.
-
-Currently, we support Megatron-LM\@core_v0.4.0 and we fix some internal
-issues of Megatron-LM. Here's the additional installation guide (optional).
-
-The pros, cons and extension guide for using Megatron-LM backend can be
-found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
+Megatron is optional. It's dependencies can be setup as below:

 .. code:: bash

-   # Megatron-LM Backend (optional)
   # apex
-   pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
-            --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
-            git+https://github.com/NVIDIA/apex
+   pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
+       git+https://github.com/NVIDIA/apex

   # transformer engine
   pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7
@@ -145,4 +106,7 @@ found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
   cp ../verl/patches/megatron_v4.patch .
   git apply megatron_v4.patch
   pip3 install -e .
-   export PYTHONPATH=$PYTHONPATH:$(pwd)
\ No newline at end of file
+   export PYTHONPATH=$PYTHONPATH:$(pwd)
+
+
+.. [1] Megatron v0.4 is supported with verl's patches to fix issues such as virtual pipeline hang. It will be soon updated with latest the version of upstream Megatron-LM without patches.
\ No newline at end of file
--- a/docs/start/quickstart.rst
+++ b/docs/start/quickstart.rst
 .. _quickstart:

 =========================================================
-Quickstart: Post-train a LLM using PPO with GSM8K dataset
+Quickstart: PPO training on GSM8K dataset
 =========================================================

-Post-train a LLM using GSM8K dataset
-===================================================================
+Post-train a LLM using GSM8K dataset.

 Introduction
 ------------
@@ -52,9 +51,9 @@ We preprocess the dataset in parquet format so that (1) it contains necessary fi
 Step 2: Download a model for post-training
 -------------------------------------------

-Usually we recommend starting with an "instruct" model variant so that the model follows instructions. In this example, we start with the ``Qwen2.5-0.5B-Instruct`` model.
+In this example, we start with the ``Qwen2.5-0.5B-Instruct`` model.

-If you start from a "base" model variant, doing SFT before RL is recommended. Refer to the `sft directory <https://github.com/volcengine/verl/blob/main/examples/sft/gsm8k>`_ and `SFT Trainer <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_ for further details.
+If you want to perform SFT before RL, refer to the :doc:`Complete GSM8K Example<../examples/gsm8k_example>`, the `sft directory <https://github.com/volcengine/verl/blob/main/examples/sft/gsm8k>`_ and `SFT Trainer <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_ for further details.

 .. code-block:: bash


--- a/examples/split_placement/README.md
+++ b/examples/split_placement/README.md
@@ -58,4 +58,4 @@ actor_output = actor_output.get()

 ```
 bash run_deepseek7b_llm.sh
-```
\ No newline at end of file
+```
--- a/verl/workers/sharding_manager/fsdp_ulysses.py
+++ b/verl/workers/sharding_manager/fsdp_ulysses.py
@@ -14,10 +14,8 @@
 """
 Contains a resharding manager that binds weights from FSDP zero3 to XPerfGPT
 """
-from typing import Optional
 from .base import BaseShardingManager

-import random
 from torch.distributed.device_mesh import DeviceMesh

 from verl.utils.torch_functional import allgather_dict_tensors