Unverified Commit e842b73d by HL Committed by GitHub

docs: add programming model guide (#230)

parent bdb50ac3
...@@ -6,7 +6,7 @@ Model ...@@ -6,7 +6,7 @@ Model
-------------------------- --------------------------
In principle, our FSDP backend can support any HF model and we can In principle, our FSDP backend can support any HF model and we can
sychronoize the actor model weight with vLLM using `hf_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py>`_. sychronoize the actor model weight with vLLM using `hf_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py>`_.
However, ``hf_weight_loader`` is will gather the full state_dict of a However, ``hf_weight_loader`` is will gather the full state_dict of a
model during synchronization, which may cause OOM. We suggest using model during synchronization, which may cause OOM. We suggest using
``dtensor_weight_loader`` which gather the full model parameter layer by ``dtensor_weight_loader`` which gather the full model parameter layer by
......
...@@ -29,29 +29,34 @@ verl is fast with: ...@@ -29,29 +29,34 @@ verl is fast with:
.. toctree:: .. toctree::
:maxdepth: 5 :maxdepth: 5
:caption: Quickstart :caption: Quickstart
:titlesonly:
:numbered:
start/install start/install
start/quickstart start/quickstart
.. toctree:: .. toctree::
:maxdepth: 4
:caption: Programming guide
hybrid_flow
.. toctree::
:maxdepth: 5 :maxdepth: 5
:caption: Data Preparation :caption: Data Preparation
:titlesonly:
:numbered:
preparation/prepare_data preparation/prepare_data
preparation/reward_function preparation/reward_function
.. toctree:: .. toctree::
:maxdepth: 5
:caption: Configurations
examples/config
.. toctree::
:maxdepth: 2 :maxdepth: 2
:caption: PPO Example :caption: PPO Example
:titlesonly:
:numbered:
examples/ppo_code_architecture examples/ppo_code_architecture
examples/config
examples/gsm8k_example examples/gsm8k_example
.. toctree:: .. toctree::
......
Prepare Data (Parquet) for Post-Training Prepare Data for Post-Training
======================================== ========================================
Before starting the post-training job, we need to prepare the data for Before starting the post-training job, we need to prepare the data for
......
...@@ -15,9 +15,9 @@ verl supports various backends. Currently, the following configurations are avai ...@@ -15,9 +15,9 @@ verl supports various backends. Currently, the following configurations are avai
Training backends Training backends
------------------ ------------------
We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in `PyTorch FSDP Backend <https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html>`_. We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in :doc:`FSDP Workers<../workers/fsdp_workers>`.
For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM@core_v0.4.0 with some internal patches (soon be updated to latest version directly relying on upstream Megatron-LM). The guide for using Megatron-LM backend can be found in `Megatron-LM Backend <https://verl.readthedocs.io/en/latest/workers/megatron_workers.html>`_. For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM v0.4 [1]_. The guide for using Megatron-LM backend can be found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
Install from docker image Install from docker image
...@@ -25,7 +25,7 @@ Install from docker image ...@@ -25,7 +25,7 @@ Install from docker image
We provide pre-built Docker images for quick setup. We provide pre-built Docker images for quick setup.
Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` if you want to build your own image. Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` for NGC-based image or if you want to build your own.
1. Launch the desired Docker image: 1. Launch the desired Docker image:
...@@ -85,53 +85,14 @@ own post-training jobs. ...@@ -85,53 +85,14 @@ own post-training jobs.
cd verl cd verl
pip3 install -e . pip3 install -e .
You can also install verl using ``pip3 install``
.. code:: bash Megatron is optional. It's dependencies can be setup as below:
# directly install from pypi
pip3 install verl
Dependencies
------------
verl requires Python >= 3.9 and CUDA >= 12.1.
verl support various backend, we currently release FSDP and Megatron-LM
for actor training and vLLM for rollout generation.
The following dependencies are required for all backends, PyTorch FSDP and Megatron-LM.
The pros, cons and extension guide for using PyTorch FSDP backend can be
found in :doc:`FSDP Workers<../workers/fsdp_workers>`.
.. code:: bash
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install ray vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
# flash attention 2
pip3 install flash-attn --no-build-isolation
For users who pursue better scalability, we recommend using Megatron-LM
backend. Please install the above dependencies first.
Currently, we support Megatron-LM\@core_v0.4.0 and we fix some internal
issues of Megatron-LM. Here's the additional installation guide (optional).
The pros, cons and extension guide for using Megatron-LM backend can be
found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
.. code:: bash .. code:: bash
# Megatron-LM Backend (optional)
# apex # apex
pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \ pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
--config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \ git+https://github.com/NVIDIA/apex
git+https://github.com/NVIDIA/apex
# transformer engine # transformer engine
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7 pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7
...@@ -145,4 +106,7 @@ found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`. ...@@ -145,4 +106,7 @@ found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
cp ../verl/patches/megatron_v4.patch . cp ../verl/patches/megatron_v4.patch .
git apply megatron_v4.patch git apply megatron_v4.patch
pip3 install -e . pip3 install -e .
export PYTHONPATH=$PYTHONPATH:$(pwd) export PYTHONPATH=$PYTHONPATH:$(pwd)
\ No newline at end of file
.. [1] Megatron v0.4 is supported with verl's patches to fix issues such as virtual pipeline hang. It will be soon updated with latest the version of upstream Megatron-LM without patches.
\ No newline at end of file
.. _quickstart: .. _quickstart:
========================================================= =========================================================
Quickstart: Post-train a LLM using PPO with GSM8K dataset Quickstart: PPO training on GSM8K dataset
========================================================= =========================================================
Post-train a LLM using GSM8K dataset Post-train a LLM using GSM8K dataset.
===================================================================
Introduction Introduction
------------ ------------
...@@ -52,9 +51,9 @@ We preprocess the dataset in parquet format so that (1) it contains necessary fi ...@@ -52,9 +51,9 @@ We preprocess the dataset in parquet format so that (1) it contains necessary fi
Step 2: Download a model for post-training Step 2: Download a model for post-training
------------------------------------------- -------------------------------------------
Usually we recommend starting with an "instruct" model variant so that the model follows instructions. In this example, we start with the ``Qwen2.5-0.5B-Instruct`` model. In this example, we start with the ``Qwen2.5-0.5B-Instruct`` model.
If you start from a "base" model variant, doing SFT before RL is recommended. Refer to the `sft directory <https://github.com/volcengine/verl/blob/main/examples/sft/gsm8k>`_ and `SFT Trainer <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_ for further details. If you want to perform SFT before RL, refer to the :doc:`Complete GSM8K Example<../examples/gsm8k_example>`, the `sft directory <https://github.com/volcengine/verl/blob/main/examples/sft/gsm8k>`_ and `SFT Trainer <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_ for further details.
.. code-block:: bash .. code-block:: bash
......
...@@ -58,4 +58,4 @@ actor_output = actor_output.get() ...@@ -58,4 +58,4 @@ actor_output = actor_output.get()
``` ```
bash run_deepseek7b_llm.sh bash run_deepseek7b_llm.sh
``` ```
\ No newline at end of file
...@@ -14,10 +14,8 @@ ...@@ -14,10 +14,8 @@
""" """
Contains a resharding manager that binds weights from FSDP zero3 to XPerfGPT Contains a resharding manager that binds weights from FSDP zero3 to XPerfGPT
""" """
from typing import Optional
from .base import BaseShardingManager from .base import BaseShardingManager
import random
from torch.distributed.device_mesh import DeviceMesh from torch.distributed.device_mesh import DeviceMesh
from verl.utils.torch_functional import allgather_dict_tensors from verl.utils.torch_functional import allgather_dict_tensors
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment