Unverified Commit e842b73d by HL Committed by GitHub

docs: add programming model guide (#230)

parent bdb50ac3
......@@ -6,7 +6,7 @@ Model
--------------------------
In principle, our FSDP backend can support any HF model and we can
sychronoize the actor model weight with vLLM using `hf_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py>`_.
sychronoize the actor model weight with vLLM using `hf_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py>`_.
However, ``hf_weight_loader`` is will gather the full state_dict of a
model during synchronization, which may cause OOM. We suggest using
``dtensor_weight_loader`` which gather the full model parameter layer by
......
......@@ -29,29 +29,34 @@ verl is fast with:
.. toctree::
:maxdepth: 5
:caption: Quickstart
:titlesonly:
:numbered:
start/install
start/quickstart
.. toctree::
:maxdepth: 4
:caption: Programming guide
hybrid_flow
.. toctree::
:maxdepth: 5
:caption: Data Preparation
:titlesonly:
:numbered:
preparation/prepare_data
preparation/reward_function
.. toctree::
:maxdepth: 5
:caption: Configurations
examples/config
.. toctree::
:maxdepth: 2
:caption: PPO Example
:titlesonly:
:numbered:
examples/ppo_code_architecture
examples/config
examples/gsm8k_example
.. toctree::
......
Prepare Data (Parquet) for Post-Training
Prepare Data for Post-Training
========================================
Before starting the post-training job, we need to prepare the data for
......
......@@ -15,9 +15,9 @@ verl supports various backends. Currently, the following configurations are avai
Training backends
------------------
We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in `PyTorch FSDP Backend <https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html>`_.
We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in :doc:`FSDP Workers<../workers/fsdp_workers>`.
For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM@core_v0.4.0 with some internal patches (soon be updated to latest version directly relying on upstream Megatron-LM). The guide for using Megatron-LM backend can be found in `Megatron-LM Backend <https://verl.readthedocs.io/en/latest/workers/megatron_workers.html>`_.
For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM v0.4 [1]_. The guide for using Megatron-LM backend can be found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
Install from docker image
......@@ -25,7 +25,7 @@ Install from docker image
We provide pre-built Docker images for quick setup.
Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` if you want to build your own image.
Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` for NGC-based image or if you want to build your own.
1. Launch the desired Docker image:
......@@ -85,53 +85,14 @@ own post-training jobs.
cd verl
pip3 install -e .
You can also install verl using ``pip3 install``
.. code:: bash
# directly install from pypi
pip3 install verl
Dependencies
------------
verl requires Python >= 3.9 and CUDA >= 12.1.
verl support various backend, we currently release FSDP and Megatron-LM
for actor training and vLLM for rollout generation.
The following dependencies are required for all backends, PyTorch FSDP and Megatron-LM.
The pros, cons and extension guide for using PyTorch FSDP backend can be
found in :doc:`FSDP Workers<../workers/fsdp_workers>`.
.. code:: bash
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install ray vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
# flash attention 2
pip3 install flash-attn --no-build-isolation
For users who pursue better scalability, we recommend using Megatron-LM
backend. Please install the above dependencies first.
Currently, we support Megatron-LM\@core_v0.4.0 and we fix some internal
issues of Megatron-LM. Here's the additional installation guide (optional).
The pros, cons and extension guide for using Megatron-LM backend can be
found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
Megatron is optional. It's dependencies can be setup as below:
.. code:: bash
# Megatron-LM Backend (optional)
# apex
pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
--config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
git+https://github.com/NVIDIA/apex
pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
git+https://github.com/NVIDIA/apex
# transformer engine
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7
......@@ -145,4 +106,7 @@ found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
cp ../verl/patches/megatron_v4.patch .
git apply megatron_v4.patch
pip3 install -e .
export PYTHONPATH=$PYTHONPATH:$(pwd)
\ No newline at end of file
export PYTHONPATH=$PYTHONPATH:$(pwd)
.. [1] Megatron v0.4 is supported with verl's patches to fix issues such as virtual pipeline hang. It will be soon updated with latest the version of upstream Megatron-LM without patches.
\ No newline at end of file
.. _quickstart:
=========================================================
Quickstart: Post-train a LLM using PPO with GSM8K dataset
Quickstart: PPO training on GSM8K dataset
=========================================================
Post-train a LLM using GSM8K dataset
===================================================================
Post-train a LLM using GSM8K dataset.
Introduction
------------
......@@ -52,9 +51,9 @@ We preprocess the dataset in parquet format so that (1) it contains necessary fi
Step 2: Download a model for post-training
-------------------------------------------
Usually we recommend starting with an "instruct" model variant so that the model follows instructions. In this example, we start with the ``Qwen2.5-0.5B-Instruct`` model.
In this example, we start with the ``Qwen2.5-0.5B-Instruct`` model.
If you start from a "base" model variant, doing SFT before RL is recommended. Refer to the `sft directory <https://github.com/volcengine/verl/blob/main/examples/sft/gsm8k>`_ and `SFT Trainer <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_ for further details.
If you want to perform SFT before RL, refer to the :doc:`Complete GSM8K Example<../examples/gsm8k_example>`, the `sft directory <https://github.com/volcengine/verl/blob/main/examples/sft/gsm8k>`_ and `SFT Trainer <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_ for further details.
.. code-block:: bash
......
......@@ -58,4 +58,4 @@ actor_output = actor_output.get()
```
bash run_deepseek7b_llm.sh
```
\ No newline at end of file
```
......@@ -14,10 +14,8 @@
"""
Contains a resharding manager that binds weights from FSDP zero3 to XPerfGPT
"""
from typing import Optional
from .base import BaseShardingManager
import random
from torch.distributed.device_mesh import DeviceMesh
from verl.utils.torch_functional import allgather_dict_tensors
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment