Unverified Commit ced8ecbf by HL Committed by GitHub

example: switch the default model ckpt for Megatron, add wandb logs (#210)

use the general purpose LLM for the math task instead of code LLM.

---------

Co-authored-by: Your Name <you@example.com>
parent 22d56a8b
...@@ -69,7 +69,7 @@ Checkout this [Jupyter Notebook](https://github.com/volcengine/verl/tree/main/ex ...@@ -69,7 +69,7 @@ Checkout this [Jupyter Notebook](https://github.com/volcengine/verl/tree/main/ex
- [Run GSM8K Example](https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html) - [Run GSM8K Example](https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html)
**Reproducible algorithm baselines:** **Reproducible algorithm baselines:**
- [PPO](https://verl.readthedocs.io/en/latest/experiment/ppo.html) - [PPO and GRPO](https://verl.readthedocs.io/en/latest/experiment/ppo.html)
**For code explanation and advance usage (extension):** **For code explanation and advance usage (extension):**
- PPO Trainer and Workers - PPO Trainer and Workers
......
# veRL documents # verl documents
## Build the docs ## Build the docs
......
...@@ -3,7 +3,7 @@ Extend to other RL(HF) algorithms ...@@ -3,7 +3,7 @@ Extend to other RL(HF) algorithms
We already implemented the complete training pipeline of the PPO We already implemented the complete training pipeline of the PPO
algorithms. To extend to other algorithms, we analyze the high-level algorithms. To extend to other algorithms, we analyze the high-level
principle to use veRL and provide a tutorial to implement the DPO principle to use verl and provide a tutorial to implement the DPO
algorithm. Users can follow the similar paradigm to extend to other RL algorithms. algorithm. Users can follow the similar paradigm to extend to other RL algorithms.
.. note:: **Key ideas**: Single process drives multi-process computation and data communication. .. note:: **Key ideas**: Single process drives multi-process computation and data communication.
...@@ -26,7 +26,7 @@ Step 3: Utilize the encapsulated APIs to implement the control flow ...@@ -26,7 +26,7 @@ Step 3: Utilize the encapsulated APIs to implement the control flow
Example: Online DPO Example: Online DPO
------------------- -------------------
We use veRL to implement a simple online DPO algorithm. The algorithm We use verl to implement a simple online DPO algorithm. The algorithm
flow of Online DPO is as follows: flow of Online DPO is as follows:
1. There is a prompt (rollout) generator which has the same weight as 1. There is a prompt (rollout) generator which has the same weight as
...@@ -178,7 +178,7 @@ steps: ...@@ -178,7 +178,7 @@ steps:
and merge them. and merge them.
Frequently calling these 3 steps on the controller process greatly hurts Frequently calling these 3 steps on the controller process greatly hurts
code readability. **In veRL, we have abstracted and encapsulated these 3 code readability. **In verl, we have abstracted and encapsulated these 3
steps, so that the worker's method + dispatch + collect can be steps, so that the worker's method + dispatch + collect can be
registered into the worker_group** registered into the worker_group**
......
...@@ -31,7 +31,7 @@ ...@@ -31,7 +31,7 @@
# -- Project information ----------------------------------------------------- # -- Project information -----------------------------------------------------
project = u'veRL' project = u'verl'
# pylint: disable=W0622 # pylint: disable=W0622
copyright = u'2024 ByteDance Seed Foundation MLSys Team' copyright = u'2024 ByteDance Seed Foundation MLSys Team'
author = u'Guangming Sheng, Chi Zhang, Yanghua Peng, Haibin Lin' author = u'Guangming Sheng, Chi Zhang, Yanghua Peng, Haibin Lin'
......
...@@ -200,7 +200,7 @@ Define, init and run the PPO Trainer ...@@ -200,7 +200,7 @@ Define, init and run the PPO Trainer
on the allocated GPUs (in the resource pool) on the allocated GPUs (in the resource pool)
- The actual PPO training will be executed in ``trainer.fit()`` - The actual PPO training will be executed in ``trainer.fit()``
veRL can be easily extended to other RL algorithms by reusing the Ray verl can be easily extended to other RL algorithms by reusing the Ray
model workers, resource pool and reward functions. See :doc:`extension<../advance/dpo_extension>` for model workers, resource pool and reward functions. See :doc:`extension<../advance/dpo_extension>` for
more information. more information.
......
...@@ -11,22 +11,32 @@ Assuming GSM8k dataset is preprocess via ``python3 examples/data_preprocess/gsm8 ...@@ -11,22 +11,32 @@ Assuming GSM8k dataset is preprocess via ``python3 examples/data_preprocess/gsm8
Refer to the table below to reproduce PPO training from different pre-trained models. Refer to the table below to reproduce PPO training from different pre-trained models.
.. _Huggingface: https://huggingface.co/google/gemma-2-2b-it#benchmark-results .. _Huggingface: https://huggingface.co/google/gemma-2-2b-it#benchmark-results
.. _SFT Command and logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/gemma-2-2b-it-sft-0.411.log .. _SFT Command and Logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/gemma-2-2b-it-sft-0.411.log
.. _SFT+PPO Command and logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/gemma-2-2b-it-ppo-bsz512_4-prompt1024-resp-512-0.640.log .. _SFT+PPO Command and Logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/gemma-2-2b-it-ppo-bsz512_4-prompt1024-resp-512-0.640.log
.. _wandb: https://api.wandb.ai/links/verl-team/h7ux8602 .. _wandb: https://api.wandb.ai/links/verl-team/h7ux8602
.. _Qwen Blog: https://qwenlm.github.io/blog/qwen2.5-llm/ .. _Qwen Blog: https://qwenlm.github.io/blog/qwen2.5-llm/
.. _PPO Command and logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/Qwen2.5-0.5B-bsz256_2-prompt1024-resp512-0.567.log .. _PPO Command and Logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/Qwen2.5-0.5B-bsz256_2-prompt1024-resp512-0.567.log
.. _Megatron PPO Command and Logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/deepseek-llm-7b-chat-megatron-bsz256_4-prompt512-resp512-0.695.log
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+ .. _Qwen7b GRPO Script: https://github.com/volcengine/verl/blob/a65c9157bc0b85b64cd753de19f94e80a11bd871/examples/grpo_trainer/run_qwen2-7b_seq_balance.sh
| Model | Method | Test score | Details | .. _Megatron wandb: https://wandb.ai/verl-team/verl_megatron_gsm8k_examples/runs/10fetyr3
+============================+========================+============+=====================+=========================================================================+
| google/gemma-2-2b-it | pretrained checkpoint | 23.9 | `Huggingface`_ | +----------------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+ | Model | Method | Test score | Details |
| google/gemma-2-2b-it | SFT | 52.06 | `SFT Command and logs`_ | +==================================+========================+============+=====================+=========================================================================+
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+ | google/gemma-2-2b-it | pretrained checkpoint | 23.9 | `Huggingface`_ |
| google/gemma-2-2b-it | SFT + PPO | 64.02 | `SFT+PPO Command and logs`_, `wandb`_ | +----------------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+ | google/gemma-2-2b-it | SFT | 52.06 | `SFT Command and Logs`_ |
| Qwen/Qwen2.5-0.5B-Instruct | pretrained checkpoint | 36.4 | `Qwen Blog`_ | +----------------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+ | google/gemma-2-2b-it | SFT + PPO | 64.02 | `SFT+PPO Command and Logs`_, `wandb`_ |
| Qwen/Qwen2.5-0.5B-Instruct | PPO | 56.7 | `PPO Command and logs`_ | +----------------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+ | Qwen/Qwen2.5-0.5B-Instruct | pretrained checkpoint | 36.4 | `Qwen Blog`_ |
\ No newline at end of file +----------------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
| Qwen/Qwen2.5-0.5B-Instruct | PPO | 56.7 | `PPO Command and Logs`_ |
+----------------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
| deepseek-ai/deepseek-llm-7b-chat | PPO | 69.5 [1]_ | `Megatron PPO Command and Logs`_, `Megatron wandb`_ |
+----------------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
| Qwen/Qwen2-7B-Instruct | GRPO | 89 | `Qwen7b GRPO Script`_ |
+----------------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
.. [1] During the evaluation, we have only extracted answers following the format "####". A more flexible answer exaction, longer response length and better prompt engineering may lead to higher score.
\ No newline at end of file
Welcome to veRL's documentation! Welcome to verl's documentation!
================================================ ================================================
.. _hf_arxiv: https://arxiv.org/pdf/2409.19256 .. _hf_arxiv: https://arxiv.org/pdf/2409.19256
veRL is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs) post-training. It is an open source implementation of the `HybridFlow <hf_arxiv>`_ paper. verl is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs) post-training. It is an open source implementation of the `HybridFlow <hf_arxiv>`_ paper.
veRL is flexible and easy to use with: verl is flexible and easy to use with:
- **Easy extension of diverse RL algorithms**: The Hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code. - **Easy extension of diverse RL algorithms**: The Hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code.
...@@ -16,9 +16,9 @@ veRL is flexible and easy to use with: ...@@ -16,9 +16,9 @@ veRL is flexible and easy to use with:
- Readily integration with popular HuggingFace models - Readily integration with popular HuggingFace models
veRL is fast with: verl is fast with:
- **State-of-the-art throughput**: By seamlessly integrating existing SOTA LLM training and inference frameworks, veRL achieves high generation and training throughput. - **State-of-the-art throughput**: By seamlessly integrating existing SOTA LLM training and inference frameworks, verl achieves high generation and training throughput.
- **Efficient actor model resharding with 3D-HybridEngine**: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases. - **Efficient actor model resharding with 3D-HybridEngine**: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.
...@@ -92,7 +92,7 @@ veRL is fast with: ...@@ -92,7 +92,7 @@ veRL is fast with:
Contribution Contribution
------------- -------------
veRL is free software; you can redistribute it and/or modify it under the terms verl is free software; you can redistribute it and/or modify it under the terms
of the Apache License 2.0. We welcome contributions. of the Apache License 2.0. We welcome contributions.
Join us on `GitHub <https://github.com/volcengine/verl>`_, `Slack <https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA>`_ and `Wechat <https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG>`_ for discussions. Join us on `GitHub <https://github.com/volcengine/verl>`_, `Slack <https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA>`_ and `Wechat <https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG>`_ for discussions.
......
Performance Tuning Guide Performance Tuning Guide
========================= =========================
In this section, we will discuss how to tune the performance of all the stages in veRL, including: In this section, we will discuss how to tune the performance of all the stages in verl, including:
1. Rollout generation throughput. 1. Rollout generation throughput.
...@@ -16,7 +16,7 @@ In this section, we will discuss how to tune the performance of all the stages i ...@@ -16,7 +16,7 @@ In this section, we will discuss how to tune the performance of all the stages i
Rollout Generation Tuning Rollout Generation Tuning
-------------------------- --------------------------
veRL currently supports two rollout backends: vLLM and TGI (with SGLang support coming soon). verl currently supports two rollout backends: vLLM and TGI (with SGLang support coming soon).
Below are key factors for tuning vLLM-based rollout. Before tuning, we recommend setting ``actor_rollout_ref.rollout.disable_log_stats=False`` so that rollout statistics are logged. Below are key factors for tuning vLLM-based rollout. Before tuning, we recommend setting ``actor_rollout_ref.rollout.disable_log_stats=False`` so that rollout statistics are logged.
...@@ -45,7 +45,7 @@ Batch Size Tuning ...@@ -45,7 +45,7 @@ Batch Size Tuning
To achieve higher throughput in experience preparation (i.e., model fwd) and model update (i.e., actor/critic fwd/bwd), To achieve higher throughput in experience preparation (i.e., model fwd) and model update (i.e., actor/critic fwd/bwd),
users may need to tune the ``*micro_batch_size_per_gpu`` for different computation. users may need to tune the ``*micro_batch_size_per_gpu`` for different computation.
In veRL, the core principle for setting batch sizes is: In verl, the core principle for setting batch sizes is:
- **Algorithmic metrics** (train batch size, PPO mini-batch size) are *global* (from a single-controller perspective), - **Algorithmic metrics** (train batch size, PPO mini-batch size) are *global* (from a single-controller perspective),
normalized in each worker. See the `normalization code <https://github.com/volcengine/verl/blob/main/verl/workers/fsdp_workers.py#L120-L122>`_. normalized in each worker. See the `normalization code <https://github.com/volcengine/verl/blob/main/verl/workers/fsdp_workers.py#L120-L122>`_.
......
...@@ -7,7 +7,7 @@ Requirements ...@@ -7,7 +7,7 @@ Requirements
- **Python**: Version >= 3.9 - **Python**: Version >= 3.9
- **CUDA**: Version >= 12.1 - **CUDA**: Version >= 12.1
veRL supports various backends. Currently, the following configurations are available: verl supports various backends. Currently, the following configurations are available:
- **FSDP** and **Megatron-LM** (optional) for training. - **FSDP** and **Megatron-LM** (optional) for training.
- **vLLM** adn **TGI** for rollout generation, **SGLang** support coming soon. - **vLLM** adn **TGI** for rollout generation, **SGLang** support coming soon.
...@@ -34,7 +34,7 @@ Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3` ...@@ -34,7 +34,7 @@ Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3`
docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v <image:tag> docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v <image:tag>
2. Inside the container, install veRL: 2. Inside the container, install verl:
.. code:: bash .. code:: bash
...@@ -74,7 +74,7 @@ To manage environment, we recommend using conda: ...@@ -74,7 +74,7 @@ To manage environment, we recommend using conda:
conda create -n verl python==3.9 conda create -n verl python==3.9
conda activate verl conda activate verl
For installing the latest version of veRL, the best way is to clone and For installing the latest version of verl, the best way is to clone and
install it from source. Then you can modify our code to customize your install it from source. Then you can modify our code to customize your
own post-training jobs. own post-training jobs.
...@@ -85,7 +85,7 @@ own post-training jobs. ...@@ -85,7 +85,7 @@ own post-training jobs.
cd verl cd verl
pip3 install -e . pip3 install -e .
You can also install veRL using ``pip3 install`` You can also install verl using ``pip3 install``
.. code:: bash .. code:: bash
...@@ -95,9 +95,9 @@ You can also install veRL using ``pip3 install`` ...@@ -95,9 +95,9 @@ You can also install veRL using ``pip3 install``
Dependencies Dependencies
------------ ------------
veRL requires Python >= 3.9 and CUDA >= 12.1. verl requires Python >= 3.9 and CUDA >= 12.1.
veRL support various backend, we currently release FSDP and Megatron-LM verl support various backend, we currently release FSDP and Megatron-LM
for actor training and vLLM for rollout generation. for actor training and vLLM for rollout generation.
The following dependencies are required for all backends, PyTorch FSDP and Megatron-LM. The following dependencies are required for all backends, PyTorch FSDP and Megatron-LM.
......
set -x set -x
# prepare pre-trained model ckpt
huggingface-cli download deepseek-ai/deepseek-llm-7b-chat --local-dir $HOME/models/deepseek-llm-7b-chat
# ``actor_rollout_ref.rollout.tensor_model_parallel_size`` in theory could be different from
# ``**.megatron.tensor_model_parallel_size``
# the config file used: verl/trainer/main_ppo/config/ppo_megatron_trainer.yaml # the config file used: verl/trainer/main_ppo/config/ppo_megatron_trainer.yaml
python3 -m verl.trainer.main_ppo --config-path=config \ python3 -m verl.trainer.main_ppo --config-path=config \
...@@ -10,19 +16,22 @@ python3 -m verl.trainer.main_ppo --config-path=config \ ...@@ -10,19 +16,22 @@ python3 -m verl.trainer.main_ppo --config-path=config \
data.val_batch_size=1312 \ data.val_batch_size=1312 \
data.max_prompt_length=512 \ data.max_prompt_length=512 \
data.max_response_length=512 \ data.max_response_length=512 \
actor_rollout_ref.model.path=deepseek-ai/deepseek-coder-6.7b-instruct \ actor_rollout_ref.model.path=$HOME/models/deepseek-llm-7b-chat \
actor_rollout_ref.actor.optim.lr=2e-6 \ actor_rollout_ref.actor.optim.lr=2e-6 \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.megatron.tensor_model_parallel_size=4 \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.ref.megatron.tensor_model_parallel_size=4 \
critic.optim.lr=2e-5 \ critic.optim.lr=2e-5 \
critic.model.path=deepseek-ai/deepseek-coder-6.7b-instruct \ critic.model.path=$HOME/models/deepseek-llm-7b-chat \
critic.model.enable_gradient_checkpointing=False \ critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size_per_gpu=8 \ critic.ppo_micro_batch_size_per_gpu=4 \
critic.megatron.tensor_model_parallel_size=4 \
algorithm.kl_ctrl.kl_coef=0.001 \ algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \ trainer.critic_warmup=0 \
trainer.logger=['console','wandb'] \ trainer.logger=['console','wandb'] \
......
...@@ -8,13 +8,13 @@ ...@@ -8,13 +8,13 @@
"source": [ "source": [
"# Run Qwen PPO with [verl](https://github.com/volcengine/verl)\n", "# Run Qwen PPO with [verl](https://github.com/volcengine/verl)\n",
"\n", "\n",
"This tutorial provides a step-by-step guide to using veRL for executing your RLHF pipeline. You can find our [github repo](https://github.com/volcengine/verl/) and [documentation](https://verl.readthedocs.io/en/latest/index.html) for mode details.\n", "This tutorial provides a step-by-step guide to using verl for executing your RLHF pipeline. You can find our [github repo](https://github.com/volcengine/verl/) and [documentation](https://verl.readthedocs.io/en/latest/index.html) for mode details.\n",
"\n", "\n",
"This notebook is also published on the [Lightning Studio](https://lightning.ai/hlin-verl/studios/verl-getting-started) platform, which provides free GPU quota every month. Checkout the published notebook with pre-installed dependencies using a free L4 GPU [here](https://lightning.ai/hlin-verl/studios/verl-getting-started) (no credit card required).\n", "This notebook is also published on the [Lightning Studio](https://lightning.ai/hlin-verl/studios/verl-getting-started) platform, which provides free GPU quota every month. Checkout the published notebook with pre-installed dependencies using a free L4 GPU [here](https://lightning.ai/hlin-verl/studios/verl-getting-started) (no credit card required).\n",
"\n", "\n",
"### You will learn:\n", "### You will learn:\n",
"\n", "\n",
"- How to install veRL from scratch.\n", "- How to install verl from scratch.\n",
"- How to use existing scripts to run an RLHF pipeline with your own models and data." "- How to use existing scripts to run an RLHF pipeline with your own models and data."
] ]
}, },
......
...@@ -18,7 +18,7 @@ name = "verl" ...@@ -18,7 +18,7 @@ name = "verl"
# The actual version is specified in the [tool.setuptools.dynamic] section below. # The actual version is specified in the [tool.setuptools.dynamic] section below.
dynamic = ["version"] dynamic = ["version"]
description = "veRL: Volcano Engine Reinforcement Learning for LLM" description = "verl: Volcano Engine Reinforcement Learning for LLM"
license = {file = "LICENSE"} # or "Apache-2.0", if you prefer an SPDX identifier license = {file = "LICENSE"} # or "Apache-2.0", if you prefer an SPDX identifier
readme = {file = "README.md", content-type = "text/markdown"} readme = {file = "README.md", content-type = "text/markdown"}
requires-python = ">=3.8" requires-python = ">=3.8"
......
...@@ -43,7 +43,7 @@ setup( ...@@ -43,7 +43,7 @@ setup(
license='Apache 2.0', license='Apache 2.0',
author='Bytedance - Seed - MLSys', author='Bytedance - Seed - MLSys',
author_email='zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk', author_email='zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk',
description='veRL: Volcano Engine Reinforcement Learning for LLM', description='verl: Volcano Engine Reinforcement Learning for LLM',
install_requires=install_requires, install_requires=install_requires,
extras_require=extras_require, extras_require=extras_require,
package_data={'': ['version/*'], package_data={'': ['version/*'],
......
...@@ -206,7 +206,7 @@ def initialize_model_parallel( ...@@ -206,7 +206,7 @@ def initialize_model_parallel(
backend = backend or torch.distributed.get_backend() backend = backend or torch.distributed.get_backend()
# NOTE(sgm) we don't assert world_size == tp * pp # NOTE(sgm) we don't assert world_size == tp * pp
# DP is not managed by vllm but by the veRL WorkerGroup # DP is not managed by vllm but by the verl WorkerGroup
num_tensor_model_parallel_groups: int = (world_size // tensor_model_parallel_size) num_tensor_model_parallel_groups: int = (world_size // tensor_model_parallel_size)
num_pipeline_model_parallel_groups: int = (world_size // pipeline_model_parallel_size) num_pipeline_model_parallel_groups: int = (world_size // pipeline_model_parallel_size)
......
...@@ -224,7 +224,7 @@ def initialize_model_parallel( ...@@ -224,7 +224,7 @@ def initialize_model_parallel(
backend = backend or torch.distributed.get_backend(ps.get_world_group().device_group) backend = backend or torch.distributed.get_backend(ps.get_world_group().device_group)
# NOTE(sgm) we don't assert world_size == tp * pp # NOTE(sgm) we don't assert world_size == tp * pp
# DP is not managed by vllm but by the veRL WorkerGroup # DP is not managed by vllm but by the verl WorkerGroup
# if (world_size != # if (world_size !=
# tensor_model_parallel_size * pipeline_model_parallel_size): # tensor_model_parallel_size * pipeline_model_parallel_size):
# raise RuntimeError( # raise RuntimeError(
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment