Unverified Commit 1d12fe31 by Blue Space Committed by GitHub

[doc] update megatron core_r0.11.0 documentation (#562)

Urgently update megatron core_r0.11.0 documentation.
parent 0a11fc62
...@@ -25,7 +25,7 @@ Install from docker image ...@@ -25,7 +25,7 @@ Install from docker image
We provide pre-built Docker images for quick setup. We provide pre-built Docker images for quick setup.
Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` for NGC-based image or if you want to build your own. Image and tag: ``whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6``. See files under ``docker/`` for NGC-based image or if you want to build your own.
1. Launch the desired Docker image: 1. Launch the desired Docker image:
...@@ -42,27 +42,36 @@ Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3` ...@@ -42,27 +42,36 @@ Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3`
git clone https://github.com/volcengine/verl && cd verl && pip3 install -e . git clone https://github.com/volcengine/verl && cd verl && pip3 install -e .
# or install from pypi via `pip3 install verl` # or install from pypi via `pip3 install verl`
.. note::
The Docker image is built with the following configurations:
3. Setup Megatron (optional) - **PyTorch**: 2.4.0+cu124
- **CUDA**: 12.4
- **Megatron-LM**: core_r0.11.0
- **vLLM**: 0.6.3
- **Ray**: 2.10.0
- **TransformerEngine**: 2.0.0+754d2a0
If you want to enable training with Megatron, Megatron code must be added to PYTHONPATH: Now verl has been **compatible to Megatron-LM core_r0.11.0**, and there is **no need to apply patches** to Megatron-LM. Also, the image has integrated **Megatron-LM core_r0.11.0**, located at ``/opt/nvidia/Meagtron-LM``. One more thing, because verl only use ``megatron.core`` module for now, there is **no need to modify** ``PATH`` if you have installed Megatron-LM, like this docker image.
.. code:: bash If you must use Megatron-LM **core_r0.4.0**, please refer to the old docker image version ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3`` in the `Docker Hub Repo: verlai/verl <https://hub.docker.com/r/verlai/verl/tags>`_, and apply the patches in the ``verl/patches`` folder.
cd ..
git clone -b core_v0.4.0 https://github.com/NVIDIA/Megatron-LM.git
cp verl/patches/megatron_v4.patch Megatron-LM/
cd Megatron-LM && git apply megatron_v4.patch
pip3 install -e .
export PYTHONPATH=$PYTHONPATH:$(pwd)
.. code-block:: bash
You can also get the Megatron code after verl's patch via cd ..
git clone -b core_v0.4.0 https://github.com/NVIDIA/Megatron-LM.git
cp verl/patches/megatron_v4.patch Megatron-LM/
cd Megatron-LM && git apply megatron_v4.patch
pip3 install -e .
export PYTHONPATH=$PYTHONPATH:$(pwd)
Or refer to patched Megatron-LM **core_r0.4.0**:
.. code:: bash .. code-block:: bash
git clone -b core_v0.4.0_verl https://github.com/eric-haibin-lin/Megatron-LM git clone -b core_v0.4.0_verl https://github.com/eric-haibin-lin/Megatron-LM
export PYTHONPATH=$PYTHONPATH:$(pwd)/Megatron-LM export PYTHONPATH=$PYTHONPATH:$(pwd)/Megatron-LM
Install from custom environment Install from custom environment
--------------------------------- ---------------------------------
...@@ -97,18 +106,12 @@ Megatron is optional. It's dependencies can be setup as below: ...@@ -97,18 +106,12 @@ Megatron is optional. It's dependencies can be setup as below:
git+https://github.com/NVIDIA/apex git+https://github.com/NVIDIA/apex
# transformer engine # transformer engine
pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7 pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@stable
# megatron core v0.4.0: clone and apply the patch # megatron core v0.4.0: clone and apply the patch
# You can also get the patched Megatron code patch via # You can also get the patched Megatron code patch via
# git clone -b core_v0.4.0_verl https://github.com/eric-haibin-lin/Megatron-LM # git clone -b core_v0.4.0_verl https://github.com/eric-haibin-lin/Megatron-LM
cd .. cd ..
git clone -b core_v0.4.0 https://github.com/NVIDIA/Megatron-LM.git git clone -b core_v0.11.0 https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM cd Megatron-LM
cp ../verl/patches/megatron_v4.patch .
git apply megatron_v4.patch
pip3 install -e . pip3 install -e .
export PYTHONPATH=$PYTHONPATH:$(pwd)
.. [1] Megatron v0.4 is supported with verl's patches to fix issues such as virtual pipeline hang. It will be soon updated with latest the version of upstream Megatron-LM without patches.
...@@ -196,5 +196,6 @@ additional initialization for the Optimizer. ...@@ -196,5 +196,6 @@ additional initialization for the Optimizer.
Context Parallel Context Parallel
---------------- ----------------
This require the developer/contributor to implement the context parallel Currently we can only use LLaMa and Qwen models implemented in verl, and context parallel is not supported by far.
both in Megatron-LM and models.
We are working in progress to support Megatron implementation of GPTModel, with TransformerEngine support. So if the itegration goes well, we can support Ulysses, Ring and AllGather context parallel in the future.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment