Unverified Commit 3fe77fa7 by Xingyao Wang Committed by GitHub

docs: Add LigerKernel performance tuning documentation (#178)

This PR adds documentation for the LigerKernel option in a new
performance tuning section, addressing the comment from
volcengine/verl#173.

Changes:
- Created new performance tuning section in docs
- Documented LigerKernel option for SFT
- Added performance tuning section to documentation index

Related to volcengine/verl#173

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: HL <linhaibin.eric@gmail.com>
parent 13762f43
......@@ -11,6 +11,8 @@ In this section, we will discuss how to tune the performance of all the stages i
4. Utilize Ulysses Sequence Parallel for Long Context Training
5. LigerKernel for SFT performance optimization
Rollout Generation Tuning
--------------------------
......@@ -119,3 +121,20 @@ To utilize this technique, users can set ``ulysses_sequence_parallel_size>1`` in
We support different model utilize different ulysses_sequence_parallel_size sizes.
To train log sequence (>32k), users may need to decrease the ``*micro_batch_size_per_gpu`` and ``*max_token_len_per_gpu`` to avoid OOM.
LigerKernel for SFT
----------------------
LigerKernel is a high-performance kernel for Supervised Fine-Tuning (SFT) that can improve training efficiency. To enable LigerKernel in your SFT training:
1. In your SFT configuration file (e.g., ``verl/trainer/config/sft_trainer.yaml``), set the ``use_liger`` parameter:
.. code-block:: yaml
model:
use_liger: True # Enable LigerKernel for SFT
2. The default value is ``False``. Enable it only when you want to use LigerKernel's optimizations.
3. LigerKernel is particularly useful for improving training performance in SFT scenarios.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment