- 10 Mar, 2025 1 commit
-
-
Current bugs when enable hsdp: - **Incorrect Division in Batch Sizes** - `ppo_micro_batch`, `ppo_minibatch`, etc... should be divided by `self.device_mesh.size()` instead of `self.device_mesh.shape[0]`. - **Improper Weight Initialization** in `get_init_weight_context_manager` - The `get_init_weight_context_manager` function must initialize empty weights only on local_rank == 0 within every fsdp mesh. - When `sync_module_states=True`, PyTorch's FSDP first broadcasts parameters within the fsdp process group and then within the ddp process group. If weights are not initialized correctly on `local_rank == 0` of each fsdp mesh, the synchronization process may fail or produce incorrect results. https://github.com/pytorch/pytorch/blob/3f069e7679588d5ee4b1d5b2492ca0e20f9320b5/torch/distributed/fsdp/_init_utils.py#L614-L621 - Ensure initialization occurs only when `self.device_mesh.get_coordinate()[-1] == 0`, which corresponds to `local_rank == 0 `within each fsdp mesh.
zhr2001 committed
-
- 08 Mar, 2025 2 commits
-
-
Haosheng Zou (邹昊晟) committed
-
Lumeng Wu committed
-
- 07 Mar, 2025 8 commits
-
-
- [x] Add concurrency to workflows to cancel previous workflows when new commit is pushed to the same branch. - [ ] Cancel all workflows/jobs from the same commit if any fails? (Not sure whether we really need it) Note: we leave out `secrets_scan.yml` and `scorecard.yml` to avoid any possible leakage or security risk, which also cost little.
Shawn/Yuxuan Tong committed -
Since searching for an appropriate `simplify` algorithm may cause `sympy.simplify` to timeout, and `ProcessPool` may get stuck due to excessive concurrency, the timeout mechanism in `verl/verl/workers/reward_manager/prime.py` cannot capture the timeout. To address this issue, a timeout detection mechanism is added to `verl/verl/utils/reward_score/prime_math/__init__.py` for `sympy.simplify` to solve it easily.
Yuchen Zhang committed -
# Background In RLHFDataset, we filter out prompts that are too long. This requires apply_chat_template to the whole dataset, which is not scalable when the dataset is large. https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132 Instead of performing filtering online, we probably want to move this process offline and add an assertion to avoid truncation or simply perform truncation Reference: #502 # Key Changes - Add an option `data.filter_overlong_prompts=True \` to enable the above data filtering. The default value is set to False, but we enable it for all the example scripts. - Add an option `data.truncation` to truncate the input_ids or prompt length if they exceed max_prompt_length. The default is 'error', which does not allow the max_prompt_length to be exceeded. The users should increase the max_prompt_length if throwing the error. You can also set `left` and `right`. ### Suggestion for large-scale dataset. For large-scale datasets, filtering overlong prompts could be time-consuming. You should set `data.filtering_overlong_prompts=False` and set `truncation='left'`. Also, please note that you should increase `data.max_prompt_length` to avoid over-truncation of the prompts.
Guangming Sheng committed -
zhou fan committed
-
close #503
Joel committed -
Verl's megatron core_r0.11.0 backend successfully tested with 3D parallelism with multiple bug fixed (#495) This PR combines multiple modifications. # QWen2.5 checkpoint saver bug fix Thanks for the efforts @uygnef contributed to #368 , we use the new saver for model loader and saver for 3D parallelism support. # Megatron backend 3D-parallelism test benches We modify the scripts in `examples/ppo_trainer` and `tests/e2e`, as well as the CI workflows, all tested. # Bug Fix for 3D-parallelism Including configuration bugs as well as the module packing. Original TP VocabParallelEntropy can lead to CUDA OOM, we refactor the implementation with `torch.bmm`. # Fully migration to Megatron Core Now we only use Megatron core in verl, fully get rid of calling other components. If they are in need, please integrate them into `utils/megatron`. --------- Co-authored-by: uygnef <admin@fengyu.org>
Blue Space committed -
Willem Jiang committed
-
Joel committed
-
- 06 Mar, 2025 6 commits
-
-
This PR solves these 2 following problems. 1. Last step skipped `self.global_steps += 1` before if `self.global_steps >= self.total_training_steps` makes the last step skipped. We start from step 1, and we expect `self.total_training_steps` in total. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L999-L1001 When `self.global_steps == self.total_training_steps-1`: * we have only executed `self.total_training_steps-1` steps * `self.global_steps` is updated to `self.total_training_steps` * `self.global_steps >= self.total_training_steps` is satisfied, and the training ends. Therefore, we should put `self.global_steps += 1` at last 2. redundant validation and logging If `self.total_training_steps % self.config.trainer.test_freq == 0` : * `self._validate()` will be executed twice 1. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L984 2. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1005 * logging will also be executed twice 1. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L985 and https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L997 2. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1007
Lumeng Wu committed -
- Add allgather method to dataproto - Add tests - Replace existing raw allgather with this function
Chi Zhang committed -
Yusheng (Ethan) Su committed
-
In this PR, a `val_generations_to_log_to_swanlab` parameter has been added. When this parameter is set to 1, it supports logging the generated text from eval in SwanLab. @hiyouga --- This pull request introduces logging of validation generations to Swanlab in addition to Wandb. The changes include updates to several configuration files and the addition of a new logging method in the `ray_trainer.py` file. Key changes include: ### Configuration Updates: * Added `val_generations_to_log_to_swanlab` parameter to the `trainer` section in the following configuration files: * `examples/split_placement/config/ppo_trainer_split.yaml` * `verl/trainer/config/ppo_megatron_trainer.yaml` * `verl/trainer/config/ppo_trainer.yaml` ### Code Updates: * Added a new method `_maybe_log_val_generations_to_swanlab` to log validation samples to Swanlab in `verl/trainer/ppo/ray_trainer.py` * Updated the `_validate` method to call the new Swanlab logging method in `verl/trainer/ppo/ray_trainer.py` ---
Ze-Yi LIN committed -
### What does this PR do? In the `naive` mode, passing `extra_info` information for reward function calculation is supported(https://github.com/volcengine/verl/pull/266), but the support for the `prime` mode is missing. This will cause the reward functions that use `extra_info` to fail to produce correct results in the `prime` mode. This commit fixes this issue. ### Who can review? @PeterSH6 @vermouth1992 @hiyouga or other people who have the authority?
nomadlx committed -
Set timeout in CI to avoid infinite hang. close #468
Chi Zhang committed
-
- 05 Mar, 2025 5 commits
-
-
Chi Zhang committed
-
This pull request includes updates to the `docs/examples/config.rst` file to enhance the documentation for the `Trainer` configuration. The most important changes involve expanding the support for various logging platforms. Documentation updates: * [`docs/examples/config.rst`](diffhunk://#diff-f051f6df5187cb4805be686b3d10c480877a01e9a35ed98cd63cf8da6af03772L352-R354): Updated the descriptions for `trainer.project_name`, `trainer.experiment_name`, and `trainer.logger` to include support for additional logging platforms such as swanlab, mlflow, and tensorboard.
Ze-Yi LIN committed -
Add support for downloading models from modelscope by setting `VERL_USE_MODELSCOPE=True` --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
Hong Zhang committed -
HL committed
-
calculate mfu in update actor/critic when using megatron workers
Mingjie LIU committed
-
- 04 Mar, 2025 5 commits
-
-
hoshi-hiyouga committed
-
This PR is a continuing work of #448 , in order to support e2e CI for Ascend NPU.
Shuqiao Li committed -
add DeepRetrieval to README Awesome work
Patrick Jiang committed -
## What does this PR do? 1. Separate the prompt part and the response part in reward manager to avoid the reward leakage of format reward. 2. Update the reward score function for Geometry3k dataset. 3. Update the content in the readme file. ## Who can review? @vermouth1992 @PeterSH6
hoshi-hiyouga committed -
add ReSearch to README Awesome work
Mingyang Chen committed
-
- 03 Mar, 2025 5 commits
-
-
Shuqiao Li committed
-
## What does this PR do? This PR migrates the feature of RL on VLMs in our implementation in [EasyR1](https://github.com/hiyouga/EasyR1) fork back to veRL. We have validated this feature using Qwen2.5-VL 7B model on 8*H100 GPUs. The configuration and data processing script are provided along this PR for easy reproducing. ## How to reproduce? 1. Download and preprocess the dataset ```bash python3 examples/data_preprocess/geo3k.py --local_dir ~/data/geo3k ``` 2. Start GRPO training ```bash bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh ``` ## Dependencies - vllm>=0.7.3 - transformers>=4.49.0 - [qwen-vl-utils](https://pypi.org/project/qwen-vl-utils/) - [mathruler](https://pypi.org/project/mathruler/) ## Major Changes ### New dataflow for multimodal RL In this PR, we introduce two new concepts in the dataflow, `multi_modal_data` and `multi_modal_inputs`. The former means the multi-modal features required by the **rollout** worker (such as vLLM), while the latter means the multi-modal features required by the **actor/critic** worker (such as an HF model). They are different because the rollout and actor workers have their own data format requirements. Taking Qwen2-VL + huggingface + vLLM as an example, the data structure should be: - **multi_modal_data**: {"image": [PIL.Image, PIL.Image, ...]} - **multi_modal_inputs**: {"pixel_values": torch.Tensor, "image_grid_thw": torch.Tensor} Both of them are converted to numpy objects and placed in the non-tensor batch in DataProto. This design can be extended to other modalities/VLMs easily due to the agnostic of models. ### Other changes - Data - Support pre-processing the [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k) dataset. - Support `config.data.image_key`, which should be **a list of Pillow images**. - Actor/Ref/Critic - Support `multi_modal_inputs`. - Process position ids to adapt to the m-rope . - Rollout - Update dtensor weight loader to adapt to the Qwen2-VL architecture in vLLM 0.7+. - Support `multi_modal_data`. - Use `raw_prompt_ids` as the vLLM inputs to **avoid unpadding** the input ids. - Reward Manager - Add **mathruler** for more accurate math scores on the Geometry 3k dataset - Models - Support calculating the position ids for the m-rope in Qwen2-VL. - Support removing padding in flash attention2 for m-rope (transformers itself **does not support it**). - Sharding Manager - Support all-gathering the non-tensor batch. - FSDP Workers / Checkpoint Merger - Support `AutoModelForVision2Seq` at model initialization. Note: The Ulysses parallelism is not completed yet. We will support it in the next update. ## Performance We provide the estimated MFU of the language model part for H100 GPUs. These values are lower than the actual ones because **we did not compute the FLOPs of the vision tower part**. - `remove_padding=False`: MFU ~7% - `remove_padding=True`: MFU ~20% The training and test reward score curves are presented as follows.  ## Who can review? @vermouth1992 @PeterSH6
hoshi-hiyouga committed -
forget to update params in generation.yaml #259
BearBiscuit committed -
# Support Megatron mcore 0.11 ## Description This PR introduces official support for Megatron mcore 0.11 with the following updates: - Upgraded Megatron to version `core_r0.11.0` - Applied compatibility patch `patches/mcore_r0.11.patch` - Removed legacy version support for cleaner implementation Special thanks to @chendong-1998 for: - Original Megatron upgrade from 0.4 to 0.6 (#93f6a7e) ## Compatibility Notes Current implementation requires careful handling due to dependency conflicts: - `megatron-core==0.11.0` requires torch>=2.6 - `vllm==0.6.3` requires torch==2.4 Installation constraints: 1. Must use vllm's torch dependency (2.4) as baseline 2. Do NOT run `pip install -e .` in mcore directory (will upgrade torch to 2.6) 3. Apply compatibility patch manually after installation ## Testing ### test with `verl/examples/ppo_trainer/run_deepseek_megatron.sh`  --------- Signed-off-by: chendong-1998 <chendong136@huawei.com> Co-authored-by: chendong-1998 <chendong136@huawei.com> Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com> Co-authored-by: Sion Gao <gaoziyuan19@mails.ucas.ac.cn>
Yan Bai committed -
HL committed
-
- 02 Mar, 2025 6 commits
-
-
Reverts volcengine/verl#314
Chi Zhang committed -
Weizhe Chen committed
-
ZSL98 committed
-
Specify the IP address when calling the bind method.
Willem Jiang committed -
Guangming Sheng committed
-
Now APIs can be displayed: 
HL committed
-
- 01 Mar, 2025 2 commits
-
-
Lumeng Wu committed
-
Because of the ongoing updates in vLLM, I noticed that veRL currently cannot integrate with the nightly build of vLLM directly. The new DP feature in the nightly version can no longer be bypassed by simply adjusting the `data_parallel_size` parameter, and resolving this requires further investigation. As a temporary workaround, I recommend a customized installation of vLLM if the V1 engine is required. I have updated the relevant documentation accordingly to reflect this guidance.
ZSL98 committed
-