- 16 Feb, 2025 2 commits
- 15 Feb, 2025 7 commits
-
-
The split placement example is outdated, I tried it and encountered some errors. To address this, the following changes were made in this PR 1. Copied the content from `verl/trainer/config/ppo_trainer.yaml` to `examples/split_placement/config/ppo_trainer_split.yaml` 2. Copied `RayPPOTrainer.fit` method into the `fit` func in `examples/split_placement/split_monkey_patch.py` and modified it to get the futures of `critic_output` and `actor_output`
zhou fan committed -
HL committed
-
Co-authored-by: zhangshulai <zhangshulai@bytedance.com>
ZSL98 committed -
Related issue: https://github.com/volcengine/verl/issues/273 - Add `remove_previous_ckpt_in_save` and `del_local_ckpt_after_load` configuration option in `ppo_trainer.yaml` - Update `RayPPOTrainer` to support optional checkpoint deletion during loading - Modify `ActorRolloutRefWorker` and `CriticWorker` to pass checkpoint removal flag
Zhiwei He committed -
- Rollout back vllm version (vllm > 0.7.0 only for testing) - pyext as an extra requirement
Guangming Sheng committed -
Currently, checkpoints will not be saved until the training steps satisfy the saving frequency. This PR adds an auto-save ckpt function at the end of training.
Zhihan committed -
Add model merger to save checkpoints in the format of .safetensor and push them to the huggingface (#262) This PR introduces a new script, `scripts/model_merger.py`, which enables the conversion of model checkpoints saved in `.pt` format to the `.safetensors` format. The script also includes functionality to optionally push the converted model to Hugging Face Hub. ### Changes: 1. Added `scripts/model_merger.py` to handle the conversion process. 2. Implemented support for `.pt` to `.safetensors` transformation. 3. Added an option to push the converted model to Hugging Face Hub if required.
Zhihan committed
-
- 14 Feb, 2025 5 commits
-
-
湛露先生 committed
-
This PR aims to integrate vllm>=0.7.0 and preserve: **Backward compatibility**: 0.3.1, 0.4.2, 0.5.4, 0.6.3 are still supported **Forward compatibility**: Future versions of vllm (>= 0.7.0) will be supported without requiring manual maintenance for each new release. The readme of this Beta version is located at docs/README_vllm0.7.md, where users can find the installation method and related features. This readme is copied as below. --- # Readme for verl(vllm>=0.7) version ## Installation Note: This version of veRL supports **FSDP** for training and **vLLM** for rollout. (Megatron-LM is not supported yet.) ``` # Create the conda environment conda create -n verl python==3.10 conda activate verl # Install verl git clone https://github.com/volcengine/verl.git cd verl pip3 install -e . # Install vLLM>=0.7 pip3 install vllm==0.7.0 # Install flash-attn pip3 install flash-attn --no-build-isolation ``` For existing stable vllm versions (<=0.7.2), you also need to make some tiny patches manually on vllm (/path/to/site-packages/vllm after installation) after the above steps: - vllm/distributed/parallel_state.py: Remove the assertion below: ``` if (world_size != tensor_model_parallel_size * pipeline_model_parallel_size): raise RuntimeError( f"world_size ({world_size}) is not equal to " f"tensor_model_parallel_size ({tensor_model_parallel_size}) x " f"pipeline_model_parallel_size ({pipeline_model_parallel_size})") ``` - vllm/executor/uniproc_executor.py: change `local_rank = rank` to `local_rank = int(os.environ["LOCAL_RANK"])` - vllm/model_executor/model_loader/weight_utils.py: remove the `torch.cuda.empty_cache()` in `pt_weights_iterator` These modifications have already been merged into the main branch of vLLM. To avoid modifying these files manually, you can directly build vLLM from source. ## Features ### Use cuda graph After installation, examples using FSDP as training backends can be used. By default, the `enforce_eager` is set to True, which disables the cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add the following lines to the bash script: ``` actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ ``` For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 115 seconds with vLLM0.6.3, while it is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds. **Note:** Currently, if the `n` is greater than 1 in `SamplingParams` in vLLM>=0.7, there is a potential performance issue on the stability of rollout generation time (Some iterations would see generation time bursts). We are working with the vLLM team to check this issue. ### Other features in vLLM 1. **num_scheduler_step>1:** not supported yet (weight loading has not been aligned with `MultiStepModelRunner`) 2. **Prefix caching:** not supported yet (vLLM sleep mode does not support prefix caching) 3. **Chunked prefill:** supported --------- Co-authored-by: zhangshulai <zhangshulai@bytedance.com>
ZSL98 committed -
Previous FileLock in https://github.com/volcengine/verl/blob/c46f403479db5d7afca6388800503a3bfe393bf5/verl/utils/checkpoint/checkpoint_manager.py#L75 may cause some errors when the given path is too long. To fix this issue, use the hash value to replace the original path to avoid the conflict. For instance, FileExistsEror: lErmno 17] File exists or BlockingIOError: [Errno 11] Resource temporarily unavailable. After modifying this part, the issue could be avoided. ``` @staticmethod def local_mkdir(path): if not os.path.isabs(path): working_dir = os.getcwd() path = os.path.join(working_dir, path) # Using hash value of path as lock file name to avoid long file name lock_filename = f"ckpt_{hash(path) & 0xFFFFFFFF:08x}.lock" lock_path = os.path.join(tempfile.gettempdir(), lock_filename) try: with FileLock(lock_path, timeout=60): # Add timeout # make a new dir os.makedirs(path, exist_ok=True) except Exception as e: print(f"Warning: Failed to acquire lock for {path}: {e}") # Even if the lock is not acquired, try to create the directory os.makedirs(path, exist_ok=True) return path ```
Wei Liu committed -
**Fix: Compatibility Issue with Python 3.9 in FSDP Worker for LLaMA Model** When running the LLaMA model in the FSDP worker, an ImportError occurs due to the use of the Unpack type from the typing module. This type is only available in Python 3.11 and later, but the current environment uses Python 3.9, which does not support it. **Error Details:** ``` File "/project/Logic-RL-main/verl/models/transformers/llama.py", line 17, in <module> from typing import Optional, List, Union, Tuple, Unpack, Callable ImportError: cannot import name 'Unpack' from 'typing' (/opt/miniconda3/envs/verl/lib/python3.9/typing.py) ``` **Solution:** To resolve this issue, I added conditional imports to handle different Python versions. For Python versions lower than 3.11, the code now uses a fallback or alternative approach to avoid relying on Unpack. Co-authored-by: Yu Feng <fengyufengyu@didiglobal.com>
Yu Feng committed -
- As titled
Guangming Sheng committed
-
- 13 Feb, 2025 1 commit
-
-
- As titled
Guangming Sheng committed
-
- 12 Feb, 2025 4 commits
- 11 Feb, 2025 2 commits
-
-
HL committed
-
The original Ray controller method `execute_rank_zero_sync()` is not functional. Fixed.
ExtremeViscent committed
-
- 10 Feb, 2025 5 commits
-
-
give main_task num_cpus=1 and make sure that main_task should not be scheduled on head
Chi Zhang committed -
## Description Added [ReMax](https://arxiv.org/abs/2310.10505) support to verl. ReMax is a simple, efficient, and stable RL algorithm customized for LLM training, with theoretical guarantees for variance reduction. The [HybridFlow](https://arxiv.org/pdf/2409.19256v2) paper experimented with ReMax, but verl did not provide an implementation. Therefore, ReMax has been added. ## Changes - Added RayReMaxTrainer implementation - Added example scripts for ReMax training - Added documentation for ReMax algorithm ## Testing - Tested ReMax example scripts with Qwen models validation reward of optimizing Qwen2.5-3B-Instruct on the GSM8K dataset: <img width="501" alt="截屏2025-02-09 20 51 14" src="https://github.com/user-attachments/assets/742c2eab-6877-4c3c-b0a2-4159bd109add" /> The curve demonstrates the effectiveness of ReMax, though its performance can be further enhanced through hyperparameter fine-tuning. ## Documentation - Added ReMax documentation - Updated example configurations ## Checklist - [x] Code follows project's style guidelines (yapf formatted) - [x] Tests added/updated and passing - [x] Documentation updated - [x] Example scripts added
Ziniu Li committed -
Add stronger verification support as is used in https://github.com/PRIME-RL/PRIME - [x] Batched verification - [x] Python interpreter - [x] Stronger math verifier - [x] Continuous score for code test Re-opening https://github.com/volcengine/verl/pull/207 to trigger automatic workflows
Zefan Wang committed -
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
湛露先生 committed -
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
湛露先生 committed
-
- 09 Feb, 2025 7 commits
-
-
- As titled
Guangming Sheng committed -
Closes #227
Pan Yinxu committed -
## Motivation Often the summary of average/max/min reward is not enough information, and it's helpful to look at some real-world generations to see how the model's actual behavior is changing over time. This can be particularly helpful for debugging issues like the generation being cut off before reasoning finishes. ## Change This PR introduces a new `trainer.val_generations_to_log_to_wandb` config value, with a default of 0. If set to a number larger than 0, it logs that number of inputs/outputs/scores each time the validation set is generated and scored. It uses a [wandb Table](https://docs.wandb.ai/guides/track/log/log-tables/) to do so, adding a single row for each validation set run. I choose to log the data in this format because it allows a user to easily see how the outputs for a given input change over time by looking down a column vertically. ## Screenshot <img width="1106" alt="Screenshot 2025-01-31 at 8 02 47 AM" src="https://github.com/user-attachments/assets/f2ec0079-8464-4735-ad63-d71f349f4332" /> Note: if there's already another way to accomplish this easily let me know! I was surprised not to find a way to see sample generations because I find that quite useful, so let me know if I'm missing something.
Kyle Corbitt committed -
- As titled
Guangming Sheng committed -
add requirements to make some CI tests work
Zefan Wang committed -
HL committed
-
We have implemented the REINFORCE++ algorithm. To use it, specify the parameter `algorithm.adv_estimator=reinforce_plus_plus`. Preliminary performance evaluations were conducted within the [Unakar/Logic-RL](https://github.com/Unakar/Logic-RL) project, a reproduction of DeepSeek R1 Zero on the 2K Tiny Logic Puzzle Dataset. Results indicate that our REINFORCE++ implementation exhibits performance and training stability comparable to, or potentially exceeding, that of PPO and GRPO. Related issue: #68
4332001876 committed
-
- 08 Feb, 2025 3 commits
-
-
**Features:** - Save actor and critic checkpoint: - Model - Optimizer - lr_scheduler - rng_state - dataloader - A complete checkpoint represents that dataloader, actor and critic (if any) state are properly saved - By default, we will not save the dataset but only store the dataloader (with sampler) state **Usage:** - Support resume mode: auto, disable and resume_from_path - auto: veRL will automatically check the latest checkpoint from `trainer.default_local_dir` - disable: veRL will always train from scratch - resume_from_path: When setting `resume_from_path`=True, then user only need to set the resume_mode to the checkpoint path that you want to load. **TODO:** - Support SFT resume in the next PR - Support uploader **Relevant issue:** - https://github.com/volcengine/verl/issues/76 - https://github.com/volcengine/verl/issues/143
Guangming Sheng committed -
Fix typo tips in bash sft. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
湛露先生 committed -
Existing `logprobs_from_logits_v2` doesnt achieve the memory savings it claims. This is because `logsumexp` still allocates a `bs*seqlen*vocab` tensor internally to hold the element-wise application of `exp`. However, by applying a loop over `logsumexp`, we can iteratively compute logsumexp outputs. Benchmarks show this uses significantly less memory to compute logprobs. Fix provided, as well as a separate memory-efficient approach for bfloat16 case.
Tyler Romero committed
-
- 07 Feb, 2025 3 commits
-
-
[TRACKING] feat: Integrate SwanLab for experiment tracking with online/offline mode and local dashboard support (#218) --- ### Pull Request Description This PR introduces **SwanLab**, a lightweight open-source experiment tracking tool, as a new logging option for the training framework. The integration provides both online and offline tracking capabilities, along with a local dashboard for visualizing results. Below is a detailed overview of the changes and usage instructions: --- #### **Key Features of SwanLab Integration** 1. **Online and Offline Tracking**: - **Online Mode**: Track experiments remotely and store data on SwanLab's cloud platform. - **Offline Mode**: Use a local dashboard to visualize training logs without an internet connection. 2. **Hardware Monitoring**: - Automatically tracks GPU usage, power consumption, temperature, and other hardware metrics. - Supports NVIDIA GPUs and Huawei Ascend NPUs. 3. **Remote Access**: - View training progress remotely via the SwanLab web interface or mobile app. 4. **Local Dashboard**: - Includes an open-source local dashboard for offline visualization of training logs. --- #### **Usage Instructions** ##### **Step 1: Set Up Online Tracking (Optional)** To use SwanLab's online tracking, log in to the [SwanLab website](https://swanlab.cn) and obtain your API key from the [Settings page](https://swanlab.cn/space/~/settings). Then, authenticate using the following command: ```bash swanlab login ``` If you prefer offline mode, skip this step. --- ##### **Step 2: Configure SwanLab as the Logger** To enable SwanLab as the experiment tracker, add `trainer.logger=['swanlab']` to your training command. For example, using the [Post-train a LLM using PPO with GSM8K dataset](https://verl.readthedocs.io/en/latest/start/quickstart.html) workflow: ```bash PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.val_batch_size=1312 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=4 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=['console','swanlab'] \ +trainer.val_before_train=False \ trainer.default_hdfs_dir=null \ trainer.n_gpus_per_node=1 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.total_epochs=15 2>&1 | tee verl_demo.log ``` If you are not logged in, you will be prompted to choose a tracking mode: 1. **Cloud Mode**: Upload logs to SwanLab's cloud platform. 2. **Cloud-Only Mode**: Upload logs to the cloud but do not save them locally. 3. **Local Mode**: Save logs locally for offline tracking. <img width="1325" alt="select" src="https://github.com/user-attachments/assets/5c55fc45-79a9-4673-ae4e-ea9d0623dd29" /> Alternatively, you can configure SwanLab using environment variables: ```bash export SWANLAB_API_KEY=<your_api_key> # Set API key for online tracking export SWANLAB_LOG_DIR=<local_log_path> # Set local log directory export SWANLAB_MODE=<mode> # Set tracking mode: cloud (default), cloud-only, local, or disabled ``` --- ##### **Step 3: View Training Logs** After logging in, you will see a confirmation message: <img width="1415" alt="track" src="https://github.com/user-attachments/assets/87c4ff2f-c8c4-4e7a-a41e-21afa935cb56" /> - **Online Tracking**: View logs on the [SwanLab website](https://swanlab.cn). <img width="1900" alt="remote" src="https://github.com/user-attachments/assets/5b44b9f3-948f-4f93-9873-572bce56daf7" /> For more details, refer to the [SwanLab Cloud Documentation](https://docs.swanlab.cn/guide_cloud/experiment_track/view-result.html). - **Offline Tracking**: Use the local dashboard to visualize logs: ```bash swanlab watch ``` For advanced configurations, such as setting a custom port, refer to the [Offline Dashboard Documentation](https://docs.swanlab.cn/guide_cloud/self_host/offline-board.html) and [CLI Documentation](https://docs.swanlab.cn/api/cli-swanlab-watch.html#%E8%AE%BE%E7%BD%AEip%E5%92%8C%E7%AB%AF%E5%8F%A3%E5%8F%B7). --- #### **Impact** - Provides a lightweight, flexible, and user-friendly experiment tracking solution. - Supports both online and offline use cases, making it suitable for environments with restricted internet access. - Enhances hardware monitoring capabilities for better resource utilization. --- This PR is ready for review. Feedback and suggestions are welcome!
Shaohon Chen committed -
This PR addresses issue https://github.com/volcengine/verl/issues/212. The changes include: - read eos_token_id from generation_config to ensure alignment with vLLM - modified the get_eos_mask function to accept both int and list types for the eos_token parameter.
Kinman Lei committed -
- Support FSDPCheckpointManager - Support hdfs_io import if installed - Add CI for FSDPCheckpointManager TODO: - Will integrate in the next PR
Guangming Sheng committed
-
- 06 Feb, 2025 1 commit
-
-
Install the scorecard workflow
Willem Jiang committed
-