- 12 Feb, 2025 2 commits
- 11 Feb, 2025 2 commits
-
-
HL committed
-
The original Ray controller method `execute_rank_zero_sync()` is not functional. Fixed.
ExtremeViscent committed
-
- 10 Feb, 2025 5 commits
-
-
give main_task num_cpus=1 and make sure that main_task should not be scheduled on head
Chi Zhang committed -
## Description Added [ReMax](https://arxiv.org/abs/2310.10505) support to verl. ReMax is a simple, efficient, and stable RL algorithm customized for LLM training, with theoretical guarantees for variance reduction. The [HybridFlow](https://arxiv.org/pdf/2409.19256v2) paper experimented with ReMax, but verl did not provide an implementation. Therefore, ReMax has been added. ## Changes - Added RayReMaxTrainer implementation - Added example scripts for ReMax training - Added documentation for ReMax algorithm ## Testing - Tested ReMax example scripts with Qwen models validation reward of optimizing Qwen2.5-3B-Instruct on the GSM8K dataset: <img width="501" alt="截屏2025-02-09 20 51 14" src="https://github.com/user-attachments/assets/742c2eab-6877-4c3c-b0a2-4159bd109add" /> The curve demonstrates the effectiveness of ReMax, though its performance can be further enhanced through hyperparameter fine-tuning. ## Documentation - Added ReMax documentation - Updated example configurations ## Checklist - [x] Code follows project's style guidelines (yapf formatted) - [x] Tests added/updated and passing - [x] Documentation updated - [x] Example scripts added
Ziniu Li committed -
Add stronger verification support as is used in https://github.com/PRIME-RL/PRIME - [x] Batched verification - [x] Python interpreter - [x] Stronger math verifier - [x] Continuous score for code test Re-opening https://github.com/volcengine/verl/pull/207 to trigger automatic workflows
Zefan Wang committed -
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
湛露先生 committed -
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
湛露先生 committed
-
- 09 Feb, 2025 7 commits
-
-
- As titled
Guangming Sheng committed -
Closes #227
Pan Yinxu committed -
## Motivation Often the summary of average/max/min reward is not enough information, and it's helpful to look at some real-world generations to see how the model's actual behavior is changing over time. This can be particularly helpful for debugging issues like the generation being cut off before reasoning finishes. ## Change This PR introduces a new `trainer.val_generations_to_log_to_wandb` config value, with a default of 0. If set to a number larger than 0, it logs that number of inputs/outputs/scores each time the validation set is generated and scored. It uses a [wandb Table](https://docs.wandb.ai/guides/track/log/log-tables/) to do so, adding a single row for each validation set run. I choose to log the data in this format because it allows a user to easily see how the outputs for a given input change over time by looking down a column vertically. ## Screenshot <img width="1106" alt="Screenshot 2025-01-31 at 8 02 47 AM" src="https://github.com/user-attachments/assets/f2ec0079-8464-4735-ad63-d71f349f4332" /> Note: if there's already another way to accomplish this easily let me know! I was surprised not to find a way to see sample generations because I find that quite useful, so let me know if I'm missing something.
Kyle Corbitt committed -
- As titled
Guangming Sheng committed -
add requirements to make some CI tests work
Zefan Wang committed -
HL committed
-
We have implemented the REINFORCE++ algorithm. To use it, specify the parameter `algorithm.adv_estimator=reinforce_plus_plus`. Preliminary performance evaluations were conducted within the [Unakar/Logic-RL](https://github.com/Unakar/Logic-RL) project, a reproduction of DeepSeek R1 Zero on the 2K Tiny Logic Puzzle Dataset. Results indicate that our REINFORCE++ implementation exhibits performance and training stability comparable to, or potentially exceeding, that of PPO and GRPO. Related issue: #68
4332001876 committed
-
- 08 Feb, 2025 3 commits
-
-
**Features:** - Save actor and critic checkpoint: - Model - Optimizer - lr_scheduler - rng_state - dataloader - A complete checkpoint represents that dataloader, actor and critic (if any) state are properly saved - By default, we will not save the dataset but only store the dataloader (with sampler) state **Usage:** - Support resume mode: auto, disable and resume_from_path - auto: veRL will automatically check the latest checkpoint from `trainer.default_local_dir` - disable: veRL will always train from scratch - resume_from_path: When setting `resume_from_path`=True, then user only need to set the resume_mode to the checkpoint path that you want to load. **TODO:** - Support SFT resume in the next PR - Support uploader **Relevant issue:** - https://github.com/volcengine/verl/issues/76 - https://github.com/volcengine/verl/issues/143
Guangming Sheng committed -
Fix typo tips in bash sft. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
湛露先生 committed -
Existing `logprobs_from_logits_v2` doesnt achieve the memory savings it claims. This is because `logsumexp` still allocates a `bs*seqlen*vocab` tensor internally to hold the element-wise application of `exp`. However, by applying a loop over `logsumexp`, we can iteratively compute logsumexp outputs. Benchmarks show this uses significantly less memory to compute logprobs. Fix provided, as well as a separate memory-efficient approach for bfloat16 case.
Tyler Romero committed
-
- 07 Feb, 2025 3 commits
-
-
[TRACKING] feat: Integrate SwanLab for experiment tracking with online/offline mode and local dashboard support (#218) --- ### Pull Request Description This PR introduces **SwanLab**, a lightweight open-source experiment tracking tool, as a new logging option for the training framework. The integration provides both online and offline tracking capabilities, along with a local dashboard for visualizing results. Below is a detailed overview of the changes and usage instructions: --- #### **Key Features of SwanLab Integration** 1. **Online and Offline Tracking**: - **Online Mode**: Track experiments remotely and store data on SwanLab's cloud platform. - **Offline Mode**: Use a local dashboard to visualize training logs without an internet connection. 2. **Hardware Monitoring**: - Automatically tracks GPU usage, power consumption, temperature, and other hardware metrics. - Supports NVIDIA GPUs and Huawei Ascend NPUs. 3. **Remote Access**: - View training progress remotely via the SwanLab web interface or mobile app. 4. **Local Dashboard**: - Includes an open-source local dashboard for offline visualization of training logs. --- #### **Usage Instructions** ##### **Step 1: Set Up Online Tracking (Optional)** To use SwanLab's online tracking, log in to the [SwanLab website](https://swanlab.cn) and obtain your API key from the [Settings page](https://swanlab.cn/space/~/settings). Then, authenticate using the following command: ```bash swanlab login ``` If you prefer offline mode, skip this step. --- ##### **Step 2: Configure SwanLab as the Logger** To enable SwanLab as the experiment tracker, add `trainer.logger=['swanlab']` to your training command. For example, using the [Post-train a LLM using PPO with GSM8K dataset](https://verl.readthedocs.io/en/latest/start/quickstart.html) workflow: ```bash PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.val_batch_size=1312 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=4 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=['console','swanlab'] \ +trainer.val_before_train=False \ trainer.default_hdfs_dir=null \ trainer.n_gpus_per_node=1 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.total_epochs=15 2>&1 | tee verl_demo.log ``` If you are not logged in, you will be prompted to choose a tracking mode: 1. **Cloud Mode**: Upload logs to SwanLab's cloud platform. 2. **Cloud-Only Mode**: Upload logs to the cloud but do not save them locally. 3. **Local Mode**: Save logs locally for offline tracking. <img width="1325" alt="select" src="https://github.com/user-attachments/assets/5c55fc45-79a9-4673-ae4e-ea9d0623dd29" /> Alternatively, you can configure SwanLab using environment variables: ```bash export SWANLAB_API_KEY=<your_api_key> # Set API key for online tracking export SWANLAB_LOG_DIR=<local_log_path> # Set local log directory export SWANLAB_MODE=<mode> # Set tracking mode: cloud (default), cloud-only, local, or disabled ``` --- ##### **Step 3: View Training Logs** After logging in, you will see a confirmation message: <img width="1415" alt="track" src="https://github.com/user-attachments/assets/87c4ff2f-c8c4-4e7a-a41e-21afa935cb56" /> - **Online Tracking**: View logs on the [SwanLab website](https://swanlab.cn). <img width="1900" alt="remote" src="https://github.com/user-attachments/assets/5b44b9f3-948f-4f93-9873-572bce56daf7" /> For more details, refer to the [SwanLab Cloud Documentation](https://docs.swanlab.cn/guide_cloud/experiment_track/view-result.html). - **Offline Tracking**: Use the local dashboard to visualize logs: ```bash swanlab watch ``` For advanced configurations, such as setting a custom port, refer to the [Offline Dashboard Documentation](https://docs.swanlab.cn/guide_cloud/self_host/offline-board.html) and [CLI Documentation](https://docs.swanlab.cn/api/cli-swanlab-watch.html#%E8%AE%BE%E7%BD%AEip%E5%92%8C%E7%AB%AF%E5%8F%A3%E5%8F%B7). --- #### **Impact** - Provides a lightweight, flexible, and user-friendly experiment tracking solution. - Supports both online and offline use cases, making it suitable for environments with restricted internet access. - Enhances hardware monitoring capabilities for better resource utilization. --- This PR is ready for review. Feedback and suggestions are welcome!
Shaohon Chen committed -
This PR addresses issue https://github.com/volcengine/verl/issues/212. The changes include: - read eos_token_id from generation_config to ensure alignment with vLLM - modified the get_eos_mask function to accept both int and list types for the eos_token parameter.
Kinman Lei committed -
- Support FSDPCheckpointManager - Support hdfs_io import if installed - Add CI for FSDPCheckpointManager TODO: - Will integrate in the next PR
Guangming Sheng committed
-
- 06 Feb, 2025 3 commits
-
-
Install the scorecard workflow
Willem Jiang committed -
use the general purpose LLM for the math task instead of code LLM. --------- Co-authored-by: Your Name <you@example.com>
HL committed -
HL committed
-
- 05 Feb, 2025 7 commits
-
-
- As titled - Relevant: https://github.com/volcengine/verl/issues/181
Guangming Sheng committed -
- As titled
Guangming Sheng committed -
- Move config to a class method of `RayPPOTrainer` - Fix config problem when adv_estimator=grpo - Add GRPO e2e CI
Chi Zhang committed -
Chi Zhang committed
-
https://github.com/volcengine/verl/pull/182 add a assert statement to make sure flash-attn>=2.4.3 where cross_entropy_loss returns Tuple[losses, z_losses]🤯
be betterest committed -
This PR is similar to PR https://github.com/volcengine/verl/pull/174 but fix the critic save error I move the old PR to this one due to some redundant commit
Wei Xiong committed -
sry missed this last one, should be it cc @vermouth1992 Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>
jaysonfrancis committed
-
- 04 Feb, 2025 3 commits
-
-
runs always show "crashed" on my wandb, despite finishing successfully. "Crashed" indicates that wandb did not finish sending the "success" signal to the server so the server believes the client was terminated unexpectedly. Furthermore, wandb log is incomplete (last lines missing). This PR adds a call to `wandb.finish` when the Tracker was destructed (oftentimes when `trainer.fit` finished) so that signals are sent to the server and a data sync is performed. Without this change: <img width="526" alt="image" src="https://github.com/user-attachments/assets/869da24e-c5b8-415c-b15a-bb79c49f96ce" /> With this change: <img width="548" alt="image" src="https://github.com/user-attachments/assets/16f0a40d-ea3b-48ed-93a4-f40ee01cb7c6" />
Long(Tony) Lian committed -
Neil Chowdhury committed
-
Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>
jaysonfrancis committed
-
- 03 Feb, 2025 4 commits
-
-
This PR adds documentation for the LigerKernel option in a new performance tuning section, addressing the comment from volcengine/verl#173. Changes: - Created new performance tuning section in docs - Documented LigerKernel option for SFT - Added performance tuning section to documentation index Related to volcengine/verl#173 --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: HL <linhaibin.eric@gmail.com>
Xingyao Wang committed -
runnning -> running
Ikko Eltociear Ashimine committed -
HL committed
-
HL committed
-
- 02 Feb, 2025 1 commit
-
-
Chujie Zheng committed
-