- 09 Feb, 2025 1 commit
-
-
We have implemented the REINFORCE++ algorithm. To use it, specify the parameter `algorithm.adv_estimator=reinforce_plus_plus`. Preliminary performance evaluations were conducted within the [Unakar/Logic-RL](https://github.com/Unakar/Logic-RL) project, a reproduction of DeepSeek R1 Zero on the 2K Tiny Logic Puzzle Dataset. Results indicate that our REINFORCE++ implementation exhibits performance and training stability comparable to, or potentially exceeding, that of PPO and GRPO. Related issue: #68
4332001876 committed
-
- 08 Feb, 2025 3 commits
-
-
**Features:** - Save actor and critic checkpoint: - Model - Optimizer - lr_scheduler - rng_state - dataloader - A complete checkpoint represents that dataloader, actor and critic (if any) state are properly saved - By default, we will not save the dataset but only store the dataloader (with sampler) state **Usage:** - Support resume mode: auto, disable and resume_from_path - auto: veRL will automatically check the latest checkpoint from `trainer.default_local_dir` - disable: veRL will always train from scratch - resume_from_path: When setting `resume_from_path`=True, then user only need to set the resume_mode to the checkpoint path that you want to load. **TODO:** - Support SFT resume in the next PR - Support uploader **Relevant issue:** - https://github.com/volcengine/verl/issues/76 - https://github.com/volcengine/verl/issues/143
Guangming Sheng committed -
Fix typo tips in bash sft. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
湛露先生 committed -
Existing `logprobs_from_logits_v2` doesnt achieve the memory savings it claims. This is because `logsumexp` still allocates a `bs*seqlen*vocab` tensor internally to hold the element-wise application of `exp`. However, by applying a loop over `logsumexp`, we can iteratively compute logsumexp outputs. Benchmarks show this uses significantly less memory to compute logprobs. Fix provided, as well as a separate memory-efficient approach for bfloat16 case.
Tyler Romero committed
-
- 07 Feb, 2025 3 commits
-
-
[TRACKING] feat: Integrate SwanLab for experiment tracking with online/offline mode and local dashboard support (#218) --- ### Pull Request Description This PR introduces **SwanLab**, a lightweight open-source experiment tracking tool, as a new logging option for the training framework. The integration provides both online and offline tracking capabilities, along with a local dashboard for visualizing results. Below is a detailed overview of the changes and usage instructions: --- #### **Key Features of SwanLab Integration** 1. **Online and Offline Tracking**: - **Online Mode**: Track experiments remotely and store data on SwanLab's cloud platform. - **Offline Mode**: Use a local dashboard to visualize training logs without an internet connection. 2. **Hardware Monitoring**: - Automatically tracks GPU usage, power consumption, temperature, and other hardware metrics. - Supports NVIDIA GPUs and Huawei Ascend NPUs. 3. **Remote Access**: - View training progress remotely via the SwanLab web interface or mobile app. 4. **Local Dashboard**: - Includes an open-source local dashboard for offline visualization of training logs. --- #### **Usage Instructions** ##### **Step 1: Set Up Online Tracking (Optional)** To use SwanLab's online tracking, log in to the [SwanLab website](https://swanlab.cn) and obtain your API key from the [Settings page](https://swanlab.cn/space/~/settings). Then, authenticate using the following command: ```bash swanlab login ``` If you prefer offline mode, skip this step. --- ##### **Step 2: Configure SwanLab as the Logger** To enable SwanLab as the experiment tracker, add `trainer.logger=['swanlab']` to your training command. For example, using the [Post-train a LLM using PPO with GSM8K dataset](https://verl.readthedocs.io/en/latest/start/quickstart.html) workflow: ```bash PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.val_batch_size=1312 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=4 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=['console','swanlab'] \ +trainer.val_before_train=False \ trainer.default_hdfs_dir=null \ trainer.n_gpus_per_node=1 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.total_epochs=15 2>&1 | tee verl_demo.log ``` If you are not logged in, you will be prompted to choose a tracking mode: 1. **Cloud Mode**: Upload logs to SwanLab's cloud platform. 2. **Cloud-Only Mode**: Upload logs to the cloud but do not save them locally. 3. **Local Mode**: Save logs locally for offline tracking. <img width="1325" alt="select" src="https://github.com/user-attachments/assets/5c55fc45-79a9-4673-ae4e-ea9d0623dd29" /> Alternatively, you can configure SwanLab using environment variables: ```bash export SWANLAB_API_KEY=<your_api_key> # Set API key for online tracking export SWANLAB_LOG_DIR=<local_log_path> # Set local log directory export SWANLAB_MODE=<mode> # Set tracking mode: cloud (default), cloud-only, local, or disabled ``` --- ##### **Step 3: View Training Logs** After logging in, you will see a confirmation message: <img width="1415" alt="track" src="https://github.com/user-attachments/assets/87c4ff2f-c8c4-4e7a-a41e-21afa935cb56" /> - **Online Tracking**: View logs on the [SwanLab website](https://swanlab.cn). <img width="1900" alt="remote" src="https://github.com/user-attachments/assets/5b44b9f3-948f-4f93-9873-572bce56daf7" /> For more details, refer to the [SwanLab Cloud Documentation](https://docs.swanlab.cn/guide_cloud/experiment_track/view-result.html). - **Offline Tracking**: Use the local dashboard to visualize logs: ```bash swanlab watch ``` For advanced configurations, such as setting a custom port, refer to the [Offline Dashboard Documentation](https://docs.swanlab.cn/guide_cloud/self_host/offline-board.html) and [CLI Documentation](https://docs.swanlab.cn/api/cli-swanlab-watch.html#%E8%AE%BE%E7%BD%AEip%E5%92%8C%E7%AB%AF%E5%8F%A3%E5%8F%B7). --- #### **Impact** - Provides a lightweight, flexible, and user-friendly experiment tracking solution. - Supports both online and offline use cases, making it suitable for environments with restricted internet access. - Enhances hardware monitoring capabilities for better resource utilization. --- This PR is ready for review. Feedback and suggestions are welcome!
Shaohon Chen committed -
This PR addresses issue https://github.com/volcengine/verl/issues/212. The changes include: - read eos_token_id from generation_config to ensure alignment with vLLM - modified the get_eos_mask function to accept both int and list types for the eos_token parameter.
Kinman Lei committed -
- Support FSDPCheckpointManager - Support hdfs_io import if installed - Add CI for FSDPCheckpointManager TODO: - Will integrate in the next PR
Guangming Sheng committed
-
- 06 Feb, 2025 3 commits
-
-
Install the scorecard workflow
Willem Jiang committed -
use the general purpose LLM for the math task instead of code LLM. --------- Co-authored-by: Your Name <you@example.com>
HL committed -
HL committed
-
- 05 Feb, 2025 7 commits
-
-
- As titled - Relevant: https://github.com/volcengine/verl/issues/181
Guangming Sheng committed -
- As titled
Guangming Sheng committed -
- Move config to a class method of `RayPPOTrainer` - Fix config problem when adv_estimator=grpo - Add GRPO e2e CI
Chi Zhang committed -
Chi Zhang committed
-
https://github.com/volcengine/verl/pull/182 add a assert statement to make sure flash-attn>=2.4.3 where cross_entropy_loss returns Tuple[losses, z_losses]🤯
be betterest committed -
This PR is similar to PR https://github.com/volcengine/verl/pull/174 but fix the critic save error I move the old PR to this one due to some redundant commit
Wei Xiong committed -
sry missed this last one, should be it cc @vermouth1992 Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>
jaysonfrancis committed
-
- 04 Feb, 2025 3 commits
-
-
runs always show "crashed" on my wandb, despite finishing successfully. "Crashed" indicates that wandb did not finish sending the "success" signal to the server so the server believes the client was terminated unexpectedly. Furthermore, wandb log is incomplete (last lines missing). This PR adds a call to `wandb.finish` when the Tracker was destructed (oftentimes when `trainer.fit` finished) so that signals are sent to the server and a data sync is performed. Without this change: <img width="526" alt="image" src="https://github.com/user-attachments/assets/869da24e-c5b8-415c-b15a-bb79c49f96ce" /> With this change: <img width="548" alt="image" src="https://github.com/user-attachments/assets/16f0a40d-ea3b-48ed-93a4-f40ee01cb7c6" />
Long(Tony) Lian committed -
Neil Chowdhury committed
-
Co-authored-by: Jayson Francis <jaysonfrancis@users.noreply.github.com>
jaysonfrancis committed
-
- 03 Feb, 2025 4 commits
-
-
This PR adds documentation for the LigerKernel option in a new performance tuning section, addressing the comment from volcengine/verl#173. Changes: - Created new performance tuning section in docs - Documented LigerKernel option for SFT - Added performance tuning section to documentation index Related to volcengine/verl#173 --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: HL <linhaibin.eric@gmail.com>
Xingyao Wang committed -
runnning -> running
Ikko Eltociear Ashimine committed -
HL committed
-
HL committed
-
- 02 Feb, 2025 1 commit
-
-
Chujie Zheng committed
-
- 01 Feb, 2025 2 commits
-
-
since 'lighteval/MATH' is no longer available on huggingface.
HL committed -
- As titled
Guangming Sheng committed
-
- 31 Jan, 2025 4 commits
-
-
HL committed
-
Xingyao Wang committed
-
Chujie Zheng committed
-
 --------- Co-authored-by: HL <linhaibin.eric@gmail.com>
dignfei committed
-
- 30 Jan, 2025 8 commits
-
-
HL committed
-
This is a follow-up to https://github.com/volcengine/verl/issues/151 ## Motivation Currently, in order to add a custom score function you need to fork verl and update the `_select_rm_score_fn` to define your logic. This makes it harder to use verl as part of a larger application while staying up to date with upstream improvements in verl. It would be convenient to allow end users to directly pass in a reward function they wish to use, without requiring them to clone/fork verl to do so. ## Design In this PR I slightly modify `main_ppo.py` to allow users to import a new function `run_ppo`. `run_ppo` behaves very similarly to the existing `main`, with the important addition of a new `compute_score` argument. This argument, if passed in, is used to compute the score of every generation. This is the change that allows The `compute_score` function is similar in shape to the existing `compute_score` on gsm8k and math. However, I have added a new `data_source` parameter so that the user can compute the score differently if desired depending on the task shape. ## Example Usage This is a sample script showing how you can use the new functionality. I have tested that this works. ```python from verl.trainer.main_ppo import run_ppo from omegaconf import OmegaConf def custom_compute_score(data_source, solution_str, ground_truth): """Dummy compute_score function that reward the model for generations of exactly 20 characters :) """ return abs(len(solution_str) - 20) config = OmegaConf.load("vendor/verl/verl/trainer/config/ppo_trainer.yaml") # Update config as needed config.data.train_files = "path/to/train.parquet" config.data.val_files = "path/to/test.parquet" # ... run_ppo(config, custom_compute_score) ``` ## Breaking changes There are no breaking changes in this PR. It is still possible to call `python -m verl.trainer.main_ppo ...` as before (although if you want to pass in a custom compute_score you will need to use the new method described above). ## Possible future work It would be great to move to [structured configs](https://omegaconf.readthedocs.io/en/2.1_branch/structured_config.html) as well since they'd allow us to have typesafe, autocompletable configurations from Python. I thought about adding those changes here as well but they would be much more extensive and I'm not sure whether there's interest from the project.
Kyle Corbitt committed -
Franz Srambical committed
-
Franz Srambical committed
-
## Summary This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance` to init a fsdp worker model. ## Main Changes 1. Adding an option of using `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model from pretained, instead of the default `transformers.AutoModelForCausalLM` 2. Added a test case using configuration file `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh` ## Related Issue #96 ## TODO #97 optimize the memory usage when computing entropy & log_probs https://github.com/volcengine/verl/blob/6d96fda3d47f057caaa8f494ca7804181903e911/verl/workers/actor/dp_actor.py#L94-L106 --------- Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Hongpeng Guo committed -
The logits is of shape `(bsz, response_length, vocab_size)`. This PR doesn't change any code execution, but explicitly show the logits shape and easier for readers to understand the code. Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Hongpeng Guo committed -
Add contribution guide
Chi Zhang committed -
Chi Zhang committed
-
- 29 Jan, 2025 1 commit
-
-
`token_level_rewards == (token_level_rewards * non_zero_mask)`
Franz Srambical committed
-