- 28 Feb, 2025 1 commit
-
-
Willem Jiang committed
-
- 27 Feb, 2025 6 commits
-
-
Add tensorboard in Tracking backends. The user can set the environment variable TENSORBOARD_DIR to specify the TensorBoard log path.
Hongji Zhu committed -
Chi Zhang committed
-
The current training script utilizes the same file during training and evaluation. It is surmised that this may be incorrect.
yaguang committed -
[ckpt] replace DataLoader with StatefulDataLoader to support resume training for SequentialSampler (#389) Try to resolve this [issue](https://github.com/volcengine/verl/issues/356). As suggested by this issue discussion, I replace default DataLoader with StatefulDataloader, which provides state_dict and load_state_dict methods that may support resuming the iterator position of mid-epoch checkpointing.
alexchiu committed -
Thanks: @HillZhang1999 - Related issue: https://github.com/volcengine/verl/issues/189 `[36m(main_task pid=3523385)[0m ValueError: max_num_batched_tokens (8192) is smaller than max_model_len (9216). This effectively limits the maximum sequence length to max_num_batched_tokens and makes vLLM reject longer sequences. Please increase max_num_batched_tokens or decrease max_model_len.` When enable_chunked_prefill is activated, the aforementioned issue will be concealed. Please increase `max_num_batched_tokens` or `decrease max_model_len`.
Guangming Sheng committed -
Chi Zhang committed
-
- 26 Feb, 2025 2 commits
-
-
apis: add data proto to documentation page. use copy_to_local instead of copy_local_path_from_hdfs (#358)
HL committed -
- As titled
Guangming Sheng committed
-
- 25 Feb, 2025 4 commits
-
-
See issue: https://github.com/volcengine/verl/issues/342
Mingjie Liu committed -
#369 --------- Co-authored-by: Thom <zhangyi@zhangyideMacBook-Pro.local>
_T_L_R_ committed -
kriswang committed
-
Chi Zhang committed
-
- 24 Feb, 2025 5 commits
-
-
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
湛露先生 committed -
BearBiscuit committed
-
close #312 Add support for ulysses sp for transformers >= 0.48 I've tested transformers 0.45.0, 0.46.0, 0.47.0, 0.48.0 and 0.49.0, using sp=2 with the following script in my local env ```bash #!/bin/bash set -ex VERSIONS=("4.45.0" "4.46.0" "4.47.0" "4.48.0" "4.49.0") for version in "${VERSIONS[@]}"; do echo "Testing with Transformers version ${version}" echo "----------------------------------------" pip install "transformers==${version}" PYTHONPATH=./ torchrun --nproc_per_node=2 tests/model/test_transformers_ulysses.py echo "----------------------------------------" echo "Completed testing for version ${version}" echo "" done ```
zhou fan committed -
fix the issue[#331](https://github.com/volcengine/verl/issues/331)
BearBiscuit committed -
Validation datasets are sent to inference engines as a whole batch, which will schedule the memory themselves. - [x] Remove `val_batch_size` from examples - [x] Set default values of `val_batch_size` in configs as `null` and add DEPRECATED comments - [x] Add deprecation warnings about `val_batch_size` in `_validate_config`
Shawn/Yuxuan Tong committed
-
- 23 Feb, 2025 2 commits
-
-
Tracking backend support vemlp wandb --------- Co-authored-by: liudayuan.carrot <liudayuan.carrot@bytedance.com>
liudayuan-carrot committed -
Ikko Eltociear Ashimine committed
-
- 22 Feb, 2025 2 commits
-
-
HL committed
-
Implement RLOO algorithm according to https://arxiv.org/abs/2402.14740
Zefan Wang committed
-
- 21 Feb, 2025 2 commits
-
-
HL committed
-
This PR adds Ray Serve to the requirements to enable support for multi-node training. It addresses the issue described here: https://github.com/volcengine/verl/issues/87#issuecomment-2659493418 Co-authored-by: Yu Feng <fengyufengyu@didiglobal.com>
Yu Feng committed
-
- 20 Feb, 2025 1 commit
-
-
HL committed
-
- 19 Feb, 2025 7 commits
-
-
Support Qwen2 Megatron backend The code is primarily adapted from the llama folder, with modifications to use QKV bias and remove the rope_scaling of RoPE in `verl/models/qwen2/megatron/layers/parallel_attention.py`. - Train using Qwen2-7B-Instruct with PPO, GSM8k score can reach 0.87 at step 75. - not support saver now
Kinman Lei committed -
Chi Zhang committed
-
Willem Jiang committed
-
A working Slurm example adapted from https://docs.ray.io/en/latest/ray-core/starting-ray.html
Chenhui Zhang committed -
HL committed
-
Willem Jiang committed
-
We need to specify the minimum permission in the workflow.
Willem Jiang committed
-
- 18 Feb, 2025 3 commits
- 17 Feb, 2025 3 commits
-
-
1 fix wrong notes description. 2 fix wrong code path. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
湛露先生 committed -
- Fixed FSDP1 model offload - With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and `actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU memory utilization can increase to 0.9 - With actor, critic and reference offload all enabled, there will only be one model copy at a time in the GPU memory. Therefore, we can further increase the `micro_batch_size_per_gpu` or `max_token_per_gpu` **Specifically:** - During rollout, only rollout model and KVCache are in the GPU memory. - During critic compute values, only the critic model will stay in the GPU memory while its optimizer and other model states are in CPU main memory - During actor update, the actor model, optimizer are stored on GPU while the reference model and critic model, critic optimizer are offloaded to CPU.
Guangming Sheng committed -
### **Enhancement: Support for `extra_info` in Reward Calculation** #### **Summary** This update enhances the reward computation process by introducing an additional `extra_info` parameter. This allows users to pass in more contextual information when calculating rewards, improving flexibility for different datasets. #### **Changes Made** - **Updated `_default_compute_score`** to accept an `extra_info` argument: ```python def _default_compute_score(data_source, solution_str, ground_truth, extra_info): ``` - **Modified the reward manager (`naive.py`)** to pass `extra_info` from `data_item.non_tensor_batch` to `compute_score`: ```python extra_info = data_item.non_tensor_batch['extra_info'] score = self.compute_score( data_source=data_source, solution_str=sequences_str, ground_truth=ground_truth, extra_info=extra_info, ) ``` #### **Why This Change?** - Some datasets require additional context beyond `data_source`, `solution_str`, and `ground_truth` for accurate reward computation. - The new `extra_info` field allows users to pass custom metadata, ideally in dictionary form, as specified in the [official documentation](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html). - This change maintains compatibility with existing dataset processing scripts, as they already include the `extra_info` field. #### **Impact** - **Improved flexibility**: Users can now pass additional contextual information, making reward computation more adaptable to different datasets. - **Backward compatibility**: Since all example datasets already include `extra_info`, this update should integrate seamlessly. Let me know if any modifications are needed!
Taiwei Shi committed
-
- 16 Feb, 2025 1 commit
-
-
HL committed
-
- 15 Feb, 2025 1 commit
-
-
The split placement example is outdated, I tried it and encountered some errors. To address this, the following changes were made in this PR 1. Copied the content from `verl/trainer/config/ppo_trainer.yaml` to `examples/split_placement/config/ppo_trainer_split.yaml` 2. Copied `RayPPOTrainer.fit` method into the `fit` func in `examples/split_placement/split_monkey_patch.py` and modified it to get the futures of `critic_output` and `actor_output`
zhou fan committed
-