Name |
Last commit
|
Last update |
---|---|---|
.github | ||
docker | ||
docs | ||
examples | ||
patches | ||
scripts | ||
tests | ||
verl | ||
.gitignore | ||
.readthedocs.yaml | ||
.style.yapf | ||
LICENSE | ||
Notice.txt | ||
README.md | ||
pyproject.toml | ||
requirements.txt | ||
setup.py |
[bugfix] Fix position embedding processing for Qwen2.5-VL In the `RLHFDataset.__getitem__` method, a bug was identified in how multimodal position IDs (3D in Qwen2.5-VL) are determined. Previously, the code checked for `self.image_key in row_dict` to decide whether to use multimodal position IDs. However, since `self.image_key` is popped from `row_dict` during image token expansion, this check incorrectly fails for subsequent operations. This causes the VL model to use incorrect position IDs, resulting in significant performance degradation. <img width="349" alt="image" src="https://github.com/user-attachments/assets/79790bbf-239e-4667-a2c5-d63d91d63165" /> The fix introduces an explicit `is_multi_modal` flag to properly track multimodal content throughout the processing pipeline. Co-authored-by: songyifan <songyifan3@xiaomi.com>
Name |
Last commit
|
Last update |
---|---|---|
.github | Loading commit data... | |
docker | Loading commit data... | |
docs | Loading commit data... | |
examples | Loading commit data... | |
patches | Loading commit data... | |
scripts | Loading commit data... | |
tests | Loading commit data... | |
verl | Loading commit data... | |
.gitignore | Loading commit data... | |
.readthedocs.yaml | Loading commit data... | |
.style.yapf | Loading commit data... | |
LICENSE | Loading commit data... | |
Notice.txt | Loading commit data... | |
README.md | Loading commit data... | |
pyproject.toml | Loading commit data... | |
requirements.txt | Loading commit data... | |
setup.py | Loading commit data... |