| Name |
Last commit
|
Last update |
|---|---|---|
| .github | ||
| docker | ||
| docs | ||
| examples | ||
| patches | ||
| scripts | ||
| tests | ||
| verl | ||
| .gitignore | ||
| .readthedocs.yaml | ||
| .style.yapf | ||
| LICENSE | ||
| Notice.txt | ||
| README.md | ||
| pyproject.toml | ||
| requirements.txt | ||
| setup.py |
# Background In RLHFDataset, we filter out prompts that are too long. This requires apply_chat_template to the whole dataset, which is not scalable when the dataset is large. https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132 Instead of performing filtering online, we probably want to move this process offline and add an assertion to avoid truncation or simply perform truncation Reference: #502 # Key Changes - Add an option `data.filter_overlong_prompts=True \` to enable the above data filtering. The default value is set to False, but we enable it for all the example scripts. - Add an option `data.truncation` to truncate the input_ids or prompt length if they exceed max_prompt_length. The default is 'error', which does not allow the max_prompt_length to be exceeded. The users should increase the max_prompt_length if throwing the error. You can also set `left` and `right`. ### Suggestion for large-scale dataset. For large-scale datasets, filtering overlong prompts could be time-consuming. You should set `data.filtering_overlong_prompts=False` and set `truncation='left'`. Also, please note that you should increase `data.max_prompt_length` to avoid over-truncation of the prompts.
| Name |
Last commit
|
Last update |
|---|---|---|
| .github | Loading commit data... | |
| docker | Loading commit data... | |
| docs | Loading commit data... | |
| examples | Loading commit data... | |
| patches | Loading commit data... | |
| scripts | Loading commit data... | |
| tests | Loading commit data... | |
| verl | Loading commit data... | |
| .gitignore | Loading commit data... | |
| .readthedocs.yaml | Loading commit data... | |
| .style.yapf | Loading commit data... | |
| LICENSE | Loading commit data... | |
| Notice.txt | Loading commit data... | |
| README.md | Loading commit data... | |
| pyproject.toml | Loading commit data... | |
| requirements.txt | Loading commit data... | |
| setup.py | Loading commit data... |