| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| _static | ||
| advance | ||
| amd_tutorial | ||
| examples | ||
| experiment | ||
| faq | ||
| perf | ||
| preparation | ||
| start | ||
| workers | ||
| Makefile | ||
| README.md | ||
| README_vllm0.7.md | ||
| conf.py | ||
| data.rst | ||
| hybrid_flow.rst | ||
| index.rst | ||
| requirements-docs.txt |
# Background In RLHFDataset, we filter out prompts that are too long. This requires apply_chat_template to the whole dataset, which is not scalable when the dataset is large. https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132 Instead of performing filtering online, we probably want to move this process offline and add an assertion to avoid truncation or simply perform truncation Reference: #502 # Key Changes - Add an option `data.filter_overlong_prompts=True \` to enable the above data filtering. The default value is set to False, but we enable it for all the example scripts. - Add an option `data.truncation` to truncate the input_ids or prompt length if they exceed max_prompt_length. The default is 'error', which does not allow the max_prompt_length to be exceeded. The users should increase the max_prompt_length if throwing the error. You can also set `left` and `right`. ### Suggestion for large-scale dataset. For large-scale datasets, filtering overlong prompts could be time-consuming. You should set `data.filtering_overlong_prompts=False` and set `truncation='left'`. Also, please note that you should increase `data.max_prompt_length` to avoid over-truncation of the prompts.
| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| _static | Loading commit data... | |
| advance | Loading commit data... | |
| amd_tutorial | Loading commit data... | |
| examples | Loading commit data... | |
| experiment | Loading commit data... | |
| faq | Loading commit data... | |
| perf | Loading commit data... | |
| preparation | Loading commit data... | |
| start | Loading commit data... | |
| workers | Loading commit data... | |
| Makefile | Loading commit data... | |
| README.md | Loading commit data... | |
| README_vllm0.7.md | Loading commit data... | |
| conf.py | Loading commit data... | |
| data.rst | Loading commit data... | |
| hybrid_flow.rst | Loading commit data... | |
| index.rst | Loading commit data... | |
| requirements-docs.txt | Loading commit data... |