skip special tokens (#715)
it should skip special tokens here. just like trl do https://github.com/huggingface/trl/blob/fc2b041b58f6fbe766dceaec819bc5a8f9d209da/trl/trainer/grpo_trainer.py#L597 if `skip_special_tokens=False`, completion ``` <think>...</think><answer>....</answer> ``` will be decoded as things such as ``` <think>...</think><answer>....</answer><|im_end|><|endoftext|> ``` which will render typical `format_reward_func` mismatch ```python r"^<think>.*?</think>\s*<answer>.*?</answer>$" ```
Showing
Please
register
or
sign in
to comment