[misc] fix: disable chunked-prefill by default (#259)
Thanks: @HillZhang1999 - Related issue: https://github.com/volcengine/verl/issues/189 `[36m(main_task pid=3523385)[0m ValueError: max_num_batched_tokens (8192) is smaller than max_model_len (9216). This effectively limits the maximum sequence length to max_num_batched_tokens and makes vLLM reject longer sequences. Please increase max_num_batched_tokens or decrease max_model_len.` When enable_chunked_prefill is activated, the aforementioned issue will be concealed. Please increase `max_num_batched_tokens` or `decrease max_model_len`.
Showing
Please
register
or
sign in
to comment