Unverified Commit c3e530de by Yuqian Fu Committed by GitHub

[misc] fix the wrong url (#657)

parent c6dc8b73
......@@ -37,7 +37,7 @@ Below are key factors for tuning vLLM-based rollout. Before tuning, we recommend
When GPU resources allow, a smaller tensor parallel size spawns more vLLM replicas.
Data parallelism (DP) can yield higher throughput than tensor parallelism (TP), but also increases KVCache consumption.
Carefully balance the trade-off between more replicas and higher memory usage.
Our experient in Sec. 8.4 of `HybridFlow paper <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score/gsm8k.py>`_ evaluate this trade-off.
Our experient in Sec. 8.4 of `HybridFlow paper <https://arxiv.org/pdf/2409.19256v2>`_ evaluate this trade-off.
More tuning details such as dealing with Preemption and Chunked-prefill
can be found in `vLLM official tuning guide <https://docs.vllm.ai/en/latest/performance/optimization.html>`_
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment