Unverified Commit c3e530de by Yuqian Fu Committed by GitHub

[misc] fix the wrong url (#657)

parent c6dc8b73
...@@ -37,7 +37,7 @@ Below are key factors for tuning vLLM-based rollout. Before tuning, we recommend ...@@ -37,7 +37,7 @@ Below are key factors for tuning vLLM-based rollout. Before tuning, we recommend
When GPU resources allow, a smaller tensor parallel size spawns more vLLM replicas. When GPU resources allow, a smaller tensor parallel size spawns more vLLM replicas.
Data parallelism (DP) can yield higher throughput than tensor parallelism (TP), but also increases KVCache consumption. Data parallelism (DP) can yield higher throughput than tensor parallelism (TP), but also increases KVCache consumption.
Carefully balance the trade-off between more replicas and higher memory usage. Carefully balance the trade-off between more replicas and higher memory usage.
Our experient in Sec. 8.4 of `HybridFlow paper <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score/gsm8k.py>`_ evaluate this trade-off. Our experient in Sec. 8.4 of `HybridFlow paper <https://arxiv.org/pdf/2409.19256v2>`_ evaluate this trade-off.
More tuning details such as dealing with Preemption and Chunked-prefill More tuning details such as dealing with Preemption and Chunked-prefill
can be found in `vLLM official tuning guide <https://docs.vllm.ai/en/latest/performance/optimization.html>`_ can be found in `vLLM official tuning guide <https://docs.vllm.ai/en/latest/performance/optimization.html>`_
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment