Fix wrongs args desc . (#294)

1 fix wrong notes description. 2 fix wrong code path. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>

Fix wrongs args desc . (#294)
1 fix wrong notes description. 2 fix wrong code path. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
0dfcb7f9 · 湛露先生 · GitHub · 9db52329 · 0dfcb7f9 · 0dfcb7f9
Unverified Commit 0dfcb7f9 authored Feb 17, 2025 by 湛露先生 Committed by GitHub Feb 17, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 5 deletions

docs/advance/dpo_extension.rst
+1 -1

docs/examples/ppo_code_architecture.rst
+2 -2

verl/models/llama/megatron/checkpoint_utils/llama_saver.py
+1 -1

verl/utils/model.py
+1 -1

No files found.
--- a/docs/advance/dpo_extension.rst
+++ b/docs/advance/dpo_extension.rst
@@ -66,7 +66,7 @@ Here, ``SampleGenerator`` can be viewed as a multi-process pulled up by
 the control flow to call. The implementation details inside can use any
 inference engine including vllm, sglang and huggingface. Users can
 largely reuse the code in
-verl/verl/trainer/ppo/rollout/vllm_rollout/vllm_rollout.py and we won't
+verl/verl/workers/rollout/vllm_rollout/vllm_rollout.py and we won't
 go into details here.

 **ReferencePolicy inference**

--- a/docs/examples/ppo_code_architecture.rst
+++ b/docs/examples/ppo_code_architecture.rst
@@ -159,8 +159,8 @@ whether it's a model-based RM or a function-based RM
  - Note that the pre-defined ``RewardModelWorker`` only supports models
    with the structure of huggingface
    ``AutoModelForSequenceClassification``. If it's not this model, you
-    need to define your own RewardModelWorker in `FSDP Workers <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/fsdp_workers.py>`_ 
-    and `Megatron-LM Workers <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/megatron_workers.py>`_.
+    need to define your own RewardModelWorker in `FSDP Workers <https://github.com/volcengine/verl/blob/main/verl/workers/fsdp_workers.py>`_ 
+    and `Megatron-LM Workers <https://github.com/volcengine/verl/blob/main/verl/workers/megatron_workers.py>`_.

 - If it's a function-based RM, the users are required to classified the
  reward function for each datasets.

--- a/verl/models/llama/megatron/checkpoint_utils/llama_saver.py
+++ b/verl/models/llama/megatron/checkpoint_utils/llama_saver.py
@@ -77,7 +77,7 @@ def merge_megatron_ckpt_llama(wrapped_models, config, is_value_model=False, dtyp
    """Merge sharded parameters of a Megatron module into a merged checkpoint.

    Args:
-        wrapped_modelss (list of megatron.model.DistributedDataParallel):
+        wrapped_models (list of megatron.model.DistributedDataParallel):
            The local DDP wrapped megatron modules.
        dtype (str or None):
            The data type of state_dict. if None, the data type of the original parameters

--- a/verl/utils/model.py
+++ b/verl/utils/model.py
@@ -77,7 +77,7 @@ def create_huggingface_actor(model_name: str, override_config_kwargs=None, autom

    Args:
        model_name:
-        actor_override_config_kwargs:
+        override_config_kwargs:

    Returns: