[Config] Providing an option to turn off `torch.compile` in actor (#554)

## Summary Providing an option in the config to turn off the `torch.compile` used in `dp_actor.py` ## Usage Adding the following line to the driver or cli scripts to turn off `torch.compile`. ```python +actor_rollout_ref.actor.use_torch_compile=False ``` Otherwise, `torch.compile` will be used by default ## Related Issue #354 #245 --------- Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>

[Config] Providing an option to turn off `torch.compile` in actor (#554)
## Summary Providing an option in the config to turn off the `torch.compile` used in `dp_actor.py` ## Usage Adding the following line to the driver or cli scripts to turn off `torch.compile`. ```python +actor_rollout_ref.actor.use_torch_compile=False ``` Otherwise, `torch.compile` will be used by default ## Related Issue #354 #245 --------- Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
54574690 · Hongpeng Guo · GitHub · d2db7252 · 54574690 · 54574690
Unverified Commit 54574690 authored Mar 14, 2025 by Hongpeng Guo Committed by GitHub Mar 14, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 1 deletions

docs/examples/config.rst
+3 -0

verl/trainer/config/ppo_megatron_trainer.yaml
+1 -0

verl/trainer/config/ppo_trainer.yaml
+1 -0

verl/workers/actor/dp_actor.py
+4 -1

No files found.
--- a/docs/examples/config.rst
+++ b/docs/examples/config.rst
@@ -84,6 +84,7 @@ Actor/Rollout/Reference Policy
      clip_ratio: 0.2
      entropy_coeff: 0.001
      use_kl_loss: False # True for GRPO
+      use_torch_compile: True # False to disable torch compile
      kl_loss_coef: 0.001 # for grpo
      kl_loss_type: low_var_kl # for grpo
      ppo_epochs: 1
@@ -176,6 +177,8 @@ Actor/Rollout/Reference Policy

 - ``actor_rollout_ref.actor.clip_ratio``: PPO clip ratio

+- ``actor_rollout_ref.actor.use_torch_compile``: Whether to use torch compile in actor
+
 - ``actor_rollout_ref.actor.entropy_coeff``: The weight of entropy when
  calculating PPO loss


--- a/verl/trainer/config/ppo_megatron_trainer.yaml
+++ b/verl/trainer/config/ppo_megatron_trainer.yaml
@@ -26,6 +26,7 @@ actor_rollout_ref:
    ppo_micro_batch_size: null # will be deprecated, use ppo_micro_batch_size_per_gpu
    ppo_micro_batch_size_per_gpu: null
    use_dynamic_bsz: False
+    use_torch_compile: True # False to disable torch compile
    clip_ratio: 0.2
    entropy_coeff: 0.001
    ppo_epochs: 1

--- a/verl/trainer/config/ppo_trainer.yaml
+++ b/verl/trainer/config/ppo_trainer.yaml
@@ -33,6 +33,7 @@ actor_rollout_ref:
    clip_ratio: 0.2
    entropy_coeff: 0.001
    use_kl_loss: False # True for GRPO
+    use_torch_compile: True # False to disable torch compile
    kl_loss_coef: 0.001 # for grpo
    kl_loss_type: low_var_kl # for grpo
    ppo_epochs: 1

--- a/verl/workers/actor/dp_actor.py
+++ b/verl/workers/actor/dp_actor.py
@@ -53,7 +53,10 @@ class DataParallelPPOActor(BasePPOActor):
        self.ulysses_sequence_parallel_size = self.config.ulysses_sequence_parallel_size
        self.use_ulysses_sp = self.ulysses_sequence_parallel_size > 1

-        self.compute_entropy_from_logits = torch.compile(verl_F.entropy_from_logits, dynamic=True)
+        self.compute_entropy_from_logits = (
+            torch.compile(verl_F.entropy_from_logits, dynamic=True)
+            if self.config.get('use_torch_compile', True)  #  use torch compile by default
+            else verl_F.entropy_from_logits)

    def _forward_micro_batch(self, micro_batch, temperature) -> Tuple[torch.Tensor, torch.Tensor]:
        """