[misc] fix: grpo kl loss should be add when do minimization (#179)

- As titled

[misc] fix: grpo kl loss should be add when do minimization (#179)
- As titled
a65c9157 · Guangming Sheng · GitHub · 38ac5255 · a65c9157
Unverified Commit a65c9157 authored Feb 01, 2025 by Guangming Sheng Committed by GitHub Feb 01, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 additions and 1 deletions

verl/workers/actor/dp_actor.py
+1 -1

No files found.
--- a/verl/workers/actor/dp_actor.py
+++ b/verl/workers/actor/dp_actor.py
@@ -263,7 +263,7 @@ class DataParallelPPOActor(BasePPOActor):
                                                kl_penalty=self.config.kl_loss_type)
                    kl_loss = masked_mean(kld, response_mask)

-                    policy_loss = policy_loss - kl_loss * self.config.kl_loss_coef
+                    policy_loss = policy_loss + kl_loss * self.config.kl_loss_coef
                    metrics['actor/kl_loss'] = kl_loss.detach().item()
                    metrics['actor/kl_coef'] = self.config.kl_loss_coef