algo: Rloo advantage estimator (#341)
Implement RLOO algorithm according to https://arxiv.org/abs/2402.14740
Showing
examples/rloo_trainer/run_qwen2-7b.sh
0 → 100644
Please
register
or
sign in
to comment
Implement RLOO algorithm according to https://arxiv.org/abs/2402.14740