Support for GRPO with Megatron backend and fix a configuration bug when not using virtual pipeline. Calibrated with FSDP backend.