- Add format script - Move save_checkpoint to a separate function - Add timing/step, response_length/clip_ratio, prompt_length/clip_ratio and critic/vf_explained_var metrics - The training step starts from 1