fix: typo (explanation) (#167)

41b7c583 · Franz Srambical · GitHub · e7fd415a · 41b7c583 · 41b7c583
Unverified Commit 41b7c583 authored Jan 30, 2025 by Franz Srambical Committed by GitHub Jan 30, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 3 deletions

docs/examples/config.rst
+1 -1

docs/examples/gsm8k_example.rst
+1 -1

docs/start/quickstart.rst
+1 -1

No files found.
--- a/docs/examples/config.rst
+++ b/docs/examples/config.rst
 .. _config-explain-page:

-Config Explaination
+Config Explanation
 ===================

 ppo_trainer.yaml for FSDP Backend

--- a/docs/examples/gsm8k_example.rst
+++ b/docs/examples/gsm8k_example.rst
@@ -101,7 +101,7 @@ Step 4: Perform PPO training with your model on GSM8K Dataset
 - Users could replace the ``data.train_files`` ,\ ``data.val_files``,
  ``actor_rollout_ref.model.path`` and ``critic.model.path`` based on
  their environment.
- See :doc:`config` for detailed explaination of each config field.
+- See :doc:`config` for detailed explanation of each config field.

 **Reward Model/Function**


--- a/docs/start/quickstart.rst
+++ b/docs/start/quickstart.rst
@@ -136,7 +136,7 @@ If you encounter out of memory issues with HBM less than 32GB, enable the follow
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    critic.ppo_micro_batch_size_per_gpu=1 \

-For the full set of configs, please refer to :ref:`config-explain-page` for detailed explaination and performance tuning.
+For the full set of configs, please refer to :ref:`config-explain-page` for detailed explanation and performance tuning.


 .. [1] The original paper (https://arxiv.org/pdf/2110.14168) mainly focuses on training a verifier (a reward model) to solve math problems via Best-of-N sampling. In this example, we train an RL agent using a rule-based reward model.