@@ -36,6 +36,21 @@ ORM training follows [@Ouyang2022TrainingLM]
### Step4 Evaluate ORM & Critic Model
Deploy the Llamafactory-trained reward model using Llamafactory. See this [issue](https://github.com/hiyouga/LLaMA-Factory/issues/4743#issuecomment-2218022614) for details.
Runing the following command.
```bash
API_PORT=8000 CUDA_VISIBLE_DEVICES=0 llamafactory-cli api deepseekcoder_rm.yaml
```
where the config file `deepseekcoder_rm.yaml` looks like