Unverified Commit fe547a33 by HL Committed by GitHub

docs: add hf ckpt to faq, and include verl apis in the website (#427)

Now APIs can be displayed: 


![image](https://github.com/user-attachments/assets/6592ce68-7bf6-46cb-8dd3-a5fa6cd99f3e)
parent 99fb2dde
...@@ -6,11 +6,14 @@ version: 2 ...@@ -6,11 +6,14 @@ version: 2
build: build:
os: ubuntu-22.04 os: ubuntu-22.04
tools: tools:
python: "3.8" python: "3.11"
rust: "1.70"
sphinx: sphinx:
configuration: docs/conf.py configuration: docs/conf.py
python: python:
install: install:
- requirements: docs/requirements-docs.txt - requirements: docs/requirements-docs.txt
\ No newline at end of file - method: pip
path: .
...@@ -118,18 +118,20 @@ If you find the project helpful, please cite: ...@@ -118,18 +118,20 @@ If you find the project helpful, please cite:
verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, and University of Hong Kong. verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, and University of Hong Kong.
## Awesome work using verl ## Awesome work using verl
- [Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization](https://arxiv.org/abs/2410.09302) - [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of **DeepSeek R1 Zero** recipe for reasoning tasks
- [Flaming-hot Initiation with Regular Execution Sampling for Large Language Models](https://arxiv.org/abs/2410.21236) - [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards
- [Process Reinforcement Through Implicit Rewards](https://github.com/PRIME-RL/PRIME/) - [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning **agent** training framework
- [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of DeepSeek R1 Zero recipe for reasoning tasks - [Logic-RL](https://github.com/Unakar/Logic-RL): a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
- [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning agent training framework
- [Logic R1](https://github.com/Unakar/Logic-RL): a reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
- [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO - [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO
- [critic-rl](https://github.com/HKUNLP/critic-rl): Teaching Language Models to Critique via Reinforcement Learning - [critic-rl](https://github.com/HKUNLP/critic-rl): LLM critics for code generation
- [Easy-R1](https://github.com/hiyouga/EasyR1): Multi-Modality RL - [Easy-R1](https://github.com/hiyouga/EasyR1): **Multi-modal** RL training framework
- [self-rewarding-reasoning-LLM](https://arxiv.org/pdf/2502.19613): self-rewarding and correction with **generative reward models**
- [Search-R1](https://github.com/PeterGriffinJin/Search-R1): RL with reasoning and **searching (tool-call)** interleaved LLMs
- [DQO](https://arxiv.org/abs/2410.09302): Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
- [FIRE](https://arxiv.org/abs/2410.21236): Flaming-hot initiation with regular execution sampling for large language models
## Contribution Guide ## Contribution Guide
Contributions from the community are welcome! Contributions from the community are welcome! Please checkout our [roadmap](https://github.com/volcengine/verl/issues/22) and [release plan](https://github.com/volcengine/verl/issues/354).
### Code formatting ### Code formatting
We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat you code locally, make sure you installed **latest** `yapf` We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat you code locally, make sure you installed **latest** `yapf`
......
...@@ -55,3 +55,8 @@ Please set the following environment variable. The env var must be set before th ...@@ -55,3 +55,8 @@ Please set the following environment variable. The env var must be set before th
export VLLM_ATTENTION_BACKEND=XFORMERS export VLLM_ATTENTION_BACKEND=XFORMERS
If in doubt, print this env var in each rank to make sure it is properly set. If in doubt, print this env var in each rank to make sure it is properly set.
Checkpoints
------------------------
If you want to convert the model checkpoint into huggingface safetensor format, please refer to ``scripts/model_merger.py``.
...@@ -6,4 +6,7 @@ sphinx-markdown-tables ...@@ -6,4 +6,7 @@ sphinx-markdown-tables
# theme default rtd # theme default rtd
# crate-docs-theme # crate-docs-theme
sphinx-rtd-theme sphinx-rtd-theme
\ No newline at end of file
# pin tokenizers version to avoid env_logger version req
tokenizers==0.19.1
...@@ -84,7 +84,7 @@ def union_tensor_dict(tensor_dict1: TensorDict, tensor_dict2: TensorDict) -> Ten ...@@ -84,7 +84,7 @@ def union_tensor_dict(tensor_dict1: TensorDict, tensor_dict2: TensorDict) -> Ten
return tensor_dict1 return tensor_dict1
def union_numpy_dict(tensor_dict1: dict[np.ndarray], tensor_dict2: dict[np.ndarray]) -> dict[np.ndarray]: def union_numpy_dict(tensor_dict1: dict[str, np.ndarray], tensor_dict2: dict[str, np.ndarray]) -> dict[str, np.ndarray]:
for key, val in tensor_dict2.items(): for key, val in tensor_dict2.items():
if key in tensor_dict1: if key in tensor_dict1:
assert isinstance(tensor_dict2[key], np.ndarray) assert isinstance(tensor_dict2[key], np.ndarray)
...@@ -448,19 +448,17 @@ class DataProto: ...@@ -448,19 +448,17 @@ class DataProto:
return self return self
def make_iterator(self, mini_batch_size, epochs, seed=None, dataloader_kwargs=None): def make_iterator(self, mini_batch_size, epochs, seed=None, dataloader_kwargs=None):
"""Make an iterator from the DataProto. This is built upon that TensorDict can be used as a normal Pytorch r"""Make an iterator from the DataProto. This is built upon that TensorDict can be used as a normal Pytorch
dataset. See https://pytorch.org/tensordict/tutorials/data_fashion for more details. dataset. See https://pytorch.org/tensordict/tutorials/data_fashion for more details.
Args: Args:
mini_batch_size (int): mini-batch size when iterating the dataset. We require that mini_batch_size (int): mini-batch size when iterating the dataset. We require that ``batch.batch_size[0] % mini_batch_size == 0``.
``batch.batch_size[0] % mini_batch_size == 0``
epochs (int): number of epochs when iterating the dataset. epochs (int): number of epochs when iterating the dataset.
dataloader_kwargs: internally, it returns a DataLoader over the batch. dataloader_kwargs (Any): internally, it returns a DataLoader over the batch. The dataloader_kwargs is the kwargs passed to the DataLoader.
The dataloader_kwargs is the kwargs passed to the DataLoader
Returns: Returns:
Iterator: an iterator that yields a mini-batch data at a time. The total number of iteration steps is Iterator: an iterator that yields a mini-batch data at a time. The total number of iteration steps is ``self.batch.batch_size * epochs // mini_batch_size``
``self.batch.batch_size * epochs // mini_batch_size``
""" """
assert self.batch.batch_size[0] % mini_batch_size == 0, f"{self.batch.batch_size[0]} % {mini_batch_size} != 0" assert self.batch.batch_size[0] % mini_batch_size == 0, f"{self.batch.batch_size[0]} % {mini_batch_size} != 0"
# we can directly create a dataloader from TensorDict # we can directly create a dataloader from TensorDict
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment