Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
codecritic
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Ziyuan Nan
codecritic
Commits
a611bcf1
Commit
a611bcf1
authored
Jan 07, 2025
by
nanziyuan
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fix bugs
parent
f3dd6691
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
24 additions
and
19 deletions
+24
-19
codecritic/cli/select_preference_pairs.py
+1
-8
scripts/gen_dataset.sh
+23
-11
No files found.
codecritic/cli/select_preference_pairs.py
View file @
a611bcf1
...
@@ -98,20 +98,13 @@ if __name__ == "__main__":
...
@@ -98,20 +98,13 @@ if __name__ == "__main__":
# select pairs
# select pairs
ds
=
defaultdict
(
dict
)
for
item
in
dataset
:
ds
[
item
[
"task_id"
]][
item
[
"solution_id"
]]
=
item
sorted_pairinfo
=
sorted
(
pairinfo
,
key
=
lambda
x
:
x
[
"similarity"
])
task_groups
=
defaultdict
(
list
)
task_groups
=
defaultdict
(
list
)
for
item
in
pairinfo
:
for
item
in
pairinfo
:
task_groups
[
item
[
"task_id"
]]
.
append
(
item
)
task_groups
[
item
[
"task_id"
]]
.
append
(
item
)
# Step 2: Select the 4 pairs with the smallest score for each task
selected_pairs
=
[]
selected_pairs
=
[]
for
task
,
items
in
task_groups
.
items
():
for
task
,
items
in
task_groups
.
items
():
# Sort items for this task by score and select the top 4
sorted_items
=
sorted
(
items
,
key
=
lambda
x
:
x
[
"similarity"
],
reverse
=
True
)[:
4
]
sorted_items
=
sorted
(
items
,
key
=
lambda
x
:
x
[
"similarity"
])[:
4
]
selected_pairs
.
extend
(
sorted_items
)
selected_pairs
.
extend
(
sorted_items
)
save_jsonl
(
selected_pairs
,
args
.
output
)
save_jsonl
(
selected_pairs
,
args
.
output
)
scripts/gen_dataset.sh
View file @
a611bcf1
...
@@ -4,15 +4,22 @@ model="/lustre/S/huangdi/open_for_out/models/Qwen2.5-Coder-7B-Instruct/"
...
@@ -4,15 +4,22 @@ model="/lustre/S/huangdi/open_for_out/models/Qwen2.5-Coder-7B-Instruct/"
project
=
"/lustre/S/nanziyuan/projects/ccc"
project
=
"/lustre/S/nanziyuan/projects/ccc"
modelname
=
"qwen25_coder_inst"
modelname
=
"qwen25_coder_inst"
# APPS
trainset
=
"
${
project
}
/data/train/
${
modelname
}
-apps-train.jsonl"
# CUDA_VISIBLE_DEVICES=0,1,2,3 \
testset
=
"
${
project
}
/data/test/
${
modelname
}
-apps-test.jsonl"
python
-m
codecritic.cli.gen_dataset
\
--model
${
model
}
\
train_selected_pairs
=
"
${
project
}
/data/train/
${
modelname
}
-apps-train-selected_pairs.jsonl"
--apps
/lustre/S/nanziyuan/datasets/apps/
\
--train
"
${
project
}
/data/train/
${
modelname
}
-apps-train.jsonl"
\
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3
--test
"
${
project
}
/data/test/
${
modelname
}
-apps-test.jsonl"
## Sampling
# HumanEval & MBPP
## APPS
# python -m codecritic.cli.gen_dataset \
# --model ${model} \
# --apps /lustre/S/nanziyuan/datasets/apps/ \
# --train ${trainset} \
# --test ${testset}
## HumanEval & MBPP
# evalplus.evaluate \
# evalplus.evaluate \
# --model ${model} \
# --model ${model} \
# --n_samples 50 \
# --n_samples 50 \
...
@@ -29,6 +36,11 @@ python -m codecritic.cli.gen_dataset \
...
@@ -29,6 +36,11 @@ python -m codecritic.cli.gen_dataset \
# --root "${project}/data/test/${modelname}-mbpp" \
# --root "${project}/data/test/${modelname}-mbpp" \
# --backend vllm
# --backend vllm
# HumanEvalPack
## HumanEvalPack
## BigCodeBench
# BigCodeBench
## Training dataset
python
-m
codecritic.cli.select_preference_pairs
\
--dataset
${
trainset
}
\
--output
${
train_selected_pairs
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment