Unverified Commit e9c051c9 by malachaux Committed by GitHub

Update README.md

parent deeedd00
......@@ -262,7 +262,42 @@ The format of each line in each file is `<FUNCTION_ID> | <function>`. The functi
For instance, for the line `COUNT_SET_BITS_IN_AN_INTEGER_3 | <function>` in the file test.cpp.shuf.valid.tok, the corresponding test script can be found in `data/evaluation/geeks_for_geeks_successful_test_scripts/cpp/COUNT_SET_BITS_IN_AN_INTEGER_3.cpp`.
If the script is missing, it means there was an issue with our automatically created tests for the corresponding function.
The code generated by your model can be tested by injecting it where the `TO_FILL` comment is in the test script.
The code generated by your model can be tested by injecting it where the `TO_FILL` comment is in the test script.
## Little guide to download Github from Google Big Query
Hi here is a little guide :
- Create a Google platform account ( you will have around 300 $ given for free , that is sufficient for Github)
- Create a Google Big Query project here
- In this project, create a dataset
- In this dataset, create one table per programming language. The results of each SQL request (one per language) will be stored in these tables.
- Before running your SQL request, make sure you change the query settings to save the query results in the dedicated table (more -> Query Settings -> Destination -> table for query results -> put table name)
- Run your SQL request (one per language and dont forget to change the table for each request)
- Export your results to google Cloud :
- In google cloud storage, create a bucket and a folder per language into it
- Export your table to this bucket ( EXPORT -> Export to GCS -> export format JSON , compression GZIP)
- To download the bucket on your machine, use the API gsutil:
- pip install gsutil
- gsutil config -> to config gsutil with your google account
- gsutil -m cp -r gs://name_of_bucket/name_of_folder . -> copy your bucket on your machine
Example of query for python :
```
SELECT
f.repo_name,
f.ref,
f.path,
c.copies,
c.content
FROM `bigquery-public-data.github_repos.files` as f
JOIN `bigquery-public-data.github_repos.contents` as c on f.id = c.id
WHERE
NOT c.binary
AND f.path like '%.py'
```
Google link for more info here
## References
This Code was used to train and evaluate the TransCoder model in:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment