Unverified Commit e9c051c9 by malachaux Committed by GitHub

Update README.md

parent deeedd00
......@@ -264,6 +264,41 @@ If the script is missing, it means there was an issue with our automatically cre
The code generated by your model can be tested by injecting it where the `TO_FILL` comment is in the test script.
## Little guide to download Github from Google Big Query
Hi here is a little guide :
- Create a Google platform account ( you will have around 300 $ given for free , that is sufficient for Github)
- Create a Google Big Query project here
- In this project, create a dataset
- In this dataset, create one table per programming language. The results of each SQL request (one per language) will be stored in these tables.
- Before running your SQL request, make sure you change the query settings to save the query results in the dedicated table (more -> Query Settings -> Destination -> table for query results -> put table name)
- Run your SQL request (one per language and dont forget to change the table for each request)
- Export your results to google Cloud :
- In google cloud storage, create a bucket and a folder per language into it
- Export your table to this bucket ( EXPORT -> Export to GCS -> export format JSON , compression GZIP)
- To download the bucket on your machine, use the API gsutil:
- pip install gsutil
- gsutil config -> to config gsutil with your google account
- gsutil -m cp -r gs://name_of_bucket/name_of_folder . -> copy your bucket on your machine
Example of query for python :
```
SELECT
f.repo_name,
f.ref,
f.path,
c.copies,
c.content
FROM `bigquery-public-data.github_repos.files` as f
JOIN `bigquery-public-data.github_repos.contents` as c on f.id = c.id
WHERE
NOT c.binary
AND f.path like '%.py'
```
Google link for more info here
## References
This Code was used to train and evaluate the TransCoder model in:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment