first commit
parents
Showing
.gitignore
0 → 100644
README.md
0 → 100644
This diff is collapsed.
Click to expand it.
data_tool/__init__.py
0 → 100644
data_tool/__main__.py
0 → 100644
data_tool/document.py
0 → 100644
data_tool/filter.py
0 → 100644
data_tool/header_cleaner.py
0 → 100644
data_tool/lsh.py
0 → 100644
data_tool/main.py
0 → 100644
This diff is collapsed.
Click to expand it.
data_tool/minhash.py
0 → 100644
data_tool/quality_signals.py
0 → 100644
This diff is collapsed.
Click to expand it.
data_tool/resources.py
0 → 100644
This diff is collapsed.
Click to expand it.
data_tool/utils.py
0 → 100644
pyproject.toml
0 → 100644
requirements.txt
0 → 100644
| numpy>=1.24.0 | ||
| scipy>=1.10.0 | ||
| datasketch>=1.6.0 |
tests/__init__.py
0 → 100644
tests/conftest.py
0 → 100644
tests/test_cli.py
0 → 100644
tests/test_document.py
0 → 100644
tests/test_filter.py
0 → 100644
tests/test_header_cleaner.py
0 → 100644
tests/test_lsh.py
0 → 100644
tests/test_minhash.py
0 → 100644
tests/test_quality_signals.py
0 → 100644
This diff is collapsed.
Click to expand it.
tests/test_utils.py
0 → 100644
uv.lock
0 → 100644
This source diff could not be displayed because it is too large.
You can
view the blob
instead.
Please
register
or
sign in
to comment