This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
buffer.c | Loading commit data... | |
copy.c | Loading commit data... | |
dirent.c | Loading commit data... | |
env.c | Loading commit data... | |
errors.c | Loading commit data... | |
filebuf.c | Loading commit data... | |
hex.c | Loading commit data... | |
mkdir.c | Loading commit data... | |
oid.c | Loading commit data... | |
opts.c | Loading commit data... | |
path.c | Loading commit data... | |
pool.c | Loading commit data... | |
rmdir.c | Loading commit data... | |
stat.c | Loading commit data... | |
string.c | Loading commit data... | |
strmap.c | Loading commit data... | |
strtol.c | Loading commit data... | |
vector.c | Loading commit data... |