Change similarity metric to sampled hashes
This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.
Showing
src/hashsig.c
0 → 100644
src/hashsig.h
0 → 100644
Please
register
or
sign in
to comment