1. 20 Feb, 2013 2 commits
    • Refine pluggable similarity API · 9bc8be3d
      This plugs in the three basic similarity strategies for handling
      whitespace via internal use of the pluggable API.  In so doing, I
      realized that the use of git_buf in the hashsig API was not needed
      and actually just made it harder to use, so I tweaked that API as
      well.
      
      Note that the similarity metric is still not hooked up in the
      find_similarity code - this is just setting out the function that
      will be used.
      Russell Belfer committed
    • Change similarity metric to sampled hashes · 5e5848eb
      This moves the similarity metric code out of buf_text and into a
      new file.  Also, this implements a different approach to similarity
      measurement based on a Rabin-Karp rolling hash where we only keep
      the top 100 and bottom 100 hashes.  In theory, that should be
      sufficient samples to given a fairly accurate measurement while
      limiting the amount of data we keep for file signatures no matter
      how large the file is.
      Russell Belfer committed