1. 31 Jul, 2013 1 commit
    • Major rename detection changes · d730d3f4
      After doing further profiling, I found that a lot of time was
      being spent attempting to insert hashes into the file hash
      signature when using the rolling hash because the rolling hash
      approach generates a hash per byte of the file instead of one
      per run/line of data.
      
      To optimize this, I decided to convert back to a run-based file
      signature algorithm which would be more like core Git.
      
      After changing this, a number of the existing tests started to
      fail.  In some cases, this appears to have been because the test
      was coded to be too specific to the particular results of the file
      similarity metric and in some cases there appear to have been bugs
      in the core rename detection code where only by the coincidence
      of the file similarity scoring were the expected results being
      generated.
      
      This renames all the variables in the core rename detection code
      to be more consistent and hopefully easier to follow which made it
      a bit easier to reason about the behavior of that code and fix the
      problems that I was seeing.  I think it's in better shape now.
      
      There are a couple of tests now that attempt to stress test the
      rename detection code and they are quite slow.  Most of the time
      is spent setting up the test data on disk and in the index.  When
      we roll out performance improvements for index insertion, it
      should also speed up these tests I hope.
      Russell Belfer committed
  2. 26 Jul, 2013 1 commit
  3. 25 Jul, 2013 3 commits
    • Fix rename detection to use actual blob size · a16e4172
      The size data in the index may not reflect the actual size of the
      blob data from the ODB when content filtering comes into play.
      This commit fixes rename detection to use the actual blob size when
      calculating data signatures instead of the value from the index.
      
      Because of a misunderstanding on my part, I first converted the
      git_index_add_bypath API to use the post-filtered blob data size
      in creating the index entry.  I backed that change out, but I
      kept the overall refactoring of that routine and the new internal
      git_blob__create_from_paths API because it eliminates an extra
      stat() call from the code that adds a file to the index.
      
      The existing tests actually cover this code path, at least when
      running on Windows, so at this point I'm not adding new tests to
      cover the changes.
      Russell Belfer committed
    • Make rename detection file size fix better · effdbeb3
      The previous fix for checking file sizes with rename detection
      always loads the blob.  In this version, if the odb backend can
      get the object header without loading the whole thing into memory,
      then we'll just use that, so that we can eliminate possible rename
      sources & targets without loading them.
      Russell Belfer committed
    • Fix rename detection for tree-to-tree diffs · a5140f4d
      The performance improvements I introduced for rename detection
      were not able to run successfully for tree-to-tree diffs because
      the blob size was not known early enough and so the file signature
      always had to be calculated nonetheless.
      
      This change separates loading blobs into memory from calculating
      the signature.  I can't avoid having to load the large blobs into
      memory, but by moving it forward, I'm able to avoid the signature
      calculation if the blob won't come into play for renames.
      Russell Belfer committed
  4. 24 Jul, 2013 6 commits
  5. 23 Jul, 2013 4 commits
  6. 22 Jul, 2013 4 commits
  7. 19 Jul, 2013 7 commits
  8. 18 Jul, 2013 3 commits
  9. 17 Jul, 2013 3 commits
  10. 16 Jul, 2013 2 commits
  11. 15 Jul, 2013 6 commits