1. 12 May, 2020 1 commit
  2. 01 Apr, 2020 1 commit
    • merge: cache negative cache results for similarity metrics · 4dfcc50f
      When computing renames, we cache the hash signatures for each of the
      potentially conflicting entries so that we do not need to repeatedly
      read the file and can at least halfway efficiently determine whether two
      files are similar enough to be deemed a rename. In order to make the
      hash signatures meaningful, we require at least four lines of data to be
      present, resulting in at least four different hashes that can be
      compared. Files that are deemed too small are not cached at all and
      will thus be repeatedly re-hashed, which is usually not a huge issue.
      
      The issue with above heuristic is in case a file does _not_ have at
      least four lines, where a line is anything separated by a consecutive
      run of "\n" or "\0" characters. For example "a\nb" is two lines, but
      "a\0\0b" is also just two lines. Taken to the extreme, a file that has
      megabytes of consecutive space- or NUL-only may also be deemed as too
      small and thus not get cached. As a result, we will repeatedly load its
      blob, calculate its hash signature just to finally throw it away as we
      notice it's not of any value. When you've got a comparitively big file
      that you compare against a big set of potentially renamed files, then
      the cost simply expodes.
      
      The issue can be trivially fixed by introducing negative cache entries.
      Whenever we determine that a given blob does not have a meaningful
      representation via a hash signature, we store this negative cache marker
      and will from then on not hash it again, but also ignore it as a
      potential rename target. This should help the "normal" case already
      where you have a lot of small files as rename candidates, but in the
      above scenario it's savings are extraordinarily high.
      
      To verify we do not hit the issue anymore with described solution, this
      commit adds a test that uses the exact same setup described above with
      one 50 megabyte blob of '\0' characters and 1000 other files that get
      renamed. Without the negative cache:
      
      $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
      real    11m48.377s
      user    11m11.576s
      sys     0m35.187s
      
      And with the negative cache:
      
      $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
      real    0m1.972s
      user    0m1.851s
      sys     0m0.118s
      
      So this represents a ~350-fold performance improvement, but it obviously
      depends on how many files you have and how big the blob is. The test
      number were chosen in a way that one will immediately notice as soon as
      the bug resurfaces.
      Patrick Steinhardt committed
  3. 18 Jan, 2020 1 commit
  4. 20 Jul, 2019 1 commit
  5. 15 Jun, 2019 1 commit
  6. 13 Jun, 2019 1 commit
  7. 10 Jun, 2019 5 commits
  8. 14 Dec, 2018 1 commit
  9. 01 Dec, 2018 1 commit
  10. 19 Oct, 2018 1 commit
  11. 13 Jul, 2018 1 commit
    • treewide: remove use of C++ style comments · 9994cd3f
      C++ style comment ("//") are not specified by the ISO C90 standard and
      thus do not conform to it. While libgit2 aims to conform to C90, we did
      not enforce it until now, which is why quite a lot of these
      non-conforming comments have snuck into our codebase. Do a tree-wide
      conversion of all C++ style comments to the supported C style comments
      to allow us enforcing strict C90 compliance in a later commit.
      Patrick Steinhardt committed
  12. 06 Jul, 2018 1 commit
  13. 10 Jun, 2018 1 commit
  14. 04 Feb, 2018 3 commits
  15. 21 Jan, 2018 2 commits
  16. 04 Dec, 2017 1 commit
  17. 11 Nov, 2017 1 commit
  18. 09 Feb, 2017 1 commit
  19. 01 Jan, 2017 1 commit
  20. 26 May, 2016 1 commit
  21. 17 Mar, 2016 8 commits
  22. 07 Mar, 2016 1 commit
  23. 12 Feb, 2016 1 commit
  24. 11 Feb, 2016 1 commit
  25. 25 Nov, 2015 2 commits
    • merge: handle conflicts in recursive base building · 78859c63
      When building a recursive merge base, allow conflicts to occur.
      Use the file (with conflict markers) as the common ancestor.
      
      The user has already seen and dealt with this conflict by virtue
      of having a criss-cross merge.  If they resolved this conflict
      identically in both branches, then there will be no conflict in the
      result.  This is the best case scenario.
      
      If they did not resolve the conflict identically in the two branches,
      then we will generate a new conflict.  If the user is simply using
      standard conflict output then the results will be fairly sensible.
      But if the user is using a mergetool or using diff3 output, then the
      common ancestor will be a conflict file (itself with diff3 output,
      haha!).  This is quite terrible, but it matches git's behavior.
      Edward Thomson committed