Commits · 322c15ee858622f2e3def514d3e7e1b47023950e · lvzhengyang / git2

12 May, 2020 1 commit

tests: merge: fix printf formatter on 32 bit arches · 0cf9b666

We currently use `PRIuMAX` to print an integer of type `size_t` in
merge::trees::rename::cache_recomputation. While this works just fine on
64 bit arches, it doesn't on 32 bit ones. As a result, our nightly
builds on x86 and arm32 fail.

Fix the issue by using `PRIuZ` instead.

committed 4 years ago

0cf9b666 Browse File

01 Apr, 2020 1 commit

merge: cache negative cache results for similarity metrics · 4dfcc50f

When computing renames, we cache the hash signatures for each of the
potentially conflicting entries so that we do not need to repeatedly
read the file and can at least halfway efficiently determine whether two
files are similar enough to be deemed a rename. In order to make the
hash signatures meaningful, we require at least four lines of data to be
present, resulting in at least four different hashes that can be
compared. Files that are deemed too small are not cached at all and
will thus be repeatedly re-hashed, which is usually not a huge issue.

The issue with above heuristic is in case a file does _not_ have at
least four lines, where a line is anything separated by a consecutive
run of "\n" or "\0" characters. For example "a\nb" is two lines, but
"a\0\0b" is also just two lines. Taken to the extreme, a file that has
megabytes of consecutive space- or NUL-only may also be deemed as too
small and thus not get cached. As a result, we will repeatedly load its
blob, calculate its hash signature just to finally throw it away as we
notice it's not of any value. When you've got a comparitively big file
that you compare against a big set of potentially renamed files, then
the cost simply expodes.

The issue can be trivially fixed by introducing negative cache entries.
Whenever we determine that a given blob does not have a meaningful
representation via a hash signature, we store this negative cache marker
and will from then on not hash it again, but also ignore it as a
potential rename target. This should help the "normal" case already
where you have a lot of small files as rename candidates, but in the
above scenario it's savings are extraordinarily high.

To verify we do not hit the issue anymore with described solution, this
commit adds a test that uses the exact same setup described above with
one 50 megabyte blob of '\0' characters and 1000 other files that get
renamed. Without the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real    11m48.377s
user    11m11.576s
sys     0m35.187s

And with the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real    0m1.972s
user    0m1.851s
sys     0m0.118s

So this represents a ~350-fold performance improvement, but it obviously
depends on how many files you have and how big the blob is. The test
number were chosen in a way that one will immediately notice as soon as
the bug resurfaces.

committed 4 years ago

4dfcc50f Browse File

20 Jul, 2019 1 commit

fileops: rename to "futils.h" to match function signatures · e54343a4

Our file utils functions all have a "futils" prefix, e.g.
`git_futils_touch`. One would thus naturally guess that their
definitions and implementation would live in files "futils.h" and
"futils.c", respectively, but in fact they live in "fileops.h".

Rename the files to match expectations.

committed 5 years ago

e54343a4 Browse File

13 Jul, 2018 1 commit

treewide: remove use of C++ style comments · 9994cd3f

C++ style comment ("//") are not specified by the ISO C90 standard and
thus do not conform to it. While libgit2 aims to conform to C90, we did
not enforce it until now, which is why quite a lot of these
non-conforming comments have snuck into our codebase. Do a tree-wide
conversion of all C++ style comments to the supported C style comments
to allow us enforcing strict C90 compliance in a later commit.

committed 6 years ago

9994cd3f Browse File

09 Feb, 2017 1 commit

merge_trees: introduce test for submodule renames · 49806e9b

Test that shows that submodules are incorrectly considered in renames,
and `git_merge_trees` will fail to lookup the submodule as a blob.

committed 7 years ago

49806e9b Browse File

01 Jan, 2017 1 commit
- merge: set default rename threshold · 19ed4d0c
```
When `GIT_MERGE_FIND_RENAMES` is set, provide a default for
`rename_threshold` when it is unset.
```
  Edward Thomson committed 8 years ago
  19ed4d0c Browse File
20 Mar, 2014 1 commit
- Update git_merge_tree_opts to git_merge_options · 5aa2ac6d
  Edward Thomson committed 10 years ago
  
  5aa2ac6d Browse File
14 Nov, 2013 1 commit
- Rename tests-clar to tests · 17820381
  Ben Straub committed 11 years ago
  
  17820381 Browse File
15 May, 2013 1 commit
- Fix trailing whitespaces · 1fed6b07
  nulltoken committed 11 years ago
  
  1fed6b07 Browse Directory
30 Apr, 2013 1 commit
- renames! · 0462fba5
  Edward Thomson committed 11 years ago
  
  0462fba5 Browse Directory