Commits · 4dfcc50fe88e06a49a4ccccb048d47fe6be01292 · lvzhengyang / git2

01 Apr, 2020 1 commit

merge: cache negative cache results for similarity metrics · 4dfcc50f

When computing renames, we cache the hash signatures for each of the
potentially conflicting entries so that we do not need to repeatedly
read the file and can at least halfway efficiently determine whether two
files are similar enough to be deemed a rename. In order to make the
hash signatures meaningful, we require at least four lines of data to be
present, resulting in at least four different hashes that can be
compared. Files that are deemed too small are not cached at all and
will thus be repeatedly re-hashed, which is usually not a huge issue.

The issue with above heuristic is in case a file does _not_ have at
least four lines, where a line is anything separated by a consecutive
run of "\n" or "\0" characters. For example "a\nb" is two lines, but
"a\0\0b" is also just two lines. Taken to the extreme, a file that has
megabytes of consecutive space- or NUL-only may also be deemed as too
small and thus not get cached. As a result, we will repeatedly load its
blob, calculate its hash signature just to finally throw it away as we
notice it's not of any value. When you've got a comparitively big file
that you compare against a big set of potentially renamed files, then
the cost simply expodes.

The issue can be trivially fixed by introducing negative cache entries.
Whenever we determine that a given blob does not have a meaningful
representation via a hash signature, we store this negative cache marker
and will from then on not hash it again, but also ignore it as a
potential rename target. This should help the "normal" case already
where you have a lot of small files as rename candidates, but in the
above scenario it's savings are extraordinarily high.

To verify we do not hit the issue anymore with described solution, this
commit adds a test that uses the exact same setup described above with
one 50 megabyte blob of '\0' characters and 1000 other files that get
renamed. Without the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real    11m48.377s
user    11m11.576s
sys     0m35.187s

And with the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real    0m1.972s
user    0m1.851s
sys     0m0.118s

So this represents a ~350-fold performance improvement, but it obviously
depends on how many files you have and how big the blob is. The test
number were chosen in a way that one will immediately notice as soon as
the bug resurfaces.

committed Apr 01, 2020

4dfcc50f Browse Files

26 Mar, 2020 7 commits
- Merge pull request #5464 from pks-t/pks/refdb-backend-docs · ca782c91
```
refdb_backend: improve callback documentation
```
  Patrick Steinhardt committed Mar 26, 2020
  ca782c91 Browse Files
- Merge pull request #5465 from libgit2/ethomson/cred_deprecation · 9a490318
```
credentials: provide backcompat for opaque structs
```
  Patrick Steinhardt committed Mar 26, 2020
  9a490318 Browse Files
- credentials: provide backcompat for opaque structs · fad840d7
```
The credential structures are now opaque and defined in
`sys/credential.h`.  However, we should continue to provide them for
backward compatibility, unless `GIT_DEPRECATED_HARD` is set.
```
  Edward Thomson committed Mar 26, 2020
  fad840d7 Browse Files
- Merge pull request #5445 from lhchavez/fix-5443 · bba9599a
```
Fix segfault when calling git_blame_buffer()
```
  Edward Thomson committed Mar 26, 2020
  bba9599a Browse Files
- refdb_backend: improve callback documentation · 3bbbe95a
```
The callbacks are currently sparsely documented, making it really hard
to implement a new backend without taking a look at the existing
refdb_fs backend. Add documentation to make this task hopefully easier
to achieve.
```
  Patrick Steinhardt committed Mar 26, 2020
  3bbbe95a Browse Files
- Merge pull request #5463 from utkarsh2102/spell-fix · 9d5016dc
```
Fix spelling error
```
  Patrick Steinhardt committed Mar 26, 2020
  9d5016dc Browse Files
- Fix spelling error · e7a1fd88
```
Signed-off-by: Utkarsh Gupta <utkarsh@debian.org>
```
  Utkarsh Gupta committed Mar 26, 2020
  e7a1fd88 Browse Files
23 Mar, 2020 3 commits
- Fix segfault when calling git_blame_buffer() · 62d59467
```
This change makes sure that the hunk is not null before trying to
dereference it. This avoids segfaults, especially when blaming against a
modified buffer (i.e. the index).

Fixes: #5443
```
  lhchavez committed Mar 23, 2020
  62d59467 Browse Files
- Merge pull request #5456 from pks-t/pks/refdb-fs-backend-version · dd435711
```
refdb_fs: initialize backend version
```
  Edward Thomson committed Mar 23, 2020
  dd435711 Browse Files
- Merge pull request #5444 from josharian/issue5428 · 43fb0c29
```
repository: improve commondir docs
```
  Edward Thomson committed Mar 23, 2020
  43fb0c29 Browse Files
22 Mar, 2020 1 commit

refdb_fs: initialize backend version · a2d3316a

While the `git_refdb_backend()` struct has a version, we do not
initialize it correctly when calling `git_refdb_backend_fs()`. Fix this
by adding the call to `git_refdb_init_backend()`.

committed Mar 22, 2020

a2d3316a Browse Files

21 Mar, 2020 1 commit
- Merge pull request #5455 from pks-t/pks/cmake-install-dirs · 9a102446
```
cmake: use install directories provided via GNUInstallDirs
```
  Edward Thomson committed Mar 21, 2020
  9a102446 Browse Files
18 Mar, 2020 1 commit
- Merge pull request #5451 from pks-t/pks/docker-curl · 44372ce5
```
azure: fix errors due to curl and removal of old VM images
```
  Edward Thomson committed Mar 18, 2020
  44372ce5 Browse Files
17 Mar, 2020 1 commit
- ci: don't use --insecure · 153199ae
```
mbedTLS has fixed their certificate.  🎉
```
  Edward Thomson committed Mar 17, 2020
  153199ae Browse Files
14 Mar, 2020 1 commit

cmake: use install directories provided via GNUInstallDirs · 87fc539f

We currently hand-code logic to configure where to install our artifacts
via the `LIB_INSTALL_DIR`, `INCLUDE_INSTALL_DIR` and `BIN_INSTALL_DIR`
variables. This is reinventing the wheel, as CMake already provide a way
to do that via `CMAKE_INSTALL_<DIR>` paths, e.g. `CMAKE_INSTALL_LIB`.
This requires users of libgit2 to know about the discrepancy and will
require special hacks for any build systems that handle these variables
in an automated way. One such example is Gentoo Linux, which sets up
these paths in both the cmake and cmake-utils eclass.

So let's stop doing that: the GNUInstallDirs module handles it in a
better way for us, especially so as the actual values are dependent on
CMAKE_INSTALL_PREFIX. This commit removes our own set of variables and
instead refers users to use the standard ones.

As a second benefit, this commit also fixes our pkgconfig generation to
use the GNUInstallDirs module. We had a bug there where we ignored the
CMAKE_INSTALL_PREFIX when configuring the libdir and includedir keys, so
if libdir was set to "lib64", then libdir would be an invalid path. With
GNUInstallDirs, we can now use `CMAKE_INSTALL_FULL_LIBDIR`, which
handles the prefix for us.

committed Mar 14, 2020

87fc539f Browse Files

13 Mar, 2020 6 commits

azure: docker: use insecure flag to fix curl downloads · 8621bdda

We currently hve some problems with our curl downloads when building
Docker images. It's not quite obvious what the problem is and they seem
to occur semi-randomly. To unblock our CI, let's add the "--insecure"
flag to curl to ignore any certificate errors. This is intended as a
temporary solution only.

committed Mar 13, 2020

8621bdda Browse Files

azure: upgrade to newer hosted VM images · 95f329b4

Azure is phasing out old images on March 23rd 2020, but we're currently
still using them. So let's upgrade images as following:

    - Ubuntu 16.04 -> ubuntu-18.04
    - macOS 10.13 -> macOS-10.15
    - Hosted Windows machines -> vs2017-win2016

Each of them is currently the latest version. As the new Microsoft
Windows machine has upgraded to MSVS2017, we need to also adjust our
CMake generators to "Visual Studio 15 2017". As this CMake generator
doesn't accept the target platform name anymore, we instead need to set
it up via either "-A Win32" or "-A x64".

[1]: https://devblogs.microsoft.com/devops/removing-older-images-in-azure-pipelines-hosted-pools/

committed Mar 13, 2020

95f329b4 Browse Files

azure: docurium: fix build failure due to bumped CMake requirements · 5ac33ced

Our Docurium builds currently depend on Debian Jessie, which has CMake
v3.0 available. As rugged has bumped its CMake requirements to need at
least v3.5 now, the documentation build is thus failing.

Fix this by converting our Docurium Docker image to be based on Ubuntu
Bionic. We already do base all of our images on Ubuntu, so I don't see
any sense in using Debian here. If this was only to speed up builds, we
should just go all the way and use some minimal container like Alpine
anyway.

Also remove cache busters. As we're rebuilding the image every time,
it's we really don't need them at all.

committed Mar 13, 2020

5ac33ced Browse Files

azure: docker: consistently silence curl but show errors · c76c1e87

We currently pass the "--silent" flag to most invocations of curl, but
in fact this does not only suppress the progress meter, but also any
errors. So let's also pass "--show-error", too.

committed Mar 13, 2020

c76c1e87 Browse Files

ntlmclient: silence deprecation warnings for CommonCrypto backend · f2e43a87
```
The `CC_MD4()` function has been deprecated in macOS 10.15. Silence this
warning for now until we implement a proper fix.
```
Patrick Steinhardt committed Mar 13, 2020
f2e43a87 Browse Files

cmake: ignore deprecation notes for Secure Transport · b1f6481f

The Secure Transport interface we're currently using has been deprecated
with macOS 10.15. As we're currently in code freeze, we cannot migrate
to newer interfaces. As such, let's disable deprecation warnings for
our "schannel.c" stream.

committed Mar 13, 2020

b1f6481f Browse Files

10 Mar, 2020 4 commits

Merge pull request #5435 from libgit2/ethomson/canonical · be36db28
```
win32: don't canonicalize relative paths
```
Edward Thomson committed Mar 10, 2020
be36db28 Browse Files
win32: test relative symlinks · 163db8f2
```
Ensure that we don't canonicalize symlink targets.
```
Patrick Steinhardt committed Mar 10, 2020
163db8f2 Browse Files

win32: don't canonicalize symlink targets · 43d7a42b

Don't canonicalize symlink targets; our win32 path canonicalization
routines expect an absolute path.  In particular, using the path
canonicalization routines for symlink targets (introduced in commit
7d55bee6, "win32: fix relative symlinks pointing into dirs",
2020-01-10).

Now, use the utf8 -> utf16 relative path handling functions, so that
paths like "../foo" will be translated to "..\foo".

committed Mar 10, 2020

43d7a42b Browse Files

win32: introduce relative path handling function · f2b114ba

Add a function that takes a (possibly) relative UTF-8 path and emits a
UTF-16 path with forward slashes translated to backslashes.  If the
given path is, in fact, absolute, it will be translated to absolute path
handling rules.

committed Mar 10, 2020

f2b114ba Browse Files

08 Mar, 2020 2 commits

win32: clarify usage of path canonicalization funcs · fb7da154

The path canonicalization functions on win32 are intended to
canonicalize absolute paths; those with prefixes.  In other words,
things start with drive letters (`C:\`), share names (`\\server\share`),
or other prefixes (`\\?\`).

This function removes leading `..` that occur after the prefix but
before the directory/file portion (eg, turning `C:\..\..\..\foo` into
`C:\foo`).  This translation is not appropriate for local paths.

committed Mar 08, 2020

fb7da154 Browse Files

repository: improve commondir docs · a5886e9e
```
Fixes #5428
```
Josh Bleecher Snyder committed Mar 07, 2020
a5886e9e Browse Files

06 Mar, 2020 2 commits
- Merge pull request #5422 from pks-t/pks/cmake-booleans · e23b8b44
```
CMake booleans
```
  Edward Thomson committed Mar 06, 2020
  e23b8b44 Browse Files
- Merge pull request #5439 from ignatenkobrain/patch-2 · 8eb1fc36
```
Set proper pkg-config dependency for pcre2
```
  Edward Thomson committed Mar 06, 2020
  8eb1fc36 Browse Files
05 Mar, 2020 2 commits

Merge pull request #5432 from libgit2/ethomson/sslread · 76e45960
```
httpclient: use a 16kb read buffer for macOS
```
Patrick Steinhardt committed Mar 05, 2020
76e45960 Browse Files

httpclient: use a 16kb read buffer for macOS · 502e5d51

Use a 16kb read buffer for compatibility with macOS SecureTransport.

SecureTransport `SSLRead` has the following behavior:

1. It will return _at most_ one TLS packet's worth of data, and
2. It will try to give you as much data as you asked for

This means that if you call `SSLRead` with a buffer size that is smaller
than what _it_ reads (in other words, the maximum size of a TLS packet),
then it will buffer that data for subsequent calls.  However, it will
also attempt to give you as much data as you requested in your SSLRead
call.  This means that it will guarantee a network read in the event
that it has buffered data.

Consider our 8kb buffer and a server sending us 12kb of data on an HTTP
Keep-Alive session.  Our first `SSLRead` will read the TLS packet off
the network.  It will return us the 8kb that we requested and buffer the
remaining 4kb.  Our second `SSLRead` call will see the 4kb that's
buffered and decide that it could give us an additional 4kb.  So it will
do a network read.

But there's nothing left to read; that was the end of the data.  The
HTTP server is waiting for us to provide a new request.  The server will
eventually time out, our `read` system call will return, `SSLRead` can
return back to us and we can make progress.

While technically correct, this is wildly ineffecient.  (Thanks, Tim
Apple!)

Moving us to use an internal buffer that is the maximum size of a TLS
packet (16kb) ensures that `SSLRead` will never buffer and it will
always return everything that it read (albeit decrypted).

committed Mar 04, 2020

502e5d51 Browse Files

03 Mar, 2020 1 commit
- Set proper pkg-config dependency for pcre2 · dd704944
```
Signed-off-by: Igor Raits <i.gnatenko.brain@gmail.com>
```
  Igor Gnatenko committed Mar 03, 2020
  dd704944 Browse Files
02 Mar, 2020 2 commits
- Merge pull request #5437 from libgit2/ethomson/azp · cd6ed4e4
```
ci: provide globalsign certs for bionic
```
  Edward Thomson committed Mar 02, 2020
  cd6ed4e4 Browse Files
- ci: provide globalsign certs for bionic · dc55d0e8
```
tls.mbed.org has neglected to send their full certificate chain.  Add
their intermediate cert manually.  🙄
```
  Edward Thomson committed Mar 02, 2020
  dc55d0e8 Browse Files
01 Mar, 2020 3 commits
- Merge pull request #5426 from pks-t/pks/freebsd-htobe64 · 6d25dbdc
```
deps: ntlmclient: fix htonll on big endian FreeBSD
```
  Edward Thomson committed Mar 01, 2020
  6d25dbdc Browse Files
- Merge pull request #5433 from libgit2/ethomson/azp · 8c1aef10
```
azure-pipelines: download GlobalSign's certificate manually
```
  Edward Thomson committed Mar 01, 2020
  8c1aef10 Browse Files
- ci: provide globalsign certs · 0f316d59
```
tls.mbed.org has neglected to send their full certificate chain.  Add
their intermediate cert manually.  🙄
```
  Edward Thomson committed Mar 01, 2020
  0f316d59 Browse Files
26 Feb, 2020 1 commit

deps: ntlmclient: fix htonll on big endian FreeBSD · c690136c

In commit 3828ea67 (deps: ntlmclient: fix missing htonll symbols on
FreeBSD and SunOS, 2020-02-21), we've fixed compilation on BSDs due to
missing `htonll` wrappers. While we are now using `htobe64` for both
Linux and OpenBSD, we decided to use `bswap64` on FreeBSD. While correct
on little endian systems, where we will swap from little- to big-endian,
we will also do the swap on big endian systems. As a result, we do not
use network byte order on such systems.

Fix the issue by using htobe64, as well.

committed Feb 26, 2020

c690136c Browse Files

25 Feb, 2020 1 commit
- Merge pull request #5417 from pks-t/pks/ntlmclient-htonll · a48da8fa
```
deps: ntlmclient: fix missing htonll symbols on FreeBSD and SunOS
```
  Patrick Steinhardt committed Feb 25, 2020
  a48da8fa Browse Files