- 03 Jun, 2020 1 commit
-
-
When computing renames, we cache the hash signatures for each of the potentially conflicting entries so that we do not need to repeatedly read the file and can at least halfway efficiently determine whether two files are similar enough to be deemed a rename. In order to make the hash signatures meaningful, we require at least four lines of data to be present, resulting in at least four different hashes that can be compared. Files that are deemed too small are not cached at all and will thus be repeatedly re-hashed, which is usually not a huge issue. The issue with above heuristic is in case a file does _not_ have at least four lines, where a line is anything separated by a consecutive run of "\n" or "\0" characters. For example "a\nb" is two lines, but "a\0\0b" is also just two lines. Taken to the extreme, a file that has megabytes of consecutive space- or NUL-only may also be deemed as too small and thus not get cached. As a result, we will repeatedly load its blob, calculate its hash signature just to finally throw it away as we notice it's not of any value. When you've got a comparitively big file that you compare against a big set of potentially renamed files, then the cost simply expodes. The issue can be trivially fixed by introducing negative cache entries. Whenever we determine that a given blob does not have a meaningful representation via a hash signature, we store this negative cache marker and will from then on not hash it again, but also ignore it as a potential rename target. This should help the "normal" case already where you have a lot of small files as rename candidates, but in the above scenario it's savings are extraordinarily high. To verify we do not hit the issue anymore with described solution, this commit adds a test that uses the exact same setup described above with one 50 megabyte blob of '\0' characters and 1000 other files that get renamed. Without the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 11m48.377s user 11m11.576s sys 0m35.187s And with the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 0m1.972s user 0m1.851s sys 0m0.118s So this represents a ~350-fold performance improvement, but it obviously depends on how many files you have and how big the blob is. The test number were chosen in a way that one will immediately notice as soon as the bug resurfaces.
Patrick Steinhardt committed
-
- 01 Apr, 2020 1 commit
-
-
Release v1.0
Patrick Steinhardt committed
-
- 28 Mar, 2020 2 commits
-
-
Patrick Steinhardt committed
-
Patrick Steinhardt committed
-
- 26 Mar, 2020 7 commits
-
-
refdb_backend: improve callback documentation
Patrick Steinhardt committed -
credentials: provide backcompat for opaque structs
Patrick Steinhardt committed -
The credential structures are now opaque and defined in `sys/credential.h`. However, we should continue to provide them for backward compatibility, unless `GIT_DEPRECATED_HARD` is set.
Edward Thomson committed -
Fix segfault when calling git_blame_buffer()
Edward Thomson committed -
The callbacks are currently sparsely documented, making it really hard to implement a new backend without taking a look at the existing refdb_fs backend. Add documentation to make this task hopefully easier to achieve.
Patrick Steinhardt committed -
Fix spelling error
Patrick Steinhardt committed -
Signed-off-by: Utkarsh Gupta <utkarsh@debian.org>
Utkarsh Gupta committed
-
- 23 Mar, 2020 3 commits
-
-
This change makes sure that the hunk is not null before trying to dereference it. This avoids segfaults, especially when blaming against a modified buffer (i.e. the index). Fixes: #5443
lhchavez committed -
refdb_fs: initialize backend version
Edward Thomson committed -
repository: improve commondir docs
Edward Thomson committed
-
- 22 Mar, 2020 1 commit
-
-
While the `git_refdb_backend()` struct has a version, we do not initialize it correctly when calling `git_refdb_backend_fs()`. Fix this by adding the call to `git_refdb_init_backend()`.
Patrick Steinhardt committed
-
- 21 Mar, 2020 1 commit
-
-
cmake: use install directories provided via GNUInstallDirs
Edward Thomson committed
-
- 18 Mar, 2020 1 commit
-
-
azure: fix errors due to curl and removal of old VM images
Edward Thomson committed
-
- 17 Mar, 2020 1 commit
-
-
mbedTLS has fixed their certificate.
🎉 Edward Thomson committed
-
- 14 Mar, 2020 1 commit
-
-
We currently hand-code logic to configure where to install our artifacts via the `LIB_INSTALL_DIR`, `INCLUDE_INSTALL_DIR` and `BIN_INSTALL_DIR` variables. This is reinventing the wheel, as CMake already provide a way to do that via `CMAKE_INSTALL_<DIR>` paths, e.g. `CMAKE_INSTALL_LIB`. This requires users of libgit2 to know about the discrepancy and will require special hacks for any build systems that handle these variables in an automated way. One such example is Gentoo Linux, which sets up these paths in both the cmake and cmake-utils eclass. So let's stop doing that: the GNUInstallDirs module handles it in a better way for us, especially so as the actual values are dependent on CMAKE_INSTALL_PREFIX. This commit removes our own set of variables and instead refers users to use the standard ones. As a second benefit, this commit also fixes our pkgconfig generation to use the GNUInstallDirs module. We had a bug there where we ignored the CMAKE_INSTALL_PREFIX when configuring the libdir and includedir keys, so if libdir was set to "lib64", then libdir would be an invalid path. With GNUInstallDirs, we can now use `CMAKE_INSTALL_FULL_LIBDIR`, which handles the prefix for us.
Patrick Steinhardt committed
-
- 13 Mar, 2020 6 commits
-
-
We currently hve some problems with our curl downloads when building Docker images. It's not quite obvious what the problem is and they seem to occur semi-randomly. To unblock our CI, let's add the "--insecure" flag to curl to ignore any certificate errors. This is intended as a temporary solution only.
Patrick Steinhardt committed -
Azure is phasing out old images on March 23rd 2020, but we're currently still using them. So let's upgrade images as following: - Ubuntu 16.04 -> ubuntu-18.04 - macOS 10.13 -> macOS-10.15 - Hosted Windows machines -> vs2017-win2016 Each of them is currently the latest version. As the new Microsoft Windows machine has upgraded to MSVS2017, we need to also adjust our CMake generators to "Visual Studio 15 2017". As this CMake generator doesn't accept the target platform name anymore, we instead need to set it up via either "-A Win32" or "-A x64". [1]: https://devblogs.microsoft.com/devops/removing-older-images-in-azure-pipelines-hosted-pools/
Patrick Steinhardt committed -
Our Docurium builds currently depend on Debian Jessie, which has CMake v3.0 available. As rugged has bumped its CMake requirements to need at least v3.5 now, the documentation build is thus failing. Fix this by converting our Docurium Docker image to be based on Ubuntu Bionic. We already do base all of our images on Ubuntu, so I don't see any sense in using Debian here. If this was only to speed up builds, we should just go all the way and use some minimal container like Alpine anyway. Also remove cache busters. As we're rebuilding the image every time, it's we really don't need them at all.
Patrick Steinhardt committed -
We currently pass the "--silent" flag to most invocations of curl, but in fact this does not only suppress the progress meter, but also any errors. So let's also pass "--show-error", too.
Patrick Steinhardt committed -
The `CC_MD4()` function has been deprecated in macOS 10.15. Silence this warning for now until we implement a proper fix.
Patrick Steinhardt committed -
The Secure Transport interface we're currently using has been deprecated with macOS 10.15. As we're currently in code freeze, we cannot migrate to newer interfaces. As such, let's disable deprecation warnings for our "schannel.c" stream.
Patrick Steinhardt committed
-
- 10 Mar, 2020 4 commits
-
-
win32: don't canonicalize relative paths
Edward Thomson committed -
Ensure that we don't canonicalize symlink targets.
Patrick Steinhardt committed -
Don't canonicalize symlink targets; our win32 path canonicalization routines expect an absolute path. In particular, using the path canonicalization routines for symlink targets (introduced in commit 7d55bee6, "win32: fix relative symlinks pointing into dirs", 2020-01-10). Now, use the utf8 -> utf16 relative path handling functions, so that paths like "../foo" will be translated to "..\foo".
Edward Thomson committed -
Add a function that takes a (possibly) relative UTF-8 path and emits a UTF-16 path with forward slashes translated to backslashes. If the given path is, in fact, absolute, it will be translated to absolute path handling rules.
Edward Thomson committed
-
- 08 Mar, 2020 2 commits
-
-
The path canonicalization functions on win32 are intended to canonicalize absolute paths; those with prefixes. In other words, things start with drive letters (`C:\`), share names (`\\server\share`), or other prefixes (`\\?\`). This function removes leading `..` that occur after the prefix but before the directory/file portion (eg, turning `C:\..\..\..\foo` into `C:\foo`). This translation is not appropriate for local paths.
Edward Thomson committed -
Fixes #5428
Josh Bleecher Snyder committed
-
- 06 Mar, 2020 2 commits
-
-
CMake booleans
Edward Thomson committed -
Set proper pkg-config dependency for pcre2
Edward Thomson committed
-
- 05 Mar, 2020 2 commits
-
-
httpclient: use a 16kb read buffer for macOS
Patrick Steinhardt committed -
Use a 16kb read buffer for compatibility with macOS SecureTransport. SecureTransport `SSLRead` has the following behavior: 1. It will return _at most_ one TLS packet's worth of data, and 2. It will try to give you as much data as you asked for This means that if you call `SSLRead` with a buffer size that is smaller than what _it_ reads (in other words, the maximum size of a TLS packet), then it will buffer that data for subsequent calls. However, it will also attempt to give you as much data as you requested in your SSLRead call. This means that it will guarantee a network read in the event that it has buffered data. Consider our 8kb buffer and a server sending us 12kb of data on an HTTP Keep-Alive session. Our first `SSLRead` will read the TLS packet off the network. It will return us the 8kb that we requested and buffer the remaining 4kb. Our second `SSLRead` call will see the 4kb that's buffered and decide that it could give us an additional 4kb. So it will do a network read. But there's nothing left to read; that was the end of the data. The HTTP server is waiting for us to provide a new request. The server will eventually time out, our `read` system call will return, `SSLRead` can return back to us and we can make progress. While technically correct, this is wildly ineffecient. (Thanks, Tim Apple!) Moving us to use an internal buffer that is the maximum size of a TLS packet (16kb) ensures that `SSLRead` will never buffer and it will always return everything that it read (albeit decrypted).
Edward Thomson committed
-
- 03 Mar, 2020 1 commit
-
-
Signed-off-by: Igor Raits <i.gnatenko.brain@gmail.com>
Igor Gnatenko committed
-
- 02 Mar, 2020 2 commits
-
-
ci: provide globalsign certs for bionic
Edward Thomson committed -
tls.mbed.org has neglected to send their full certificate chain. Add their intermediate cert manually.
🙄 Edward Thomson committed
-
- 01 Mar, 2020 2 commits
-
-
deps: ntlmclient: fix htonll on big endian FreeBSD
Edward Thomson committed -
azure-pipelines: download GlobalSign's certificate manually
Edward Thomson committed
-