- 03 Jan, 2018 1 commit
-
-
When generating a diff between two trees, for each file that is to be diffed we have to determine whether it shall be treated as text or as binary files. While git has heuristics to determine which kind of diff to generate, users can also that default behaviour by setting or unsetting the 'diff' attribute for specific files. Because of that, we have to query gitattributes in order to determine how to diff the current files. Instead of hitting the '.gitattributes' file every time we need to query an attribute, which can get expensive especially on networked file systems, we try to cache them instead. This works perfectly fine for every '.gitattributes' file that is found, but we hit cache invalidation problems when we determine that an attribuse file is _not_ existing. We do create an entry in the cache for missing '.gitattributes' files, but as soon as we hit that file again we invalidate it and stat it again to see if it has now appeared. In the case of diffing large trees with each other, this behaviour is very suboptimal. For each pair of files that is to be diffed, we will repeatedly query every directory component leading towards their respective location for an attributes file. This leads to thousands or even hundreds of thousands of wasted syscalls. The attributes cache already has a mechanism to help in that scenario in form of the `git_attr_session`. As long as the same attributes session is still active, we will not try to re-query the gitmodules files at all but simply retain our currently cached results. To fix our problem, we can create a session at the top-most level, which is the initialization of the `git_diff` structure, and use it in order to look up the correct diff driver. As the `git_diff` structure is used to generate patches for multiple files at once, this neatly solves our problem by retaining the session until patches for all files have been generated. The fix has been tested with linux.git by calling `git_diff_tree_to_tree` and `git_diff_to_buf` with v4.10^{tree} and v4.14^{tree}. | time | .gitattributes stats without fix | 33.201s | 844614 with fix | 30.327s | 4441 While execution only improved by roughly 10%, the stat(3) syscalls for .gitattributes files decreased by 99.5%. The benchmarks were quite simple with best-of-three timings on Linux ext4 systems. One can assume that for network based file systems the performance gain will be a lot larger due to a much higher latency.
Patrick Steinhardt committed
-
- 01 Jan, 2018 1 commit
-
-
winhttp: properly support ntlm and negotiate
Edward Thomson committed
-
- 30 Dec, 2017 6 commits
-
-
Support using notes via a commit rather than a ref
Edward Thomson committed -
Transfer fewer objects on push and local fetch
Edward Thomson committed -
refs: traverse symlinked directories
Edward Thomson committed -
Inflate large loose blobs
Edward Thomson committed -
Ensure that we can recurse into directories via symbolic links.
Edward Thomson committed -
Perform some error checking when examining symlink directories.
Edward Thomson committed
-
- 29 Dec, 2017 2 commits
-
-
Native Git allows symlinked directories under .git/refs. This change allows libgit2 to also look for references that live under symlinked directories. Signed-off-by: Andy Doan <andy@opensourcefoundries.com>
Andy Doan committed -
When parsing unauthorized responses, properly parse headers looking for both NTLM and Negotiate challenges. Set the HTTP credentials to default credentials (using a `NULL` username and password) with the schemes supported by ourselves and the server.
Edward Thomson committed
-
- 28 Dec, 2017 1 commit
-
-
FETCH_HEAD and multiple refspecs
Edward Thomson committed
-
- 26 Dec, 2017 4 commits
-
-
Carlos Martín Nieto committed
-
We treat each refspec on its own, but the code currently overwrites the contents of FETCH_HEAD so we end up with the entries for the last refspec we processed. Instead, truncate it before performing the updates and append to it when updating the references.
Carlos Martín Nieto committed -
We want to do this in order to get FETCH_HEAD to be empty when we start updating it due to fetching from the remote.
Carlos Martín Nieto committed -
Carlos Martín Nieto committed
-
- 23 Dec, 2017 7 commits
-
-
patch_parse: fix parsing unquoted filenames with spaces
Edward Thomson committed -
Fix unpack double free
Edward Thomson committed -
If an element has been cached, but then the call to packfile_unpack_compressed() fails, the very next thing that happens is that its data is freed and then the element is not removed from the cache, which frees the data again. This change sets obj->data to NULL to avoid the double-free. It also stops trying to resolve deltas after two continuous failed rounds of resolution, and adds a test for this.
lhchavez committed -
Free OpenSSL peer certificate
Edward Thomson committed -
libFuzzer: Prevent a potential shift overflow
Edward Thomson committed -
cmake: let USE_ICONV be optional on macOS
Edward Thomson committed -
Do not attempt to check out submodule as blob when merging a submodule modify/deltete conflict
Edward Thomson committed
-
- 20 Dec, 2017 10 commits
-
-
Writing very large files may be slow, particularly on inefficient filesystems and when running instrumented code to detect invalid memory accesses (eg within valgrind or similar tools). Introduce `GITTEST_SLOW` so that tests that are slow can be skipped by the CI system.
Edward Thomson committed -
Teach the CommonCrypto hash mechanisms to support large files. The hash primitives take a `CC_LONG` (aka `uint32_t`) at a time. So loop to give the hash function at most an unsigned 32 bit's worth of bytes until we have hashed the entire file.
Edward Thomson committed -
Teach the win32 hash mechanisms to support large files. The hash primitives take at most `ULONG_MAX` bytes at a time. Loop, giving the hash function the maximum supported number of bytes, until we have hashed the entire file.
Edward Thomson committed -
Check the size of objects being read from the loose odb backend and reject those that would not fit in memory with an error message that reflects the actual problem, instead of error'ing later with an unintuitive error message regarding truncation or invalid hashes.
Edward Thomson committed -
Instead of paging to zlib in INT_MAX sized chunks, we can give it as many as UINT_MAX bytes at a time. zlib doesn't care how big a buffer we give it, this simply results in fewer calls into zlib.
Edward Thomson committed -
zlib will only inflate/deflate an `int`s worth of data at a time. We need to loop through large files in order to ensure that we inflate the entire file, not just an `int`s worth of data. Thankfully, we already have this loop in our `git_zstream` layer. Handle large objects using the `git_zstream`.
Edward Thomson committed -
Introduce an internal API to get the object type based on a length-specified (not null terminated) string representation. This can be used to compare the (space terminated) object type name in a loose object. Reimplement `git_object_string2type` based on this API.
Edward Thomson committed -
Introduce a test for very large objects in the ODB. Write a large object (5 GB) and ensure that the write succeeds and provides us the expected object ID. Introduce a test that writes that file and ensures that we can subsequently read it.
Edward Thomson committed -
Introduce `git_prefixncmp` that will search up to the first `n` characters of a string to see if it is prefixed by another string. This is useful for examining if a non-null terminated character array is prefixed by a particular substring. Consolidate the various implementations of `git__prefixcmp` around a single core implementation and add some test cases to validate its behavior.
Edward Thomson committed -
zlib will return `Z_BUF_ERROR` whenever there is more input to inflate or deflate than there is output to store the result. This is normal for us as we iterate through the input, particularly with very large input buffers.
Edward Thomson committed
-
- 19 Dec, 2017 1 commit
-
-
Add Jonathan Tan to git.git-authors
Edward Thomson committed
-
- 18 Dec, 2017 1 commit
-
-
Jonathan has consented via email to have his contributions to git reused in libgit2
Charlie Somerville committed
-
- 16 Dec, 2017 1 commit
-
-
diff_file: properly refcount blobs when initializing file contents
Edward Thomson committed
-
- 15 Dec, 2017 5 commits
-
-
Per SSL_get_peer_certificate docs: ``` The reference count of the X509 object is incremented by one, so that it will not be destroyed when the session containing the peer certificate is freed. The X509 object must be explicitly freed using X509_free(). ```
Etienne Samson committed -
This makes it easier to cleanup allocated resources on exit.
Etienne Samson committed -
lhchavez committed
-
libFuzzer: Fix missing trailer crash
Patrick Steinhardt committed -
When initializing a `git_diff_file_content` from a source whose data is derived from a blob, we simply assign the blob's pointer to the resulting struct without incrementing its refcount. Thus, the structure can only be used as long as the blob is kept alive by the caller. Fix the issue by using `git_blob_dup` instead of a direct assignment. This function will increment the refcount of the blob without allocating new memory, so it does exactly what we want. As `git_diff_file_content__unload` already frees the blob when `GIT_DIFF_FLAG__FREE_BLOB` is set, we don't need to add new code handling the free but only have to set that flag correctly.
Patrick Steinhardt committed
-