Commits · d8896bda5c43616f3c755242703fce7c2a97ad67 · lvzhengyang / git2

03 Jan, 2018 1 commit

diff_generate: avoid excessive stats of .gitattribute files · d8896bda

When generating a diff between two trees, for each file that is to be
diffed we have to determine whether it shall be treated as text or as
binary files. While git has heuristics to determine which kind of diff
to generate, users can also that default behaviour by setting or
unsetting the 'diff' attribute for specific files.

Because of that, we have to query gitattributes in order to determine
how to diff the current files. Instead of hitting the '.gitattributes'
file every time we need to query an attribute, which can get expensive
especially on networked file systems, we try to cache them instead. This
works perfectly fine for every '.gitattributes' file that is found, but
we hit cache invalidation problems when we determine that an attribuse
file is _not_ existing. We do create an entry in the cache for missing
'.gitattributes' files, but as soon as we hit that file again we
invalidate it and stat it again to see if it has now appeared.

In the case of diffing large trees with each other, this behaviour is
very suboptimal. For each pair of files that is to be diffed, we will
repeatedly query every directory component leading towards their
respective location for an attributes file. This leads to thousands or
even hundreds of thousands of wasted syscalls.

The attributes cache already has a mechanism to help in that scenario in
form of the `git_attr_session`. As long as the same attributes session
is still active, we will not try to re-query the gitmodules files at all
but simply retain our currently cached results. To fix our problem, we
can create a session at the top-most level, which is the initialization
of the `git_diff` structure, and use it in order to look up the correct
diff driver. As the `git_diff` structure is used to generate patches for
multiple files at once, this neatly solves our problem by retaining the
session until patches for all files have been generated.

The fix has been tested with linux.git by calling
`git_diff_tree_to_tree` and `git_diff_to_buf` with v4.10^{tree} and
v4.14^{tree}.

                | time    | .gitattributes stats
    without fix | 33.201s | 844614
    with fix    | 30.327s | 4441

While execution only improved by roughly 10%, the stat(3) syscalls for
.gitattributes files decreased by 99.5%. The benchmarks were quite
simple with best-of-three timings on Linux ext4 systems. One can assume
that for network based file systems the performance gain will be a lot
larger due to a much higher latency.

committed Jan 03, 2018

d8896bda Browse Files

01 Jan, 2018 1 commit
- Merge pull request #4453 from libgit2/ethomson/spnego · 7610638e
```
winhttp: properly support ntlm and negotiate
```
  Edward Thomson committed Jan 01, 2018
  7610638e Browse Files
30 Dec, 2017 6 commits
- Merge pull request #4159 from richardipsum/notes-commit · d6210245
```
Support using notes via a commit rather than a ref
```
  Edward Thomson committed Dec 30, 2017
  d6210245 Browse Files
- Merge pull request #4028 from chescock/improve-local-fetch · 8cdf439b
```
Transfer fewer objects on push and local fetch
```
  Edward Thomson committed Dec 30, 2017
  8cdf439b Browse Files
- Merge pull request #4455 from libgit2/ethomson/branch_symlinks · 2b7a3393
```
refs: traverse symlinked directories
```
  Edward Thomson committed Dec 30, 2017
  2b7a3393 Browse Files
- Merge pull request #4443 from libgit2/ethomson/large_loose_blobs · e14bf97e
```
Inflate large loose blobs
```
  Edward Thomson committed Dec 30, 2017
  e14bf97e Browse Files
- refs:iterator: add tests to recurse symlinks · 7a830f28
```
Ensure that we can recurse into directories via symbolic links.
```
  Edward Thomson committed Dec 30, 2017
  7a830f28 Browse Files
- iterator: cleanups with symlink dir handling · 9e94b6af
```
Perform some error checking when examining symlink directories.
```
  Edward Thomson committed Dec 30, 2017
  9e94b6af Browse Files
29 Dec, 2017 2 commits

branches: Check symlinked subdirectories · e9628e7b

 Native Git allows symlinked directories under .git/refs. This
 change allows libgit2 to also look for references that live under
 symlinked directories.

Signed-off-by: Andy Doan <andy@opensourcefoundries.com>

committed Dec 29, 2017

e9628e7b Browse Files

winhttp: properly support ntlm and negotiate · 526dea1c

When parsing unauthorized responses, properly parse headers looking for
both NTLM and Negotiate challenges. Set the HTTP credentials to default
credentials (using a `NULL` username and password) with the schemes
supported by ourselves and the server.

committed Dec 29, 2017

526dea1c Browse Files

28 Dec, 2017 1 commit
- Merge pull request #4021 from carlosmn/cmn/refspecs-fetchhead · 083b1a2e
```
FETCH_HEAD and multiple refspecs
```
  Edward Thomson committed Dec 28, 2017
  083b1a2e Browse Files
26 Dec, 2017 4 commits
- fetch: go over FETCH_HEAD just once when counting the prefixes in test · c081f0d0
  Carlos Martín Nieto committed Dec 26, 2017
  
  c081f0d0 Browse Files
- remote: append to FETCH_HEAD rather than overwrite for each refspec · 1b4fbf2e
```
We treat each refspec on its own, but the code currently overwrites the contents
of FETCH_HEAD so we end up with the entries for the last refspec we processed.

Instead, truncate it before performing the updates and append to it when
updating the references.
```
  Carlos Martín Nieto committed Dec 26, 2017
  1b4fbf2e Browse Files
- futils: add a function to truncate a file · 3ccc1a4d
```
We want to do this in order to get FETCH_HEAD to be empty when we start updating
it due to fetching from the remote.
```
  Carlos Martín Nieto committed Dec 26, 2017
  3ccc1a4d Browse Files
- fetch: add a failing test for FETCH_HEAD with multiple fetch refspecs · c0bfda87
  Carlos Martín Nieto committed Dec 26, 2017
  
  c0bfda87 Browse Files
23 Dec, 2017 7 commits
- Merge pull request #4285 from pks-t/pks/patches-with-whitespace · 4110fc84
```
patch_parse: fix parsing unquoted filenames with spaces
```
  Edward Thomson committed Dec 23, 2017
  4110fc84 Browse Files
- Merge pull request #4045 from lhchavez/fix-unpack-double-free · d734466c
```
Fix unpack double free
```
  Edward Thomson committed Dec 23, 2017
  d734466c Browse Files
- Fix unpack double free · c3514b0b
```
If an element has been cached, but then the call to
packfile_unpack_compressed() fails, the very next thing that happens is
that its data is freed and then the element is not removed from the
cache, which frees the data again.

This change sets obj->data to NULL to avoid the double-free. It also
stops trying to resolve deltas after two continuous failed rounds of
resolution, and adds a test for this.
```
  lhchavez committed Dec 23, 2017
  c3514b0b Browse Files
- Merge pull request #4430 from tiennou/fix/openssl-x509-leak · 9f7ad3c5
```
Free OpenSSL peer certificate
```
  Edward Thomson committed Dec 23, 2017
  9f7ad3c5 Browse Files
- Merge pull request #4435 from lhchavez/ubsan-shift-overflow · 30d91760
```
libFuzzer: Prevent a potential shift overflow
```
  Edward Thomson committed Dec 23, 2017
  30d91760 Browse Files
- Merge pull request #4402 from libgit2/ethomson/iconv · 1ddc57b3
```
cmake: let USE_ICONV be optional on macOS
```
  Edward Thomson committed Dec 23, 2017
  1ddc57b3 Browse Files
- Merge pull request #4429 from novalis/delete-modify-submodule-merge · 06f3aa5f
```
Do not attempt to check out submodule as blob when merging a submodule modify/deltete conflict
```
  Edward Thomson committed Dec 23, 2017
  06f3aa5f Browse Files
20 Dec, 2017 10 commits

tests: add GITTEST_SLOW env var check · 456e5218

Writing very large files may be slow, particularly on inefficient
filesystems and when running instrumented code to detect invalid memory
accesses (eg within valgrind or similar tools).

Introduce `GITTEST_SLOW` so that tests that are slow can be skipped by
the CI system.

committed Dec 20, 2017

456e5218 Browse Files

hash: commoncrypto hash should support large files · bdb54214

Teach the CommonCrypto hash mechanisms to support large files.  The hash
primitives take a `CC_LONG` (aka `uint32_t`) at a time.  So loop to give
the hash function at most an unsigned 32 bit's worth of bytes until we
have hashed the entire file.

committed Dec 20, 2017

bdb54214 Browse Files

hash: win32 hash mechanism should support large files · a89560d5

Teach the win32 hash mechanisms to support large files.  The hash
primitives take at most `ULONG_MAX` bytes at a time.  Loop, giving the
hash function the maximum supported number of bytes, until we have
hashed the entire file.

committed Dec 20, 2017

a89560d5 Browse Files

odb_loose: reject objects that cannot fit in memory · 3e6533ba

Check the size of objects being read from the loose odb backend and
reject those that would not fit in memory with an error message that
reflects the actual problem, instead of error'ing later with an
unintuitive error message regarding truncation or invalid hashes.

committed Dec 20, 2017

3e6533ba Browse Files

zstream: use UINT_MAX sized chunks · 8642feba

Instead of paging to zlib in INT_MAX sized chunks, we can give it
as many as UINT_MAX bytes at a time. zlib doesn't care how big
a buffer we give it, this simply results in fewer calls into zlib.

committed Dec 20, 2017

8642feba Browse Files

odb: support large loose objects · ddefea75

zlib will only inflate/deflate an `int`s worth of data at a time.
We need to loop through large files in order to ensure that we inflate
the entire file, not just an `int`s worth of data.  Thankfully, we
already have this loop in our `git_zstream` layer.  Handle large objects
using the `git_zstream`.

committed Dec 20, 2017

ddefea75 Browse Files

object: introduce git_object_stringn2type · d1e44655

Introduce an internal API to get the object type based on a
length-specified (not null terminated) string representation.  This can
be used to compare the (space terminated) object type name in a loose
object.

Reimplement `git_object_string2type` based on this API.

committed Dec 20, 2017

d1e44655 Browse Files

odb: test loose reading/writing large objects · dacc3291

Introduce a test for very large objects in the ODB.  Write a large
object (5 GB) and ensure that the write succeeds and provides us the
expected object ID.  Introduce a test that writes that file and
ensures that we can subsequently read it.

committed Dec 20, 2017

dacc3291 Browse Files

util: introduce `git__prefixncmp` and consolidate implementations · 86219f40

Introduce `git_prefixncmp` that will search up to the first `n`
characters of a string to see if it is prefixed by another string.
This is useful for examining if a non-null terminated character
array is prefixed by a particular substring.

Consolidate the various implementations of `git__prefixcmp` around a
single core implementation and add some test cases to validate its
behavior.

committed Dec 20, 2017

86219f40 Browse Files

zstream: treat `Z_BUF_ERROR` as non-fatal · b7d36ef4

zlib will return `Z_BUF_ERROR` whenever there is more input to inflate
or deflate than there is output to store the result.  This is normal for
us as we iterate through the input, particularly with very large input
buffers.

committed Dec 20, 2017

b7d36ef4 Browse Files

19 Dec, 2017 1 commit
- Merge pull request #4449 from libgit2/charliesome/git-authors-jonathan-tan · a0867242
```
Add Jonathan Tan to git.git-authors
```
  Edward Thomson committed Dec 19, 2017
  a0867242 Browse Files
18 Dec, 2017 1 commit
- Add Jonathan Tan to git.git-authors · 1ee0628d
```
Jonathan has consented via email to have his contributions to git reused in libgit2
```
  Charlie Somerville committed Dec 19, 2017
  1ee0628d Browse Files
16 Dec, 2017 1 commit
- Merge pull request #4447 from pks-t/pks/diff-file-contents-refcount-blob · fa8cf14f
```
diff_file: properly refcount blobs when initializing file contents
```
  Edward Thomson committed Dec 16, 2017
  fa8cf14f Browse Files
15 Dec, 2017 5 commits

openssl: free the peer certificate · 8be2a790

Per SSL_get_peer_certificate docs:
```
The reference count of the X509 object is incremented by one, so that it will not be destroyed when the session containing the peer certificate is freed. The X509 object must be explicitly freed using X509_free().
```

committed Dec 16, 2017

8be2a790 Browse Files

openssl: merge all the exit paths of verify_server_cert · 2518eb81
```
This makes it easier to cleanup allocated resources on exit.
```
Etienne Samson committed Dec 16, 2017
2518eb81 Browse Files
Simplified overflow condition · 53f2c6b1
lhchavez committed Dec 15, 2017

53f2c6b1 Browse Files
Merge pull request #4432 from lhchavez/fix-missing-trailer · 2482559d
```
libFuzzer: Fix missing trailer crash
```
Patrick Steinhardt committed Dec 15, 2017
2482559d Browse Files

diff_file: properly refcount blobs when initializing file contents · 2388a9e2

When initializing a `git_diff_file_content` from a source whose data is
derived from a blob, we simply assign the blob's pointer to the
resulting struct without incrementing its refcount. Thus, the structure
can only be used as long as the blob is kept alive by the caller.

Fix the issue by using `git_blob_dup` instead of a direct assignment.
This function will increment the refcount of the blob without allocating
new memory, so it does exactly what we want. As
`git_diff_file_content__unload` already frees the blob when
`GIT_DIFF_FLAG__FREE_BLOB` is set, we don't need to add new code
handling the free but only have to set that flag correctly.

committed Dec 15, 2017

2388a9e2 Browse Files