- 03 Jan, 2018 1 commit
-
-
When generating a diff between two trees, for each file that is to be diffed we have to determine whether it shall be treated as text or as binary files. While git has heuristics to determine which kind of diff to generate, users can also that default behaviour by setting or unsetting the 'diff' attribute for specific files. Because of that, we have to query gitattributes in order to determine how to diff the current files. Instead of hitting the '.gitattributes' file every time we need to query an attribute, which can get expensive especially on networked file systems, we try to cache them instead. This works perfectly fine for every '.gitattributes' file that is found, but we hit cache invalidation problems when we determine that an attribuse file is _not_ existing. We do create an entry in the cache for missing '.gitattributes' files, but as soon as we hit that file again we invalidate it and stat it again to see if it has now appeared. In the case of diffing large trees with each other, this behaviour is very suboptimal. For each pair of files that is to be diffed, we will repeatedly query every directory component leading towards their respective location for an attributes file. This leads to thousands or even hundreds of thousands of wasted syscalls. The attributes cache already has a mechanism to help in that scenario in form of the `git_attr_session`. As long as the same attributes session is still active, we will not try to re-query the gitmodules files at all but simply retain our currently cached results. To fix our problem, we can create a session at the top-most level, which is the initialization of the `git_diff` structure, and use it in order to look up the correct diff driver. As the `git_diff` structure is used to generate patches for multiple files at once, this neatly solves our problem by retaining the session until patches for all files have been generated. The fix has been tested with linux.git by calling `git_diff_tree_to_tree` and `git_diff_to_buf` with v4.10^{tree} and v4.14^{tree}. | time | .gitattributes stats without fix | 33.201s | 844614 with fix | 30.327s | 4441 While execution only improved by roughly 10%, the stat(3) syscalls for .gitattributes files decreased by 99.5%. The benchmarks were quite simple with best-of-three timings on Linux ext4 systems. One can assume that for network based file systems the performance gain will be a lot larger due to a much higher latency.
Patrick Steinhardt committed
-
- 30 Nov, 2017 1 commit
-
-
The macro `DIFF_FLAG_SET` can be used to set or unset a flag by modifying the diff's bitmask. While the case of setting the flag is handled correctly, the case of unsetting the flag was not. Instead of inverting the flags, we are inverting the value which is used to decide whether we want to set or unset the bits. The value being used here is a simple `bool` which is `false`. As that is being uplifted to `int` when getting the bitwise-complement, we will end up retaining all bits inside of the bitmask. As that's only ever used to set `GIT_DIFF_IGNORE_CASE`, we were actually always ignoring case for generated diffs. Fix that by instead getting the bitwise-complement of `FLAG`, not `VAL`.
Patrick Steinhardt committed
-
- 18 Nov, 2017 1 commit
-
-
Strict aliasing rules dictate that for most data types, you are not allowed to cast them to another data type and then access the casted pointers. While this works just fine for most compilers, technically we end up in undefined behaviour when we hurt that rule. Our current refcounting code makes heavy use of casting and thus violates that rule. While we didn't have any problems with that code, Travis started spitting out a lot of warnings due to a change in their toolchain. In the refcounting case, the code is also easy to fix: as all refcounting-statements are actually macros, we can just access the `rc` field directly instead of casting. There are two outliers in our code where that doesn't work. Both the `git_diff` and `git_patch` structures have specializations for generated and parsed diffs/patches, which directly inherit from them. Because of that, the refcounting code is only part of the base structure and not of the children themselves. We can help that by instead passing their base into `GIT_REFCOUNT_INC`, though.
Patrick Steinhardt committed
-
- 03 Jul, 2017 1 commit
-
-
Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Patrick Steinhardt committed
-
- 29 Dec, 2016 1 commit
-
-
Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Edward Thomson committed
-
- 24 Aug, 2016 1 commit
-
-
Ensure that `git_patch_from_diff` can return the patch for parsed diffs, not just generate a patch for a generated diff.
Edward Thomson committed
-
- 26 May, 2016 3 commits
-
-
Parse diff files into a `git_diff` structure.
Edward Thomson committed -
Edward Thomson committed
-
Now that `git_diff_delta` data can be produced by reading patch file data, which may have an abbreviated oid, allow consumers to know that the id is abbreviated.
Edward Thomson committed
-
- 03 May, 2016 1 commit
-
-
When determining diffs between two iterators we may need to recurse into an unmatched directory for the "new" iterator when it is either a prefix to the current item of the "old" iterator or when untracked/ignored changes are requested by the user and the directory is untracked/ignored. When advancing into the directory and no files are found, we will get back `GIT_ENOTFOUND`. If so, we simply skip the directory, handling resulting unmatched old items in the next iteration. The other case of `iterator_advance_into` returning either `GIT_NOERROR` or any other error but `GIT_ENOTFOUND` will be handled by the caller, which will now either compare the first directory entry of the "new" iterator in case of `GIT_ENOERROR` or abort on other cases. Improve readability of the code to make the above logic more clear.
Patrick Steinhardt committed
-
- 24 Mar, 2016 1 commit
-
-
Remove some unused functions, refactor some ugliness.
Edward Thomson committed
-
- 23 Mar, 2016 2 commits
-
-
When a directory is removed out from underneath us, stop trying to manipulate it.
Edward Thomson committed -
Drop some of the layers of indirection between the workdir and the filesystem iterators. This makes the code a little bit easier to follow, and reduces the number of unnecessary allocations a bit as well. (Prior to this, when we filter entries, we would allocate them, filter them and then free them; now we do the filtering before allocation.) Also, rename `git_iterator_advance_over_with_status` to just `git_iterator_advance_over`. Mostly because it's a fucking long-ass function name otherwise.
Edward Thomson committed
-
- 11 Feb, 2016 1 commit
-
-
Arthur Schreiber committed
-
- 01 Dec, 2015 1 commit
-
-
When formatting a patch as email we do not include the commit's message in the formatted patch output. Implement this and add a test that verifies behavior.
Patrick Steinhardt committed
-
- 23 Nov, 2015 1 commit
-
-
When examining the working directory and determining whether it's up-to-date, only consider the nanoseconds in the index entry when built with `GIT_USE_NSEC`. This prevents us from believing that the working directory is always dirty when the index was originally written with a git client that uinderstands nsecs (like git 2.x).
Edward Thomson committed
-
- 02 Nov, 2015 1 commit
-
-
Jason Haslam committed
-
- 28 Oct, 2015 1 commit
-
-
Vicent Marti committed
-
- 22 Oct, 2015 1 commit
-
-
Although our index contains the literal time present in the index, we do not read nanoseconds from disk, and thus we should not use them in any comparisons, lest we always think our working directory is dirty. Guard this behind a `GIT_USE_NSECS` for future improvement.
Edward Thomson committed
-
- 02 Oct, 2015 1 commit
-
-
Axel Rasmussen committed
-
- 19 Sep, 2015 2 commits
-
-
Axel Rasmussen committed
-
Axel Rasmussen committed
-
- 12 Sep, 2015 1 commit
-
-
When we're not doing pathspec matching, we let the iterator handle file matching for us. However, we can only trust the iterator to return *files* that match the pattern, because the iterator must return directories that are not strictly in the pathlist, but that are the parents of files that match the pattern, so that diff can later recurse into them. Thus, diff must examine non-files explicitly before including them in the delta list.
Edward Thomson committed
-
- 30 Aug, 2015 1 commit
-
-
When using literal pathspecs in diff with `GIT_DIFF_DISABLE_PATHSPEC_MATCH` turn on the faster iterator pathlist handling. Updates iterator pathspecs to include directory prefixes (eg, `foo/`) for compatibility with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`.
Edward Thomson committed
-
- 28 Aug, 2015 2 commits
-
-
Edward Thomson committed
-
Edward Thomson committed
-
- 30 Jun, 2015 1 commit
-
-
Pierre-Olivier Latour committed
-
- 26 Jun, 2015 1 commit
-
-
When diffing the index with the workdir and GIT_DIFF_UPDATE_INDEX has been passed, the previous implementation was always writing the index to disk even if it wasn't modified.
Pierre-Olivier Latour committed
-
- 23 Jun, 2015 1 commit
-
-
If an index entry for a file that is not in HEAD is in conflicted state, when diffing HEAD with the index, the status field of the corresponding git_diff_delta was incorrectly reported as GIT_DELTA_ADDED instead of GIT_DELTA_CONFLICTED. This was due to handle_unmatched_new_item() initially setting the status to GIT_DELTA_CONFLICTED but then overriding it later with GIT_DELTA_ADDED.
Pierre-Olivier Latour committed
-
- 22 Jun, 2015 1 commit
-
-
When a file on the workdir has the same or a newer timestamp than the index, we need to perform a full check of the contents, as the update of the file may have happened just after we wrote the index. The iterator changes are such that we can reach inside the workdir iterator from the diff, though it may be better to have an accessor instead of moving these structs into the header.
Carlos Martín Nieto committed
-
- 20 Jun, 2015 1 commit
-
-
When updating the index during a diff, preserve the original mode, which prevents us from dropping the mode to what we have interpreted as on our system (eg, what the working directory claims it to be, which may be a lie on some systems.)
Edward Thomson committed
-
- 28 May, 2015 7 commits
-
-
Mark the `old_file` and `new_file` sides of a delta with a new bit, `GIT_DIFF_FLAG_EXISTS`, that introduces that a particular side of the delta exists in the diff. This is useful for indicating whether a working directory item exists or not, in the presence of a conflict. Diff users may have previously used DELETED to determine this information.
Edward Thomson committed -
Edward Thomson committed
-
It's not always obvious the mapping between stage level and conflict-ness. More importantly, this can lead otherwise sane people to write constructs like `if (!git_index_entry_stage(entry))`, which (while technically correct) is unreadable. Provide a nice method to help avoid such messy thinking.
Edward Thomson committed -
Since a diff entry only concerns a single entry, zero the information for the index side of a conflict. (The index entry would otherwise erroneously include the lowest-stage index entry - generally the ancestor of a conflict.) Test that during status, the index side of the conflict is empty.
Edward Thomson committed -
Make sure that we provide a blanked nitem side when the item does not exist in the working directory.
Edward Thomson committed -
When diffing against an index, return a new `GIT_DELTA_CONFLICTED` delta type for items that are conflicted. For a single file path, only one delta will be produced (despite the fact that there are multiple entries in the index). Index iterators now have the (optional) ability to return conflicts in the index. Prior to this change, they would be omitted, and callers (like diff) would omit conflicted index entries entirely.
Edward Thomson committed -
Wrap the iterator current / advance functions so that we can extend them, but also handle GIT_ITEROVER cases in the iterator funcs instead of the callers.
Edward Thomson committed
-
- 04 May, 2015 1 commit
-
-
When checking out with a case-insensitive working directory, we want to change the case of items in the working directory to reflect changes that occured in the checkout target. Diff now has an option to break case-changing renames into delete/add.
Edward Thomson committed
-
- 15 Apr, 2015 1 commit
-
-
In the prior implementation, enabling GIT_DIFF_UPDATE_INDEX would overwrite entries in the index with the ones generated from scanning the working if the OID was the same. Because this OID comparison ignores file modes, this means an file in the workdir with only an exec bit difference with the one in the index would end up being overwritten, resulting in the exec bit being loss. There might be other related bugs but the fix of comparing OIDs and file modes should address them all.
Pierre-Olivier Latour committed
-