- 03 Aug, 2020 6 commits
-
-
As part of a push towards more inclusive language, git is reconsidering using "master" as the default branch name. As a first step, this setting will be configurable with the `init.defaultBranch` configuration option. Honor this during repository initialization. During initialization, we will create an initial branch: 1. Using the `initial_head` setting, if specified; 2. Using the `HEAD` configured in a template, if it exists; 3. Using the `init.defaultBranch` configuration option, if it is set; or 4. Using `master` in the absence of additional configuration.
Edward Thomson committed -
sanitizer ci: skip negotiate tests
Edward Thomson committed -
We don't build with SPNEGO enabled on our focal-based sanitizer builds, so we need to disable the negotiate tests.
Edward Thomson committed -
Add CI support for Memory and UndefinedBehavior Sanitizers
Edward Thomson committed -
Access HEAD via the refdb backends
Edward Thomson committed -
config_entries: Avoid excessive map operations
Edward Thomson committed
-
- 13 Jul, 2020 1 commit
-
-
When appending config entries, we currently always first get the currently existing map entry and then afterwards update the map to contain the current config value. In the common scenario where keys aren't being overridden, this is the best we can do. But in case a key gets set multiple times, then we'll also perform these two map operations. In extreme cases, hashing the map keys will thus start to dominate performance. Let's optimize the pattern by using a separately allocated map entry. Currently, we always put the current list entry into the map and update it to get any overridden multivar. As these list entries are also used to iterate config entries, we cannot update them in-place in the map and are thus forced to always set the map to contain the new entry. But with a separately allocated map entry, we can now create one once per config key and insert it into the map. Whenever appending a new config value with the same key, we can now just update the map entry in-place instead of having to replace the map entry completely. This reduces calls to the hashing function by half and trades the improved runtime for one more allocation per unique config key. Given that the refactoring arguably improves code readability by splitting concerns of the `config_entry_list` type and not having to track it in two different structures, this alone would already be reason enough to take the trade. Given a pathological case of a gitconfig with 100.000 repeated keys and a section of length 10.000 characters, this reduces runtime by half from approximately 14 seconds to 7 seconds as expected.
Patrick Steinhardt committed
-
- 12 Jul, 2020 22 commits
-
-
mwindow: set limit on number of open files
Edward Thomson committed -
lhchavez committed
-
In case where a branch is getting renamed, all HEADs of the main repository and of its worktrees that point to the old branch need to get updated to point to the new branch. We already do so and have a test for this, but the test only verifies that we're able to lookup the updated HEAD, not what it contains. Let's make the test more specific by verifying the updated HEAD also has the correct updated symbolic target.
Patrick Steinhardt committed -
With the last user of `git_reference__read_head` gone, let's remove it as it's been reading references without consulting the refdb backends.
Patrick Steinhardt committed -
The function `git_repository_head_for_worktree` currently uses `git_reference__read_head` to directly read a given worktree's HEAD from the filesystem. This is broken in case the repository uses a different refdb implementation than the filesystem-based one, so let's instead open the worktree as a real repository and use `git_reference_lookup`. This also fixes the case where the worktree's HEAD is not a symref, but a detached HEAD, which would have resulted in an error previously.
Patrick Steinhardt committed -
The function `git_repository_foreach_head` is broken, as it directly interacts with the on-disk representation of the reference database, thus assuming that no other refdb is used for the given repository. As this is an internal function only and all users have been replaced, let's remove this function.
Patrick Steinhardt committed -
We currently determine whether a branch is checked out via `git_repository_foreach_head`. As this function reads references directly from the disk, it breaks our refdb abstraction in case the repository uses a different reference backend implementation than the filesystem-based one. So let's use `git_repository_foreach_worktree` instead -- while it's less efficient, it is at least correct in all corner cases.
Patrick Steinhardt committed -
When renaming a reference, we need to iterate over every HEAD and potentially update it in case it is a symbolic reference pointing to the previous name of the renamed reference. Most importantly, this doesn't only include HEADs from the repo we're renaming the reference in, but we also need to iterate over HEADs from linked worktrees. In order to update the HEADs, we directly read them from the worktree's gitdir and thus assume that both repository and worktrees use the filesystem-based reference backend. But this breaks as soon as one got a repository with a different refdb and breaks our own abstractions. So let's instead update HEAD references via the refdb by first opening each worktree as a repository and then using the usual functions to read and update HEADs. This is a lot less efficient than the current code, but it's not like we can really help this: going via the refdb is mandatory.
Patrick Steinhardt committed -
Given a Git repository, it's non-trivial to iterate over all worktrees that are associated with it, including the "main" repository. This commit adds a new internal function `git_repository_foreach_worktree` that does this for us.
Patrick Steinhardt committed -
refdb: a set of preliminary refactorings for the reftable backend
Edward Thomson committed -
To determine whether another reflog entry needs to be written for HEAD on a reference update, we need to see whether HEAD directly or indirectly points to the reference we're updating. The resolve logic is currently completely unbounded except an error occurs, which effectively means that we'd be spinning forever in case we have a symref loop in the repository refdb. Let's fix the issue by using `git_refdb_resolve` instead, which is always bounded.
Patrick Steinhardt committed -
The refs code currently has a second implementation that resolves references in order to find any final symbolic reference pointing to a nonexistent target branch. As we've just extended `git_refdb_resolve` to also return such references, let's use that one instead in order to reduce code duplication.
Patrick Steinhardt committed -
In some cases, resolving references requires us to also know about the final symbolic reference that's pointing to a nonexistent branch, e.g. in an empty repository where the main branch is yet unborn but HEAD already points to it. Right now, the resolving logic is thus split up into two, where one is the new refdb implementation and the second one is an ad-hoc implementation inside "refs.c". Let's extend `git_refdb_resolve` to also return such final dangling references pointing to nonexistent branches so we can deduplicate the resolving logic.
Patrick Steinhardt committed -
Resolving of symbolic references is currently implemented inside the "refs" layer. As a result, it's hard to call this function from low-level parts that only have a refdb available, but no repository, as the "refs" layer always operates on the repository-level. So let's move the function into the generic "refdb" implementation to lift this restriction.
Patrick Steinhardt committed -
CMake modernization pt2
Patrick Steinhardt committed -
There's two tests that create a commit signature, but never make any use of it. Let's remove these to avoid any confusion.
Patrick Steinhardt committed -
The logic to determine whether a reflog entry should be for the HEAD reference is non-trivial. Currently, the only user of this is the filesystem-based refdb, but with the advent of the reftable refdb we're going to add a second user that's interested in having the same behaviour. Let's pull out a new function that checks whether a given reference should cause a entry to be written to the HEAD reflog as a preparatory step.
Patrick Steinhardt committed -
The logic to determine whether a reflog should be written is non-trivial. Currently, the only user of this is the filesystem-based refdb, but with the advent of the reftable refdb we're going to add a second user that's interested in having the same behaviour. Let's pull out a new function that checks whether a given reference should cause a reflog to be written as a preparatory step.
Patrick Steinhardt committed -
In the past, we've imported the CheckPrototypeDefinition into our own module directory as it wasn't yet available in all supported CMake versions. Now that we require at least CMake v3.5, we don't need to bundle it anymore as it's included with the distribution already. Let's drop the included modules and always use upstream's version.
Patrick Steinhardt committed -
We set up some compile definitions as part of our src/CMakeLists.txt. While the definitions are global, we really only need them as part of the git2internal target which compiles all the objects. Let's thus use `target_compile_definitions` instead of `add_definitions`.
Patrick Steinhardt committed -
Modern CMake is usually target-driven in that a target is first defined and then the likes of `target_sources`, `target_include_directories` etc. are used to further populate the target. We still use old-style CMake, where we first set up a set of variables and then populate the target in a single call. Let's migrate to modern CMake usage by starting to populate the sources of our git2internal target piece-by-piece. While this is a small step, it allows us to convert to target-based build instructions piece-by-piece.
Patrick Steinhardt committed -
We currently do not set up a project version within CMake, meaning that it can't be use by other projects including libgit2 as a sub-project and also not by other tools like IDEs. This commit changes this to always set up a project version, but instead of extracting it from the "version.h" header we now set it up directly. This is mostly to avoid mis-use of the previous `LIBGIT2_VERSION` variables, as we should now always use the `libgit2_VERSION` ones that are set up by CMake if one provides the "VERSION" keyword to the `project()` call. While this is one more moving target we need to adjust on releases, this commit also adjusts our release script to verify that the project version was incremented as expected.
Patrick Steinhardt committed
-
- 09 Jul, 2020 4 commits
-
-
This change adds two new build targets: MSan and UBSan. This is because even though OSS-Fuzz is great and adds a lot of coverage, it only does that for the fuzz targets, so the rest of the codebase is not necessarily run with the Sanitizers ever :( So this change makes sure that MSan/UBSan warnings don't make it into the codebase. As part of this change, the Ubuntu focal container is introduced. It builds mbedTLS and libssh2 as debug libraries into /usr/local and as MSan-enabled libraries into /usr/local/msan. This latter part is needed because MSan requires the binary and all its dependent libraries to be built with MSan support so that memory allocations and deallocations are tracked correctly to avoid false positives.
lhchavez committed -
Make the tests run cleanly under UndefinedBehaviorSanitizer
Edward Thomson committed -
Make the tests pass cleanly with MemorySanitizer
Edward Thomson committed -
Enable building git2.rc resource script with GCC
Edward Thomson committed
-
- 02 Jul, 2020 1 commit
-
-
Make NTLMClient Memory and UndefinedBehavior Sanitizer-clean
Edward Thomson committed
-
- 01 Jul, 2020 3 commits
-
-
Fix the default LIBGIT2_FILENAME for GNU windres
Alexander Ovchinnikov committed -
Alexander Ovchinnikov committed
-
Alexander Ovchinnikov committed
-
- 30 Jun, 2020 3 commits
-
-
This change makes the code pass the libgit2 tests cleanly when MSan/UBSan are enabled. Notably: * Changes malloc/memset combos into calloc for easier auditing. * Makes `write_buf` return early if the buffer length is empty to avoid arithmetic with NULL pointers (which UBSan does not like). * Initializes a few arrays that were sometimes being read before being written to.
lhchavez committed -
This change: * Initializes a few variables that were being read before being initialized. * Includes https://github.com/madler/zlib/pull/393. As such, it only works reliably with `-DUSE_BUNDLED_ZLIB=ON`.
lhchavez committed -
This change makes the tests run cleanly under `-fsanitize=undefined,nullability` and comprises of: * Avoids some arithmetic with NULL pointers (which UBSan does not like). * Avoids an overflow in a shift, due to an uint8_t being implicitly converted to a signed 32-bit signed integer after being shifted by a 32-bit signed integer. * Avoids a unaligned read in libgit2. * Ignores unaligned reads in the SHA1 library, since it only happens on Intel processors, where it is _still_ undefined behavior, but the semantics are moderately well-understood. Of notable omission is `-fsanitize=integer`, since there are lots of warnings in zlib and the SHA1 library which probably don't make sense to fix and I could not figure out how to silence easily. libgit2 itself also has ~100s of warnings which are mostly innocuous (e.g. use of enum constants that only fit on an `uint32_t`, but there is no way to do that in a simple fashion because the data type chosen for enumerated types is implementation-defined), and investigating whether there are worrying warnings would need reducing the noise significantly.
lhchavez committed
-