Commits · 62d492dee448d98ef61d33680bbd7de614ce8fd8 · lvzhengyang / git2

26 Sep, 2021 1 commit

oidarray: introduce `git_oidarray_dispose` · 0bd132ab

Since users are disposing the _contents_ of the oidarray, not freeing
the oidarray itself, the proper cleanup function is
`git_oidarray_dispose`.  Deprecate `git_oidarray_free`.

committed 3 years ago

0bd132ab Browse File

15 Sep, 2021 1 commit

merge: Check file mode when resolving renames. · 479a38bf

When determining if ours or theirs changed, we check the oids but not
their respective file modes. This can lead to merges introducing incorrect
file mode changes (eg., in a revert). A simple linear example might be:

commit A - introduces file `foo` with chmod 0755
commit B - updates some unrelated file
commit C - renames `foo` to `bar` and chmod 0644

If B is reverted, `bar` will unexpectedly acquire mode 0755.

committed 3 years ago

479a38bf Browse File

09 Sep, 2021 1 commit
- Fix coding style for pointer · 379c4646
```
Make some syntax change to follow coding style.
```
  punkymaniac committed 3 years ago
  379c4646 Browse File
27 Jul, 2021 2 commits

graph: Create `git_graph_reachable_from_any()` · ce5400cd

This change introduces a new API function
`git_graph_reachable_from_any()`, that answers the question whether a
commit is reachable from any of the provided commits through following
parent edges.

This function can take advantage of optimizations provided by the
existence of a `commit-graph` file, since it makes it faster to know
whether, given two commits X and Y, X cannot possibly be an reachable
from Y.

Part of: #5757

committed 3 years ago

ce5400cd Browse File

commit-graph: Introduce `git_commit_list_generation_cmp` · 6f544140

This change makes calculations of merge-bases a bit faster when there
are complex graphs and the commit times cause visiting nodes multiple
times. This is done by visiting the nodes in the graph in reverse
generation order when the generation number is available instead of
commit timestamp. If the generation number is missing in any pair of
commits, it can safely fall back to the old heuristic with no negative
side-effects.

Part of: #5757

committed 3 years ago

6f544140 Browse File

19 Jul, 2021 1 commit
- merge: don't try to malloc(0) · 31e84edb
  Edward Thomson committed 3 years ago
  
  31e84edb Browse File
03 Mar, 2021 1 commit
- merge: Check insert_head_ids error in create_virtual_base · dc1095a5
```
insert_head_ids can fail due to allocation error
```
  panda committed 3 years ago
  dc1095a5 Browse File
27 Nov, 2020 1 commit
- merge: use GIT_ASSERT · c59fbafd
  Edward Thomson committed 4 years ago
  
  c59fbafd Browse File
09 Jun, 2020 2 commits

tree-wide: do not compile deprecated functions with hard deprecation · c6184f0c

When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor
definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h"
header to be empty. As a result, no function declarations are made
available to callers, but the implementations are still available to
link against. This has the problem that function declarations also
aren't visible to the implementations, meaning that the symbol's
visibility will not be set up correctly. As a result, the resulting
library may not expose those deprecated symbols at all on some platforms
and thus cause linking errors.

Fix the issue by conditionally compiling deprecated functions, only.
While it becomes impossible to link against such a library in case one
uses deprecated functions, distributors of libgit2 aren't expected to
pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually
define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real"
hard deprecation still makes sense in the context of CI to test we don't
use deprecated symbols ourselves and in case a dependant uses libgit2 in
a vendored way and knows it won't ever use any of the deprecated symbols
anyway.

committed 4 years ago

c6184f0c Browse File

tree-wide: mark local functions as static · a6c9e0b3

We've accumulated quite some functions which are never used outside of
their respective code unit, but which are lacking the `static` keyword.
Add it to reduce their linkage scope and allow the compiler to optimize
better.

committed 4 years ago

a6c9e0b3 Browse File

01 Jun, 2020 1 commit
- git_pool_init: handle failure cases · 0f35efeb
```
Propagate failures caused by pool initialization errors.
```
  Edward Thomson committed 4 years ago
  0f35efeb Browse File
01 Apr, 2020 1 commit

merge: cache negative cache results for similarity metrics · 4dfcc50f

When computing renames, we cache the hash signatures for each of the
potentially conflicting entries so that we do not need to repeatedly
read the file and can at least halfway efficiently determine whether two
files are similar enough to be deemed a rename. In order to make the
hash signatures meaningful, we require at least four lines of data to be
present, resulting in at least four different hashes that can be
compared. Files that are deemed too small are not cached at all and
will thus be repeatedly re-hashed, which is usually not a huge issue.

The issue with above heuristic is in case a file does _not_ have at
least four lines, where a line is anything separated by a consecutive
run of "\n" or "\0" characters. For example "a\nb" is two lines, but
"a\0\0b" is also just two lines. Taken to the extreme, a file that has
megabytes of consecutive space- or NUL-only may also be deemed as too
small and thus not get cached. As a result, we will repeatedly load its
blob, calculate its hash signature just to finally throw it away as we
notice it's not of any value. When you've got a comparitively big file
that you compare against a big set of potentially renamed files, then
the cost simply expodes.

The issue can be trivially fixed by introducing negative cache entries.
Whenever we determine that a given blob does not have a meaningful
representation via a hash signature, we store this negative cache marker
and will from then on not hash it again, but also ignore it as a
potential rename target. This should help the "normal" case already
where you have a lot of small files as rename candidates, but in the
above scenario it's savings are extraordinarily high.

To verify we do not hit the issue anymore with described solution, this
commit adds a test that uses the exact same setup described above with
one 50 megabyte blob of '\0' characters and 1000 other files that get
renamed. Without the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real    11m48.377s
user    11m11.576s
sys     0m35.187s

And with the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null
real    0m1.972s
user    0m1.851s
sys     0m0.118s

So this represents a ~350-fold performance improvement, but it obviously
depends on how many files you have and how big the blob is. The test
number were chosen in a way that one will immediately notice as soon as
the bug resurfaces.

committed 4 years ago

4dfcc50f Browse File

22 Nov, 2019 1 commit
- blob: use `git_object_size_t` for object size · 4334b177
```
Instead of using a signed type (`off_t`) use a new `git_object_size_t`
for the sizes of objects.
```
  Edward Thomson committed 5 years ago
  4334b177 Browse File
10 Oct, 2019 1 commit

refs: fix locks getting forcibly removed · 3335a034

The flag GIT_FILEBUF_FORCE currently does two things:
     1. It will cause the filebuf to create non-existing leading
        directories for the file that is about to be written.
     2. It will forcibly remove any pre-existing locks.
While most call sites actually do want (1), they do not want to
remove pre-existing locks, as that renders the locking mechanisms
effectively useless.
Introduce a new flag `GIT_FILEBUF_CREATE_LEADING_DIRS` to
separate both behaviours cleanly from each other and convert
callers to use it instead of `GIT_FILEBUF_FORCE` to have them
honor locked files correctly.

As this conversion removes all current users of `GIT_FILEBUF_FORCE`,
this commit removes the flag altogether.

committed 5 years ago

3335a034 Browse File

23 Aug, 2019 1 commit

merge: check return value of `git_commit_list_insert` · d4fe402b

The function `git_commit_list_insert` dynamically allocates memory and
may thus fail to insert a given commit, but we didn't check for that in
several places in "merge.c".

Convert surrounding functions to return error codes and check whether
`git_commit_list_insert` was successful, returning an error if not.

committed 5 years ago

d4fe402b Browse File

24 Jun, 2019 1 commit
- merge: safely cast size of merged file for index · 9a6992c4
```
Explicitly truncate the file size to a `uint32_t`.
```
  Edward Thomson committed 5 years ago
  9a6992c4 Browse File
14 Jun, 2019 1 commit

Rename opt init functions to `options_init` · 0b5ba0d7

In libgit2 nomenclature, when we need to verb a direct object, we name
a function `git_directobject_verb`.  Thus, if we need to init an options
structure named `git_foo_options`, then the name of the function that
does that should be `git_foo_options_init`.

The previous names of `git_foo_init_options` is close - it _sounds_ as
if it's initializing the options of a `foo`, but in fact
`git_foo_options` is its own noun that should be respected.

Deprecate the old names; they'll now call directly to the new ones.

committed 5 years ago

0b5ba0d7 Browse File

10 Jun, 2019 1 commit
- merge: analysis support for bare repositories · 6d2ab2cf
  Robert Coup committed 5 years ago
  
  6d2ab2cf Browse File
15 Feb, 2019 3 commits

oidmap: introduce high-level setter for key/value pairs · 2e0a3048

Currently, one would use either `git_oidmap_insert` to insert key/value pairs
into a map or `git_oidmap_put` to insert a key only. These function have
historically been macros, which is why their syntax is kind of weird: instead of
returning an error code directly, they instead have to be passed a pointer to
where the return value shall be stored. This does not match libgit2's common
idiom of directly returning error codes.Furthermore, `git_oidmap_put` is tightly
coupled with implementation details of the map as it exposes the index of
inserted entries.

Introduce a new function `git_oidmap_set`, which takes as parameters the map,
key and value and directly returns an error code. Convert all trivial callers of
`git_oidmap_insert` and `git_oidmap_put` to make use of it.

committed 5 years ago

2e0a3048 Browse File

oidmap: introduce high-level getter for values · 9694ef20

The current way of looking up an entry from a map is tightly coupled with the
map implementation, as one first has to look up the index of the key and then
retrieve the associated value by using the index. As a caller, you usually do
not care about any indices at all, though, so this is more complicated than
really necessary. Furthermore, it invites for errors to happen if the correct
error checking sequence is not being followed.

Introduce a new high-level function `git_oidmap_get` that takes a map and a key
and returns a pointer to the associated value if such a key exists. Otherwise,
a `NULL` pointer is returned. Adjust all callers that can trivially be
converted.

committed 5 years ago

9694ef20 Browse File

maps: use uniform lifecycle management functions · 351eeff3

Currently, the lifecycle functions for maps (allocation, deallocation, resize)
are not named in a uniform way and do not have a uniform function signature.
Rename the functions to fix that, and stick to libgit2's naming scheme of saying
`git_foo_new`. This results in the following new interface for allocation:

- `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an
  error code if we ran out of memory

- `void git_<t>map_free(git_<t>map *map)` to free a map

- `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map

This commit also fixes all existing callers.

committed 5 years ago

351eeff3 Browse File

22 Jan, 2019 1 commit
- git_error: use new names in internal APIs and usage · f673e232
```
Move to the `git_error` name in the internal API for error-related
functions.
```
  Edward Thomson committed 5 years ago
  f673e232 Browse File
01 Dec, 2018 1 commit
- object_type: use new enumeration names · 168fe39b
```
Use the new object_type enumeration names within the codebase.
```
  Edward Thomson committed 6 years ago
  168fe39b Browse File
28 Nov, 2018 1 commit

khash: remove intricate knowledge of khash types · 852bc9f4

Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types,
simply use `size_t` instead. This decouples code from the khash stuff
and makes it possible to move the khash includes into the implementation
files.

committed 6 years ago

852bc9f4 Browse File

20 Oct, 2018 1 commit
- merge: don't leak the index during reloads · 32b81661
  Edward Thomson committed 6 years ago
  
  32b81661 Browse File
19 Oct, 2018 2 commits
- merge: assert that we're passed sane parameters · cb71a9ce
  Etienne Samson committed 6 years ago
  
  cb71a9ce Browse File
- merge: make analysis possible against a non-HEAD reference · 6e9fb040
```
This moves the current merge analysis code into a more generic version
that can work against any reference.

Also change the tests to check returned analysis values exactly.
```
  Etienne Samson committed 6 years ago
  6e9fb040 Browse File
10 Jun, 2018 1 commit
- Convert usage of `git_buf_free` to new `git_buf_dispose` · ecf4f33a
  Patrick Steinhardt committed 6 years ago
  
  ecf4f33a Browse File
04 Feb, 2018 2 commits

merge: virtual commit should be last argument to merge-base · 1403c612

Our virtual commit must be the last argument to merge-base: since our
algorithm pushes _both_ parents of the virtual commit, it needs to be
the last argument, since merge-base:

> Given three commits A, B and C, git merge-base A B C will compute the
> merge base between A and a hypothetical commit M

We want to calculate the merge base between the actual commit ("two")
and the virtual commit ("one") - since one actually pushes its parents
to the merge-base calculation, we need to calculate the merge base of
"two" and the parents of one.

committed 6 years ago

1403c612 Browse File

merge: reverse merge bases for recursive merge · b924df1e

When the commits being merged have multiple merge bases, reverse the
order when creating the virtual merge base.  This is for compatibility
with git's merge-recursive algorithm, and ensures that we build
identical trees.

Git does this to try to use older merge bases first.  Per 8918b0c:

> It seems to be the only sane way to do it: when a two-head merge is
> done, and the merge-base and one of the two branches agree, the
> merge assumes that the other branch has something new.
>
> If we start creating virtual commits from newer merge-bases, and go
> back to older merge-bases, and then merge with newer commits again,
> chances are that a patch is lost, _because_ the merge-base and the
> head agree on it. Unlikely, yes, but it happened to me.

committed 6 years ago

b924df1e Browse File

21 Jan, 2018 1 commit

merge: recursive uses larger conflict markers · 185b0d08

Git uses longer conflict markers in the recursive merge base - two more
than the default (thus, 9 character long conflict markers). This allows
users to tell the difference between the recursive merge conflicts and
conflicts between the ours and theirs branches.

This was introduced in git d694a17986a28bbc19e2a6c32404ca24572e400f.

Update our tests to expect this as well.

committed 6 years ago

185b0d08 Browse File

11 Nov, 2017 2 commits

merge: add error handling for index reload · e8d373c4
```
Cleans up should git_repository_index or git_index_read fail
```
Etiene Dalcol committed 7 years ago
e8d373c4 Browse File

merge: reload index before git_merge · bb9e3797

If the index in memory is different from the index on the disk,
previously merge would abort with GIT_ECONFLICT.
Reload the index before merging to fix this.

Fixes #4203

committed 7 years ago

bb9e3797 Browse File

03 Jul, 2017 1 commit

Make sure to always include "common.h" first · 0c7f49dd

Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.

This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.

This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.

committed 7 years ago

0c7f49dd Browse File

21 Jun, 2017 1 commit

merge: fix potential free of uninitialized memory · 4dc87e72

The function `merge_diff_mark_similarity_exact` may error our early and,
when it does so, free the `ours_deletes_by_oid` and
`theirs_deletes_by_oid` variables. While the first one can never be
uninitialized due to the first call actually assigning to it, the second
variable can be freed without being initialized.

Fix the issue by initializing both variables to `NULL`.

committed 7 years ago

4dc87e72 Browse File

17 May, 2017 1 commit

merge: perform exact rename detection in linear time · cee1e7af

The current exact rename detection has order n^2 complexity.
We can do better by using a map to first aggregate deletes and
using that to match deletes to adds.

This results in a substantial performance improvement for merges
with a large quantity of adds and deletes.

committed 7 years ago

cee1e7af Browse File

23 Mar, 2017 1 commit
- merge: indentation fixup · b53d834f
  Edward Thomson committed 7 years ago
  
  b53d834f Browse File
13 Feb, 2017 1 commit

repository: rename `path_repository` and `path_gitlink` · 84f56cb0

The `path_repository` variable is actually confusing to think
about, as it is not always clear what the repository actually is.
It may either be the path to the folder containing worktree and
.git directory, the path to .git itself, a worktree or something
entirely different. Actually, the intent of the variable is to
hold the path to the gitdir, which is either the .git directory
or the bare repository.

Rename the variable to `gitdir` to avoid confusion. While at it,
also rename `path_gitlink` to `gitlink` to improve consistency.

committed 7 years ago

84f56cb0 Browse File

09 Feb, 2017 1 commit
- merge: don't do rename detection on submodules · 95367366
  Edward Thomson committed 7 years ago
  
  95367366 Browse File
01 Jan, 2017 1 commit
- merge: set default rename threshold · 19ed4d0c
```
When `GIT_MERGE_FIND_RENAMES` is set, provide a default for
`rename_threshold` when it is unset.
```
  Edward Thomson committed 8 years ago
  19ed4d0c Browse File