Commits · 7d90637069e03567a07b57ccbe4cf728ab823644 · lvzhengyang / git2

09 Feb, 2018 7 commits
- odb: fix memory leaks due to not freeing hash context · a43bcd2c
  Patrick Steinhardt committed 6 years ago
  
  a43bcd2c Browse File
- odb: error when we can't create object header · 619f61a8
```
Return an error to the caller when we can't create an object header for
some reason (printf failure) instead of simply asserting.
```
  Edward Thomson committed 6 years ago
  619f61a8 Browse File
- odb: assert on logic errors when writing objects · 7ec7aa4a
```
There's no recovery possible if we're so confused or corrupted that
we're trying to overwrite our memory.  Simply assert.
```
  Edward Thomson committed 6 years ago
  7ec7aa4a Browse File
- git_odb__hashfd: propagate error on failures · 138e4c2b
  Edward Thomson committed 6 years ago
  
  138e4c2b Browse File
- git_odb__hashobj: provide errors messages on failures · 35ed256b
```
Provide error messages on hash failures: assert when given invalid
input instead of failing with a user error; provide error messages
on program errors.
```
  Edward Thomson committed 6 years ago
  35ed256b Browse File
- odb: check for alloc errors on hardcoded objects · 59d99adc
```
It's unlikely that we'll fail to allocate a single byte, but let's check
for allocation failures for good measure.  Untangle `-1` being a marker
of not having found the hardcoded odb object; use that to reflect actual
errors.
```
  Edward Thomson committed 6 years ago
  59d99adc Browse File
- odb: error when we can't alloc an object · ef902864
```
At the moment, we're swallowing the allocation failure.  We need to
return the error to the caller.
```
  Edward Thomson committed 6 years ago
  ef902864 Browse File
02 Feb, 2018 1 commit

odb: provide length and type with streaming read · 97f9a5f0

The streaming read functionality should provide the length and the type
of the object, like the normal read functionality does.

committed 6 years ago

97f9a5f0 Browse File

26 Jan, 2018 1 commit

odb: reject reading and writing null OIDs · 275f103d

The null OID (hash with all zeroes) indicates a missing object in
upstream git and is thus not a valid object ID. Add defensive
measurements to avoid writing such a hash to the object database in the
very unlikely case where some data results in the null OID. Furthermore,
add shortcuts when reading the null OID from the ODB to avoid ever
returning an object when a faulty repository may contain the null OID.

committed 6 years ago

275f103d Browse File

03 Jul, 2017 1 commit

Make sure to always include "common.h" first · 0c7f49dd

Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.

This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.

This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.

committed 7 years ago

0c7f49dd Browse File

12 Jun, 2017 1 commit

odb_read_prefix: reset error in backends loop · cb3010c5

When looking for an object by prefix, we query all the backends so that
we can ensure that there is no ambiguity.  We need to reset the `error`
value between backends; otherwise the first backend may find an object
by prefix, but subsequent backends may not.  If we do not reset the
`error` value then it will remain at `GIT_ENOTFOUND` and `read_prefix_1`
will fail, despite having actually found an object.

committed 7 years ago

cb3010c5 Browse File

15 May, 2017 2 commits

odb: fix printf formatter for git_off_t · 8d93a11c

The fields `declared_size` and `received_bytes` of the `git_odb_stream`
are both of type `git_off_t` which is defined as a signed integer. When
passing these values to a printf-style string in
`git_odb_stream__invalid_length`, though, we format these as PRIuZ,
which is unsigned.

Fix the issue by using PRIdZ instead, silencing warnings on macOS.

committed 7 years ago

8d93a11c Browse File

odb: shut up gcc warnings regarding uninitilized variables · 7776db51

The `error` variable is used as a return value in the out-section of
both `odb_read_1` and `read_prefix_1`. While the value will actually
always be initialized inside of this section, GCC fails to realize this
due to interactions with the `found` variable: if `found` is set, the
error will always be initialized. If it is not, we return early without
reaching the out-statements.

Shut up the warnings by initializing the error variable, even though it
is unnecessary.

committed 7 years ago

7776db51 Browse File

28 Apr, 2017 4 commits

odb: verify hashes in read_prefix_1 · e0973bc0

While the function reading an object from the complete OID already
verifies OIDs, we do not yet do so for reading objects from a partial
OID. Do so when strict OID verification is enabled.

committed 7 years ago

e0973bc0 Browse File

odb: improve error handling in read_prefix_1 · 14109620

The read_prefix_1 function has several return statements springled
throughout the code. As we have to free memory upon getting an error,
the free code has to be repeated at every single retrun -- which it is
not, so we have a memory leak here.

Refactor the code to use the typical `goto out` pattern, which will free
data when an error has occurred. While we're at it, we can also improve
the error message thrown when multiple ambiguous prefixes are found. It
will now include the colliding prefixes.

committed 7 years ago

14109620 Browse File

odb: add option to turn off hash verification · 35079f50

Verifying hashsums of objects we are reading from the ODB may be costly
as we have to perform an additional hashsum calculation on the object.
Especially when reading large objects, the penalty can be as high as
35%, as can be seen when executing the equivalent of `git cat-file` with
and without verification enabled. To mitigate for this, we add a global
option for libgit2 which enables the developer to turn off the
verification, e.g. when he can be reasonably sure that the objects on
disk won't be corrupted.

committed 7 years ago

35079f50 Browse File

odb: verify object hashes · 28a0741f

The upstream git.git project verifies objects when looking them up from
disk. This avoids scenarios where objects have somehow become corrupt on
disk, e.g. due to hardware failures or bit flips. While our mantra is
usually to follow upstream behavior, we do not do so in this case, as we
never check hashes of objects we have just read from disk.

To fix this, we create a new error class `GIT_EMISMATCH` which denotes
that we have looked up an object with a hashsum mismatch. `odb_read_1`
will then, after having read the object from its backend, hash the
object and compare the resulting hash to the expected hash. If hashes do
not match, it will return an error.

This obviously introduces another computation of checksums and could
potentially impact performance. Note though that we usually perform I/O
operations directly before doing this computation, and as such the
actual overhead should be drowned out by I/O. Running our test suite
seems to confirm this guess. On a Linux system with best-of-five
timings, we had 21.592s with the check enabled and 21.590s with the
ckeck disabled. Note though that our test suite mostly contains very
small blobs only. It is expected that repositories with bigger blobs may
notice an increased hit by this check.

In addition to a new test, we also had to change the
odb::backend::nonrefreshing test suite, which now triggers a hashsum
mismatch when looking up the commit "deadbeef...". This is expected, as
the fake backend allocated inside of the test will return an empty
object for the OID "deadbeef...", which will obviously not hash back to
"deadbeef..." again. We can simply adjust the hash to equal the hash of
the empty object here to fix this test.

committed 7 years ago

28a0741f Browse File

03 Mar, 2017 1 commit
- git_commit_create: freshen tree objects in commit · 52d03f37
```
Freshen the tree object that a commit points to during commit time.
```
  Edward Thomson committed 7 years ago
  52d03f37 Browse File
02 Mar, 2017 1 commit
- Honor `core.fsyncObjectFiles` · 1c04a96b
  Edward Thomson committed 7 years ago
  
  1c04a96b Browse File
29 Dec, 2016 1 commit

giterr_set: consistent error messages · 909d5494

Error messages should be sentence fragments, and therefore:

1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one

committed 8 years ago

909d5494 Browse File

14 Nov, 2016 1 commit
- common: cast precision specifiers to int · 901434b0
  Patrick Steinhardt committed 8 years ago
  
  901434b0 Browse File
05 Aug, 2016 1 commit

odb: only provide the empty tree · becadafc

Only provide the empty tree internally, which matches git's behavior.
If we provide the empty blob then any users trying to write it with
libgit2 would omit it from actually landing in the odb, which appear
to git proper as a broken repository (missing that object).

committed 8 years ago

becadafc Browse File

04 Aug, 2016 1 commit

odb: freshen existing objects when writing · 8f09a98e

When writing an object, we calculate its OID and see if it exists in the
object database.  If it does, we need to freshen the file that contains
it.

committed 8 years ago

8f09a98e Browse File

20 Jun, 2016 1 commit
- fix error message SHA truncation in git_odb__error_notfound() · 2076d329
  Sim Domingo committed 8 years ago
  
  2076d329 Browse File
26 May, 2016 1 commit

delta: move delta application to delta.c · 6a2d2f8a

Move the delta application functions into `delta.c`, next to the
similar delta creation functions.  Make the `git__delta_apply`
functions adhere to other naming and parameter style within the
library.

committed 8 years ago

6a2d2f8a Browse File

09 Mar, 2016 4 commits
- odb: Try to lookup headers in all backends before passthrough · 1bbcb2b2
  Vicent Marti committed 8 years ago
  
  1bbcb2b2 Browse File
- odb: Refactor `git_odb_expand_ids` · e78d2ac9
  Vicent Marti committed 8 years ago
  
  e78d2ac9 Browse File
- odb: Implement new helper to read types without refreshing · 4416aa77
  Vicent Marti committed 8 years ago
  
  4416aa77 Browse File
- odb: Handle corner cases in `git_odb_expand_ids` · 9a786650
```
The old implementation had two issues:

1. OIDs that were too short as to be ambiguous were not being handled
properly.

2. If the last OID to expand in the array was missing from the ODB, we
would leak a `GIT_ENOTFOUND` error code from the function.
```
  Vicent Marti committed 8 years ago
  9a786650 Browse File
08 Mar, 2016 2 commits
- git_odb_expand_ids: accept git_odb_expand_id array · 62484f52
```
Take (and write to) an array of a struct, `git_odb_expand_id`.
```
  Edward Thomson committed 8 years ago
  62484f52 Browse File
- git_odb_expand_ids: rename func, return the type · 4b1f0f79
  Edward Thomson committed 8 years ago
  
  4b1f0f79 Browse File
07 Mar, 2016 2 commits

git_odb_exists_many_prefixes: query odb for multiple short ids · 6c04269c
```
Query the object database for multiple objects at a time, given their
object ID (which may be abbreviated) and optional type.
```
Edward Thomson committed 8 years ago
6c04269c Browse File

odb: improved not found error messages · e10144ae

When looking up an abbreviated oid, show the actual (abbreviated) oid
the caller passed instead of a full (but ambiguously truncated) oid.

committed 8 years ago

e10144ae Browse File

14 Oct, 2015 2 commits

odb: Prioritize alternate backends · a0a1b19a

For most real use cases, repositories with alternates use them as main
object storage. Checking the alternate for objects before the main
repository should result in measurable speedups.

Because of this, we're changing the sorting algorithm to prioritize
alternates *in cases where two backends have the same priority*. This
means that the pack backend for the alternate will be checked before the
pack backend for the main repository *but* both of them will be checked
before any loose backends.

committed 9 years ago

a0a1b19a Browse File

odb: Be smarter when refreshing backends · 43820f20

In the current implementation of ODB backends, each backend is tasked
with refreshing itself after a failed lookup. This is standard Git
behavior: we want to e.g. reload the packfiles on disk in case they have
changed and that's the reason we can't find the object we're looking
for.

This behavior, however, becomes pathological in repositories where
multiple alternates have been loaded. Given that each alternate counts
as a separate backend, a miss in the main repository (which can
potentially be very frequent in cases where object storage comes from
the alternate) will result in refreshing all its packfiles before we
move on to the alternate backend where the object will most likely be
found.

To fix this, the code in `odb.c` has been refactored as to perform the
refresh of all the backends externally, once we've verified that the
object is nowhere to be found.

If the refresh is successful, we then perform the lookup sequentially
through all the backends, skipping the ones that we know for sure
weren't refreshed (because they have no refresh API).

The on-disk pack backend has been adjusted accordingly: it no longer
performs refreshes internally.

committed 9 years ago

43820f20 Browse File

30 Sep, 2015 1 commit

refdb and odb backends must provide `free` function · d3b29fb9

As refdb and odb backends can be allocated by client code, libgit2
can’t know whether an alternative memory allocator was used, and thus
should not try to call `git__free` on those objects.

Instead, odb and refdb backend implementations must always provide
their own `free` functions to ensure memory gets freed correctly.

committed 9 years ago

d3b29fb9 Browse File

29 Jun, 2015 1 commit
- odb: cast to long long for printf · e5f9df7b
  Edward Thomson committed 9 years ago
  
  e5f9df7b Browse File
02 Jun, 2015 1 commit
- Fixed build warnings on Xcode 6.1 · 9f3c18e2
  Pierre-Olivier Latour committed 9 years ago
  
  9f3c18e2 Browse File
13 May, 2015 2 commits

odb: reverse the default backend priorities · b0d7f329

We currently first look in the loose object dir and then in the packs
for objects. When performing operations on recent history this has a
higher likelihood of hitting, but when we deal with operations which
look further back into the past, we start spending a large amount of
time getting ENOTENT from `access`.

Reversing the priorities means that long-running operations can get to
their objects faster, as we can look at the index data we have in memory
(or rather mapped) to figure out whether we have an object, which is
faster than going out to the filesystem.

The packed backend already implements an optimistic read algorithm by
first looking at the packs we know about and only going out to disk to
referesh if the object is not found which means that in the case where
we do have the object (which will be in the majority for anything that
traverses the graph) we can avoid going to to disk entirely to determine
whether an object exists.

Operations which look at recent history may take a slight impact, but
these would be operations which look a lot less at object and thus take
less time regardless.

committed 9 years ago

b0d7f329 Browse File

odb: make the writestream's size a git_off_t · 77b339f7

Restricting files to size_t is a silly limitation. The loose backend
writes to a file directly, so there is no issue in using 63 bits for the
size.

We still assume that the header is going to fit in 64 bytes, which does
mean quite a bit smaller files due to the run-length encoding, but it's
still a much larger size than you would want Git to handle.

committed 9 years ago

77b339f7 Browse File