1. 09 Feb, 2018 7 commits
  2. 02 Feb, 2018 1 commit
  3. 26 Jan, 2018 1 commit
    • odb: reject reading and writing null OIDs · 275f103d
      The null OID (hash with all zeroes) indicates a missing object in
      upstream git and is thus not a valid object ID. Add defensive
      measurements to avoid writing such a hash to the object database in the
      very unlikely case where some data results in the null OID. Furthermore,
      add shortcuts when reading the null OID from the ODB to avoid ever
      returning an object when a faulty repository may contain the null OID.
      Patrick Steinhardt committed
  4. 03 Jul, 2017 1 commit
    • Make sure to always include "common.h" first · 0c7f49dd
      Next to including several files, our "common.h" header also declares
      various macros which are then used throughout the project. As such, we
      have to make sure to always include this file first in all
      implementation files. Otherwise, we might encounter problems or even
      silent behavioural differences due to macros or defines not being
      defined as they should be. So in fact, our header and implementation
      files should make sure to always include "common.h" first.
      
      This commit does so by establishing a common include pattern. Header
      files inside of "src" will now always include "common.h" as its first
      other file, separated by a newline from all the other includes to make
      it stand out as special. There are two cases for the implementation
      files. If they do have a matching header file, they will always include
      this one first, leading to "common.h" being transitively included as
      first file. If they do not have a matching header file, they instead
      include "common.h" as first file themselves.
      
      This fixes the outlined problems and will become our standard practice
      for header and source files inside of the "src/" from now on.
      Patrick Steinhardt committed
  5. 12 Jun, 2017 1 commit
    • odb_read_prefix: reset error in backends loop · cb3010c5
      When looking for an object by prefix, we query all the backends so that
      we can ensure that there is no ambiguity.  We need to reset the `error`
      value between backends; otherwise the first backend may find an object
      by prefix, but subsequent backends may not.  If we do not reset the
      `error` value then it will remain at `GIT_ENOTFOUND` and `read_prefix_1`
      will fail, despite having actually found an object.
      Edward Thomson committed
  6. 15 May, 2017 2 commits
    • odb: fix printf formatter for git_off_t · 8d93a11c
      The fields `declared_size` and `received_bytes` of the `git_odb_stream`
      are both of type `git_off_t` which is defined as a signed integer. When
      passing these values to a printf-style string in
      `git_odb_stream__invalid_length`, though, we format these as PRIuZ,
      which is unsigned.
      
      Fix the issue by using PRIdZ instead, silencing warnings on macOS.
      Patrick Steinhardt committed
    • odb: shut up gcc warnings regarding uninitilized variables · 7776db51
      The `error` variable is used as a return value in the out-section of
      both `odb_read_1` and `read_prefix_1`. While the value will actually
      always be initialized inside of this section, GCC fails to realize this
      due to interactions with the `found` variable: if `found` is set, the
      error will always be initialized. If it is not, we return early without
      reaching the out-statements.
      
      Shut up the warnings by initializing the error variable, even though it
      is unnecessary.
      Patrick Steinhardt committed
  7. 28 Apr, 2017 4 commits
    • odb: verify hashes in read_prefix_1 · e0973bc0
      While the function reading an object from the complete OID already
      verifies OIDs, we do not yet do so for reading objects from a partial
      OID. Do so when strict OID verification is enabled.
      Patrick Steinhardt committed
    • odb: improve error handling in read_prefix_1 · 14109620
      The read_prefix_1 function has several return statements springled
      throughout the code. As we have to free memory upon getting an error,
      the free code has to be repeated at every single retrun -- which it is
      not, so we have a memory leak here.
      
      Refactor the code to use the typical `goto out` pattern, which will free
      data when an error has occurred. While we're at it, we can also improve
      the error message thrown when multiple ambiguous prefixes are found. It
      will now include the colliding prefixes.
      Patrick Steinhardt committed
    • odb: add option to turn off hash verification · 35079f50
      Verifying hashsums of objects we are reading from the ODB may be costly
      as we have to perform an additional hashsum calculation on the object.
      Especially when reading large objects, the penalty can be as high as
      35%, as can be seen when executing the equivalent of `git cat-file` with
      and without verification enabled. To mitigate for this, we add a global
      option for libgit2 which enables the developer to turn off the
      verification, e.g. when he can be reasonably sure that the objects on
      disk won't be corrupted.
      Patrick Steinhardt committed
    • odb: verify object hashes · 28a0741f
      The upstream git.git project verifies objects when looking them up from
      disk. This avoids scenarios where objects have somehow become corrupt on
      disk, e.g. due to hardware failures or bit flips. While our mantra is
      usually to follow upstream behavior, we do not do so in this case, as we
      never check hashes of objects we have just read from disk.
      
      To fix this, we create a new error class `GIT_EMISMATCH` which denotes
      that we have looked up an object with a hashsum mismatch. `odb_read_1`
      will then, after having read the object from its backend, hash the
      object and compare the resulting hash to the expected hash. If hashes do
      not match, it will return an error.
      
      This obviously introduces another computation of checksums and could
      potentially impact performance. Note though that we usually perform I/O
      operations directly before doing this computation, and as such the
      actual overhead should be drowned out by I/O. Running our test suite
      seems to confirm this guess. On a Linux system with best-of-five
      timings, we had 21.592s with the check enabled and 21.590s with the
      ckeck disabled. Note though that our test suite mostly contains very
      small blobs only. It is expected that repositories with bigger blobs may
      notice an increased hit by this check.
      
      In addition to a new test, we also had to change the
      odb::backend::nonrefreshing test suite, which now triggers a hashsum
      mismatch when looking up the commit "deadbeef...". This is expected, as
      the fake backend allocated inside of the test will return an empty
      object for the OID "deadbeef...", which will obviously not hash back to
      "deadbeef..." again. We can simply adjust the hash to equal the hash of
      the empty object here to fix this test.
      Patrick Steinhardt committed
  8. 03 Mar, 2017 1 commit
  9. 02 Mar, 2017 1 commit
  10. 29 Dec, 2016 1 commit
  11. 14 Nov, 2016 1 commit
  12. 05 Aug, 2016 1 commit
    • odb: only provide the empty tree · becadafc
      Only provide the empty tree internally, which matches git's behavior.
      If we provide the empty blob then any users trying to write it with
      libgit2 would omit it from actually landing in the odb, which appear
      to git proper as a broken repository (missing that object).
      Edward Thomson committed
  13. 04 Aug, 2016 1 commit
  14. 20 Jun, 2016 1 commit
  15. 26 May, 2016 1 commit
  16. 09 Mar, 2016 4 commits
  17. 08 Mar, 2016 2 commits
  18. 07 Mar, 2016 2 commits
  19. 14 Oct, 2015 2 commits
    • odb: Prioritize alternate backends · a0a1b19a
      For most real use cases, repositories with alternates use them as main
      object storage. Checking the alternate for objects before the main
      repository should result in measurable speedups.
      
      Because of this, we're changing the sorting algorithm to prioritize
      alternates *in cases where two backends have the same priority*. This
      means that the pack backend for the alternate will be checked before the
      pack backend for the main repository *but* both of them will be checked
      before any loose backends.
      Vicent Marti committed
    • odb: Be smarter when refreshing backends · 43820f20
      In the current implementation of ODB backends, each backend is tasked
      with refreshing itself after a failed lookup. This is standard Git
      behavior: we want to e.g. reload the packfiles on disk in case they have
      changed and that's the reason we can't find the object we're looking
      for.
      
      This behavior, however, becomes pathological in repositories where
      multiple alternates have been loaded. Given that each alternate counts
      as a separate backend, a miss in the main repository (which can
      potentially be very frequent in cases where object storage comes from
      the alternate) will result in refreshing all its packfiles before we
      move on to the alternate backend where the object will most likely be
      found.
      
      To fix this, the code in `odb.c` has been refactored as to perform the
      refresh of all the backends externally, once we've verified that the
      object is nowhere to be found.
      
      If the refresh is successful, we then perform the lookup sequentially
      through all the backends, skipping the ones that we know for sure
      weren't refreshed (because they have no refresh API).
      
      The on-disk pack backend has been adjusted accordingly: it no longer
      performs refreshes internally.
      Vicent Marti committed
  20. 30 Sep, 2015 1 commit
    • refdb and odb backends must provide `free` function · d3b29fb9
      As refdb and odb backends can be allocated by client code, libgit2
      can’t know whether an alternative memory allocator was used, and thus
      should not try to call `git__free` on those objects.
      
      Instead, odb and refdb backend implementations must always provide
      their own `free` functions to ensure memory gets freed correctly.
      Arthur Schreiber committed
  21. 29 Jun, 2015 1 commit
  22. 02 Jun, 2015 1 commit
  23. 13 May, 2015 2 commits
    • odb: reverse the default backend priorities · b0d7f329
      We currently first look in the loose object dir and then in the packs
      for objects. When performing operations on recent history this has a
      higher likelihood of hitting, but when we deal with operations which
      look further back into the past, we start spending a large amount of
      time getting ENOTENT from `access`.
      
      Reversing the priorities means that long-running operations can get to
      their objects faster, as we can look at the index data we have in memory
      (or rather mapped) to figure out whether we have an object, which is
      faster than going out to the filesystem.
      
      The packed backend already implements an optimistic read algorithm by
      first looking at the packs we know about and only going out to disk to
      referesh if the object is not found which means that in the case where
      we do have the object (which will be in the majority for anything that
      traverses the graph) we can avoid going to to disk entirely to determine
      whether an object exists.
      
      Operations which look at recent history may take a slight impact, but
      these would be operations which look a lot less at object and thus take
      less time regardless.
      Carlos Martín Nieto committed
    • odb: make the writestream's size a git_off_t · 77b339f7
      Restricting files to size_t is a silly limitation. The loose backend
      writes to a file directly, so there is no issue in using 63 bits for the
      size.
      
      We still assume that the header is going to fit in 64 bytes, which does
      mean quite a bit smaller files due to the run-length encoding, but it's
      still a much larger size than you would want Git to handle.
      Carlos Martín Nieto committed