1. 23 Feb, 2022 2 commits
  2. 11 Nov, 2021 1 commit
  3. 17 Oct, 2021 1 commit
    • str: introduce `git_str` for internal, `git_buf` is external · f0e693b1
      libgit2 has two distinct requirements that were previously solved by
      `git_buf`.  We require:
      
      1. A general purpose string class that provides a number of utility APIs
         for manipulating data (eg, concatenating, truncating, etc).
      2. A structure that we can use to return strings to callers that they
         can take ownership of.
      
      By using a single class (`git_buf`) for both of these purposes, we have
      confused the API to the point that refactorings are difficult and
      reasoning about correctness is also difficult.
      
      Move the utility class `git_buf` to be called `git_str`: this represents
      its general purpose, as an internal string buffer class.  The name also
      is an homage to Junio Hamano ("gitstr").
      
      The public API remains `git_buf`, and has a much smaller footprint.  It
      is generally only used as an "out" param with strict requirements that
      follow the documentation.  (Exceptions exist for some legacy APIs to
      avoid breaking callers unnecessarily.)
      
      Utility functions exist to convert a user-specified `git_buf` to a
      `git_str` so that we can call internal functions, then converting it
      back again.
      Edward Thomson committed
  4. 20 Sep, 2021 1 commit
  5. 14 Apr, 2021 1 commit
    • utf8: refactor utf8 functions · 1d95b59b
      Move the utf8 functions into a proper namespace `git_utf8` instead of
      being in the namespaceless `git__` function group.  Update them to
      have out-params first and use `char *` instead of `uint8_t *` to match
      our API treating strings as `char *` (even if they truly contain `uchar`s
      inside).
      Edward Thomson committed
  6. 13 Dec, 2020 1 commit
    • Make git__strntol64() ~70%* faster · e99e833f
      This change uses compiler intrinsics to detect overflows instead of
      using divisions to detect potential overflow. This makes the code faster
      and makes it easier to read as a bonus side-effect!
      
      Some of the things this quickens:
      
      * Config parsing.
      * Tree parsing.
      * Smart protocol negotiation.
      
      \* Measured by running `libgit2_clar` with `-fno-optimize-sibling-calls
      -fno-omit-frame-pointer` under `perf(1)`:
      
      ```shell
      $ perf diff --symbols=git__strntol64 --compute=ratio \
        --percentage=absolute baseline.data perf.data
      \# Event 'cycles'
      \#
      \# Baseline           Ratio  Shared Object
      \# ........  ..............  .............
      \#
           0.25%        0.321836  libgit2_clar
      ```
      lhchavez committed
  7. 08 Dec, 2020 1 commit
  8. 06 Dec, 2020 1 commit
  9. 09 Jun, 2020 1 commit
  10. 01 Jun, 2020 1 commit
  11. 01 Apr, 2020 1 commit
  12. 13 Oct, 2019 1 commit
  13. 23 Aug, 2019 1 commit
    • util: do not perform allocations in insertsort · 8cbef12d
      Our hand-rolled fallback sorting function `git__insertsort_r` does an
      in-place sort of the given array. As elements may not necessarily be
      pointers, it needs a way of swapping two values of arbitrary size, which
      is currently implemented by allocating a temporary buffer of the
      element's size. This is problematic, though, as the emulated `qsort`
      interface doesn't provide any return values and thus cannot signal an
      error if allocation of that temporary buffer has failed.
      
      Convert the function to swap via a temporary buffer allocated on the
      stack. Like this, it can `memcpy` contents of both elements in small
      batches without requiring a heap allocation. The buffer size has been
      chosen such that in most cases, a single iteration of copying will
      suffice. Most importantly, it can fully contain `git_oid` structures and
      pointers.
      
      Add a bunch of tests for the `git__qsort_r` interface to verify nothing
      breaks. Furthermore, this removes the declaration of `git__insertsort_r`
      and makes it static as it is not used anywhere else.
      Patrick Steinhardt committed
  14. 24 Jun, 2019 2 commits
  15. 29 Mar, 2019 1 commit
  16. 17 Feb, 2019 1 commit
    • Remove `git_time_monotonic` · e6c6d3bb
      `git_time_monotonic` was added so that non-native bindings like rugged
      could get high-resolution timing for benchmarking.  However, this is
      outside the scope of libgit2 *and* rugged decided not to use this
      function in the first place.
      
      Google suggests that absolutely _nobody_ is using this function and we
      don't want to be in the benchmarking business.  Remove the function.
      Edward Thomson committed
  17. 22 Jan, 2019 1 commit
  18. 14 Nov, 2018 1 commit
    • strntol: fix out-of-bounds reads when parsing numbers with leading sign · 4209a512
      When parsing a number, we accept a leading plus or minus sign to return
      a positive or negative number. When the parsed string has such a leading
      sign, we set up a flag indicating that the number is negative and
      advance the pointer to the next character in that string. This misses
      updating the number of bytes in the string, though, which is why the
      parser may later on do an out-of-bounds read.
      
      Fix the issue by correctly updating both the pointer and the number of
      remaining bytes. Furthermore, we need to check whether we actually have
      any bytes left after having advanced the pointer, as otherwise the
      auto-detection of the base may do an out-of-bonuds access. Add a test
      that detects the out-of-bound read.
      
      Note that this is not actually security critical. While there are a lot
      of places where the function is called, all of these places are guarded
      or irrelevant:
      
      - commit list: this operates on objects from the ODB, which are always
        NUL terminated any may thus not trigger the off-by-one OOB read.
      
      - config: the configuration is NUL terminated.
      
      - curl stream: user input is being parsed that is always NUL terminated
      
      - index: the index is read via `git_futils_readbuffer`, which always NUL
        terminates it.
      
      - loose objects: used to parse the length from the object's header. As
        we check previously that the buffer contains a NUL byte, this is safe.
      
      - rebase: this parses numbers from the rebase instruction sheet. As the
        rebase code uses `git_futils_readbuffer`, the buffer is always NUL
        terminated.
      
      - revparse: this parses a user provided buffer that is NUL terminated.
      
      - signature: this parser the header information of objects. As objects
        read from the ODB are always NUL terminated, this is a non-issue. The
        constructor `git_signature_from_buffer` does not accept a length
        parameter for the buffer, so the buffer needs to be NUL terminated, as
        well.
      
      - smart transport: the buffer that is parsed is NUL terminated
      
      - tree cache: this parses the tree cache from the index extension. The
        index itself is read via `git_futils_readbuffer`, which always NUL
        terminates it.
      
      - winhttp transport: user input is being parsed that is always NUL
        terminated
      Patrick Steinhardt committed
  19. 02 Nov, 2018 2 commits
    • strntol: fix detection and skipping of base prefixes · 50d09407
      The `git__strntol` family of functions has the ability to auto-detect
      a number's base if the string has either the common '0x' prefix for
      hexadecimal numbers or '0' prefix for octal numbers. The detection of
      such prefixes and following handling has two major issues though that are
      being fixed in one go now.
      
      - We do not do any bounds checking previous to verifying the '0x' base.
        While we do verify that there is at least one digit available
        previously, we fail to verify that there are two digits available and
        thus may do an out-of-bounds read when parsing this
        two-character-prefix.
      
      - When skipping the prefix of such numbers, we only update the pointer
        length without also updating the number of remaining bytes. Thus if we
        try to parse a number '0x1' of total length 3, we will first skip the
        first two bytes and then try to read 3 bytes starting at '1'.
      
      Fix both issues by disentangling the logic. Instead of doing the
      detection and skipping of such prefixes in one go, we will now first try
      to detect the base while also honoring how many bytes are left. Only if
      we have a valid base that is either 8 or 16 and have one of the known
      prefixes, we will now advance the pointer and update the remaining bytes
      in one step.
      
      Add some tests that verify that no out-of-bounds parsing happens and
      that autodetection works as advertised.
      Patrick Steinhardt committed
    • strntol: fix out-of-bounds read when skipping leading spaces · 41863a00
      The `git__strntol` family of functions accepts leading spaces and will
      simply skip them. The skipping will not honor the provided buffer's
      length, though, which may lead it to read outside of the provided
      buffer's bounds if it is not a simple NUL-terminated string.
      Furthermore, if leading space is trimmed, the function will further
      advance the pointer but not update the number of remaining bytes, which
      may also lead to out-of-bounds reads.
      
      Fix the issue by properly paying attention to the buffer length and
      updating it when stripping leading whitespace characters. Add a test
      that verifies that we won't read past the provided buffer length.
      Patrick Steinhardt committed
  20. 25 Oct, 2018 1 commit
    • util: provide `git__memmem` function · 83e8a6b3
      Unfortunately, neither the `memmem` nor the `strnstr` functions are part
      of any C standard but are merely extensions of C that are implemented by
      e.g. glibc. Thus, there is no standardized way to search for a string in
      a block of memory with a limited size, and using `strstr` is to be
      considered unsafe in case where the buffer has not been sanitized. In
      fact, there are some uses of `strstr` in exactly that unsafe way in our
      codebase.
      
      Provide a new function `git__memmem` that implements the `memmem`
      semantics. That is in a given haystack of `n` bytes, search for the
      occurrence of a byte sequence of `m` bytes and return a pointer to the
      first occurrence. The implementation chosen is the "Not So Naive"
      algorithm from [1]. It was chosen as the implementation is comparably
      simple while still being reasonably efficient in most cases.
      Preprocessing happens in constant time and space, searching has a time
      complexity of O(n*m) with a slightly sub-linear average case.
      
      [1]: http://www-igm.univ-mlv.fr/~lecroq/string/
      Patrick Steinhardt committed
  21. 19 Oct, 2018 1 commit
    • util: fix out of bounds read in error message · ea19efc1
      When an integer that is parsed with `git__strntol32` is too big to fit
      into an int32, we will generate an error message that includes the
      actual string that failed to parse. This does not acknowledge the fact
      that the string may either not be NUL terminated or alternative include
      additional characters after the number that is to be parsed. We may thus
      end up printing characters into the buffer that aren't the number or,
      worse, read out of bounds.
      
      Fix the issue by utilizing the `endptr` that was set by
      `git__strntol64`. This pointer is guaranteed to be set to the first
      character following the number, and we can thus use it to compute the
      width of the number that shall be printed. Create a test to verify that
      we correctly truncate the number.
      Patrick Steinhardt committed
  22. 18 Oct, 2018 3 commits
    • util: avoid signed integer overflows in `git__strntol64` · b09c1c7b
      While `git__strntol64` tries to detect integer overflows when doing the
      necessary arithmetics to come up with the final result, it does the
      detection only after the fact. This check thus relies on undefined
      behavior of signed integer overflows. Fix this by instead checking
      up-front whether the multiplications or additions will overflow.
      
      Note that a detected overflow will not cause us to abort parsing the
      current sequence of digits. In the case of an overflow, previous
      behavior was to still set up the end pointer correctly to point to the
      first character immediately after the currently parsed number. We do not
      want to change this now as code may rely on the end pointer being set up
      correctly even if the parsed number is too big to be represented as
      64 bit integer.
      Patrick Steinhardt committed
    • util: remove `git__strtol32` · 8d7fa88a
      The function `git__strtol32` can easily be misused when untrusted data
      is passed to it that may not have been sanitized with trailing `NUL`
      bytes. As all usages of this function have now been removed, we can
      remove this function altogether to avoid future misuse of it.
      Patrick Steinhardt committed
    • util: remove unsafe `git__strtol64` function · 68deb2cc
      The function `git__strtol64` does not take a maximum buffer length as
      parameter. This has led to some unsafe usages of this function, and as
      such we may consider it as being unsafe to use. As we have now
      eradicated all usages of this function, let's remove it completely to
      avoid future misuse.
      Patrick Steinhardt committed
  23. 24 Aug, 2018 1 commit
  24. 05 May, 2018 1 commit
  25. 29 Mar, 2018 1 commit
  26. 16 Feb, 2018 2 commits
    • util: clean up header includes · 92324d84
      While "util.h" declares the macro `git__tolower`, which simply resorts
      to tolower(3P) on Unix-like systems, the <ctype.h> header is only being
      included in "util.c". Thus, anybody who has included "util.h" without
      having <ctype.h> included will fail to compile as soon as the macro is
      in use.
      
      Furthermore, we can clean up additional includes in "util.c" and simply
      replace them with an include for "common.h".
      Patrick Steinhardt committed
    • Explicitly mark fallthrough cases with comments · 06b8a40f
      A lot of compilers nowadays generate warnings when there are cases in a
      switch statement which implicitly fall through to the next case. To
      avoid this warning, the last line in the case that is falling through
      can have a comment matching a regular expression, where one possible
      comment body would be `/* fall through */`.
      
      An alternative to the comment would be an explicit attribute like e.g.
      `[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only
      introduced support for such an attribute recently with GCC 7. Thus, and
      also because the fallthrough comment is supported by most compilers, we
      settle for using comments instead.
      
      One shortcoming of that method is that compilers are very strict about
      that. Most interestingly, that comment _really_ has to be the last line.
      In case a closing brace follows the comment, the heuristic will fail.
      Patrick Steinhardt committed
  27. 20 Dec, 2017 1 commit
    • util: introduce `git__prefixncmp` and consolidate implementations · 86219f40
      Introduce `git_prefixncmp` that will search up to the first `n`
      characters of a string to see if it is prefixed by another string.
      This is useful for examining if a non-null terminated character
      array is prefixed by a particular substring.
      
      Consolidate the various implementations of `git__prefixcmp` around a
      single core implementation and add some test cases to validate its
      behavior.
      Edward Thomson committed
  28. 03 Jul, 2017 1 commit
    • Make sure to always include "common.h" first · 0c7f49dd
      Next to including several files, our "common.h" header also declares
      various macros which are then used throughout the project. As such, we
      have to make sure to always include this file first in all
      implementation files. Otherwise, we might encounter problems or even
      silent behavioural differences due to macros or defines not being
      defined as they should be. So in fact, our header and implementation
      files should make sure to always include "common.h" first.
      
      This commit does so by establishing a common include pattern. Header
      files inside of "src" will now always include "common.h" as its first
      other file, separated by a newline from all the other includes to make
      it stand out as special. There are two cases for the implementation
      files. If they do have a matching header file, they will always include
      this one first, leading to "common.h" being transitively included as
      first file. If they do not have a matching header file, they instead
      include "common.h" as first file themselves.
      
      This fixes the outlined problems and will become our standard practice
      for header and source files inside of the "src/" from now on.
      Patrick Steinhardt committed
  29. 29 Dec, 2016 1 commit
  30. 13 Sep, 2016 1 commit
  31. 29 Jun, 2016 1 commit
  32. 26 May, 2016 1 commit
  33. 30 Sep, 2015 1 commit
  34. 05 Aug, 2015 1 commit