1. 26 Mar, 2020 1 commit
    • patch: correctly handle mode changes for renames · 5f47cb48
      When generating a patch for a renamed file whose mode bits have changed
      in addition to the rename, then we currently fail to parse the generated
      patch. Furthermore, when generating a diff we output mode bits after the
      similarity metric, which is different to how upstream git handles it.
      
      Fix both issues by adding another state transition that allows
      similarity indices after mode changes and by printing mode changes
      before the similarity index.
      Patrick Steinhardt committed
  2. 13 Dec, 2019 1 commit
    • patch_parse: fix undefined behaviour due to arithmetic on NULL pointers · c6f9ad73
      Doing arithmetic with NULL pointers is undefined behaviour in the C
      standard. We do so regardless when parsing patches, as we happily add a
      potential prefix length to prefixed paths. While this works out just
      fine as the prefix length is always equal to zero in these cases, thus
      resulting in another NULL pointer, it still is undefined behaviour and
      was pointed out to us by OSSfuzz.
      
      Fix the issue by checking whether paths are NULL, avoiding the
      arithmetic if they are.
      Patrick Steinhardt committed
  3. 28 Nov, 2019 1 commit
    • patch_parse: fix out-of-bounds reads caused by integer underflow · 33e6c402
      The patch format for binary files is a simple Base85 encoding with a
      length byte as prefix that encodes the current line's length. For each
      line, we thus check whether the line's actual length matches its
      expected length in order to not faultily apply a truncated patch. This
      also acts as a check to verify that we're not reading outside of the
      line's string:
      
      	if (encoded_len > ctx->parse_ctx.line_len - 1) {
      		error = git_parse_err(...);
      		goto done;
      	}
      
      There is the possibility for an integer underflow, though. Given a line
      with a single prefix byte, only, `line_len` will be zero when reaching
      this check. As a result, subtracting one from that will result in an
      integer underflow, causing us to assume that there's a wealth of bytes
      available later on. Naturally, this may result in an out-of-bounds read.
      
      Fix the issue by checking both `encoded_len` and `line_len` for a
      non-zero value. The binary format doesn't make use of zero-length lines
      anyway, so we need to know that there are both encoded bytes and
      remaining characters available at all.
      
      This patch also adds a test that works based on the last error message.
      Checking error messages is usually too tightly coupled, but in fact
      parsing the patch failed even before the change. Thus the only
      possibility is to use e.g. Valgrind, but that'd result in us not
      catching issues when run without Valgrind. As a result, using the error
      message is considered a viable tradeoff as we know that we didn't start
      decoding Base85 in the first place.
      Patrick Steinhardt committed
  4. 22 Nov, 2019 1 commit
  5. 19 Nov, 2019 1 commit
  6. 10 Nov, 2019 1 commit
    • patch_parse: use paths from "---"/"+++" lines for binary patches · de7659cc
      For some patches, it is not possible to derive the old and new file
      paths from the patch header's first line, most importantly when they
      contain spaces. In such a case, we derive both paths from the "---" and
      "+++" lines, which allow for non-ambiguous parsing. We fail to use these
      paths when parsing binary patches without data, though, as we always
      expect the header paths to be filled in.
      
      Fix this by using the "---"/"+++" paths by default and only fall back to
      header paths if they aren't set. If neither of those paths are set, we
      just return an error. Add two tests to verify this behaviour, one of
      which would have previously caused a segfault.
      Patrick Steinhardt committed
  7. 05 Nov, 2019 1 commit
    • patch_parse: fix segfault when header path contains whitespace only · de543e29
      When parsing header paths from a patch, we reject any patches with empty
      paths as malformed patches. We perform the check whether a path is empty
      before sanitizing it, though, which may lead to a path becoming empty
      after the check, e.g. if we have trimmed whitespace. This may lead to a
      segfault later when any part of our patching logic actually references
      such a path, which may then be a `NULL` pointer.
      
      Fix the issue by performing the check after sanitizing. Add tests to
      catch the issue as they would have produced a segfault previosuly.
      Patrick Steinhardt committed
  8. 21 Oct, 2019 1 commit
    • patch_parse: detect overflow when calculating old/new line position · 37141ff7
      When the patch contains lines close to INT_MAX, then it may happen that
      we end up with an integer overflow when calculating the line of the
      current diff hunk. Reject such patches as unreasonable to avoid the
      integer overflow.
      
      As the calculation is performed on integers, we introduce two new
      helpers `git__add_int_overflow` and `git__sub_int_overflow` that perform
      the integer overflow check in a generic way.
      Patrick Steinhardt committed
  9. 19 Oct, 2019 3 commits
    • patch_parse: fix out-of-bounds read with No-NL lines · 468e3ddc
      We've got two locations where we copy lines into the patch. The first
      one is when copying normal " ", "-" or "+" lines, while the second
      location gets executed when we copy "\ No newline at end of file" lines.
      While the first one correctly uses `git__strndup` to copy only until the
      newline, the other one doesn't. Thus, if the line occurs at the end of
      the patch and if there is no terminating NUL character, then it may
      result in an out-of-bounds read.
      
      Fix the issue by using `git__strndup`, as was already done in the other
      location. Furthermore, add allocation checks to both locations to detect
      out-of-memory situations.
      Patrick Steinhardt committed
    • patch_parse: reject empty path names · 6c6c15e9
      When parsing patch headers, we currently accept empty path names just
      fine, e.g. a line "--- \n" would be parsed as the empty filename. This
      is not a valid patch format and may cause `NULL` pointer accesses at a
      later place as `git_buf_detach` will return `NULL` in that case.
      
      Reject such patches as malformed with a nice error message.
      Patrick Steinhardt committed
    • patch_parse: reject patches with multiple old/new paths · 223e7e43
      It's currently possible to have patches with multiple old path name
      headers. As we didn't check for this case, this resulted in a memory
      leak when overwriting the old old path with the new old path because we
      simply discarded the old pointer.
      
      Instead of fixing this by free'ing the old pointer, we should reject
      such patches altogether. It doesn't make any sense for the "---" or
      "+++" markers to occur multiple times within a patch n the first place.
      This also implicitly fixes the memory leak.
      Patrick Steinhardt committed
  10. 16 Oct, 2019 1 commit
  11. 28 Sep, 2019 1 commit
  12. 01 Aug, 2019 1 commit
    • parse: remove use of variadic macros which are not C89 compliant · 27b8b31e
      The macro `git_parse_error` is implemented in a variadic way so
      that it's possible to pass printf-style parameters.
      Unfortunately, variadic macros are not defined by C89 and thus we
      cannot use that functionality. But as we have implemented
      `git_error_vset` in the previous commit, we can now just use that
      instead.
      
      Convert `git_parse_error` to a variadic function and use
      `git_error_vset` to fix the compliance violation. While at it,
      move the function to "patch_parse.c".
      Patrick Steinhardt committed
  13. 20 Jul, 2019 1 commit
    • patch_parse: fix segfault due to line containing static contents · a613832e
      With commit dedf70ad (patch_parse: do not depend on parsed buffer's
      lifetime, 2019-07-05), all lines of the patch are allocated with
      `strdup` to make lifetime of the parsed patch independent of the buffer
      that is currently being parsed. In patch b0893282 (patch_parse: ensure
      valid patch output with EOFNL, 2019-07-11), we introduced another
      code location where we add lines to the parsed patch. But as that one
      was implemented via a separate pull request, it wasn't converted to use
      `strdup`, as well. As a consequence, we generate a segfault when trying
      to deallocate the potentially static buffer that's now in some of the
      lines.
      
      Use `git__strdup` to fix the issue.
      Patrick Steinhardt committed
  14. 11 Jul, 2019 2 commits
  15. 05 Jul, 2019 1 commit
    • patch_parse: do not depend on parsed buffer's lifetime · dedf70ad
      When parsing a patch from a buffer, we let the patch lines point into
      the original buffer. While this is efficient use of resources, this also
      ties the lifetime of the parsed patch to the parsed buffer. As this
      behaviour is not documented anywhere in our API it is very surprising to
      its users.
      
      Untie the lifetime by duplicating the lines into the parsed patch. Add a
      test that verifies that lifetimes are indeed independent of each other.
      Patrick Steinhardt committed
  16. 24 Jun, 2019 1 commit
  17. 06 Apr, 2019 1 commit
  18. 29 Mar, 2019 2 commits
    • patch_parse: fix parsing addition/deletion of file with space · b3497344
      The diff header format is a strange beast in that it is inherently
      unparseable in an unambiguous way. While parsing
      
          a/file.txt b/file.txt
      
      is obvious and trivially doable, parsing a diff header of
      
          a/file b/file ab.txt b/file b/file ab.txt
      
      is not (but in fact valid and created by git.git).
      
      Due to that, we have relaxed our diff header parser in commit 80226b5f
      (patch_parse: allow parsing ambiguous patch headers, 2017-09-22), so
      that we started to bail out when seeing diff headers with spaces in
      their file names. Instead, we try to use the "---" and "+++" lines,
      which are unambiguous.
      
      In some cases, though, we neither have a useable file name from the
      header nor from the "---" or "+++" lines. This is the case when we have
      a deletion or addition of a file with spaces: the header is unparseable
      and the other lines will simply show "/dev/null". This trips our parsing
      logic when we try to extract the prefix (the "a/" part) that is being
      used in the path line, where we unconditionally try to dereference a
      NULL pointer in such a scenario.
      
      We can fix this by simply not trying to parse the prefix in cases where
      we have no useable path name. That'd leave the parsed patch without
      either `old_prefix` or `new_prefix` populated. But in fact such cases
      are already handled by users of the patch object, which simply opt to
      use the default prefixes in that case.
      Patrick Steinhardt committed
  19. 25 Jan, 2019 1 commit
  20. 22 Jan, 2019 1 commit
  21. 14 Nov, 2018 1 commit
  22. 18 Jun, 2018 1 commit
  23. 10 Jun, 2018 1 commit
  24. 16 Feb, 2018 1 commit
    • Explicitly mark fallthrough cases with comments · 06b8a40f
      A lot of compilers nowadays generate warnings when there are cases in a
      switch statement which implicitly fall through to the next case. To
      avoid this warning, the last line in the case that is falling through
      can have a comment matching a regular expression, where one possible
      comment body would be `/* fall through */`.
      
      An alternative to the comment would be an explicit attribute like e.g.
      `[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only
      introduced support for such an attribute recently with GCC 7. Thus, and
      also because the fallthrough comment is supported by most compilers, we
      settle for using comments instead.
      
      One shortcoming of that method is that compilers are very strict about
      that. Most interestingly, that comment _really_ has to be the last line.
      In case a closing brace follows the comment, the heuristic will fail.
      Patrick Steinhardt committed
  25. 18 Nov, 2017 1 commit
    • refcount: make refcounting conform to aliasing rules · 585b5dac
      Strict aliasing rules dictate that for most data types, you are not
      allowed to cast them to another data type and then access the casted
      pointers. While this works just fine for most compilers, technically we
      end up in undefined behaviour when we hurt that rule.
      
      Our current refcounting code makes heavy use of casting and thus
      violates that rule. While we didn't have any problems with that code,
      Travis started spitting out a lot of warnings due to a change in their
      toolchain. In the refcounting case, the code is also easy to fix:
      as all refcounting-statements are actually macros, we can just access
      the `rc` field directly instead of casting.
      
      There are two outliers in our code where that doesn't work. Both the
      `git_diff` and `git_patch` structures have specializations for generated
      and parsed diffs/patches, which directly inherit from them. Because of
      that, the refcounting code is only part of the base structure and not of
      the children themselves. We can help that by instead passing their base
      into `GIT_REFCOUNT_INC`, though.
      Patrick Steinhardt committed
  26. 11 Nov, 2017 7 commits
    • patch_parse: allow parsing ambiguous patch headers · 80226b5f
      The git patch format allows for having unquoted paths with whitespaces
      inside. This format becomes ambiguous to parse, e.g. in the following
      example:
      
          diff --git a/file b/with spaces.txt b/file b/with spaces.txt
      
      While we cannot parse this in a correct way, we can instead use the
      "---" and "+++" lines to retrieve the file names, as the path is not
      followed by anything here but spans the complete remaining line. Because
      of this, we can simply bail outwhen parsing the "diff --git" header here
      without an actual error and then proceed to just take the paths from the
      other headers.
      Patrick Steinhardt committed
    • patch_parse: treat complete line after "---"/"+++" as path · 3892f70d
      When parsing the "---" and "+++" line, we stop after the first
      whitespace inside of the filename. But as files containing whitespaces
      do not need to be quoted, we should instead use the complete line here.
      
      This fixes parsing patches with unquoted paths with whitespaces.
      Patrick Steinhardt committed
    • parse: always initialize line pointer · 7bdfc0a6
      Upon initializing the parser context, we do not currently initialize the
      current line, line length and line number. Do so in order to make the
      interface easier to use and more obvious for future consumers of the
      parsing API.
      Patrick Steinhardt committed
    • parse: implement `git_parse_peek` · e72cb769
      Some code parts need to inspect the next few bytes without actually
      consuming it yet, for example to examine what content it has to expect
      next. Create a new function `git_parse_peek` which returns the next byte
      without modifying the parsing context and use it at multiple call sites.
      Patrick Steinhardt committed
    • parse: implement and use `git_parse_advance_digit` · 252f2eee
      The patch parsing code has multiple recurring patterns where we want to
      parse an actual number. Create a new function `git_parse_advance_digit`
      and use it to avoid code duplication.
      Patrick Steinhardt committed
    • patch_parse: use git_parse_contains_s · 65dcb645
      Instead of manually checking the parsing context's remaining length and
      comparing the leading bytes with a specific string, we can simply re-use
      the function `git_parse_ctx_contains_s`. Do so to avoid code duplication
      and to further decouple patch parsing from the parsing context's struct
      members.
      Patrick Steinhardt committed
    • parse: extract parse module · ef1395f3
      The `git_patch_parse_ctx` encapsulates both parser state as well as
      options specific to patch parsing. To advance this state and keep it
      consistent, we provide a few functions which handle advancing the
      current position and accessing bytes of the patch contents. In fact,
      these functions are quite generic and not related to patch-parsing by
      themselves. Seeing that we have similar logic inside of other modules,
      it becomes quite enticing to extract this functionality into its own
      parser module.
      
      To do so, we create a new module `parse` with a central struct called
      `git_parse_ctx`. It encapsulates both the content that is to be parsed
      as well as its lengths and the current position. `git_patch_parse_ctx`
      now only contains this `parse_ctx` only, which is then accessed whenever
      we need to touch the current parser. This is the first step towards
      re-using this functionality across other modules which require parsing
      functionality and remove code-duplication.
      Patrick Steinhardt committed
  27. 01 Sep, 2017 1 commit
  28. 25 Aug, 2017 1 commit
    • patch_parse: implement state machine for parsing patch headers · 57bc9dab
      Our code parsing Git patch headers is rather lax in parsing headers of a
      Git-style patch. Most notably, we do not care for the exact order in
      which header lines appear and as such, we may parse patch files which
      are not really valid after all. Furthermore, the state transitions
      inside of the parser are not as obvious as they could be, making it
      harder than required to follow its logic.
      
      To improve upon this situation, this patch introduces a real state
      machine to parse the patches. Instead of simply parsing each line
      without caring for previous state and the exact ordering, we define a
      set of states with their allowed transitions. This makes the patch
      parser more strict in only allowing valid successions of header lines.
      As the transition table is defined inside of a single structure with
      the expected line, required state as well as the state that we end up
      in, all state transitions are immediately obvious from just having a
      look at this structure. This improves both maintainability and eases
      reasoning about the patch parser.
      Patrick Steinhardt committed
  29. 03 Jul, 2017 1 commit
    • Make sure to always include "common.h" first · 0c7f49dd
      Next to including several files, our "common.h" header also declares
      various macros which are then used throughout the project. As such, we
      have to make sure to always include this file first in all
      implementation files. Otherwise, we might encounter problems or even
      silent behavioural differences due to macros or defines not being
      defined as they should be. So in fact, our header and implementation
      files should make sure to always include "common.h" first.
      
      This commit does so by establishing a common include pattern. Header
      files inside of "src" will now always include "common.h" as its first
      other file, separated by a newline from all the other includes to make
      it stand out as special. There are two cases for the implementation
      files. If they do have a matching header file, they will always include
      this one first, leading to "common.h" being transitively included as
      first file. If they do not have a matching header file, they instead
      include "common.h" as first file themselves.
      
      This fixes the outlined problems and will become our standard practice
      for header and source files inside of the "src/" from now on.
      Patrick Steinhardt committed
  30. 21 Mar, 2017 1 commit