1. 28 Nov, 2019 2 commits
    • patch_parse: fix out-of-bounds reads caused by integer underflow · 33e6c402
      The patch format for binary files is a simple Base85 encoding with a
      length byte as prefix that encodes the current line's length. For each
      line, we thus check whether the line's actual length matches its
      expected length in order to not faultily apply a truncated patch. This
      also acts as a check to verify that we're not reading outside of the
      line's string:
      
      	if (encoded_len > ctx->parse_ctx.line_len - 1) {
      		error = git_parse_err(...);
      		goto done;
      	}
      
      There is the possibility for an integer underflow, though. Given a line
      with a single prefix byte, only, `line_len` will be zero when reaching
      this check. As a result, subtracting one from that will result in an
      integer underflow, causing us to assume that there's a wealth of bytes
      available later on. Naturally, this may result in an out-of-bounds read.
      
      Fix the issue by checking both `encoded_len` and `line_len` for a
      non-zero value. The binary format doesn't make use of zero-length lines
      anyway, so we need to know that there are both encoded bytes and
      remaining characters available at all.
      
      This patch also adds a test that works based on the last error message.
      Checking error messages is usually too tightly coupled, but in fact
      parsing the patch failed even before the change. Thus the only
      possibility is to use e.g. Valgrind, but that'd result in us not
      catching issues when run without Valgrind. As a result, using the error
      message is considered a viable tradeoff as we know that we didn't start
      decoding Base85 in the first place.
      Patrick Steinhardt committed
    • diff: make patchid computation work with all types of commits. · ece5bb5e
      Current implementation of patchid is not computing a correct patchid
      when given a patch where, for example, a new file is added or removed.
      Some more corner cases need to be handled to have same behavior as git
      patch-id command.
      Add some more tests to cover those corner cases.
      
      Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
      Gregory Herrero committed
  2. 19 Nov, 2019 1 commit
  3. 10 Nov, 2019 1 commit
    • patch_parse: use paths from "---"/"+++" lines for binary patches · de7659cc
      For some patches, it is not possible to derive the old and new file
      paths from the patch header's first line, most importantly when they
      contain spaces. In such a case, we derive both paths from the "---" and
      "+++" lines, which allow for non-ambiguous parsing. We fail to use these
      paths when parsing binary patches without data, though, as we always
      expect the header paths to be filled in.
      
      Fix this by using the "---"/"+++" paths by default and only fall back to
      header paths if they aren't set. If neither of those paths are set, we
      just return an error. Add two tests to verify this behaviour, one of
      which would have previously caused a segfault.
      Patrick Steinhardt committed
  4. 05 Nov, 2019 1 commit
    • patch_parse: fix segfault when header path contains whitespace only · de543e29
      When parsing header paths from a patch, we reject any patches with empty
      paths as malformed patches. We perform the check whether a path is empty
      before sanitizing it, though, which may lead to a path becoming empty
      after the check, e.g. if we have trimmed whitespace. This may lead to a
      segfault later when any part of our patching logic actually references
      such a path, which may then be a `NULL` pointer.
      
      Fix the issue by performing the check after sanitizing. Add tests to
      catch the issue as they would have produced a segfault previosuly.
      Patrick Steinhardt committed
  5. 21 Oct, 2019 1 commit
    • patch_parse: detect overflow when calculating old/new line position · 37141ff7
      When the patch contains lines close to INT_MAX, then it may happen that
      we end up with an integer overflow when calculating the line of the
      current diff hunk. Reject such patches as unreasonable to avoid the
      integer overflow.
      
      As the calculation is performed on integers, we introduce two new
      helpers `git__add_int_overflow` and `git__sub_int_overflow` that perform
      the integer overflow check in a generic way.
      Patrick Steinhardt committed
  6. 19 Oct, 2019 2 commits
    • patch_parse: reject empty path names · 6c6c15e9
      When parsing patch headers, we currently accept empty path names just
      fine, e.g. a line "--- \n" would be parsed as the empty filename. This
      is not a valid patch format and may cause `NULL` pointer accesses at a
      later place as `git_buf_detach` will return `NULL` in that case.
      
      Reject such patches as malformed with a nice error message.
      Patrick Steinhardt committed
    • patch_parse: reject patches with multiple old/new paths · 223e7e43
      It's currently possible to have patches with multiple old path name
      headers. As we didn't check for this case, this resulted in a memory
      leak when overwriting the old old path with the new old path because we
      simply discarded the old pointer.
      
      Instead of fixing this by free'ing the old pointer, we should reject
      such patches altogether. It doesn't make any sense for the "---" or
      "+++" markers to occur multiple times within a patch n the first place.
      This also implicitly fixes the memory leak.
      Patrick Steinhardt committed
  7. 16 Oct, 2019 1 commit
  8. 28 Aug, 2019 1 commit
  9. 11 Jul, 2019 1 commit
    • patch_parse: handle missing newline indicator in old file · 3f855fe8
      When either the old or new file contents have no newline at the end of
      the file, then git-diff(1) will print out a "\ No newline at end of
      file" indicator. While we do correctly handle this in the case where the
      new file has this indcator, we fail to parse patches where the old file
      is missing a newline at EOF.
      
      Fix this bug by handling and missing newline indicators in the old file.
      Add tests to verify that we can parse such files.
      Patrick Steinhardt committed
  10. 06 Apr, 2019 1 commit
  11. 29 Mar, 2019 1 commit
  12. 05 Nov, 2018 1 commit
  13. 11 Nov, 2017 1 commit
    • patch_parse: allow parsing ambiguous patch headers · 80226b5f
      The git patch format allows for having unquoted paths with whitespaces
      inside. This format becomes ambiguous to parse, e.g. in the following
      example:
      
          diff --git a/file b/with spaces.txt b/file b/with spaces.txt
      
      While we cannot parse this in a correct way, we can instead use the
      "---" and "+++" lines to retrieve the file names, as the path is not
      followed by anything here but spans the complete remaining line. Because
      of this, we can simply bail outwhen parsing the "diff --git" header here
      without an actual error and then proceed to just take the paths from the
      other headers.
      Patrick Steinhardt committed
  14. 26 Jun, 2017 1 commit
    • diff: implement function to calculate patch ID · 89a34828
      The upstream git project provides the ability to calculate a so-called
      patch ID. Quoting from git-patch-id(1):
      
          A "patch ID" is nothing but a sum of SHA-1 of the file diffs
          associated with a patch, with whitespace and line numbers ignored."
      
      Patch IDs can be used to identify two patches which are probably the
      same thing, e.g. when a patch has been cherry-picked to another branch.
      
      This commit implements a new function `git_diff_patchid`, which gets a
      patch and derives an OID from the diff. Note the different terminology
      here: a patch in libgit2 are the differences in a single file and a diff
      can contain multiple patches for different files. The implementation
      matches the upstream implementation and should derive the same OID for
      the same diff. In fact, some code has been directly derived from the
      upstream implementation.
      
      The upstream implementation has two different modes to calculate patch
      IDs, which is the stable and unstable mode. The old way of calculating
      the patch IDs was unstable in a sense that a different ordering the
      diffs was leading to different results. This oversight was fixed in git
      1.9, but as git tries hard to never break existing workflows, the old
      and unstable way is still default. The newer and stable way does not
      care for ordering of the diff hunks, and in fact it is the mode that
      should probably be used today. So right now, we only implement the
      stable way of generating the patch ID.
      Patrick Steinhardt committed
  15. 05 Sep, 2016 1 commit
  16. 26 May, 2016 5 commits