Commits · e300547b7bec48e3be96ab94b82f8777d747cedf · lvzhengyang / git2

26 Mar, 2020 1 commit

patch: correctly handle mode changes for renames · 5f47cb48

When generating a patch for a renamed file whose mode bits have changed
in addition to the rename, then we currently fail to parse the generated
patch. Furthermore, when generating a diff we output mode bits after the
similarity metric, which is different to how upstream git handles it.

Fix both issues by adding another state transition that allows
similarity indices after mode changes and by printing mode changes
before the similarity index.

committed 4 years ago

5f47cb48 Browse File

13 Dec, 2019 1 commit

patch_parse: fix undefined behaviour due to arithmetic on NULL pointers · c6f9ad73

Doing arithmetic with NULL pointers is undefined behaviour in the C
standard. We do so regardless when parsing patches, as we happily add a
potential prefix length to prefixed paths. While this works out just
fine as the prefix length is always equal to zero in these cases, thus
resulting in another NULL pointer, it still is undefined behaviour and
was pointed out to us by OSSfuzz.

Fix the issue by checking whether paths are NULL, avoiding the
arithmetic if they are.

committed 5 years ago

c6f9ad73 Browse File

28 Nov, 2019 1 commit

patch_parse: fix out-of-bounds reads caused by integer underflow · 33e6c402

The patch format for binary files is a simple Base85 encoding with a
length byte as prefix that encodes the current line's length. For each
line, we thus check whether the line's actual length matches its
expected length in order to not faultily apply a truncated patch. This
also acts as a check to verify that we're not reading outside of the
line's string:

	if (encoded_len > ctx->parse_ctx.line_len - 1) {
		error = git_parse_err(...);
		goto done;
	}

There is the possibility for an integer underflow, though. Given a line
with a single prefix byte, only, `line_len` will be zero when reaching
this check. As a result, subtracting one from that will result in an
integer underflow, causing us to assume that there's a wealth of bytes
available later on. Naturally, this may result in an out-of-bounds read.

Fix the issue by checking both `encoded_len` and `line_len` for a
non-zero value. The binary format doesn't make use of zero-length lines
anyway, so we need to know that there are both encoded bytes and
remaining characters available at all.

This patch also adds a test that works based on the last error message.
Checking error messages is usually too tightly coupled, but in fact
parsing the patch failed even before the change. Thus the only
possibility is to use e.g. Valgrind, but that'd result in us not
catching issues when run without Valgrind. As a result, using the error
message is considered a viable tradeoff as we know that we didn't start
decoding Base85 in the first place.

committed 5 years ago

33e6c402 Browse File

22 Nov, 2019 1 commit
- odb: use `git_object_size_t` for object size · fefefd1d
```
Instead of using a signed type (`off_t`) use a new `git_object_size_t`
for the sizes of objects.
```
  Edward Thomson committed 5 years ago
  fefefd1d Browse File
19 Nov, 2019 1 commit

patch_parse: correct parsing of patch containing not shown binary data. · 048e94ad

When not shown binary data is added or removed in a patch, patch parser
is currently returning 'error -1 - corrupt git binary header at line 4'.
Fix it by correctly handling case where binary data is added/removed.

Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>

committed 5 years ago

048e94ad Browse File

10 Nov, 2019 1 commit

patch_parse: use paths from "---"/"+++" lines for binary patches · de7659cc

For some patches, it is not possible to derive the old and new file
paths from the patch header's first line, most importantly when they
contain spaces. In such a case, we derive both paths from the "---" and
"+++" lines, which allow for non-ambiguous parsing. We fail to use these
paths when parsing binary patches without data, though, as we always
expect the header paths to be filled in.

Fix this by using the "---"/"+++" paths by default and only fall back to
header paths if they aren't set. If neither of those paths are set, we
just return an error. Add two tests to verify this behaviour, one of
which would have previously caused a segfault.

committed 5 years ago

de7659cc Browse File

05 Nov, 2019 1 commit

patch_parse: fix segfault when header path contains whitespace only · de543e29

When parsing header paths from a patch, we reject any patches with empty
paths as malformed patches. We perform the check whether a path is empty
before sanitizing it, though, which may lead to a path becoming empty
after the check, e.g. if we have trimmed whitespace. This may lead to a
segfault later when any part of our patching logic actually references
such a path, which may then be a `NULL` pointer.

Fix the issue by performing the check after sanitizing. Add tests to
catch the issue as they would have produced a segfault previosuly.

committed 5 years ago

de543e29 Browse File

21 Oct, 2019 1 commit

patch_parse: detect overflow when calculating old/new line position · 37141ff7

When the patch contains lines close to INT_MAX, then it may happen that
we end up with an integer overflow when calculating the line of the
current diff hunk. Reject such patches as unreasonable to avoid the
integer overflow.

As the calculation is performed on integers, we introduce two new
helpers `git__add_int_overflow` and `git__sub_int_overflow` that perform
the integer overflow check in a generic way.

committed 5 years ago

37141ff7 Browse File

19 Oct, 2019 3 commits

patch_parse: fix out-of-bounds read with No-NL lines · 468e3ddc

We've got two locations where we copy lines into the patch. The first
one is when copying normal " ", "-" or "+" lines, while the second
location gets executed when we copy "\ No newline at end of file" lines.
While the first one correctly uses `git__strndup` to copy only until the
newline, the other one doesn't. Thus, if the line occurs at the end of
the patch and if there is no terminating NUL character, then it may
result in an out-of-bounds read.

Fix the issue by using `git__strndup`, as was already done in the other
location. Furthermore, add allocation checks to both locations to detect
out-of-memory situations.

committed 5 years ago

468e3ddc Browse File

patch_parse: reject empty path names · 6c6c15e9

When parsing patch headers, we currently accept empty path names just
fine, e.g. a line "--- \n" would be parsed as the empty filename. This
is not a valid patch format and may cause `NULL` pointer accesses at a
later place as `git_buf_detach` will return `NULL` in that case.

Reject such patches as malformed with a nice error message.

committed 5 years ago

6c6c15e9 Browse File

patch_parse: reject patches with multiple old/new paths · 223e7e43

It's currently possible to have patches with multiple old path name
headers. As we didn't check for this case, this resulted in a memory
leak when overwriting the old old path with the new old path because we
simply discarded the old pointer.

Instead of fixing this by free'ing the old pointer, we should reject
such patches altogether. It doesn't make any sense for the "---" or
"+++" markers to occur multiple times within a patch n the first place.
This also implicitly fixes the memory leak.

committed 5 years ago

223e7e43 Browse File

16 Oct, 2019 1 commit

patch_parse: handle patches without extended headers · 11de594f

Extended header lines (especially the "index <hash>..<hash> <mode>") are
not required by "git apply" so it import patches. So we allow the
from-file/to-file lines (--- a/file\n+++ b/file) to directly follow the
git diff header.

This fixes #5267.

committed 5 years ago

11de594f Browse File

28 Sep, 2019 1 commit

patch_parse: handle patches with new empty files · b61810bf

Patches containing additions of empty files will not contain diff data
but will end with the index header line followed by the terminating
sequence "-- ". We follow the same logic as in cc4c44a9 and allow "-- "
to immediately follow the index header.

committed 5 years ago

b61810bf Browse File

01 Aug, 2019 1 commit

parse: remove use of variadic macros which are not C89 compliant · 27b8b31e

The macro `git_parse_error` is implemented in a variadic way so
that it's possible to pass printf-style parameters.
Unfortunately, variadic macros are not defined by C89 and thus we
cannot use that functionality. But as we have implemented
`git_error_vset` in the previous commit, we can now just use that
instead.

Convert `git_parse_error` to a variadic function and use
`git_error_vset` to fix the compliance violation. While at it,
move the function to "patch_parse.c".

committed 5 years ago

27b8b31e Browse File

20 Jul, 2019 1 commit

patch_parse: fix segfault due to line containing static contents · a613832e

With commit dedf70ad (patch_parse: do not depend on parsed buffer's
lifetime, 2019-07-05), all lines of the patch are allocated with
`strdup` to make lifetime of the parsed patch independent of the buffer
that is currently being parsed. In patch b0893282 (patch_parse: ensure
valid patch output with EOFNL, 2019-07-11), we introduced another
code location where we add lines to the parsed patch. But as that one
was implemented via a separate pull request, it wasn't converted to use
`strdup`, as well. As a consequence, we generate a segfault when trying
to deallocate the potentially static buffer that's now in some of the
lines.

Use `git__strdup` to fix the issue.

committed 5 years ago

a613832e Browse File

11 Jul, 2019 2 commits

patch_parse: ensure valid patch output with EOFNL · b0893282
Erik Aigner committed 5 years ago

b0893282 Browse File

patch_parse: handle missing newline indicator in old file · 3f855fe8

When either the old or new file contents have no newline at the end of
the file, then git-diff(1) will print out a "\ No newline at end of
file" indicator. While we do correctly handle this in the case where the
new file has this indcator, we fail to parse patches where the old file
is missing a newline at EOF.

Fix this bug by handling and missing newline indicators in the old file.
Add tests to verify that we can parse such files.

committed 5 years ago

3f855fe8 Browse File

05 Jul, 2019 1 commit

patch_parse: do not depend on parsed buffer's lifetime · dedf70ad

When parsing a patch from a buffer, we let the patch lines point into
the original buffer. While this is efficient use of resources, this also
ties the lifetime of the parsed patch to the parsed buffer. As this
behaviour is not documented anywhere in our API it is very surprising to
its users.

Untie the lifetime by duplicating the lines into the parsed patch. Add a
test that verifies that lifetimes are indeed independent of each other.

committed 5 years ago

dedf70ad Browse File

24 Jun, 2019 1 commit
- patch: use size_t for size when parsing · f1d73189
  Edward Thomson committed 5 years ago
  
  f1d73189 Browse File
06 Apr, 2019 1 commit
- patch_parse.c: Handle CRLF in parse_header_start · 30c06b60
  Drew DeVault committed 5 years ago
  
  30c06b60 Browse File
29 Mar, 2019 2 commits

patch_parse: fix parsing addition/deletion of file with space · b3497344

The diff header format is a strange beast in that it is inherently
unparseable in an unambiguous way. While parsing

    a/file.txt b/file.txt

is obvious and trivially doable, parsing a diff header of

    a/file b/file ab.txt b/file b/file ab.txt

is not (but in fact valid and created by git.git).

Due to that, we have relaxed our diff header parser in commit 80226b5f
(patch_parse: allow parsing ambiguous patch headers, 2017-09-22), so
that we started to bail out when seeing diff headers with spaces in
their file names. Instead, we try to use the "---" and "+++" lines,
which are unambiguous.

In some cases, though, we neither have a useable file name from the
header nor from the "---" or "+++" lines. This is the case when we have
a deletion or addition of a file with spaces: the header is unparseable
and the other lines will simply show "/dev/null". This trips our parsing
logic when we try to extract the prefix (the "a/" part) that is being
used in the path line, where we unconditionally try to dereference a
NULL pointer in such a scenario.

We can fix this by simply not trying to parse the prefix in cases where
we have no useable path name. That'd leave the parsed patch without
either `old_prefix` or `new_prefix` populated. But in fact such cases
are already handled by users of the patch object, which simply opt to
use the default prefixes in that case.

committed 5 years ago

b3497344 Browse File

patch_parse: improve formatting · 131cd9b1
Patrick Steinhardt committed 5 years ago

131cd9b1 Browse File

25 Jan, 2019 1 commit

patch: explicitly cast down in parse_header_percent · 80c3867b

Quiet down a warning from MSVC about how we're potentially losing data.
This is safe since we've explicitly tested that it's within the range of
0-100.

committed 6 years ago

80c3867b Browse File

22 Jan, 2019 1 commit
- git_error: use new names in internal APIs and usage · f673e232
```
Move to the `git_error` name in the internal API for error-related
functions.
```
  Edward Thomson committed 6 years ago
  f673e232 Browse File
14 Nov, 2018 1 commit

patch_parse: remove unused function `parse_number` · 4b84db6a

The function `parse_number` was replaced by `git_parse_advance_digit`
which is provided by the parser interface in commit 252f2eee (parse:
implement and use `git_parse_advance_digit`, 2017-07-14). As there are
no remaining callers, remove it.

committed 6 years ago

4b84db6a Browse File

18 Jun, 2018 1 commit
- patch_parse: populate line numbers while parsing diffs · f9e28026
  Etienne Samson committed 6 years ago
  
  f9e28026 Browse File
10 Jun, 2018 1 commit
- Convert usage of `git_buf_free` to new `git_buf_dispose` · ecf4f33a
  Patrick Steinhardt committed 6 years ago
  
  ecf4f33a Browse File
16 Feb, 2018 1 commit

Explicitly mark fallthrough cases with comments · 06b8a40f

A lot of compilers nowadays generate warnings when there are cases in a
switch statement which implicitly fall through to the next case. To
avoid this warning, the last line in the case that is falling through
can have a comment matching a regular expression, where one possible
comment body would be `/* fall through */`.

An alternative to the comment would be an explicit attribute like e.g.
`[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only
introduced support for such an attribute recently with GCC 7. Thus, and
also because the fallthrough comment is supported by most compilers, we
settle for using comments instead.

One shortcoming of that method is that compilers are very strict about
that. Most interestingly, that comment _really_ has to be the last line.
In case a closing brace follows the comment, the heuristic will fail.

committed 7 years ago

06b8a40f Browse File

18 Nov, 2017 1 commit

refcount: make refcounting conform to aliasing rules · 585b5dac

Strict aliasing rules dictate that for most data types, you are not
allowed to cast them to another data type and then access the casted
pointers. While this works just fine for most compilers, technically we
end up in undefined behaviour when we hurt that rule.

Our current refcounting code makes heavy use of casting and thus
violates that rule. While we didn't have any problems with that code,
Travis started spitting out a lot of warnings due to a change in their
toolchain. In the refcounting case, the code is also easy to fix:
as all refcounting-statements are actually macros, we can just access
the `rc` field directly instead of casting.

There are two outliers in our code where that doesn't work. Both the
`git_diff` and `git_patch` structures have specializations for generated
and parsed diffs/patches, which directly inherit from them. Because of
that, the refcounting code is only part of the base structure and not of
the children themselves. We can help that by instead passing their base
into `GIT_REFCOUNT_INC`, though.

committed 7 years ago

585b5dac Browse File

11 Nov, 2017 7 commits

patch_parse: allow parsing ambiguous patch headers · 80226b5f

The git patch format allows for having unquoted paths with whitespaces
inside. This format becomes ambiguous to parse, e.g. in the following
example:

    diff --git a/file b/with spaces.txt b/file b/with spaces.txt

While we cannot parse this in a correct way, we can instead use the
"---" and "+++" lines to retrieve the file names, as the path is not
followed by anything here but spans the complete remaining line. Because
of this, we can simply bail outwhen parsing the "diff --git" header here
without an actual error and then proceed to just take the paths from the
other headers.

committed 7 years ago

80226b5f Browse File

patch_parse: treat complete line after "---"/"+++" as path · 3892f70d

When parsing the "---" and "+++" line, we stop after the first
whitespace inside of the filename. But as files containing whitespaces
do not need to be quoted, we should instead use the complete line here.

This fixes parsing patches with unquoted paths with whitespaces.

committed 7 years ago

3892f70d Browse File

parse: always initialize line pointer · 7bdfc0a6

Upon initializing the parser context, we do not currently initialize the
current line, line length and line number. Do so in order to make the
interface easier to use and more obvious for future consumers of the
parsing API.

committed 7 years ago

7bdfc0a6 Browse File

parse: implement `git_parse_peek` · e72cb769

Some code parts need to inspect the next few bytes without actually
consuming it yet, for example to examine what content it has to expect
next. Create a new function `git_parse_peek` which returns the next byte
without modifying the parsing context and use it at multiple call sites.

committed 7 years ago

e72cb769 Browse File

parse: implement and use `git_parse_advance_digit` · 252f2eee

The patch parsing code has multiple recurring patterns where we want to
parse an actual number. Create a new function `git_parse_advance_digit`
and use it to avoid code duplication.

committed 7 years ago

252f2eee Browse File

patch_parse: use git_parse_contains_s · 65dcb645

Instead of manually checking the parsing context's remaining length and
comparing the leading bytes with a specific string, we can simply re-use
the function `git_parse_ctx_contains_s`. Do so to avoid code duplication
and to further decouple patch parsing from the parsing context's struct
members.

committed 7 years ago

65dcb645 Browse File

parse: extract parse module · ef1395f3

The `git_patch_parse_ctx` encapsulates both parser state as well as
options specific to patch parsing. To advance this state and keep it
consistent, we provide a few functions which handle advancing the
current position and accessing bytes of the patch contents. In fact,
these functions are quite generic and not related to patch-parsing by
themselves. Seeing that we have similar logic inside of other modules,
it becomes quite enticing to extract this functionality into its own
parser module.

To do so, we create a new module `parse` with a central struct called
`git_parse_ctx`. It encapsulates both the content that is to be parsed
as well as its lengths and the current position. `git_patch_parse_ctx`
now only contains this `parse_ctx` only, which is then accessed whenever
we need to touch the current parser. This is the first step towards
re-using this functionality across other modules which require parsing
functionality and remove code-duplication.

committed 7 years ago

ef1395f3 Browse File

01 Sep, 2017 1 commit

patch_parse: fix parsing patches only containing exact renames · cc4c44a9

Patches which contain exact renames only will not contain an actual diff
body, but only a list of files that were renamed. Thus, the patch header
is immediately followed by the terminating sequence "-- ". We currently
do not recognize this character sequence as a possible terminating
sequence. Add it and create a test to catch the failure.

committed 7 years ago

cc4c44a9 Browse File

25 Aug, 2017 1 commit

patch_parse: implement state machine for parsing patch headers · 57bc9dab

Our code parsing Git patch headers is rather lax in parsing headers of a
Git-style patch. Most notably, we do not care for the exact order in
which header lines appear and as such, we may parse patch files which
are not really valid after all. Furthermore, the state transitions
inside of the parser are not as obvious as they could be, making it
harder than required to follow its logic.

To improve upon this situation, this patch introduces a real state
machine to parse the patches. Instead of simply parsing each line
without caring for previous state and the exact ordering, we define a
set of states with their allowed transitions. This makes the patch
parser more strict in only allowing valid successions of header lines.
As the transition table is defined inside of a single structure with
the expected line, required state as well as the state that we end up
in, all state transitions are immediately obvious from just having a
look at this structure. This improves both maintainability and eases
reasoning about the patch parser.

committed 7 years ago

57bc9dab Browse File

03 Jul, 2017 1 commit

Make sure to always include "common.h" first · 0c7f49dd

Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.

This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.

This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.

committed 7 years ago

0c7f49dd Browse File

21 Mar, 2017 1 commit

patch_parse: check if advancing over header newline succeeds · 723bdf48

While parsing patch header lines, we iterate over each line and check if
the line has trailing garbage. What we do not check though is that the
line is actually a line ending with a trailing newline.

Fix this by checking the return code of `parse_advance_expected_str`.

committed 7 years ago

723bdf48 Browse File