Commits · f98dd5438f8d7bfd557b612fdf1605b1c3fb8eaf · lvzhengyang / git2

23 Feb, 2022 2 commits

refactor: make util an object library · c3b7ace9

Instead of simply including the utility files directly, make them a
cmake object library for easy reusability between other projects within
libgit2.

Now the top-level `src` is responsible for platform selection, while the
next-level `libgit2` and `util` configurations are responsible for
identifying what objects they include.

committed 2 years ago

c3b7ace9 Browse File

refactor: `src` is now `src/libgit2` · ef4ab298
Edward Thomson committed 2 years ago

ef4ab298 Browse Directory

11 Nov, 2021 1 commit
- cmake: qsort detection in features.h · 19e99de0
  Edward Thomson committed 3 years ago
  
  19e99de0 Browse Directory
17 Oct, 2021 1 commit

str: introduce `git_str` for internal, `git_buf` is external · f0e693b1

libgit2 has two distinct requirements that were previously solved by
`git_buf`.  We require:

1. A general purpose string class that provides a number of utility APIs
   for manipulating data (eg, concatenating, truncating, etc).
2. A structure that we can use to return strings to callers that they
   can take ownership of.

By using a single class (`git_buf`) for both of these purposes, we have
confused the API to the point that refactorings are difficult and
reasoning about correctness is also difficult.

Move the utility class `git_buf` to be called `git_str`: this represents
its general purpose, as an internal string buffer class.  The name also
is an homage to Junio Hamano ("gitstr").

The public API remains `git_buf`, and has a much smaller footprint.  It
is generally only used as an "out" param with strict requirements that
follow the documentation.  (Exceptions exist for some legacy APIs to
avoid breaking callers unnecessarily.)

Utility functions exist to convert a user-specified `git_buf` to a
`git_str` so that we can call internal functions, then converting it
back again.

committed 3 years ago

f0e693b1 Browse Directory

20 Sep, 2021 1 commit

win32: allow empty environment variables · 18fc751e

A length of 0 indicates an error and GetLastError() will be set.  If
GetLastError() is unset then the environment variable has a length of 0.

committed 3 years ago

18fc751e Browse Directory

14 Apr, 2021 1 commit

utf8: refactor utf8 functions · 1d95b59b

Move the utf8 functions into a proper namespace `git_utf8` instead of
being in the namespaceless `git__` function group.  Update them to
have out-params first and use `char *` instead of `uint8_t *` to match
our API treating strings as `char *` (even if they truly contain `uchar`s
inside).

committed 3 years ago

1d95b59b Browse Directory

13 Dec, 2020 1 commit

Make git__strntol64() ~70%* faster · e99e833f

This change uses compiler intrinsics to detect overflows instead of
using divisions to detect potential overflow. This makes the code faster
and makes it easier to read as a bonus side-effect!

Some of the things this quickens:

* Config parsing.
* Tree parsing.
* Smart protocol negotiation.

\* Measured by running `libgit2_clar` with `-fno-optimize-sibling-calls
-fno-omit-frame-pointer` under `perf(1)`:

```shell
$ perf diff --symbols=git__strntol64 --compute=ratio \
  --percentage=absolute baseline.data perf.data
\# Event 'cycles'
\#
\# Baseline           Ratio  Shared Object
\# ........  ..............  .............
\#
     0.25%        0.321836  libgit2_clar
```

committed 4 years ago

e99e833f Browse Directory

08 Dec, 2020 1 commit

win32: provide macros for calling conventions · 42d5f110

Provide more clarity for Win32 calling conventions - now that we always
compile to __cdecl on Win32, we'll define that as the the libgit2
calling convention. Also offer NTAPI (__stdcall) calling conventions
for things that need callbacks from NTAPI code (eg fiber-local storage).

committed 4 years ago

42d5f110 Browse Directory

06 Dec, 2020 1 commit

util: move git_online_cpus into util · 9800728a

The number of CPUs is useful information for creating a thread pool or a
number of workers, but it's not really about threading directly.  Evict
it from the thread file

committed 4 years ago

9800728a Browse Directory

09 Jun, 2020 1 commit

tree-wide: mark local functions as static · a6c9e0b3

We've accumulated quite some functions which are never used outside of
their respective code unit, but which are lacking the `static` keyword.
Add it to reduce their linkage scope and allow the compiler to optimize
better.

committed 4 years ago

a6c9e0b3 Browse Directory

01 Jun, 2020 1 commit
- strarray: move to its own file · a9746b30
  Edward Thomson committed 4 years ago
  
  a9746b30 Browse Directory
01 Apr, 2020 1 commit
- git__hexdump: better mimic `hexdump -C` · 5d37128d
  Edward Thomson committed 4 years ago
  
  5d37128d Browse Directory
13 Oct, 2019 1 commit
- util: hide helper qsort code to silence unused functions warning · aa234ac0
  Etienne Samson committed 5 years ago
  
  aa234ac0 Browse Directory
23 Aug, 2019 1 commit

util: do not perform allocations in insertsort · 8cbef12d

Our hand-rolled fallback sorting function `git__insertsort_r` does an
in-place sort of the given array. As elements may not necessarily be
pointers, it needs a way of swapping two values of arbitrary size, which
is currently implemented by allocating a temporary buffer of the
element's size. This is problematic, though, as the emulated `qsort`
interface doesn't provide any return values and thus cannot signal an
error if allocation of that temporary buffer has failed.

Convert the function to swap via a temporary buffer allocated on the
stack. Like this, it can `memcpy` contents of both elements in small
batches without requiring a heap allocation. The buffer size has been
chosen such that in most cases, a single iteration of copying will
suffice. Most importantly, it can fully contain `git_oid` structures and
pointers.

Add a bunch of tests for the `git__qsort_r` interface to verify nothing
breaks. Furthermore, this removes the declaration of `git__insertsort_r`
and makes it static as it is not used anywhere else.

committed 5 years ago

8cbef12d Browse Directory

24 Jun, 2019 2 commits

strtol: cast error message length to int · db6b8f7d
Edward Thomson committed 5 years ago

db6b8f7d Browse Directory

utf8: use size_t for length of buffer · ad6f2153

The `git__utf8_charlen` now takes `size_t` as the buffer length, since
it contains the full length of the buffer at the current position. It
now returns `-1` in all cases where utf8 codepoints are invalid, since
callers only care about a valid length of a sequence of codepoints, or
if the current position is not valid utf8.

committed 5 years ago

ad6f2153 Browse Directory

29 Mar, 2019 1 commit
- optimize string comparisons · 30a56ba6
  romkatv committed 5 years ago
  
  30a56ba6 Browse Directory
17 Feb, 2019 1 commit

Remove `git_time_monotonic` · e6c6d3bb

`git_time_monotonic` was added so that non-native bindings like rugged
could get high-resolution timing for benchmarking.  However, this is
outside the scope of libgit2 *and* rugged decided not to use this
function in the first place.

Google suggests that absolutely _nobody_ is using this function and we
don't want to be in the benchmarking business.  Remove the function.

committed 5 years ago

e6c6d3bb Browse Directory

22 Jan, 2019 1 commit
- git_error: use new names in internal APIs and usage · f673e232
```
Move to the `git_error` name in the internal API for error-related
functions.
```
  Edward Thomson committed 6 years ago
  f673e232 Browse Directory
14 Nov, 2018 1 commit

strntol: fix out-of-bounds reads when parsing numbers with leading sign · 4209a512

When parsing a number, we accept a leading plus or minus sign to return
a positive or negative number. When the parsed string has such a leading
sign, we set up a flag indicating that the number is negative and
advance the pointer to the next character in that string. This misses
updating the number of bytes in the string, though, which is why the
parser may later on do an out-of-bounds read.

Fix the issue by correctly updating both the pointer and the number of
remaining bytes. Furthermore, we need to check whether we actually have
any bytes left after having advanced the pointer, as otherwise the
auto-detection of the base may do an out-of-bonuds access. Add a test
that detects the out-of-bound read.

Note that this is not actually security critical. While there are a lot
of places where the function is called, all of these places are guarded
or irrelevant:

- commit list: this operates on objects from the ODB, which are always
  NUL terminated any may thus not trigger the off-by-one OOB read.

- config: the configuration is NUL terminated.

- curl stream: user input is being parsed that is always NUL terminated

- index: the index is read via `git_futils_readbuffer`, which always NUL
  terminates it.

- loose objects: used to parse the length from the object's header. As
  we check previously that the buffer contains a NUL byte, this is safe.

- rebase: this parses numbers from the rebase instruction sheet. As the
  rebase code uses `git_futils_readbuffer`, the buffer is always NUL
  terminated.

- revparse: this parses a user provided buffer that is NUL terminated.

- signature: this parser the header information of objects. As objects
  read from the ODB are always NUL terminated, this is a non-issue. The
  constructor `git_signature_from_buffer` does not accept a length
  parameter for the buffer, so the buffer needs to be NUL terminated, as
  well.

- smart transport: the buffer that is parsed is NUL terminated

- tree cache: this parses the tree cache from the index extension. The
  index itself is read via `git_futils_readbuffer`, which always NUL
  terminates it.

- winhttp transport: user input is being parsed that is always NUL
  terminated

committed 6 years ago

4209a512 Browse Directory

02 Nov, 2018 2 commits

strntol: fix detection and skipping of base prefixes · 50d09407

The `git__strntol` family of functions has the ability to auto-detect
a number's base if the string has either the common '0x' prefix for
hexadecimal numbers or '0' prefix for octal numbers. The detection of
such prefixes and following handling has two major issues though that are
being fixed in one go now.

- We do not do any bounds checking previous to verifying the '0x' base.
  While we do verify that there is at least one digit available
  previously, we fail to verify that there are two digits available and
  thus may do an out-of-bounds read when parsing this
  two-character-prefix.

- When skipping the prefix of such numbers, we only update the pointer
  length without also updating the number of remaining bytes. Thus if we
  try to parse a number '0x1' of total length 3, we will first skip the
  first two bytes and then try to read 3 bytes starting at '1'.

Fix both issues by disentangling the logic. Instead of doing the
detection and skipping of such prefixes in one go, we will now first try
to detect the base while also honoring how many bytes are left. Only if
we have a valid base that is either 8 or 16 and have one of the known
prefixes, we will now advance the pointer and update the remaining bytes
in one step.

Add some tests that verify that no out-of-bounds parsing happens and
that autodetection works as advertised.

committed 6 years ago

50d09407 Browse Directory

strntol: fix out-of-bounds read when skipping leading spaces · 41863a00

The `git__strntol` family of functions accepts leading spaces and will
simply skip them. The skipping will not honor the provided buffer's
length, though, which may lead it to read outside of the provided
buffer's bounds if it is not a simple NUL-terminated string.
Furthermore, if leading space is trimmed, the function will further
advance the pointer but not update the number of remaining bytes, which
may also lead to out-of-bounds reads.

Fix the issue by properly paying attention to the buffer length and
updating it when stripping leading whitespace characters. Add a test
that verifies that we won't read past the provided buffer length.

committed 6 years ago

41863a00 Browse Directory

25 Oct, 2018 1 commit

util: provide `git__memmem` function · 83e8a6b3

Unfortunately, neither the `memmem` nor the `strnstr` functions are part
of any C standard but are merely extensions of C that are implemented by
e.g. glibc. Thus, there is no standardized way to search for a string in
a block of memory with a limited size, and using `strstr` is to be
considered unsafe in case where the buffer has not been sanitized. In
fact, there are some uses of `strstr` in exactly that unsafe way in our
codebase.

Provide a new function `git__memmem` that implements the `memmem`
semantics. That is in a given haystack of `n` bytes, search for the
occurrence of a byte sequence of `m` bytes and return a pointer to the
first occurrence. The implementation chosen is the "Not So Naive"
algorithm from [1]. It was chosen as the implementation is comparably
simple while still being reasonably efficient in most cases.
Preprocessing happens in constant time and space, searching has a time
complexity of O(n*m) with a slightly sub-linear average case.

[1]: http://www-igm.univ-mlv.fr/~lecroq/string/

committed 6 years ago

83e8a6b3 Browse Directory

19 Oct, 2018 1 commit

util: fix out of bounds read in error message · ea19efc1

When an integer that is parsed with `git__strntol32` is too big to fit
into an int32, we will generate an error message that includes the
actual string that failed to parse. This does not acknowledge the fact
that the string may either not be NUL terminated or alternative include
additional characters after the number that is to be parsed. We may thus
end up printing characters into the buffer that aren't the number or,
worse, read out of bounds.

Fix the issue by utilizing the `endptr` that was set by
`git__strntol64`. This pointer is guaranteed to be set to the first
character following the number, and we can thus use it to compute the
width of the number that shall be printed. Create a test to verify that
we correctly truncate the number.

committed 6 years ago

ea19efc1 Browse Directory

18 Oct, 2018 3 commits

util: avoid signed integer overflows in `git__strntol64` · b09c1c7b

While `git__strntol64` tries to detect integer overflows when doing the
necessary arithmetics to come up with the final result, it does the
detection only after the fact. This check thus relies on undefined
behavior of signed integer overflows. Fix this by instead checking
up-front whether the multiplications or additions will overflow.

Note that a detected overflow will not cause us to abort parsing the
current sequence of digits. In the case of an overflow, previous
behavior was to still set up the end pointer correctly to point to the
first character immediately after the currently parsed number. We do not
want to change this now as code may rely on the end pointer being set up
correctly even if the parsed number is too big to be represented as
64 bit integer.

committed 6 years ago

b09c1c7b Browse Directory

util: remove `git__strtol32` · 8d7fa88a

The function `git__strtol32` can easily be misused when untrusted data
is passed to it that may not have been sanitized with trailing `NUL`
bytes. As all usages of this function have now been removed, we can
remove this function altogether to avoid future misuse of it.

committed 6 years ago

8d7fa88a Browse Directory

util: remove unsafe `git__strtol64` function · 68deb2cc

The function `git__strtol64` does not take a maximum buffer length as
parameter. This has led to some unsafe usages of this function, and as
such we may consider it as being unsafe to use. As we have now
eradicated all usages of this function, let's remove it completely to
avoid future misuse.

committed 6 years ago

68deb2cc Browse Directory

24 Aug, 2018 1 commit

util: make the qsort_r check work on macOS · 1a9cc182

This performs a compile-check by using CMake support, to differentiate the GNU
version from the BSD version of qsort_r.

Module taken from 4f252abea5f1d17c60f6ff115c9c44cc0b6f1df6, which I've checked
against CMake 2.8.11.

committed 6 years ago

1a9cc182 Browse Directory

05 May, 2018 1 commit

Sanitize the hunk header to ensure it contains UTF-8 valid data · 9d83a2b0

The diff driver truncates the hunk header text to 80 bytes, which can truncate
4-byte Unicode characters and introduce garbage characters in the diff
output. This change sanitizes the hunk header before it is displayed.

This mirrors the test in git: https://github.com/git/git/blob/master/t/t4025-hunk-header.sh

Closes https://github.com/libgit2/rugged/issues/716

committed 6 years ago

9d83a2b0 Browse Directory

29 Mar, 2018 1 commit
- util: fix missing headers for MinGW environments · fbe52fa3
```
There are multiple references to undefined functions in the Microsoft
builds. Add headers to make them known.
```
  Patrick Steinhardt committed 6 years ago
  fbe52fa3 Browse Directory
16 Feb, 2018 2 commits

util: clean up header includes · 92324d84

While "util.h" declares the macro `git__tolower`, which simply resorts
to tolower(3P) on Unix-like systems, the <ctype.h> header is only being
included in "util.c". Thus, anybody who has included "util.h" without
having <ctype.h> included will fail to compile as soon as the macro is
in use.

Furthermore, we can clean up additional includes in "util.c" and simply
replace them with an include for "common.h".

committed 6 years ago

92324d84 Browse Directory

Explicitly mark fallthrough cases with comments · 06b8a40f

A lot of compilers nowadays generate warnings when there are cases in a
switch statement which implicitly fall through to the next case. To
avoid this warning, the last line in the case that is falling through
can have a comment matching a regular expression, where one possible
comment body would be `/* fall through */`.

An alternative to the comment would be an explicit attribute like e.g.
`[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only
introduced support for such an attribute recently with GCC 7. Thus, and
also because the fallthrough comment is supported by most compilers, we
settle for using comments instead.

One shortcoming of that method is that compilers are very strict about
that. Most interestingly, that comment _really_ has to be the last line.
In case a closing brace follows the comment, the heuristic will fail.

committed 6 years ago

06b8a40f Browse Directory

20 Dec, 2017 1 commit

util: introduce `git__prefixncmp` and consolidate implementations · 86219f40

Introduce `git_prefixncmp` that will search up to the first `n`
characters of a string to see if it is prefixed by another string.
This is useful for examining if a non-null terminated character
array is prefixed by a particular substring.

Consolidate the various implementations of `git__prefixcmp` around a
single core implementation and add some test cases to validate its
behavior.

committed 7 years ago

86219f40 Browse Directory

03 Jul, 2017 1 commit

Make sure to always include "common.h" first · 0c7f49dd

Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.

This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.

This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.

committed 7 years ago

0c7f49dd Browse Directory

29 Dec, 2016 1 commit

giterr_set: consistent error messages · 909d5494

Error messages should be sentence fragments, and therefore:

1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one

committed 8 years ago

909d5494 Browse Directory

13 Sep, 2016 1 commit
- time: Export `git_time_monotonic` · 2749ff46
  Vicent Marti committed 8 years ago
  
  2749ff46 Browse Directory
29 Jun, 2016 1 commit
- Fixed bug while parsing INT64_MIN · 70b9b841
  Krishna Ram Prakash R committed 8 years ago
  
  70b9b841 Browse Directory
26 May, 2016 1 commit
- Patch parsing from patch files · d34f6826
  Edward Thomson committed 8 years ago
  
  d34f6826 Browse Directory
30 Sep, 2015 1 commit
- qsort_r/qsort_s: detect their support · e683d152
  Edward Thomson committed 9 years ago
  
  e683d152 Browse Directory
05 Aug, 2015 1 commit

Fix duplicate basenames to support older VS · eba784d2

With Visual Studio versions 2008 and older they ignore the full path to files and only check
the basename of the file to find a collision. Additionally, having duplicate basenames can break
other build tools like GYP.

This fixes https://github.com/libgit2/libgit2/issues/3356

committed 9 years ago

eba784d2 Browse Directory