Commits · 0e0781f6f3d51b6eda93010db3c56530cb3953af · lvzhengyang / git2

23 Feb, 2022 2 commits
- refactor: move utility tests into util · e6d93612
  Edward Thomson committed 2 years ago
  
  e6d93612 Browse File
- refactor: `tests` is now `tests/libgit2` · 3344fddc
```
Like we want to separate libgit2 and utility source code, we want to
separate libgit2 and utility tests.  Start by moving all the tests into
libgit2.
```
  Edward Thomson committed 2 years ago
  3344fddc Browse Directory
17 Oct, 2021 1 commit

str: introduce `git_str` for internal, `git_buf` is external · f0e693b1

libgit2 has two distinct requirements that were previously solved by
`git_buf`.  We require:

1. A general purpose string class that provides a number of utility APIs
   for manipulating data (eg, concatenating, truncating, etc).
2. A structure that we can use to return strings to callers that they
   can take ownership of.

By using a single class (`git_buf`) for both of these purposes, we have
confused the API to the point that refactorings are difficult and
reasoning about correctness is also difficult.

Move the utility class `git_buf` to be called `git_str`: this represents
its general purpose, as an internal string buffer class.  The name also
is an homage to Junio Hamano ("gitstr").

The public API remains `git_buf`, and has a much smaller footprint.  It
is generally only used as an "out" param with strict requirements that
follow the documentation.  (Exceptions exist for some legacy APIs to
avoid breaking callers unnecessarily.)

Utility functions exist to convert a user-specified `git_buf` to a
`git_str` so that we can call internal functions, then converting it
back again.

committed 3 years ago

f0e693b1 Browse Directory

27 Sep, 2021 1 commit

buf: common_prefix takes a string array · 7e7cfe8a

`git_strarray` is a public-facing type.  Change
`git_buf_text_common_prefix` to not use it, and just take an array of
strings instead.

committed 3 years ago

7e7cfe8a Browse Directory

11 May, 2021 1 commit

buf: remove internal `git_buf_text` namespace · d525e063

The `git_buf_text` namespace is unnecessary and strange.  Remove it,
just keep the functions prefixed with `git_buf`.

committed 3 years ago

d525e063 Browse Directory

21 Sep, 2019 2 commits

buffer: fix printing into out-of-memory buffer · 174b7a32

Before printing into a `git_buf` structure, we always call `ENSURE_SIZE`
first. This macro will reallocate the buffer as-needed depending on
whether the current amount of allocated bytes is sufficient or not. If
`asize` is big enough, then it will just do nothing, otherwise it will
call out to `git_buf_try_grow`. But in fact, it is insufficient to only
check `asize`.

When we fail to allocate any more bytes e.g. via `git_buf_try_grow`,
then we set the buffer's pointer to `git_buf__oom`. Note that we touch
neither `asize` nor `size`. So if we just check `asize > targetsize`,
then we will happily let the caller of `ENSURE_SIZE` proceed with an
out-of-memory buffer. As a result, we will print all bytes into the
out-of-memory buffer instead, resulting in an out-of-bounds write.

Fix the issue by having `ENSURE_SIZE` verify that the buffer is not
marked as OOM. Add a test to verify that we're not writing into the OOM
buffer.

committed 5 years ago

174b7a32 Browse Directory

buffer: fix infinite loop when growing buffers · 208f1d7a

When growing buffers, we repeatedly multiply the currently allocated
number of bytes by 1.5 until it exceeds the requested number of bytes.
This has two major problems:

    1. If the current number of bytes is tiny and one wishes to resize
       to a comparatively huge number of bytes, then we may need to loop
       thousands of times.

    2. If resizing to a value close to `SIZE_MAX` (which would fail
       anyway), then we probably hit an infinite loop as multiplying the
       current amount of bytes will repeatedly result in integer
       overflows.

When reallocating buffers, one typically chooses values close to 1.5 to
enable re-use of resulting memory holes in later reallocations. But
because of this, it really only makes sense to use a factor of 1.5
_once_, but not looping until we finally are able to fit it. Thus, we
can completely avoid the loop and just opt for the much simpler
algorithm of multiplying with 1.5 once and, if the result doesn't fit,
just use the target size. This avoids both problems of looping
extensively and hitting overflows.

This commit also adds a test that would've previously resulted in an
infinite loop.

committed 5 years ago

208f1d7a Browse Directory

20 Jul, 2019 1 commit

fileops: rename to "futils.h" to match function signatures · e54343a4

Our file utils functions all have a "futils" prefix, e.g.
`git_futils_touch`. One would thus naturally guess that their
definitions and implementation would live in files "futils.h" and
"futils.c", respectively, but in fact they live in "fileops.h".

Rename the files to match expectations.

committed 5 years ago

e54343a4 Browse Directory

10 Jun, 2018 1 commit
- Convert usage of `git_buf_free` to new `git_buf_dispose` · ecf4f33a
  Patrick Steinhardt committed 6 years ago
  
  ecf4f33a Browse Directory
26 May, 2016 2 commits
- patch parsing: squash some memory leaks · 6278fbc5
  Edward Thomson committed 8 years ago
  
  6278fbc5 Browse Directory
- git_buf: decode base85 inputs · 5b78dbdb
  Edward Thomson committed 8 years ago
  
  5b78dbdb Browse Directory
17 Sep, 2015 1 commit

git_futils_mkdir_*: make a relative-to-base mkdir · ac2fba0e

Untangle git_futils_mkdir from git_futils_mkdir_ext - the latter
assumes that we own everything beneath the base, as if it were
being called with a base of the repository or working directory,
and is tailored towards checkout and ensuring that there is no
bogosity beneath the base that must be cleaned up.

This is (at best) slow and (at worst) unsafe in the larger context
of a filesystem where we do not own things and cannot do things like
unlink symlinks that are in our way.

committed 9 years ago

ac2fba0e Browse Directory

24 Jun, 2015 2 commits

buffer: make use of EINVALID for growing a borrowed buffer · a6599235
```
This explains more closely what happens. While here, set an error
message.
```
Carlos Martín Nieto committed 9 years ago
a6599235 Browse Directory

buffer: don't allow growing borrowed buffers · caab22c0

When we don't own a buffer (asize=0) we currently allow the usage of
grow to copy the memory into a buffer we do own. This muddles the
meaning of grow, and lets us be a bit cavalier with ownership semantics.

Don't allow this any more. Usage of grow should be restricted to buffers
which we know own their own memory. If unsure, we must not attempt to
modify it.

committed 9 years ago

caab22c0 Browse Directory

22 Jun, 2015 1 commit
- git_buf_text_lf_to_crlf: allow mixed line endings · 8293c8f9
```
Allow files to have mixed line endings instead of skipping processing
on them.
```
  Edward Thomson committed 9 years ago
  8293c8f9 Browse Directory
20 Jan, 2015 1 commit

Make binary detection work similar to vanilla git · 0161e096

Main change: Don't treat chars > 128 as non-printable (common in UTF-8 files)

Signed-off-by: Sven Strickroth <email@cs-ware.de>

committed 10 years ago

0161e096 Browse Directory

21 Nov, 2014 1 commit
- buffer: Do not `put` anything if len is 0 · 92e0b679
  Vicent Marti committed 10 years ago
  
  92e0b679 Browse Directory
01 Oct, 2014 1 commit
- hashsig: Export as a `sys` header · 737b5051
  Vicent Marti committed 10 years ago
  
  737b5051 Browse Directory
15 Aug, 2014 1 commit
- Introduce git_buf_decode_base64 · e003f83a
```
Decode base64-encoded text into a git_buf
```
  Edward Thomson committed 10 years ago
  e003f83a Browse Directory
23 Jun, 2014 1 commit

crlf: pass-through mixed EOL buffers from LF->CRLF · 5a76ad35

When checking out files, we're performing conversion into the user's
native line endings, but we only want to do it for files which have
consistent line endings. Refuse to perform the conversion for mixed-EOL
files.

The CRLF->LF filter is left as-is, as that conversion is considered to be
normalization by git and should force a conversion of the line endings.

committed 10 years ago

5a76ad35 Browse Directory

23 Apr, 2014 1 commit
- patch: emit binary patches (optionally) · e349ed50
  Edward Thomson committed 10 years ago
  
  e349ed50 Browse Directory
01 Apr, 2014 1 commit

Add efficient git_buf join3 API · 18234b14

There are a few places where we need to join three strings to
assemble a path.  This adds a simple join3 function to avoid the
comparatively expensive join_n (which calls strlen on each string
twice).

committed 10 years ago

18234b14 Browse Directory

20 Jan, 2014 1 commit
- add unit tests for git_buf_join corner cases · abdaf936
  Patrick Reynolds committed 11 years ago
  
  abdaf936 Browse Directory
14 Nov, 2013 1 commit
- Rename tests-clar to tests · 17820381
  Ben Straub committed 11 years ago
  
  17820381 Browse Directory
17 Sep, 2013 1 commit

Start of filter API + git_blob_filtered_content · 0cf77103

This begins the process of exposing git_filter objects to the
public API.  This includes:

* new public type and API for `git_buffer` through which an
  allocated buffer can be passed to the user
* new API `git_blob_filtered_content`
* make the git_filter type and GIT_FILTER_TO_... constants public

committed 11 years ago

0cf77103 Browse Directory

19 Aug, 2013 1 commit

Skip UTF-8 BOM in binary detection · c0b01b75

When a git_buf contains a UTF-8 BOM, the three bytes comprising
that BOM are treated as unprintable characters.  For a small git_buf,
the three BOM characters overwhelm the printable characters.  This
is problematic when trying to check out a small file as the CR/LF
filtering will not apply.

committed 11 years ago

c0b01b75 Browse Directory

31 Jul, 2013 1 commit

Major rename detection changes · d730d3f4

After doing further profiling, I found that a lot of time was
being spent attempting to insert hashes into the file hash
signature when using the rolling hash because the rolling hash
approach generates a hash per byte of the file instead of one
per run/line of data.

To optimize this, I decided to convert back to a run-based file
signature algorithm which would be more like core Git.

After changing this, a number of the existing tests started to
fail.  In some cases, this appears to have been because the test
was coded to be too specific to the particular results of the file
similarity metric and in some cases there appear to have been bugs
in the core rename detection code where only by the coincidence
of the file similarity scoring were the expected results being
generated.

This renames all the variables in the core rename detection code
to be more consistent and hopefully easier to follow which made it
a bit easier to reason about the behavior of that code and fix the
problems that I was seeing.  I think it's in better shape now.

There are a couple of tests now that attempt to stress test the
rename detection code and they are quite slow.  Most of the time
is spent setting up the test data on disk and in the index.  When
we roll out performance improvements for index insertion, it
should also speed up these tests I hope.

committed 11 years ago

d730d3f4 Browse Directory

25 Mar, 2013 1 commit

Move crlf conversion into buf_text · 3658e81e

This adds crlf/lf conversion functions into buf_text with more
efficient implementations that bypass the high level buffer
functions.  They attempt to minimize the number of reallocations
done and they directly write the buffer data as needed if they
know that there is enough memory allocated to memcpy data.

Tests are added for these new functions.  The crlf.c code is
updated to use the new functions.

Removed the include of buf_text.h from filter.h and just include
it more narrowly in the places that need it.

committed 11 years ago

3658e81e Browse Directory

20 Feb, 2013 4 commits

Refine pluggable similarity API · 9bc8be3d

This plugs in the three basic similarity strategies for handling
whitespace via internal use of the pluggable API.  In so doing, I
realized that the use of git_buf in the hashsig API was not needed
and actually just made it harder to use, so I tweaked that API as
well.

Note that the similarity metric is still not hooked up in the
find_similarity code - this is just setting out the function that
will be used.

committed 12 years ago

9bc8be3d Browse Directory

More tests of file signatures with whitespace opts · aa643260
```
Seems to be working pretty well...
```
Russell Belfer committed 12 years ago
aa643260 Browse Directory

This moves the similarity metric code out of buf_text and into a
new file.  Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes.  In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.

committed 12 years ago

5e5848eb Browse Directory

Initial implementation of similarity scoring algo · 9c454b00

This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score.  This can be plugged into diff similarity
scoring.

committed 12 years ago

9c454b00 Browse Directory

29 Jan, 2013 1 commit
- Test buf join with NULL behavior explicitly · 17c92bea
  Russell Belfer committed 12 years ago
  
  17c92bea Browse Directory
11 Jan, 2013 1 commit

Match binary file check of core git in diff · 0d65acad

Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.

committed 12 years ago

0d65acad Browse Directory

28 Nov, 2012 1 commit

Consolidate text buffer functions · 7bf87ab6

There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc).  This groups all those functions together into a
new file and converts the code to use that.

This has two enhancements to existing functionality.  The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).

committed 12 years ago

7bf87ab6 Browse Directory

10 Oct, 2012 1 commit
- Add git_buf_put_base64 to buffer API · 2d3579be
  Russell Belfer committed 12 years ago
  
  2d3579be Browse Directory
23 Aug, 2012 1 commit
- Fix warnings and merge issues on Win64 · e9ca852e
  Russell Belfer committed 12 years ago
  
  e9ca852e Browse Directory
24 Jul, 2012 1 commit
- Add git_buf_unescape and git__unescape to unescape all characters in a string (in-place) · 02a0d651
  yorah committed 12 years ago
  
  02a0d651 Browse Directory
12 Jul, 2012 1 commit
- Fix memory leak in test · 465092ce
  Russell Belfer committed 12 years ago
  
  465092ce Browse Directory
11 Jul, 2012 1 commit

Add a couple of useful git_buf utilities · 039fc406

* `git_buf_rfind` (with tests and tests for `git_buf_rfind_next`)
* `git_buf_puts_escaped` and `git_buf_puts_escaped_regex` (with tests)
  to copy strings into a buffer while injecting an escape sequence
  (e.g. '\') in front of particular characters.

committed 12 years ago

039fc406 Browse Directory