- 23 Feb, 2022 2 commits
-
-
Edward Thomson committed
-
Like we want to separate libgit2 and utility source code, we want to separate libgit2 and utility tests. Start by moving all the tests into libgit2.
Edward Thomson committed
-
- 17 Oct, 2021 1 commit
-
-
libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
Edward Thomson committed
-
- 27 Sep, 2021 1 commit
-
-
`git_strarray` is a public-facing type. Change `git_buf_text_common_prefix` to not use it, and just take an array of strings instead.
Edward Thomson committed
-
- 11 May, 2021 1 commit
-
-
The `git_buf_text` namespace is unnecessary and strange. Remove it, just keep the functions prefixed with `git_buf`.
Edward Thomson committed
-
- 21 Sep, 2019 2 commits
-
-
Before printing into a `git_buf` structure, we always call `ENSURE_SIZE` first. This macro will reallocate the buffer as-needed depending on whether the current amount of allocated bytes is sufficient or not. If `asize` is big enough, then it will just do nothing, otherwise it will call out to `git_buf_try_grow`. But in fact, it is insufficient to only check `asize`. When we fail to allocate any more bytes e.g. via `git_buf_try_grow`, then we set the buffer's pointer to `git_buf__oom`. Note that we touch neither `asize` nor `size`. So if we just check `asize > targetsize`, then we will happily let the caller of `ENSURE_SIZE` proceed with an out-of-memory buffer. As a result, we will print all bytes into the out-of-memory buffer instead, resulting in an out-of-bounds write. Fix the issue by having `ENSURE_SIZE` verify that the buffer is not marked as OOM. Add a test to verify that we're not writing into the OOM buffer.
Patrick Steinhardt committed -
When growing buffers, we repeatedly multiply the currently allocated number of bytes by 1.5 until it exceeds the requested number of bytes. This has two major problems: 1. If the current number of bytes is tiny and one wishes to resize to a comparatively huge number of bytes, then we may need to loop thousands of times. 2. If resizing to a value close to `SIZE_MAX` (which would fail anyway), then we probably hit an infinite loop as multiplying the current amount of bytes will repeatedly result in integer overflows. When reallocating buffers, one typically chooses values close to 1.5 to enable re-use of resulting memory holes in later reallocations. But because of this, it really only makes sense to use a factor of 1.5 _once_, but not looping until we finally are able to fit it. Thus, we can completely avoid the loop and just opt for the much simpler algorithm of multiplying with 1.5 once and, if the result doesn't fit, just use the target size. This avoids both problems of looping extensively and hitting overflows. This commit also adds a test that would've previously resulted in an infinite loop.
Patrick Steinhardt committed
-
- 20 Jul, 2019 1 commit
-
-
Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.
Patrick Steinhardt committed
-
- 10 Jun, 2018 1 commit
-
-
Patrick Steinhardt committed
-
- 26 May, 2016 2 commits
-
-
Edward Thomson committed
-
Edward Thomson committed
-
- 17 Sep, 2015 1 commit
-
-
Untangle git_futils_mkdir from git_futils_mkdir_ext - the latter assumes that we own everything beneath the base, as if it were being called with a base of the repository or working directory, and is tailored towards checkout and ensuring that there is no bogosity beneath the base that must be cleaned up. This is (at best) slow and (at worst) unsafe in the larger context of a filesystem where we do not own things and cannot do things like unlink symlinks that are in our way.
Edward Thomson committed
-
- 24 Jun, 2015 2 commits
-
-
This explains more closely what happens. While here, set an error message.
Carlos Martín Nieto committed -
When we don't own a buffer (asize=0) we currently allow the usage of grow to copy the memory into a buffer we do own. This muddles the meaning of grow, and lets us be a bit cavalier with ownership semantics. Don't allow this any more. Usage of grow should be restricted to buffers which we know own their own memory. If unsure, we must not attempt to modify it.
Carlos Martín Nieto committed
-
- 22 Jun, 2015 1 commit
-
-
Allow files to have mixed line endings instead of skipping processing on them.
Edward Thomson committed
-
- 20 Jan, 2015 1 commit
-
-
Main change: Don't treat chars > 128 as non-printable (common in UTF-8 files) Signed-off-by: Sven Strickroth <email@cs-ware.de>
Sven Strickroth committed
-
- 21 Nov, 2014 1 commit
-
-
Vicent Marti committed
-
- 01 Oct, 2014 1 commit
-
-
Vicent Marti committed
-
- 15 Aug, 2014 1 commit
-
-
Decode base64-encoded text into a git_buf
Edward Thomson committed
-
- 23 Jun, 2014 1 commit
-
-
When checking out files, we're performing conversion into the user's native line endings, but we only want to do it for files which have consistent line endings. Refuse to perform the conversion for mixed-EOL files. The CRLF->LF filter is left as-is, as that conversion is considered to be normalization by git and should force a conversion of the line endings.
Carlos Martín Nieto committed
-
- 23 Apr, 2014 1 commit
-
-
Edward Thomson committed
-
- 01 Apr, 2014 1 commit
-
-
There are a few places where we need to join three strings to assemble a path. This adds a simple join3 function to avoid the comparatively expensive join_n (which calls strlen on each string twice).
Russell Belfer committed
-
- 20 Jan, 2014 1 commit
-
-
Patrick Reynolds committed
-
- 14 Nov, 2013 1 commit
-
-
Ben Straub committed
-
- 17 Sep, 2013 1 commit
-
-
This begins the process of exposing git_filter objects to the public API. This includes: * new public type and API for `git_buffer` through which an allocated buffer can be passed to the user * new API `git_blob_filtered_content` * make the git_filter type and GIT_FILTER_TO_... constants public
Russell Belfer committed
-
- 19 Aug, 2013 1 commit
-
-
When a git_buf contains a UTF-8 BOM, the three bytes comprising that BOM are treated as unprintable characters. For a small git_buf, the three BOM characters overwhelm the printable characters. This is problematic when trying to check out a small file as the CR/LF filtering will not apply.
Edward Thomson committed
-
- 31 Jul, 2013 1 commit
-
-
After doing further profiling, I found that a lot of time was being spent attempting to insert hashes into the file hash signature when using the rolling hash because the rolling hash approach generates a hash per byte of the file instead of one per run/line of data. To optimize this, I decided to convert back to a run-based file signature algorithm which would be more like core Git. After changing this, a number of the existing tests started to fail. In some cases, this appears to have been because the test was coded to be too specific to the particular results of the file similarity metric and in some cases there appear to have been bugs in the core rename detection code where only by the coincidence of the file similarity scoring were the expected results being generated. This renames all the variables in the core rename detection code to be more consistent and hopefully easier to follow which made it a bit easier to reason about the behavior of that code and fix the problems that I was seeing. I think it's in better shape now. There are a couple of tests now that attempt to stress test the rename detection code and they are quite slow. Most of the time is spent setting up the test data on disk and in the index. When we roll out performance improvements for index insertion, it should also speed up these tests I hope.
Russell Belfer committed
-
- 25 Mar, 2013 1 commit
-
-
This adds crlf/lf conversion functions into buf_text with more efficient implementations that bypass the high level buffer functions. They attempt to minimize the number of reallocations done and they directly write the buffer data as needed if they know that there is enough memory allocated to memcpy data. Tests are added for these new functions. The crlf.c code is updated to use the new functions. Removed the include of buf_text.h from filter.h and just include it more narrowly in the places that need it.
Russell Belfer committed
-
- 20 Feb, 2013 4 commits
-
-
This plugs in the three basic similarity strategies for handling whitespace via internal use of the pluggable API. In so doing, I realized that the use of git_buf in the hashsig API was not needed and actually just made it harder to use, so I tweaked that API as well. Note that the similarity metric is still not hooked up in the find_similarity code - this is just setting out the function that will be used.
Russell Belfer committed -
Seems to be working pretty well...
Russell Belfer committed -
This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.
Russell Belfer committed -
This adds a new `git_buf_text_hashsig` type and functions to generate these hash signatures and compare them to give a similarity score. This can be plugged into diff similarity scoring.
Russell Belfer committed
-
- 29 Jan, 2013 1 commit
-
-
Russell Belfer committed
-
- 11 Jan, 2013 1 commit
-
-
Core git just looks for NUL bytes in files when deciding about is-binary inside diff (although it uses a better algorithm in checkout, when deciding if CRLF conversion should be done). Libgit2 was using the better algorithm in both places, but that is causing some confusion. For now, this makes diff just look for NUL bytes to decide if a file is binary by content in diff.
Russell Belfer committed
-
- 28 Nov, 2012 1 commit
-
-
There are many scattered functions that look into the contents of buffers to do various text manipulations (such as escaping or unescaping data, calculating text stats, guessing if content is binary, etc). This groups all those functions together into a new file and converts the code to use that. This has two enhancements to existing functionality. The old text stats function is significantly rewritten and the BOM detection code was extended (although largely we can't deal with anything other than a UTF8 BOM).
Russell Belfer committed
-
- 10 Oct, 2012 1 commit
-
-
Russell Belfer committed
-
- 23 Aug, 2012 1 commit
-
-
Russell Belfer committed
-
- 24 Jul, 2012 1 commit
-
-
yorah committed
-
- 12 Jul, 2012 1 commit
-
-
Russell Belfer committed
-
- 11 Jul, 2012 1 commit
-
-
* `git_buf_rfind` (with tests and tests for `git_buf_rfind_next`) * `git_buf_puts_escaped` and `git_buf_puts_escaped_regex` (with tests) to copy strings into a buffer while injecting an escape sequence (e.g. '\') in front of particular characters.
Russell Belfer committed
-