Commits · 24e53d2fba1ea10c27c3b19f202dc92cabedf0ed · lvzhengyang / git2

20 Jan, 2015 1 commit

Make binary detection work similar to vanilla git · 0161e096

Main change: Don't treat chars > 128 as non-printable (common in UTF-8 files)

Signed-off-by: Sven Strickroth <email@cs-ware.de>

committed 10 years ago

0161e096 Browse File

21 Nov, 2014 1 commit
- buffer: Do not `put` anything if len is 0 · 92e0b679
  Vicent Marti committed 10 years ago
  
  92e0b679 Browse File
01 Oct, 2014 1 commit
- hashsig: Export as a `sys` header · 737b5051
  Vicent Marti committed 10 years ago
  
  737b5051 Browse File
15 Aug, 2014 1 commit
- Introduce git_buf_decode_base64 · e003f83a
```
Decode base64-encoded text into a git_buf
```
  Edward Thomson committed 10 years ago
  e003f83a Browse File
23 Jun, 2014 1 commit

crlf: pass-through mixed EOL buffers from LF->CRLF · 5a76ad35

When checking out files, we're performing conversion into the user's
native line endings, but we only want to do it for files which have
consistent line endings. Refuse to perform the conversion for mixed-EOL
files.

The CRLF->LF filter is left as-is, as that conversion is considered to be
normalization by git and should force a conversion of the line endings.

committed 10 years ago

5a76ad35 Browse File

23 Apr, 2014 1 commit
- patch: emit binary patches (optionally) · e349ed50
  Edward Thomson committed 10 years ago
  
  e349ed50 Browse File
01 Apr, 2014 1 commit

Add efficient git_buf join3 API · 18234b14

There are a few places where we need to join three strings to
assemble a path.  This adds a simple join3 function to avoid the
comparatively expensive join_n (which calls strlen on each string
twice).

committed 10 years ago

18234b14 Browse File

20 Jan, 2014 1 commit
- add unit tests for git_buf_join corner cases · abdaf936
  Patrick Reynolds committed 11 years ago
  
  abdaf936 Browse File
14 Nov, 2013 1 commit
- Rename tests-clar to tests · 17820381
  Ben Straub committed 11 years ago
  
  17820381 Browse File
17 Sep, 2013 1 commit

Start of filter API + git_blob_filtered_content · 0cf77103

This begins the process of exposing git_filter objects to the
public API.  This includes:

* new public type and API for `git_buffer` through which an
  allocated buffer can be passed to the user
* new API `git_blob_filtered_content`
* make the git_filter type and GIT_FILTER_TO_... constants public

committed 11 years ago

0cf77103 Browse Directory

19 Aug, 2013 1 commit

Skip UTF-8 BOM in binary detection · c0b01b75

When a git_buf contains a UTF-8 BOM, the three bytes comprising
that BOM are treated as unprintable characters.  For a small git_buf,
the three BOM characters overwhelm the printable characters.  This
is problematic when trying to check out a small file as the CR/LF
filtering will not apply.

committed 11 years ago

c0b01b75 Browse Directory

31 Jul, 2013 1 commit

Major rename detection changes · d730d3f4

After doing further profiling, I found that a lot of time was
being spent attempting to insert hashes into the file hash
signature when using the rolling hash because the rolling hash
approach generates a hash per byte of the file instead of one
per run/line of data.

To optimize this, I decided to convert back to a run-based file
signature algorithm which would be more like core Git.

After changing this, a number of the existing tests started to
fail.  In some cases, this appears to have been because the test
was coded to be too specific to the particular results of the file
similarity metric and in some cases there appear to have been bugs
in the core rename detection code where only by the coincidence
of the file similarity scoring were the expected results being
generated.

This renames all the variables in the core rename detection code
to be more consistent and hopefully easier to follow which made it
a bit easier to reason about the behavior of that code and fix the
problems that I was seeing.  I think it's in better shape now.

There are a couple of tests now that attempt to stress test the
rename detection code and they are quite slow.  Most of the time
is spent setting up the test data on disk and in the index.  When
we roll out performance improvements for index insertion, it
should also speed up these tests I hope.

committed 11 years ago

d730d3f4 Browse Directory

25 Mar, 2013 1 commit

Move crlf conversion into buf_text · 3658e81e

This adds crlf/lf conversion functions into buf_text with more
efficient implementations that bypass the high level buffer
functions.  They attempt to minimize the number of reallocations
done and they directly write the buffer data as needed if they
know that there is enough memory allocated to memcpy data.

Tests are added for these new functions.  The crlf.c code is
updated to use the new functions.

Removed the include of buf_text.h from filter.h and just include
it more narrowly in the places that need it.

committed 11 years ago

3658e81e Browse Directory

20 Feb, 2013 4 commits

Refine pluggable similarity API · 9bc8be3d

This plugs in the three basic similarity strategies for handling
whitespace via internal use of the pluggable API.  In so doing, I
realized that the use of git_buf in the hashsig API was not needed
and actually just made it harder to use, so I tweaked that API as
well.

Note that the similarity metric is still not hooked up in the
find_similarity code - this is just setting out the function that
will be used.

committed 11 years ago

9bc8be3d Browse Directory

More tests of file signatures with whitespace opts · aa643260
```
Seems to be working pretty well...
```
Russell Belfer committed 11 years ago
aa643260 Browse Directory

This moves the similarity metric code out of buf_text and into a
new file.  Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes.  In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.

committed 11 years ago

5e5848eb Browse Directory

Initial implementation of similarity scoring algo · 9c454b00

This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score.  This can be plugged into diff similarity
scoring.

committed 11 years ago

9c454b00 Browse Directory

29 Jan, 2013 1 commit
- Test buf join with NULL behavior explicitly · 17c92bea
  Russell Belfer committed 12 years ago
  
  17c92bea Browse Directory
11 Jan, 2013 1 commit

Match binary file check of core git in diff · 0d65acad

Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.

committed 12 years ago

0d65acad Browse Directory

28 Nov, 2012 1 commit

Consolidate text buffer functions · 7bf87ab6

There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc).  This groups all those functions together into a
new file and converts the code to use that.

This has two enhancements to existing functionality.  The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).

committed 12 years ago

7bf87ab6 Browse Directory

10 Oct, 2012 1 commit
- Add git_buf_put_base64 to buffer API · 2d3579be
  Russell Belfer committed 12 years ago
  
  2d3579be Browse Directory
23 Aug, 2012 1 commit
- Fix warnings and merge issues on Win64 · e9ca852e
  Russell Belfer committed 12 years ago
  
  e9ca852e Browse Directory
24 Jul, 2012 1 commit
- Add git_buf_unescape and git__unescape to unescape all characters in a string (in-place) · 02a0d651
  yorah committed 12 years ago
  
  02a0d651 Browse Directory
12 Jul, 2012 1 commit
- Fix memory leak in test · 465092ce
  Russell Belfer committed 12 years ago
  
  465092ce Browse Directory
11 Jul, 2012 1 commit

Add a couple of useful git_buf utilities · 039fc406

* `git_buf_rfind` (with tests and tests for `git_buf_rfind_next`)
* `git_buf_puts_escaped` and `git_buf_puts_escaped_regex` (with tests)
  to copy strings into a buffer while injecting an escape sequence
  (e.g. '\') in front of particular characters.

committed 12 years ago

039fc406 Browse Directory

15 May, 2012 1 commit

Ranged iterators and rewritten git_status_file · 41a82592

The goal of this work is to rewrite git_status_file to use the
same underlying code as git_status_foreach.

This is done in 3 phases:

1. Extend iterators to allow ranged iteration with start and
   end prefixes for the range of file names to be covered.
2. Improve diff so that when there is a pathspec and there is
   a common non-wildcard prefix of the pathspec, it will use
   ranged iterators to minimize excess iteration.
3. Rewrite git_status_file to call git_status_foreach_ext
   with a pathspec that covers just the one file being checked.

Since ranged iterators underlie the status & diff implementation,
this is actually fairly efficient.  The workdir iterator does
end up loading the contents of all the directories down to the
single file, which should ideally be avoided, but it is pretty
good.

committed 12 years ago

41a82592 Browse Directory

17 Apr, 2012 1 commit

Update clar and remove old helpers · 1a6e8f8a

This updates to the latest clar which includes the helpers
`cl_assert_equal_s` and `cl_assert_equal_i`.  Convert the code
over to use those and remove the old libgit2-only helpers.

committed 12 years ago

1a6e8f8a Browse Directory

21 Mar, 2012 1 commit
- Convert reflog to new errors · a4c291ef
```
Cleaned up some other issues.
```
  Russell Belfer committed 12 years ago
  a4c291ef Browse Directory
27 Feb, 2012 1 commit

buffer: Unify `git_fbuffer` and `git_buf` · 13224ea4

This makes so much sense that I can't believe it hasn't been done
before. Kill the old `git_fbuffer` and read files straight into
`git_buf` objects.

Also: In order to fully support 4GB files in 32-bit systems, the
`git_buf` implementation has been changed from using `ssize_t` for
storage and storing negative values on allocation failure, to using
`size_t` and changing the buffer pointer to a magical pointer on
allocation failure.

Hopefully this won't break anything.

committed 12 years ago

13224ea4 Browse Directory

25 Jan, 2012 1 commit

Rename the Clay test suite to Clar · 3fd1520c

Clay is the name of a programming language on the makings, and we want
to avoid confusions. Sorry for the huge diff!

committed 13 years ago

3fd1520c Browse Directory

08 Dec, 2011 1 commit

Use git_buf for path storage instead of stack-based buffers · 97769280

This converts virtually all of the places that allocate GIT_PATH_MAX
buffers on the stack for manipulating paths to use git_buf objects
instead. The patch is pretty careful not to touch the public API
for libgit2, so there are a few places that still use GIT_PATH_MAX.

This extends and changes some details of the git_buf implementation
to add a couple of extra functions and to make error handling easier.

This includes serious alterations to all the path.c functions, and
several of the fileops.c ones, too. Also, there are a number of new
functions that parallel existing ones except that use a git_buf
instead of a stack-based buffer (such as git_config_find_global_r
that exists alongsize git_config_find_global).

This also modifies the win32 version of p_realpath to allocate whatever
buffer size is needed to accommodate the realpath instead of hardcoding
a GIT_PATH_MAX limit, but that change needs to be tested still.

committed 13 years ago

97769280 Browse Directory

30 Nov, 2011 4 commits

Optimized of git_buf_join. · 969d588d

This streamlines git_buf_join and removes the join-append behavior,
opting instead for a very compact join-replace of the git_buf contents.
The unit tests had to be updated to remove the join-append tests and
have a bunch more exhaustive tests added.

committed 13 years ago

969d588d Browse Directory

Make initial value of git_buf ptr always be a valid empty string. · 309113c9

Taking a page from core git's strbuf, this introduces git_buf_initbuf
which is an empty string that is used to initialize the git_buf ptr
value even for new buffers. Now the git_buf ptr will always point to
a valid NUL-terminated string.

This change required jumping through a few hoops for git_buf_grow
and git_buf_free to distinguish between a actual allocated buffer
and the global initial value. Also, this moves the allocation
related functions to be next to each other near the top of buffer.c.

committed 13 years ago

309113c9 Browse Directory

Adding unit tests for git_buf_copy_cstr · 7df41387
Russell Belfer committed 13 years ago

7df41387 Browse Directory

Make git_buf functions always maintain a valid cstr. · c63728cd

At a tiny cost of 1 extra byte per allocation, this makes
git_buf_cstr into basically a noop, which simplifies error
checking when trying to convert things to use dynamic allocation.

This patch also adds a new function (git_buf_copy_cstr) for copying
the cstr data directly into an external buffer.

committed 13 years ago

c63728cd Browse Directory

28 Nov, 2011 3 commits

Resolve remaining feedback · 679b69c4
```
* replace some ints with size_ts
* update NULL checks in various places
```
Russell Belfer committed 13 years ago
679b69c4 Browse Directory

Add two string git_buf_join and tweak input error checking. · 3aa294fd

This commit addresses two of the comments:
* renamed existing n-input git_buf_join to git_buf_join_n
* added new git_buf_join that always takes two inputs
* moved some parameter error checking to asserts
* extended unit tests to cover new version of git_buf_join

committed 13 years ago

3aa294fd Browse Directory

Extend git_buf with new utility functions and unit tests. · 8c74d22e

Add new functions to git_buf for:
* initializing a buffer from a string
* joining one or more strings onto a buffer with separators
* swapping two buffers in place
* extracting data from a git_buf (leaving it empty)

Also, make git_buf_free leave a git_buf back in its initted state,
and slightly tweak buffer allocation sizes and thresholds.

Finally, port unit tests to clay and extend with lots of new tests
for the various git_buf functions.

committed 13 years ago

8c74d22e Browse Directory