Commits · 1a90b1e3f106139f75183ef21dd5b461d9d83f1d · lvzhengyang / git2

17 Sep, 2013 2 commits

Fix longstanding valgrind warning · e7d0ced2

There was a possible circumstance that could result in reading
past the end of a buffer.  This check fixes that.

committed 11 years ago

e7d0ced2 Browse File

Start of filter API + git_blob_filtered_content · 0cf77103

This begins the process of exposing git_filter objects to the
public API.  This includes:

* new public type and API for `git_buffer` through which an
  allocated buffer can be passed to the user
* new API `git_blob_filtered_content`
* make the git_filter type and GIT_FILTER_TO_... constants public

committed 11 years ago

0cf77103 Browse File

19 Aug, 2013 1 commit

Skip UTF-8 BOM in binary detection · c0b01b75

When a git_buf contains a UTF-8 BOM, the three bytes comprising
that BOM are treated as unprintable characters.  For a small git_buf,
the three BOM characters overwhelm the printable characters.  This
is problematic when trying to check out a small file as the CR/LF
filtering will not apply.

committed 11 years ago

c0b01b75 Browse File

14 Jul, 2013 2 commits
- Fix the initial line · b74d4478
  crazymaster committed 11 years ago
  
  b74d4478 Browse File
- Fix gather_stats · 6550565a
  crazymaster committed 11 years ago
  
  6550565a Browse File
25 Mar, 2013 1 commit

Move crlf conversion into buf_text · 3658e81e

This adds crlf/lf conversion functions into buf_text with more
efficient implementations that bypass the high level buffer
functions.  They attempt to minimize the number of reallocations
done and they directly write the buffer data as needed if they
know that there is enough memory allocated to memcpy data.

Tests are added for these new functions.  The crlf.c code is
updated to use the new functions.

Removed the include of buf_text.h from filter.h and just include
it more narrowly in the places that need it.

committed 11 years ago

3658e81e Browse File

20 Feb, 2013 3 commits

This moves the similarity metric code out of buf_text and into a
new file.  Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes.  In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.

committed 11 years ago

5e5848eb Browse File

Some similarity metric adjustments · f3327cac

This makes the text similarity metric treat \r as equivalent
to \n and makes it skip whitespace immediately following a line
terminator, so line indentation will have less effect on the
difference measurement (and so \r\n will be treated as just a
single line terminator).

This also separates the text and binary hash calculators into
two separate functions instead of have more if statements inside
the loop. This should make it easier to have more differentiated
heuristics in the future if we so wish.

committed 11 years ago

f3327cac Browse File

Initial implementation of similarity scoring algo · 9c454b00

This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score.  This can be plugged into diff similarity
scoring.

committed 11 years ago

9c454b00 Browse File

12 Jan, 2013 1 commit
- buf: Is this the function you were looking for? · 355dddbf
  Vicent Marti committed 12 years ago
  
  355dddbf Browse File
11 Jan, 2013 1 commit

Match binary file check of core git in diff · 0d65acad

Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.

committed 12 years ago

0d65acad Browse File

08 Jan, 2013 1 commit
- update copyrights · 359fc2d2
  Edward Thomson committed 12 years ago
  
  359fc2d2 Browse File
30 Nov, 2012 1 commit
- buf test: make sure we always set the bom variable · 9ff07c24
  Carlos Martín Nieto committed 12 years ago
  
  9ff07c24 Browse File
28 Nov, 2012 1 commit

Consolidate text buffer functions · 7bf87ab6

There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc).  This groups all those functions together into a
new file and converts the code to use that.

This has two enhancements to existing functionality.  The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).

committed 12 years ago

7bf87ab6 Browse File