1. 13 Feb, 2015 3 commits
  2. 20 Jan, 2015 1 commit
  3. 21 Nov, 2014 1 commit
  4. 16 Jul, 2014 2 commits
  5. 23 Jun, 2014 1 commit
    • crlf: pass-through mixed EOL buffers from LF->CRLF · 5a76ad35
      When checking out files, we're performing conversion into the user's
      native line endings, but we only want to do it for files which have
      consistent line endings. Refuse to perform the conversion for mixed-EOL
      files.
      
      The CRLF->LF filter is left as-is, as that conversion is considered to be
      normalization by git and should force a conversion of the line endings.
      Carlos Martín Nieto committed
  6. 17 Sep, 2013 2 commits
  7. 19 Aug, 2013 1 commit
    • Skip UTF-8 BOM in binary detection · c0b01b75
      When a git_buf contains a UTF-8 BOM, the three bytes comprising
      that BOM are treated as unprintable characters.  For a small git_buf,
      the three BOM characters overwhelm the printable characters.  This
      is problematic when trying to check out a small file as the CR/LF
      filtering will not apply.
      Edward Thomson committed
  8. 14 Jul, 2013 2 commits
  9. 25 Mar, 2013 1 commit
    • Move crlf conversion into buf_text · 3658e81e
      This adds crlf/lf conversion functions into buf_text with more
      efficient implementations that bypass the high level buffer
      functions.  They attempt to minimize the number of reallocations
      done and they directly write the buffer data as needed if they
      know that there is enough memory allocated to memcpy data.
      
      Tests are added for these new functions.  The crlf.c code is
      updated to use the new functions.
      
      Removed the include of buf_text.h from filter.h and just include
      it more narrowly in the places that need it.
      Russell Belfer committed
  10. 20 Feb, 2013 3 commits
    • Change similarity metric to sampled hashes · 5e5848eb
      This moves the similarity metric code out of buf_text and into a
      new file.  Also, this implements a different approach to similarity
      measurement based on a Rabin-Karp rolling hash where we only keep
      the top 100 and bottom 100 hashes.  In theory, that should be
      sufficient samples to given a fairly accurate measurement while
      limiting the amount of data we keep for file signatures no matter
      how large the file is.
      Russell Belfer committed
    • Some similarity metric adjustments · f3327cac
      This makes the text similarity metric treat \r as equivalent
      to \n and makes it skip whitespace immediately following a line
      terminator, so line indentation will have less effect on the
      difference measurement (and so \r\n will be treated as just a
      single line terminator).
      
      This also separates the text and binary hash calculators into
      two separate functions instead of have more if statements inside
      the loop. This should make it easier to have more differentiated
      heuristics in the future if we so wish.
      Russell Belfer committed
    • Initial implementation of similarity scoring algo · 9c454b00
      This adds a new `git_buf_text_hashsig` type and functions to
      generate these hash signatures and compare them to give a
      similarity score.  This can be plugged into diff similarity
      scoring.
      Russell Belfer committed
  11. 12 Jan, 2013 1 commit
  12. 11 Jan, 2013 1 commit
    • Match binary file check of core git in diff · 0d65acad
      Core git just looks for NUL bytes in files when deciding about
      is-binary inside diff (although it uses a better algorithm in
      checkout, when deciding if CRLF conversion should be done).
      Libgit2 was using the better algorithm in both places, but that
      is causing some confusion.  For now, this makes diff just look
      for NUL bytes to decide if a file is binary by content in diff.
      Russell Belfer committed
  13. 08 Jan, 2013 1 commit
  14. 30 Nov, 2012 1 commit
  15. 28 Nov, 2012 1 commit
    • Consolidate text buffer functions · 7bf87ab6
      There are many scattered functions that look into the contents of
      buffers to do various text manipulations (such as escaping or
      unescaping data, calculating text stats, guessing if content is
      binary, etc).  This groups all those functions together into a
      new file and converts the code to use that.
      
      This has two enhancements to existing functionality.  The old
      text stats function is significantly rewritten and the BOM
      detection code was extended (although largely we can't deal with
      anything other than a UTF8 BOM).
      Russell Belfer committed