1. 20 Feb, 2013 4 commits
    • Refine pluggable similarity API · 9bc8be3d
      This plugs in the three basic similarity strategies for handling
      whitespace via internal use of the pluggable API.  In so doing, I
      realized that the use of git_buf in the hashsig API was not needed
      and actually just made it harder to use, so I tweaked that API as
      well.
      
      Note that the similarity metric is still not hooked up in the
      find_similarity code - this is just setting out the function that
      will be used.
      Russell Belfer committed
    • More tests of file signatures with whitespace opts · aa643260
      Seems to be working pretty well...
      Russell Belfer committed
    • Change similarity metric to sampled hashes · 5e5848eb
      This moves the similarity metric code out of buf_text and into a
      new file.  Also, this implements a different approach to similarity
      measurement based on a Rabin-Karp rolling hash where we only keep
      the top 100 and bottom 100 hashes.  In theory, that should be
      sufficient samples to given a fairly accurate measurement while
      limiting the amount of data we keep for file signatures no matter
      how large the file is.
      Russell Belfer committed
    • Initial implementation of similarity scoring algo · 9c454b00
      This adds a new `git_buf_text_hashsig` type and functions to
      generate these hash signatures and compare them to give a
      similarity score.  This can be plugged into diff similarity
      scoring.
      Russell Belfer committed
  2. 29 Jan, 2013 1 commit
  3. 11 Jan, 2013 1 commit
    • Match binary file check of core git in diff · 0d65acad
      Core git just looks for NUL bytes in files when deciding about
      is-binary inside diff (although it uses a better algorithm in
      checkout, when deciding if CRLF conversion should be done).
      Libgit2 was using the better algorithm in both places, but that
      is causing some confusion.  For now, this makes diff just look
      for NUL bytes to decide if a file is binary by content in diff.
      Russell Belfer committed
  4. 28 Nov, 2012 1 commit
    • Consolidate text buffer functions · 7bf87ab6
      There are many scattered functions that look into the contents of
      buffers to do various text manipulations (such as escaping or
      unescaping data, calculating text stats, guessing if content is
      binary, etc).  This groups all those functions together into a
      new file and converts the code to use that.
      
      This has two enhancements to existing functionality.  The old
      text stats function is significantly rewritten and the BOM
      detection code was extended (although largely we can't deal with
      anything other than a UTF8 BOM).
      Russell Belfer committed
  5. 10 Oct, 2012 1 commit
  6. 23 Aug, 2012 1 commit
  7. 24 Jul, 2012 1 commit
  8. 12 Jul, 2012 1 commit
  9. 11 Jul, 2012 1 commit
    • Add a couple of useful git_buf utilities · 039fc406
      * `git_buf_rfind` (with tests and tests for `git_buf_rfind_next`)
      * `git_buf_puts_escaped` and `git_buf_puts_escaped_regex` (with tests)
        to copy strings into a buffer while injecting an escape sequence
        (e.g. '\') in front of particular characters.
      Russell Belfer committed
  10. 15 May, 2012 1 commit
    • Ranged iterators and rewritten git_status_file · 41a82592
      The goal of this work is to rewrite git_status_file to use the
      same underlying code as git_status_foreach.
      
      This is done in 3 phases:
      
      1. Extend iterators to allow ranged iteration with start and
         end prefixes for the range of file names to be covered.
      2. Improve diff so that when there is a pathspec and there is
         a common non-wildcard prefix of the pathspec, it will use
         ranged iterators to minimize excess iteration.
      3. Rewrite git_status_file to call git_status_foreach_ext
         with a pathspec that covers just the one file being checked.
      
      Since ranged iterators underlie the status & diff implementation,
      this is actually fairly efficient.  The workdir iterator does
      end up loading the contents of all the directories down to the
      single file, which should ideally be avoided, but it is pretty
      good.
      Russell Belfer committed
  11. 17 Apr, 2012 1 commit
  12. 21 Mar, 2012 1 commit
  13. 27 Feb, 2012 1 commit
    • buffer: Unify `git_fbuffer` and `git_buf` · 13224ea4
      This makes so much sense that I can't believe it hasn't been done
      before. Kill the old `git_fbuffer` and read files straight into
      `git_buf` objects.
      
      Also: In order to fully support 4GB files in 32-bit systems, the
      `git_buf` implementation has been changed from using `ssize_t` for
      storage and storing negative values on allocation failure, to using
      `size_t` and changing the buffer pointer to a magical pointer on
      allocation failure.
      
      Hopefully this won't break anything.
      Vicent Martí committed
  14. 25 Jan, 2012 1 commit
  15. 08 Dec, 2011 1 commit
    • Use git_buf for path storage instead of stack-based buffers · 97769280
      This converts virtually all of the places that allocate GIT_PATH_MAX
      buffers on the stack for manipulating paths to use git_buf objects
      instead.  The patch is pretty careful not to touch the public API
      for libgit2, so there are a few places that still use GIT_PATH_MAX.
      
      This extends and changes some details of the git_buf implementation
      to add a couple of extra functions and to make error handling easier.
      
      This includes serious alterations to all the path.c functions, and
      several of the fileops.c ones, too.  Also, there are a number of new
      functions that parallel existing ones except that use a git_buf
      instead of a stack-based buffer (such as git_config_find_global_r
      that exists alongsize git_config_find_global).
      
      This also modifies the win32 version of p_realpath to allocate whatever
      buffer size is needed to accommodate the realpath instead of hardcoding
      a GIT_PATH_MAX limit, but that change needs to be tested still.
      Russell Belfer committed
  16. 30 Nov, 2011 4 commits
  17. 28 Nov, 2011 3 commits