1. 01 Feb, 2018 1 commit
  2. 03 Jan, 2018 1 commit
    • diff_generate: avoid excessive stats of .gitattribute files · d8896bda
      When generating a diff between two trees, for each file that is to be
      diffed we have to determine whether it shall be treated as text or as
      binary files. While git has heuristics to determine which kind of diff
      to generate, users can also that default behaviour by setting or
      unsetting the 'diff' attribute for specific files.
      
      Because of that, we have to query gitattributes in order to determine
      how to diff the current files. Instead of hitting the '.gitattributes'
      file every time we need to query an attribute, which can get expensive
      especially on networked file systems, we try to cache them instead. This
      works perfectly fine for every '.gitattributes' file that is found, but
      we hit cache invalidation problems when we determine that an attribuse
      file is _not_ existing. We do create an entry in the cache for missing
      '.gitattributes' files, but as soon as we hit that file again we
      invalidate it and stat it again to see if it has now appeared.
      
      In the case of diffing large trees with each other, this behaviour is
      very suboptimal. For each pair of files that is to be diffed, we will
      repeatedly query every directory component leading towards their
      respective location for an attributes file. This leads to thousands or
      even hundreds of thousands of wasted syscalls.
      
      The attributes cache already has a mechanism to help in that scenario in
      form of the `git_attr_session`. As long as the same attributes session
      is still active, we will not try to re-query the gitmodules files at all
      but simply retain our currently cached results. To fix our problem, we
      can create a session at the top-most level, which is the initialization
      of the `git_diff` structure, and use it in order to look up the correct
      diff driver. As the `git_diff` structure is used to generate patches for
      multiple files at once, this neatly solves our problem by retaining the
      session until patches for all files have been generated.
      
      The fix has been tested with linux.git by calling
      `git_diff_tree_to_tree` and `git_diff_to_buf` with v4.10^{tree} and
      v4.14^{tree}.
      
                      | time    | .gitattributes stats
          without fix | 33.201s | 844614
          with fix    | 30.327s | 4441
      
      While execution only improved by roughly 10%, the stat(3) syscalls for
      .gitattributes files decreased by 99.5%. The benchmarks were quite
      simple with best-of-three timings on Linux ext4 systems. One can assume
      that for network based file systems the performance gain will be a lot
      larger due to a much higher latency.
      Patrick Steinhardt committed
  3. 03 Jul, 2017 1 commit
    • Make sure to always include "common.h" first · 0c7f49dd
      Next to including several files, our "common.h" header also declares
      various macros which are then used throughout the project. As such, we
      have to make sure to always include this file first in all
      implementation files. Otherwise, we might encounter problems or even
      silent behavioural differences due to macros or defines not being
      defined as they should be. So in fact, our header and implementation
      files should make sure to always include "common.h" first.
      
      This commit does so by establishing a common include pattern. Header
      files inside of "src" will now always include "common.h" as its first
      other file, separated by a newline from all the other includes to make
      it stand out as special. There are two cases for the implementation
      files. If they do have a matching header file, they will always include
      this one first, leading to "common.h" being transitively included as
      first file. If they do not have a matching header file, they instead
      include "common.h" as first file themselves.
      
      This fixes the outlined problems and will become our standard practice
      for header and source files inside of the "src/" from now on.
      Patrick Steinhardt committed
  4. 24 Aug, 2016 1 commit
  5. 26 May, 2016 2 commits
  6. 23 Nov, 2015 1 commit
    • checkout: only consider nsecs when built that way · 25e84f95
      When examining the working directory and determining whether it's
      up-to-date, only consider the nanoseconds in the index entry when
      built with `GIT_USE_NSEC`.  This prevents us from believing that
      the working directory is always dirty when the index was originally
      written with a git client that uinderstands nsecs (like git 2.x).
      Edward Thomson committed
  7. 26 Jun, 2015 1 commit
  8. 23 Jun, 2015 2 commits
    • stash: save the workdir file when deleted in index · 90177111
      When stashing the workdir tree, examine the index as well.  Using
      a mechanism similar to `git_diff_tree_to_workdir_with_index`
      allows us to determine that a file was added in the index and
      subsequently modified in the working directory.  Without examining
      the index, we would erroneously believe that this file was
      untracked and fail to include it in the working directory tree.
      
      Use a slightly modified `git_diff_tree_to_workdir_with_index` in
      order to avoid some of the behavior custom to `git diff`.  In
      particular, be sure to include the working directory side of a
      file when it was deleted in the index.
      Edward Thomson committed
  9. 20 Jun, 2015 1 commit
  10. 02 May, 2014 5 commits
  11. 15 Apr, 2014 1 commit
  12. 25 Jan, 2014 1 commit
  13. 01 Nov, 2013 1 commit
    • Fix --assume-unchanged support · 3e57069e
      This was never really working right because we were checking the
      wrong flag and not checking it in all the places that we need to
      be checking it.  I finally got around to writing a test and adding
      actual support for it.
      Russell Belfer committed
  14. 11 Oct, 2013 1 commit
  15. 25 Jul, 2013 1 commit
    • Make rename detection file size fix better · effdbeb3
      The previous fix for checking file sizes with rename detection
      always loads the blob.  In this version, if the odb backend can
      get the object header without loading the whole thing into memory,
      then we'll just use that, so that we can eliminate possible rename
      sources & targets without loading them.
      Russell Belfer committed
  16. 23 Jul, 2013 1 commit
    • Add hunk/file headers to git_diff_patch_size · 197b8966
      This allows git_diff_patch_size to account for hunk headers and
      file headers in the returned size.  This required some refactoring
      of the code that is used to print file headers so that it could be
      invoked by the git_diff_patch_size API.
      
      Also this increases the test coverage and fixes an off-by-one bug
      in the size calculation when newline changes happen at the end of
      the file.
      Russell Belfer committed
  17. 10 Jul, 2013 1 commit
    • Add git_pathspec_match_diff API · 2b672d5b
      This adds an additional pathspec API that will match a pathspec
      against a diff object.  This is convenient if you want to handle
      renames (so you need the whole diff and can't use the pathspec
      constraint built into the diff API) but still want to tell if the
      diff had any files that matched the pathspec.
      
      When the pathspec is matched against a diff, instead of keeping
      a list of filenames that matched, instead the API keeps the list
      of git_diff_deltas that matched and they can be retrieved via a
      new API git_pathspec_match_list_diff_entry.
      
      There are a couple of other minor API extensions here that were
      mostly for the sake of convenience and to reduce dependencies
      on knowing the internal data structure between files inside the
      library.
      Russell Belfer committed
  18. 17 Jun, 2013 2 commits
    • More tests and bug fixes for status with rename · a1683f28
      This changes the behavior of the status RENAMED flags so that they
      will be combined with the MODIFIED flags if appropriate.  If a file
      is modified in the index and also renamed, then the status code
      will have both the GIT_STATUS_INDEX_MODIFIED and INDEX_RENAMED bits
      set.  If it is renamed but the OID has not changed, then just the
      GIT_STATUS_INDEX_RENAMED bit will be set.  Similarly, the flags
      GIT_STATUS_WT_MODIFIED and GIT_STATUS_WT_RENAMED can both be set
      independently of one another.
      
      This fixes a serious bug where the check for unmodified files that
      was done at data load time could end up erasing the RENAMED state
      of a file that was renamed with no changes.
      
      Lastly, this contains a bunch of new tests for status with renames,
      including tests where the only rename changes are case changes.
      The expected results of these tests have to vary by whether the
      platform uses a case sensitive filesystem or not, so the expected
      data covers those platform differences separately.
      Russell Belfer committed
    • Improve case handling in git_diff__paired_foreach · 351888cf
      This commit reinstates some changes to git_diff__paired_foreach
      that were discarded during the rebase (because the diff_output.c
      file had gone away), and also adjusts the case insensitively
      logic slightly to hopefully deal with either mismatched icase
      diffs and other case insensitivity scenarios.
      Russell Belfer committed
  19. 10 Jun, 2013 1 commit
    • Reorganize diff and add basic diff driver · 114f5a6c
      This is a significant reorganization of the diff code to break it
      into a set of more clearly distinct files and to document the new
      organization.  Hopefully this will make the diff code easier to
      understand and to extend.
      
      This adds a new `git_diff_driver` object that looks of diff driver
      information from the attributes and the config so that things like
      function content in diff headers can be provided.  The full driver
      spec is not implemented in the commit - this is focused on the
      reorganization of the code and putting the driver hooks in place.
      
      This also removes a few #includes from src/repository.h that were
      overbroad, but as a result required extra #includes in a variety
      of places since including src/repository.h no longer results in
      pulling in the whole world.
      Russell Belfer committed
  20. 23 May, 2013 1 commit
    • More diff rename tests; better split swap handling · 67db583d
      This adds a couple more tests of different rename scenarios.
      
      Also, this fixes a problem with the case where you have two
      "split" deltas and the left half of one matches the right half of
      the other.  That case was already being handled, but in the wrong
      order in a way that could result in bad output.  Also, if the swap
      also happened to put the other two halves into the correct place
      (i.e. two files exchanged places with each other), then the second
      delta was left with the SPLIT flag set when it really should be
      cleared.
      Russell Belfer committed
  21. 22 May, 2013 1 commit
    • Significant rename detection rewrite · a21cbb12
      This flips rename detection around so instead of creating a
      forward mapping from deltas to possible rename targets, instead
      it creates a reverse mapping, looking at possible targets and
      trying to find a source that they could have been renamed or
      copied from.  This is important because each output can only
      have a single source, but a given source could map to multiple
      outputs (in the form of COPIED records).
      
      Additionally, this makes a couple of tweaks to the public rename
      detection APIs, mostly renaming a couple of options that control
      the behavior to make more sense and to be more like core Git.
      
      I walked through the tests looking at the exact results and
      updated the expectations based on what I saw.  The new code is
      different from the old because it cannot give some nonsense
      results (like A was renamed to both B and C) which were part of
      the outputs previously.
      Russell Belfer committed
  22. 07 May, 2013 1 commit
    • Add GIT_DIFF_LINE_CONTEXT_EOFNL · e35e2684
      This adds a new line origin constant for the special line that
      is used when both files end without a newline.
      
      In the course of writing the tests for this, I was having problems
      with modifying a file but not having diff notice because it was
      the same size and modified less than one second from the start of
      the test, so I decided to start working on nanosecond timestamp
      support.  This commit doesn't contain the nanosecond support, but
      it contains the reorganization of maybe_modified and the hooks so
      that if the nanosecond data were being read by stat() (or rather
      being copied by git_index_entry__init_from_stat), then the nsec
      would be taken into account.
      
      This new stuff could probably use some more tests, although there
      is some amount of it here.
      Russell Belfer committed
  23. 30 Apr, 2013 1 commit
  24. 20 Feb, 2013 1 commit
    • Replace diff delta binary with flags · 71a3d27e
      Previously the git_diff_delta recorded if the delta was binary.
      This replaces that (with no net change in structure size) with
      a full set of flags.  The flag values that were already in use
      for individual git_diff_file objects are reused for the delta
      flags, too (along with renaming those flags to make it clear that
      they are used more generally).
      
      This (a) makes things somewhat more consistent (because I was
      using a -1 value in the "boolean" binary field to indicate unset,
      whereas now I can just use the flags that are easier to understand),
      and (b) will make it easier for me to add some additional flags to
      the delta object in the future, such as marking the results of a
      copy/rename detection or other deltas that might want a special
      indicator.
      
      While making this change, I officially moved some of the flags that
      were internal only into the private diff header.
      
      This also allowed me to remove a gross hack in rename/copy detect
      code where I was overwriting the status field with an internal
      value.
      Russell Belfer committed
  25. 08 Jan, 2013 1 commit
  26. 10 Dec, 2012 1 commit
    • Clean up iterator APIs · 9950d27a
      This removes the need to explicitly pass the repo into iterators
      where the repo is implied by the other parameters.  This moves
      the repo to be owned by the parent struct.  Also, this has some
      iterator related updates to the internal diff API to lay the
      groundwork for checkout improvements.
      Russell Belfer committed
  27. 01 Dec, 2012 1 commit
  28. 30 Nov, 2012 1 commit
  29. 15 Nov, 2012 2 commits
  30. 09 Nov, 2012 2 commits
    • Fix various cross-platform build issues · 0f3def71
      This fixes a number of warnings and problems with cross-platform
      builds.  Among other things, it's not safe to name a member of a
      structure "strcmp" because that may be #defined.
      Russell Belfer committed
    • Some diff refactorings to help code reuse · 55cbd05b
      There are some diff functions that are useful in a rewritten
      checkout and this lays some groundwork for that.  This contains
      three main things:
      
      1. Share the function diff uses to calculate the OID for a file
         in the working directory (now named `git_diff__oid_for_file`
      2. Add a `git_diff__paired_foreach` function to iterator over
         two diff lists concurrently.  Convert status to use it.
      3. Move all the string/prefix/index entry comparisons into
         function pointers inside the `git_diff_list` object so they
         can be switched between case sensitive and insensitive
         versions.  This makes them easier to reuse in various
         functions without replicating logic.  As part of this, move
         a couple of index functions out of diff.c and into index.c.
      Russell Belfer committed
  31. 30 Oct, 2012 1 commit
    • Move rename detection into new file · db106d01
      This improves the naming for the rename related functionality
      moving it to be called `git_diff_find_similar()` and renaming
      all the associated constants, etc. to make more sense.
      
      I also moved the new code (plus the existing `git_diff_merge`)
      into a new file `diff_tform.c` where I can put new functions
      related to manipulating git diff lists.
      
      This also updates the implementation significantly from the
      last revision fixing some ordering issues (where break-rewrite
      needs to be handled prior to copy and rename detection) and
      improving config option handling.
      Russell Belfer committed