1. 16 Aug, 2017 2 commits
    • cmake: move regcomp and futimens checks to "features.h" · 8341d6cf
      In our CMakeLists.txt, we have to check multiple functions in order to
      determine if we have to use our own or whether we can use the
      platform-provided one. For two of these functions, namely `regcomp_l()`
      and `futimens`, the defined macro is actually used inside of the header
      file "src/unix/posix.h". As such, these macros are not only required by
      the library, but also by our test suite, which is makes use of internal
      headers.
      
      To prepare for the CMakeLists.txt split, move these two defines inside
      of the "features.h" header.
      Patrick Steinhardt committed
    • cmake: move defines into "features.h" header · a390a846
      In a future commit, we will split out the build instructions for our
      library directory and move them into a subdirectory. One of the benefits
      is fixing scoping issues, where e.g. defines do not leak to build
      targets where they do not belong to. But unfortunately, this does also
      pose the problem of how to propagate some defines which are required by
      both the library and the test suite.
      
      One way would be to create another variable keeping track of all added
      defines and declare it inside of the parent scope. While this is the
      most obvious and simplest way of going ahead, it is kind of unfortunate.
      The main reason to not use this is that these defines become implicit
      dependencies between the build targets. By simply observing a define
      inside of the CMakeLists.txt file, one cannot reason whether this define
      is only required by the current target or whether it is required by
      different targets, as well.
      
      Another approach would be to use an internal header file keeping track
      of all defines shared between targets. While configuring the library, we
      will set various variables and let CMake configure the file, adding or
      removing defines based on what has been configured. Like this, one can
      easily keep track of the current environment by simply inspecting the
      header file. Furthermore, these dependencies are becoming clear inside
      the CMakeLists.txt, as instead of simply adding a define, we now call
      e.g. `SET(GIT_THREADSAFE 1)`.
      
      Having this header file though requires us to make sure it is always
      included before any "#ifdef"-preprocessor checks are executed. As we
      have already refactored code to always include the "common.h" header
      file before any statement inside of a file, this becomes easy: just make
      sure "common.h" includes the new "features.h" header file first.
      Patrick Steinhardt committed
  2. 09 Aug, 2017 3 commits
    • oid: use memcmp in git_oid__hashcmp · c9b1e646
      The open-coded version was inherited from git.git. But it
      turns out it was based on an older version of glibc, whose
      memcmp was not very optimized.
      
      Modern glibc does much better, and some compilers (like gcc
      7) can even inline the memcmp into a series of multi-byte
      xors.
      
      Upstream is switching to using memcmp in
      git/git@0b006014c87f400bd9a86267ed30fd3e7b383884.
      Jeff King committed
    • sha1_lookup: drop sha1_entry_pos function · 9842b327
      This was pulled over from git.git, and is an experiment in
      making binary-searching lists of sha1s faster. It was never
      compiled by default (nor was it used upstream by default
      without a special environment variable).
      
      Unfortunately, it is actually slower in practice, and
      upstream is planning to drop it in
      git/git@f1068efefe6dd3beaa89484db5e2db730b094e0b (which has
      some timing results). It's worth doing the same here for
      simplicity.
      Jeff King committed
    • sha1_position: convert do-while to while · 09930192
      If we enter the sha1_position() function with "lo == hi",
      we have no elements. But the do-while loop means that we'll
      enter the loop body once anyway, picking "mi" at that same
      value and comparing nonsense to our desired key. This is
      unlikely to match in practice, but we still shouldn't be
      looking at the memory in the first place.
      
      This bug is inherited from git.git; it was fixed there in
      e01580cfe01526ec2c4eb4899f776a82ade7e0e1.
      Jeff King committed
  3. 26 Jul, 2017 2 commits
  4. 21 Jul, 2017 1 commit
  5. 15 Jul, 2017 6 commits
    • config_file: refuse modifying included variables · 1b329089
      Modifying variables pulled in by an included file currently succeeds,
      but it doesn't actually do what one would expect, as refreshing the
      configuration will cause the values to reappear. As we are currently not
      really able to support this use case, we will instead just return an
      error for deleting and setting variables which were included via an
      include.
      Patrick Steinhardt committed
    • config_file: move reader into `config_read` only · 28c2cc3d
      Right now, we have multiple call sites which initialize a `reader`
      structure. As the structure is only actually used inside of
      `config_read`, we can instead just move the reader inside of the
      `config_read` function. Instead, we can just pass in the configuration
      file into `config_read`, which eases code readability.
      Patrick Steinhardt committed
    • config_file: refresh all files if includes were modified · 83bcd3a1
      Currently, we only re-parse the top-level configuration file when it has
      changed itself. This can cause problems when an include is changed, as
      we were not updating all values correctly.
      
      Instead of conditionally reparsing only refreshed files, the logic
      becomes much clearer and easier to follow if we always re-parse the
      top-level configuration file when either the file itself or one of its
      included configuration files has changed on disk. This commit implements
      this logic.
      
      Note that this might impact performance in some cases, as we need to
      re-read all configuration files whenever any of the included files
      changed. It could increase performance to just re-parse include files
      which have actually changed, but this would compromise maintainability
      of the code without much gain. The only case where we will gain anything
      is when we actually use includes and when only these includes are
      updated, which will probably be quite an unusual scenario to actually be
      worthwhile to optimize.
      Patrick Steinhardt committed
    • config_file: remove unused backend field from parse data · 56a7a264
      The backend passed to `config_read` is never actually used anymore, so
      we can remove it from the function and the `parse_data` structure.
      Patrick Steinhardt committed
    • config_file: pass reader directly to callbacks · 3a7f7a6e
      Previously, the callbacks passed to `config_parse` got the reader via a
      pointer to a pointer. This allowed the callbacks to update the callers
      `reader` variable when the array holding it has been reallocated. As the
      array is no longer present, we can simply the code by making the reader
      a simple pointer.
      Patrick Steinhardt committed
    • config_file: refactor include handling · 73df75d8
      Current code for configuration files uses the `reader` structure to
      parse configuration files and store additional metadata like the file's
      path and checksum. These structures are stored within an array in the
      backend itself, which causes multiple problems.
      
      First, it does not make sense to keep around the file's contents with
      the backend itself. While this data is usually free'd before being added
      to the backend, this brings along somewhat intricate lifecycle problems.
      A better solution would be to store only the file paths as well as the
      checksum of the currently parsed content only.
      
      The second problem is that the `reader` structures are stored inside an
      array. When re-parsing configuration files due to changed contents, we
      may cause this array to be reallocated, requiring us to update pointers
      hold by callers. Furthermore, we do not keep track of includes which
      are already associated to a reader inside of this array. This causes us
      to add readers multiple times to the backend, e.g. in the scenario of
      refreshing configurations.
      
      This commit fixes these shortcomings. We introduce a split between the
      parsing data and the configuration file's metadata. The `reader` will
      now only hold the file's contents and the parser state and the new
      `config_file` structure holds the file's path and checksum. Furthermore,
      the new structure is a recursive structure in that it will also hold
      references to the files it directly includes. The diskfile is changed to
      only store the top-level configuration file.
      
      These changes allow us further refactorings and greatly simplify
      understanding the code.
      Patrick Steinhardt committed
  6. 12 Jul, 2017 1 commit
  7. 10 Jul, 2017 1 commit
    • patch_generate: represent buffers as void pointers · 9093ced6
      Pointers to general data should usually be used as a void pointer such
      that it is possible to hand in variables of a different pointer type
      without the need to cast. This is the same when creating patches from
      buffers, where the buffers may contain arbitrary data. Instead of
      requiring the caller to care whether his buffer is e.g. `char *` or
      `unsigned char *`, we should instead just accept a `void *`. This is
      also consistent in how we tread other types like for example `git_blob`,
      which also just has a void pointer as its raw contents.
      Patrick Steinhardt committed
  8. 03 Jul, 2017 3 commits
    • Make sure to always include "common.h" first · 0c7f49dd
      Next to including several files, our "common.h" header also declares
      various macros which are then used throughout the project. As such, we
      have to make sure to always include this file first in all
      implementation files. Otherwise, we might encounter problems or even
      silent behavioural differences due to macros or defines not being
      defined as they should be. So in fact, our header and implementation
      files should make sure to always include "common.h" first.
      
      This commit does so by establishing a common include pattern. Header
      files inside of "src" will now always include "common.h" as its first
      other file, separated by a newline from all the other includes to make
      it stand out as special. There are two cases for the implementation
      files. If they do have a matching header file, they will always include
      this one first, leading to "common.h" being transitively included as
      first file. If they do not have a matching header file, they instead
      include "common.h" as first file themselves.
      
      This fixes the outlined problems and will become our standard practice
      for header and source files inside of the "src/" from now on.
      Patrick Steinhardt committed
    • Add missing license headers · 2480d0eb
      Some implementation files were missing the license headers. This commit
      adds them.
      Patrick Steinhardt committed
    • Fix missing include for header files · 0fb4b351
      Some of our header files are not included at all by any of their
      implementing counter-parts. Including them inside of these files leads
      to some compile errors mostly due to unknown types because of missing
      includes. But there's also one case where a declared function does not
      match the implementation's prototype.
      
      Fix all these errors by fixing up the prototype and adding missing
      includes. This is preparatory work for fixing up missing includes in the
      implementation files.
      Patrick Steinhardt committed
  9. 30 Jun, 2017 2 commits
    • win32: fix circular include deps with w32_crtdbg · 459fb8fe
      The current order of declarations and includes between "common.h" and
      "w32_crtdbg_stacktrace.h" is rather complicated. Both header files make
      use of things defined in the other one and are thus circularly dependent
      on each other. This makes it currently impossible to compile the
      "w32_crtdbg_stacktrace.c" file when including "common.h" inside of
      "w32_crtdbg_stacktrace.h".
      
      We can disentangle the mess by moving declaration of the inline crtdbg
      functions into the "w32_crtdbg_stacktrace.h" file and adding additional
      includes inside of it, such that all required functions are available to
      it. This allows us to break the dependency cycle.
      Patrick Steinhardt committed
  10. 26 Jun, 2017 1 commit
    • diff: implement function to calculate patch ID · 89a34828
      The upstream git project provides the ability to calculate a so-called
      patch ID. Quoting from git-patch-id(1):
      
          A "patch ID" is nothing but a sum of SHA-1 of the file diffs
          associated with a patch, with whitespace and line numbers ignored."
      
      Patch IDs can be used to identify two patches which are probably the
      same thing, e.g. when a patch has been cherry-picked to another branch.
      
      This commit implements a new function `git_diff_patchid`, which gets a
      patch and derives an OID from the diff. Note the different terminology
      here: a patch in libgit2 are the differences in a single file and a diff
      can contain multiple patches for different files. The implementation
      matches the upstream implementation and should derive the same OID for
      the same diff. In fact, some code has been directly derived from the
      upstream implementation.
      
      The upstream implementation has two different modes to calculate patch
      IDs, which is the stable and unstable mode. The old way of calculating
      the patch IDs was unstable in a sense that a different ordering the
      diffs was leading to different results. This oversight was fixed in git
      1.9, but as git tries hard to never break existing workflows, the old
      and unstable way is still default. The newer and stable way does not
      care for ordering of the diff hunks, and in fact it is the mode that
      should probably be used today. So right now, we only implement the
      stable way of generating the patch ID.
      Patrick Steinhardt committed
  11. 23 Jun, 2017 1 commit
  12. 21 Jun, 2017 1 commit
    • merge: fix potential free of uninitialized memory · 4dc87e72
      The function `merge_diff_mark_similarity_exact` may error our early and,
      when it does so, free the `ours_deletes_by_oid` and
      `theirs_deletes_by_oid` variables. While the first one can never be
      uninitialized due to the first call actually assigning to it, the second
      variable can be freed without being initialized.
      
      Fix the issue by initializing both variables to `NULL`.
      Patrick Steinhardt committed
  13. 19 Jun, 2017 2 commits
  14. 13 Jun, 2017 1 commit
  15. 12 Jun, 2017 4 commits
  16. 10 Jun, 2017 3 commits
    • git_futils_rmdir: only allow `EBUSY` when asked · 4a0df574
      Only ignore `EBUSY` from `rmdir` when the `GIT_RMDIR_SKIP_NONEMPTY` bit
      is set.
      Edward Thomson committed
    • checkout: cope with untracked files in directory deletion · 83989d70
      When deleting a directory during checkout, do not simply delete the
      directory, since there may be untracked files.  Instead, go into
      the iterator and examine each file.
      
      In the original code (the code with the faulty assumption), we look to
      see if there's an index entry beneath the directory that we want to
      remove.   Eg, it looks to see if we have a workdir entry foo and an
      index entry foo/bar.txt. If this is not the case, then the working
      directory must have precious files in that directory. This part is okay.
      The part that's not okay is if there is an index entry foo/bar.txt. It
      just blows away the whole damned directory.
      
      That's not cool.
      
      Instead, by simply pushing the directory itself onto the stack and
      iterating each entry, we will deal with the files one by one - whether
      they're in the index (and can be force removed) or not (and are
      precious).
      
      The original code was a bad optimization, assuming that we didn't need
      to git_iterator_advance_into if there was any index entry in the folder.
      That's wrong - we could have optimized this iff all folder entries are
      in the index.
      
      Instead, we need to simply dig into the directory and analyze its
      entries.
      Edward Thomson committed
    • smart_protocol: fix parsing of server ACK responses · e141f079
      Fix ACK parsing in wait_while_ack() internal function. This patch
      handles the case where multi_ack_detailed mode sends 'ready' ACKs. The
      existing functionality would bail out too early, thus causing the
      processing of the ensuing packfile to fail if/when 'ready' ACKs were
      sent.
      Roger Gee committed
  17. 08 Jun, 2017 6 commits