1. 20 Jul, 2019 6 commits
    • checkout: postpone creation of symlinks to the end · 6be5ac23
      On most platforms it's fine to create symlinks to nonexisting files. Not
      so on Windows, where the type of a symlink (file or directory) needs to
      be set at creation time. So depending on whether the target file exists
      or not, we may end up with different symlink types. This creates a
      problem when performing checkouts, where we simply iterate over all blobs
      that need to be updated without treating symlinks any special. If the
      target file of the symlink is going to be checked out after the symlink
      itself, then the symlink will be created as directory symlink and not as
      file symlink.
      
      Fix the issue by iterating over blobs twice: once to perform postponed
      deletions and updates to non-symlink blobs, and once to perform updates
      to symlink blobs.
      Patrick Steinhardt committed
    • win32: fix symlinks to relative file targets · 50194dcd
      When creating a symlink in Windows, one needs to tell Windows whether
      the symlink should be a file or directory symlink. To determine which
      flag to pass, we call `GetFileAttributesW` on the target file to see
      whether it is a directory and then pass the flag accordingly. The
      problem though is if create a symlink with a relative target path, then
      we will check that relative path while not necessarily being inside of
      the working directory where the symlink is to be created. Thus, getting
      its attributes will either fail or return attributes of the wrong
      target.
      
      Fix this by resolving the target path relative to the directory in which
      the symlink is to be created.
      Patrick Steinhardt committed
    • win32: correctly unlink symlinks to directories · a00842c4
      When deleting a symlink on Windows, then the way to delete it depends on
      whether it is a directory symlink or a file symlink. In the first case,
      we need to use `DeleteFile`, in the second `RemoveDirectory`. Right now,
      `p_unlink` will only ever try to use `DeleteFile`, though, and thus fail
      to remove directory symlinks. This mismatches how unlink(3P) is expected
      to behave, though, as it shall remove any symlink disregarding whether
      it is a file or directory symlink.
      
      In order to correctly unlink a symlink, we thus need to check what kind
      of file this is. If we were to first query file attributes of every file
      upon calling `p_unlink`, then this would penalize the common case
      though. Instead, we can try to first delete the file with `DeleteFile`
      and only if the error returned is `ERROR_ACCESS_DENIED` will we query
      file attributes and determine whether it is a directory symlink to use
      `RemoveDirectory` instead.
      Patrick Steinhardt committed
    • path: extract function to check whether a path supports symlinks · ded77bb1
      When initializing a repository, we need to check whether its working
      directory supports symlinks to correctly set the initial value of the
      "core.symlinks" config variable. The code to check the filesystem is
      reusable in other parts of our codebase, like for example in our tests
      to determine whether certain tests can be expected to succeed or not.
      
      Extract the code into a new function `git_path_supports_symlinks` to
      avoid duplicate implementations. Remove a duplicate implementation in
      the repo test helper code.
      Patrick Steinhardt committed
    • fileops: rename to "futils.h" to match function signatures · e54343a4
      Our file utils functions all have a "futils" prefix, e.g.
      `git_futils_touch`. One would thus naturally guess that their
      definitions and implementation would live in files "futils.h" and
      "futils.c", respectively, but in fact they live in "fileops.h".
      
      Rename the files to match expectations.
      Patrick Steinhardt committed
    • patch_parse: fix segfault due to line containing static contents · a613832e
      With commit dedf70ad (patch_parse: do not depend on parsed buffer's
      lifetime, 2019-07-05), all lines of the patch are allocated with
      `strdup` to make lifetime of the parsed patch independent of the buffer
      that is currently being parsed. In patch b0893282 (patch_parse: ensure
      valid patch output with EOFNL, 2019-07-11), we introduced another
      code location where we add lines to the parsed patch. But as that one
      was implemented via a separate pull request, it wasn't converted to use
      `strdup`, as well. As a consequence, we generate a segfault when trying
      to deallocate the potentially static buffer that's now in some of the
      lines.
      
      Use `git__strdup` to fix the issue.
      Patrick Steinhardt committed
  2. 19 Jul, 2019 3 commits
    • repository: do not initialize HEAD if it's provided by templates · 9d46f167
      When using templates to initialize a git repository, then git-init(1)
      will copy over all contents of the template directory. These will be
      preferred over the default ones created by git-init(1). While we mostly
      do the same, there is the exception of "HEAD". While we do copy over the
      template's HEAD file, afterwards we'll immediately re-initialize its
      contents with either the default "ref: refs/origin/master" or the init
      option's `initial_head` field.
      
      Let's fix the inconsistency with upstream git-init(1) by not overwriting
      the template HEAD, but only if the user hasn't set `opts.initial_head`.
      If the `initial_head` field has been supplied, we should use that
      indifferent from whether the template contained a HEAD file or not. Add
      tests to verify we correctly use the template directory's HEAD file and
      that `initial_head` overrides the template.
      Patrick Steinhardt committed
    • repository: update error handling in `init_ext` · f3134a84
      Update `git_repository_init_ext` to use our typical style of error
      handling. The function had multiple statements which didn't `goto out`
      immediately but instead deferred it to later calls combined with `if`
      statements.
      Patrick Steinhardt committed
    • repository: avoid swallowing error codes in `create_head` · 869ae5a3
      The error handling in `git_repository_create_head` completely swallows
      all error codes. While probably not too much of a problem, this also
      violates our usual coding style.
      
      Refactor the code to use a local `error` variable with the typical `goto
      out` statements.
      Patrick Steinhardt committed
  3. 18 Jul, 2019 3 commits
    • configuration: cvar -> configmap · 658022c4
      `cvar` is an unhelpful name.  Refactor its usage to `configmap` for more
      clarity.
      Patrick Steinhardt committed
    • sha1: win32: fix compilation due to unknown type · 7574564e
      In commit bbf034ab (hash: move `git_hash_prov` into Win32 backend,
      2019-02-22), the `git_hash_prov`'s structure name has been removed in
      favour of its typedef'ed name. But as we have no CI that compiles with
      the WinHTTPS hashing backend right now, it wasn't noticed that the
      implementation that uses this struct wasn't changed correctly.
      
      Fix the struct type to make it compile again.
      Patrick Steinhardt committed
    • ignore: fix determining whether a shorter pattern negates another · 6f6340af
      When computing whether we need to store a negative pattern, we iterate
      through all previously known patterns and check whether the negative
      pattern undoes any of the previous ones. In doing so we call `wildmatch`
      and check it's return for any negative error values. If there was a
      negative return, we will abort and bubble up that error to the caller.
      
      In fact, this check for negative values stems from the time where we
      still used `fnmatch` instead of `wildmatch`. For `fnmatch`, negative
      values indicate a "real" error, while for `wildmatch` a negative value
      may be returned if the matching was prematurely aborted. A premature
      abort may for example also happen if the pattern matches a prefix of the
      haystack if the pattern is shorter. Returning an error in that case is
      the wrong thing to do.
      
      Fix the code to compare for equality with `WM_MATCH`, only. Negative
      values returned by `wildmatch` are perfectly fine and thus should be
      ignored. Add a test that verifies we do not see the error.
      Patrick Steinhardt committed
  4. 17 Jul, 2019 2 commits
    • cache: evict items more efficiently · 770b91b1
      When our object cache is full, we pick eight items (or the whole cache,
      if there are fewer) and evict them. For small cache sizes, this is fine,
      but when we're dealing with a large number of objects, we can repeatedly
      exhaust the cache and spend a large amount of time in git_oidmap_iterate
      trying to find items to evict.
      
      Instead, let's assume that if the cache gets full, we have a large
      number of objects that we're handling, and be more aggressive about
      evicting items. Let's remove one item for every 2048 items, but not less
      than 8. This causes us to scale our evictions in proportion to the size
      of the cache and significantly reduces the time we spend in
      git_oidmap_iterate.
      
      Before this change, a full pack of all the non-blob objects in the Linux
      repository took in excess of 30 minutes and spent 62.3% of total runtime
      in odb_read_1 and its children, and 44.3% of the time in
      git_oidmap_iterate. With this change, the same operation now takes 14
      minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total
      time, whereas git_oidmap_iterate consists of 6.2%.
      
      Note that we do spend a little more time inflating objects and a decent
      amount more time in memcmp. However, overall, the time taken is
      significantly improved, and time in pack building is now dominated by
      git_delta_create_from_index (33.7%), which is what we would expect.
      brian m. carlson committed
    • pack-objects: allocate memory more efficiently · c4df926b
      The packbuilder code allocates memory in chunks. When it needs to
      allocate, it tries to add 1024 to the number of objects and multiply by
      3/2. However, it actually multiplies by 1 instead, since it performs an
      integral division in the expression "3 / 2" and only then multiplies by
      the increased number of objects.
      
      The current behavior causes the code to waste massive amounts of time
      copying memory when it reallocates, causing inserting all non-blob
      objects in the Linux repository into a new pack to take some
      indeterminate time greater than 5 minutes instead of 52 seconds.
      
      Correct this error by first dividing by two, and only then multiplying
      by 3. We still check for overflow for the multiplication, which is the
      only part that can overflow. This appears to be the only place in the
      code base which has this problem.
      brian m. carlson committed
  5. 12 Jul, 2019 5 commits
    • attr_file: ignore macros defined in subdirectories · f8346905
      Right now, we are unconditionally applying all macros found in a
      gitatttributes file. But quoting gitattributes(5):
      
          Custom macro attributes can be defined only in top-level
          gitattributes files ($GIT_DIR/info/attributes, the .gitattributes
          file at the top level of the working tree, or the global or
          system-wide gitattributes files), not in .gitattributes files in
          working tree subdirectories. The built-in macro attribute "binary"
          is equivalent to:
      
      So gitattribute files in subdirectories of the working tree may
      explicitly _not_ contain macro definitions, but we do not currently
      enforce this limitation.
      
      This patch introduces a new parameter to the gitattributes parser that
      tells whether macros are allowed in the current file or not. If set to
      `false`, we will still parse macros, but silently ignore them instead of
      adding them to the list of defined macros. Update all callers to
      correctly determine whether the to-be-parsed file may contain macros or
      not. Most importantly, when walking up the directory hierarchy, we will
      only set it to `true` once it reaches the root directory of the repo
      itself.
      
      Add a test that verifies that we are indeed not applying macros from
      subdirectories. Previous to these changes, the test would've failed.
      Patrick Steinhardt committed
    • attr_file: refactor `parse_buffer` function · 97968529
      The gitattributes code is one of our oldest and most-untouched codebases
      in libgit2, and as such its code style doesn't quite match our current
      best practices. Refactor the function `git_attr_file__parse_buffer` to
      better match them.
      Patrick Steinhardt committed
    • attr_file: refactor `load_standalone` function · dbc7e4b1
      The gitattributes code is one of our oldest and most-untouched codebases
      in libgit2, and as such its code style doesn't quite match our current
      best practices. Refactor the function `git_attr_file__lookup_standalone`
      to better match them.
      Patrick Steinhardt committed
    • attrcache: fix memory leak if inserting invalid macro to cache · be8f9bb1
      A macro without any assignments is considered an invalid macro by the
      attributes cache and is thus not getting added to the macro map at all.
      But as `git_attr_cache__insert_macro` returns success with neither
      free'ing nor adopting the macro into its map, this will cause a memory
      leak.
      
      Fix this by freeing the macro in the function if it's not going to be
      added. This is perfectly fine to do, as callers assume that the
      attrcache will have the macro adopted on success anyway.
      Patrick Steinhardt committed
    • attrcache: fix multiple memory leaks when inserting macros · 7277bf83
      The function `git_attr_cache__insert_macro` is responsible for adopting
      macros in the per-repo macro cache. When adding a macro that replaces an
      already existing macro (e.g. because of re-parsing gitattributes files),
      then we do not free the previous macro and thus cause a memory leak.
      
      Fix this leak by first checking if the cache already has a macro defined
      with the same name. If so, free it before replacing the cache entry with
      the new instance.
      Patrick Steinhardt committed
  6. 11 Jul, 2019 17 commits
    • fileops: fix creation of directory in filesystem root · 5ae22a63
      In commit 45f24e78 (git_repository_init: stop traversing at
      windows root, 2019-04-12), we have fixed `git_futils_mkdir` to
      correctly handle the case where we create a directory in
      Windows-style filesystem roots like "C:\repo".
      
      The problem here is an off-by-one: previously, to that commit,
      we've been checking wether the parent directory's length is equal
      to the root directory's length incremented by one. When we call
      the function with "/example", then the parent directory's length
      ("/") is 1, but the root directory offset is 0 as the path is
      directly rooted without a drive prefix. This resulted in `1 == 0 +
      1`, which was true. With the change, we've stopped incrementing
      the root directory length, and thus now compare `1 <= 0`, which
      is false.
      
      The previous way of doing it was kind of finicky any non-obvious,
      which is also why the error was introduced. So instead of just
      re-adding the increment, let's explicitly add a condition that
      aborts finding the parent if the current parent path is "/".
      
      Making this change causes Azure Pipelines to fail the testcase
      repo::init::nonexistent_paths on Unix-based systems. This is because we
      have just fixed creating directories in the filesystem root, which
      previously didn't work. As Docker-based tests are running as root user,
      we are thus able to create the non-existing path and will now succeed to
      create the repository that was expected to actually fail.
      
      Let's split this up into three different tests:
      
      - A test to verify that we do not create repos in a non-existing parent
        directoy if the flag `GIT_REPOSITORY_INIT_MKPATH` is not set.
      
      - A test to verify that we fail if the root directory does not exist. As
        there is a common root directory on Unix-based systems that always
        exist, we can only test for this on Windows-based systems.
      
      - A test to verify that we fail if trying to create a repository in an
        unwriteable parent directory. We can only test this if not running
        tests as root user, as CAP_DAC_OVERRIDE will cause us to ignore
        permissions when creating files.
      Patrick Steinhardt committed
    • patch_parse: handle missing newline indicator in old file · 3f855fe8
      When either the old or new file contents have no newline at the end of
      the file, then git-diff(1) will print out a "\ No newline at end of
      file" indicator. While we do correctly handle this in the case where the
      new file has this indcator, we fail to parse patches where the old file
      is missing a newline at EOF.
      
      Fix this bug by handling and missing newline indicators in the old file.
      Add tests to verify that we can parse such files.
      Patrick Steinhardt committed
    • diff: ignore EOFNL for computing patch IDs · 001d76e1
      The patch ID is supposed to be mostly context-insignificant and
      thus only includes added or deleted lines. As such, we shouldn't honor
      end-of-file-without-newline markers in diffs.
      
      Ignore such lines to fix how we compute the patch ID for such diffs.
      Patrick Steinhardt committed
    • config_parse: provide parser init and dispose functions · dbeadf8a
      Right now, all configuration file backends are expected to
      directly mess with the configuration parser's internals in order
      to set it up. Let's avoid doing that by implementing both a
      `git_config_parser_init` and `git_config_parser_dispose` function
      to clearly define the interface between configuration backends
      and the parser.
      
      Ideally, we would make the `git_config_parser` structure
      definition private to its implementation. But as that would
      require an additional memory allocation that was not required
      before we just live with it being visible to others.
      Patrick Steinhardt committed
    • config_file: refactor error handling in `config_write` · 32157526
      Error handling in `config_write` is rather convoluted and does
      not match our current code style. Refactor it to make it easier
      to understand.
      Patrick Steinhardt committed
    • config_file: internalize `git_config_file` struct · 820fa1a3
      With the previous commits, we have finally separated the config
      parsing logic from the specific configuration file backend. Due
      to that, we can now move the `git_config_file` structure into the
      config file backend's implementation so that no other code may
      accidentally start using it again. Furthermore, we rename the
      structure to `diskfile` to make it obvious that it is internal,
      only, and to unify it with naming scheme of the other diskfile
      structures.
      Patrick Steinhardt committed
    • config_parse: remove use of `git_config_file` · 6e6da75f
      The config parser code needs to keep track of the current parsed
      file's name so that we are able to provide proper error messages
      to the user. Right now, we do that by storing a `git_config_file`
      in the parser structure, but as that is a specific backend and
      the parser aims to be generic, it is a layering violation.
      
      Switch over to use a simple string to fix that.
      Patrick Steinhardt committed
    • config_file: embed file in diskfile parse data · 54d350e0
      The config file code needs to keep track of the actual
      `git_config_file` structure, as it not only contains the path
      of the current configuration file, but it also keeps tracks of
      all includes of that file. Right now, we keep track of that
      structure via the `git_config_parser`, but as that's supposed to
      be a backend generic implementation of configuration parsing it's
      a layering violation to have it in there.
      
      Switch over the config file backend to use its own config file
      structure that's embedded in the backend parse data. This allows
      us to switch over the generic config parser to avoid using the
      `git_config_file` structure.
      Patrick Steinhardt committed
    • config_parse: rename `data` parameter to `payload` for clarity · 76749dfb
      By convention, parameters that get passed to callbacks are
      usually named `payload` in our codebase. Rename the `data`
      parameters in the configuration parser callbacks to `payload` to
      avoid confusion.
      Patrick Steinhardt committed
    • config_file: avoid re-reading files on write · 2ba7020f
      When we rewrite the configuration file due to any of its values
      being modified, we call `config_refresh` to update the in-memory
      representation of our config file backend. This is needlessly
      wasteful though, as `config_refresh` will always open the on-disk
      representation to reads the file contents while we already know
      the complete file contents at this point in time as we have just
      written it to disk.
      
      Implement a new function `config_refresh_from_buffer` that will
      refresh the backend's config entries from a buffer instead of
      from the config file itself. Note that this will thus _not_
      update the backend's timestamp, which will cause us to re-read
      the buffer when performing a read operation on it. But this is
      still an improvement as we now lazily re-read the contents, and
      most importantly we will avoid constantly re-reading the contents
      if we perform multiple write operations.
      
      The following strace demonstrates this if we're re-writing a key
      multiple times. It uses our config example with `config_set`
      changed to update the file 10 times with different keys:
      
      	$ strace lg2 config x.x z |& grep '^open.*config'
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      
      And now with the optimization of `config_refresh_from_buffer`:
      
      	$ strace lg2 config x.x z |& grep '^open.*config'
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      
      As can be seen, this is quite a lot of `open` calls less.
      Patrick Steinhardt committed
    • config_file: split out function that sets config entries · a0dc3027
      Updating a config file backend's config entries is a bit more
      involved, as it requires clearing of the old config entries as
      well as handling locking correctly. As we will need this
      functionality in a future patch to refresh config entries from a
      buffer, let's extract this into its own function
      `config_set_entries`.
      Patrick Steinhardt committed
    • config_file: split out function that reads entries from a buffer · 985f5cdf
      The `config_read` function currently performs both reading the
      on-disk config file as well as parsing the retrieved buffer
      contents. To optimize how we refresh our config entries from an
      in-memory buffer, we need to be able to directly parse buffers,
      though, without involving any on-disk files at all.
      
      Extract a new function `config_read_buffer` that sets up the
      parsing logic and then parses config entries from a buffer, only.
      Have `config_read` use it to avoid duplicated logic.
      Patrick Steinhardt committed
    • config_file: move refresh into `write` function · 3e1c137a
      We are quite lazy in how we refresh our config file backend when
      updating any of its keys: instead of just updating our in-memory
      representation of the keys, we just discard the old set of keys
      and then re-read the config file contents from disk. This refresh
      currently happens separately at every callsite of `config_write`,
      but it is clear that we _always_ want to refresh if we have
      written the config file to disk. If we didn't, then we'd run
      around with an outdated config file backend that does not
      represent what we have on disk.
      
      By moving the refresh into `config_write`, we are also able to
      optimize the case where the config file is currently locked.
      Before, we would've tried to re-read the file even if we have
      only updated its cached contents without touching the on-disk
      file. Thus we'd have unnecessarily stat'd the file, even though
      we know that it shouldn't have been modified in the meantime due
      to its lock.
      Patrick Steinhardt committed
    • config_file: implement stat cache to avoid repeated rehashing · d7f58eab
      To decide whether a config file has changed, we always hash its
      complete contents. This is unnecessarily expensive, as
      well-behaved filesystems will always update stat information for
      files which have changed. So before computing the hash, we should
      first check whether the stat info has actually changed for either
      the configuration file or any of its includes. This avoids having
      to re-read the configuration file and its includes every time
      when we check whether it's been modified.
      
      Tracing the for-each-ref example previous to this commit, one can
      see that we repeatedly re-open both the repo configuration as
      well as the global configuration:
      
      	$ strace lg2 for-each-ref |& grep config
      	access("/home/pks/.gitconfig", F_OK)    = -1 ENOENT (No such file or directory)
      	access("/home/pks/.config/git/config", F_OK) = 0
      	access("/etc/gitconfig", F_OK)          = -1 ENOENT (No such file or directory)
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	access("/tmp/repo/.git/config", F_OK)   = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c05290) = -1 ENOENT (No such file or directory)
      	access("/home/pks/.gitconfig", F_OK)    = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	access("/home/pks/.config/git/config", F_OK) = 0
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c051f0) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      
      With the change, we only do stats for those files and open them a
      single time, only:
      
      	$ strace lg2 for-each-ref |& grep config
      	access("/home/pks/.gitconfig", F_OK)    = -1 ENOENT (No such file or directory)
      	access("/home/pks/.config/git/config", F_OK) = 0
      	access("/etc/gitconfig", F_OK)          = -1 ENOENT (No such file or directory)
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	access("/tmp/repo/.git/config", F_OK)   = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffe70540d20) = -1 ENOENT (No such file or directory)
      	access("/home/pks/.gitconfig", F_OK)    = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	access("/home/pks/.config/git/config", F_OK) = 0
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/home/pks/.gitconfig", 0x7ffe70540ca0) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.gitconfig", 0x7ffe70540c80) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      
      The following benchmark has been performed with and without the
      stat cache in a best-of-ten run:
      
      ```
      
      int lg2_repro(git_repository *repo, int argc, char **argv)
      {
      	git_config *cfg;
      	int32_t dummy;
      	int i;
      
      	UNUSED(argc);
      	UNUSED(argv);
      
      	check_lg2(git_repository_config(&cfg, repo),
      			"Could not obtain config", NULL);
      
      	for (i = 1; i < 100000; ++i)
      		git_config_get_int32(&dummy, cfg, "foo.bar");
      
      	git_config_free(cfg);
      	return 0;
      }
      ```
      
      Without stat cache:
      
      	$ time lg2 repro
      	real    0m1.528s
      	user    0m0.568s
      	sys     0m0.944s
      
      With stat cache:
      
      	$ time lg2 repro
      	real    0m0.526s
      	user    0m0.268s
      	sys     0m0.258s
      
      This benchmark shows a nearly three-fold performance improvement.
      
      This change requires that we check our configuration stress tests
      as we're now in fact becoming more racy. If somebody is writing a
      configuration file at nearly the same time (there is a window of
      100ns on Windows-based systems), then it might be that we realize
      that this file has actually changed and thus may not re-read it.
      This will only happen if either an external process is rewriting
      the configuration file or if the same process has multiple
      `git_config` structures pointing to the same time, where one of
      both is being used to write and the other one is used to read
      values.
      Patrick Steinhardt committed
  7. 05 Jul, 2019 2 commits
    • w32_stack: convert buffer length param to `size_t` · 2f14c4fc
      In both `git_win32__stack_format` and `git_win32__stack`, we handle
      buffer lengths via an integer variable. As we only ever pass buffer
      sizes to it, this should be a `size_t` though to avoid loss of
      precision. As we also use it to compare with other `size_t` variables,
      this also silences signed/unsigned comparison warnings.
      Patrick Steinhardt committed
    • patch_parse: do not depend on parsed buffer's lifetime · dedf70ad
      When parsing a patch from a buffer, we let the patch lines point into
      the original buffer. While this is efficient use of resources, this also
      ties the lifetime of the parsed patch to the parsed buffer. As this
      behaviour is not documented anywhere in our API it is very surprising to
      its users.
      
      Untie the lifetime by duplicating the lines into the parsed patch. Add a
      test that verifies that lifetimes are indeed independent of each other.
      Patrick Steinhardt committed
  8. 04 Jul, 2019 2 commits
    • attr_file: completely initialize attribute sessions · 1bbec26d
      The function `git_attr_session__init` is currently only initializing
      setting up the attribute's session key by incrementing the repo-global
      key by one. Most notably, all other members of the `git_attr_session`
      struct are not getting initialized at all. So if one is to allocate a
      session on the stack and then calls `git_attr_session__init`, the
      session will still not be fully initialized. We have fared just fine
      with that until now as all users of the function have allocated the
      session structure as part of bigger structs with `calloc`, and thus its
      contents have been zero-initialized implicitly already.
      
      Fix this by explicitly zeroing out the session to enable allocation of
      sessions on the stack.
      Patrick Steinhardt committed
    • attr: Don't fail in attr_setup if there exists a system attributes file · 18a6d9f3
      Regression introduced in commit 5452e49f on PR #4967.
      
      Signed-off-by: Sven Strickroth <email@cs-ware.de>
      Sven Strickroth committed