1. 12 Jul, 2019 2 commits
  2. 11 Jul, 2019 16 commits
    • Merge pull request #5134 from pks-t/pks/config-parser-separation · a6ad9e8a
      Config parser separation
      Patrick Steinhardt committed
    • config_parse: provide parser init and dispose functions · dbeadf8a
      Right now, all configuration file backends are expected to
      directly mess with the configuration parser's internals in order
      to set it up. Let's avoid doing that by implementing both a
      `git_config_parser_init` and `git_config_parser_dispose` function
      to clearly define the interface between configuration backends
      and the parser.
      
      Ideally, we would make the `git_config_parser` structure
      definition private to its implementation. But as that would
      require an additional memory allocation that was not required
      before we just live with it being visible to others.
      Patrick Steinhardt committed
    • config_file: refactor error handling in `config_write` · 32157526
      Error handling in `config_write` is rather convoluted and does
      not match our current code style. Refactor it to make it easier
      to understand.
      Patrick Steinhardt committed
    • config_file: internalize `git_config_file` struct · 820fa1a3
      With the previous commits, we have finally separated the config
      parsing logic from the specific configuration file backend. Due
      to that, we can now move the `git_config_file` structure into the
      config file backend's implementation so that no other code may
      accidentally start using it again. Furthermore, we rename the
      structure to `diskfile` to make it obvious that it is internal,
      only, and to unify it with naming scheme of the other diskfile
      structures.
      Patrick Steinhardt committed
    • config_parse: remove use of `git_config_file` · 6e6da75f
      The config parser code needs to keep track of the current parsed
      file's name so that we are able to provide proper error messages
      to the user. Right now, we do that by storing a `git_config_file`
      in the parser structure, but as that is a specific backend and
      the parser aims to be generic, it is a layering violation.
      
      Switch over to use a simple string to fix that.
      Patrick Steinhardt committed
    • config_file: embed file in diskfile parse data · 54d350e0
      The config file code needs to keep track of the actual
      `git_config_file` structure, as it not only contains the path
      of the current configuration file, but it also keeps tracks of
      all includes of that file. Right now, we keep track of that
      structure via the `git_config_parser`, but as that's supposed to
      be a backend generic implementation of configuration parsing it's
      a layering violation to have it in there.
      
      Switch over the config file backend to use its own config file
      structure that's embedded in the backend parse data. This allows
      us to switch over the generic config parser to avoid using the
      `git_config_file` structure.
      Patrick Steinhardt committed
    • config_parse: rename `data` parameter to `payload` for clarity · 76749dfb
      By convention, parameters that get passed to callbacks are
      usually named `payload` in our codebase. Rename the `data`
      parameters in the configuration parser callbacks to `payload` to
      avoid confusion.
      Patrick Steinhardt committed
    • Merge pull request #5132 from pks-t/pks/config-stat-cache · ba9725a2
      config_file: implement stat cache to avoid repeated rehashing
      Patrick Steinhardt committed
    • config_file: avoid re-reading files on write · 2ba7020f
      When we rewrite the configuration file due to any of its values
      being modified, we call `config_refresh` to update the in-memory
      representation of our config file backend. This is needlessly
      wasteful though, as `config_refresh` will always open the on-disk
      representation to reads the file contents while we already know
      the complete file contents at this point in time as we have just
      written it to disk.
      
      Implement a new function `config_refresh_from_buffer` that will
      refresh the backend's config entries from a buffer instead of
      from the config file itself. Note that this will thus _not_
      update the backend's timestamp, which will cause us to re-read
      the buffer when performing a read operation on it. But this is
      still an improvement as we now lazily re-read the contents, and
      most importantly we will avoid constantly re-reading the contents
      if we perform multiple write operations.
      
      The following strace demonstrates this if we're re-writing a key
      multiple times. It uses our config example with `config_set`
      changed to update the file 10 times with different keys:
      
      	$ strace lg2 config x.x z |& grep '^open.*config'
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      
      And now with the optimization of `config_refresh_from_buffer`:
      
      	$ strace lg2 config x.x z |& grep '^open.*config'
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      	open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
      
      As can be seen, this is quite a lot of `open` calls less.
      Patrick Steinhardt committed
    • config_file: split out function that sets config entries · a0dc3027
      Updating a config file backend's config entries is a bit more
      involved, as it requires clearing of the old config entries as
      well as handling locking correctly. As we will need this
      functionality in a future patch to refresh config entries from a
      buffer, let's extract this into its own function
      `config_set_entries`.
      Patrick Steinhardt committed
    • config_file: split out function that reads entries from a buffer · 985f5cdf
      The `config_read` function currently performs both reading the
      on-disk config file as well as parsing the retrieved buffer
      contents. To optimize how we refresh our config entries from an
      in-memory buffer, we need to be able to directly parse buffers,
      though, without involving any on-disk files at all.
      
      Extract a new function `config_read_buffer` that sets up the
      parsing logic and then parses config entries from a buffer, only.
      Have `config_read` use it to avoid duplicated logic.
      Patrick Steinhardt committed
    • config_file: move refresh into `write` function · 3e1c137a
      We are quite lazy in how we refresh our config file backend when
      updating any of its keys: instead of just updating our in-memory
      representation of the keys, we just discard the old set of keys
      and then re-read the config file contents from disk. This refresh
      currently happens separately at every callsite of `config_write`,
      but it is clear that we _always_ want to refresh if we have
      written the config file to disk. If we didn't, then we'd run
      around with an outdated config file backend that does not
      represent what we have on disk.
      
      By moving the refresh into `config_write`, we are also able to
      optimize the case where the config file is currently locked.
      Before, we would've tried to re-read the file even if we have
      only updated its cached contents without touching the on-disk
      file. Thus we'd have unnecessarily stat'd the file, even though
      we know that it shouldn't have been modified in the meantime due
      to its lock.
      Patrick Steinhardt committed
    • config_file: implement stat cache to avoid repeated rehashing · d7f58eab
      To decide whether a config file has changed, we always hash its
      complete contents. This is unnecessarily expensive, as
      well-behaved filesystems will always update stat information for
      files which have changed. So before computing the hash, we should
      first check whether the stat info has actually changed for either
      the configuration file or any of its includes. This avoids having
      to re-read the configuration file and its includes every time
      when we check whether it's been modified.
      
      Tracing the for-each-ref example previous to this commit, one can
      see that we repeatedly re-open both the repo configuration as
      well as the global configuration:
      
      	$ strace lg2 for-each-ref |& grep config
      	access("/home/pks/.gitconfig", F_OK)    = -1 ENOENT (No such file or directory)
      	access("/home/pks/.config/git/config", F_OK) = 0
      	access("/etc/gitconfig", F_OK)          = -1 ENOENT (No such file or directory)
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	access("/tmp/repo/.git/config", F_OK)   = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c05290) = -1 ENOENT (No such file or directory)
      	access("/home/pks/.gitconfig", F_OK)    = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	access("/home/pks/.config/git/config", F_OK) = 0
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c051f0) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      
      With the change, we only do stats for those files and open them a
      single time, only:
      
      	$ strace lg2 for-each-ref |& grep config
      	access("/home/pks/.gitconfig", F_OK)    = -1 ENOENT (No such file or directory)
      	access("/home/pks/.config/git/config", F_OK) = 0
      	access("/etc/gitconfig", F_OK)          = -1 ENOENT (No such file or directory)
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	access("/tmp/repo/.git/config", F_OK)   = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/home/pks/.gitconfig", 0x7ffe70540d20) = -1 ENOENT (No such file or directory)
      	access("/home/pks/.gitconfig", F_OK)    = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	access("/home/pks/.config/git/config", F_OK) = 0
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/home/pks/.gitconfig", 0x7ffe70540ca0) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.gitconfig", 0x7ffe70540c80) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      	stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
      	stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory)
      	stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
      
      The following benchmark has been performed with and without the
      stat cache in a best-of-ten run:
      
      ```
      
      int lg2_repro(git_repository *repo, int argc, char **argv)
      {
      	git_config *cfg;
      	int32_t dummy;
      	int i;
      
      	UNUSED(argc);
      	UNUSED(argv);
      
      	check_lg2(git_repository_config(&cfg, repo),
      			"Could not obtain config", NULL);
      
      	for (i = 1; i < 100000; ++i)
      		git_config_get_int32(&dummy, cfg, "foo.bar");
      
      	git_config_free(cfg);
      	return 0;
      }
      ```
      
      Without stat cache:
      
      	$ time lg2 repro
      	real    0m1.528s
      	user    0m0.568s
      	sys     0m0.944s
      
      With stat cache:
      
      	$ time lg2 repro
      	real    0m0.526s
      	user    0m0.268s
      	sys     0m0.258s
      
      This benchmark shows a nearly three-fold performance improvement.
      
      This change requires that we check our configuration stress tests
      as we're now in fact becoming more racy. If somebody is writing a
      configuration file at nearly the same time (there is a window of
      100ns on Windows-based systems), then it might be that we realize
      that this file has actually changed and thus may not re-read it.
      This will only happen if either an external process is rewriting
      the configuration file or if the same process has multiple
      `git_config` structures pointing to the same time, where one of
      both is being used to write and the other one is used to read
      values.
      Patrick Steinhardt committed
    • examples: implement config example · 8ee3d39a
      Implement a new example that resembles git-config(1). Right now,
      this example can both read and set configuration keys, only.
      Patrick Steinhardt committed
    • cmake: report whether we are using sub-second stat information · df54c7fb
      Depending on the platform and on build options, we may or may not
      build libgit2 with support for nanoseconds when using `stat`
      calls. It's currently unclear though whether sub-second stat
      information is used at all.
      
      Add feature info for this to tell at configure time whether it's
      being used or not.
      Patrick Steinhardt committed
  3. 05 Jul, 2019 14 commits
  4. 04 Jul, 2019 4 commits
  5. 27 Jun, 2019 4 commits