stash: avoid recomputing tree when committing worktree

When creating a new stash, we need to create there separate commits storing differences stored in the index, untracked changes as well as differences in the working directory. The first two will only be done conditionally if the equivalent options "git stash --keep-index --include-untracked" are being passed to `git_stash_save`, but even when only creating a stash of worktree changes we're much slower than git.git. Using our new stash example: $ time git stash Saved working directory and index state WIP on (no branch): 2f7d9d47575e Linux 5.1.7 real 0m0.528s user 0m0.309s sys 0m0.381s $ time lg2 stash real 0m27.165s user 0m13.645s sys 0m6.403s As can be seen, libgit2 is more than 50x slower than git.git! When creating the stash commit that includes all worktree changes, we create a completely new index to prepare for the new commit and populate it with the entries contained in the index' tree. Here comes the catch: by populating the index with a tree's contents, we do not have any stat caches in the index. This means that we have to re-validate every single file from the worktree and see whether it has changed. The issue can be fixed by populating the new index with the repo's existing index instead of with the tree. This retains all stat cache information, and thus we really only need to check files that have changed stat information. This is semantically equivalent to what we previously did: previously, we used the tree of the commit computed from the index. Now we're just using the index directly. And, in fact, the cache is doing wonders: time lg2 stash real 0m1.836s user 0m1.166s sys 0m0.663s We're now performing 15x faster than before and are only 3x slower than git.git now.

stash: avoid recomputing tree when committing worktree
When creating a new stash, we need to create there separate commits storing differences stored in the index, untracked changes as well as differences in the working directory. The first two will only be done conditionally if the equivalent options "git stash --keep-index --include-untracked" are being passed to `git_stash_save`, but even when only creating a stash of worktree changes we're much slower than git.git. Using our new stash example: $ time git stash Saved working directory and index state WIP on (no branch): 2f7d9d47575e Linux 5.1.7 real 0m0.528s user 0m0.309s sys 0m0.381s $ time lg2 stash real 0m27.165s user 0m13.645s sys 0m6.403s As can be seen, libgit2 is more than 50x slower than git.git! When creating the stash commit that includes all worktree changes, we create a completely new index to prepare for the new commit and populate it with the entries contained in the index' tree. Here comes the catch: by populating the index with a tree's contents, we do not have any stat caches in the index. This means that we have to re-validate every single file from the worktree and see whether it has changed. The issue can be fixed by populating the new index with the repo's existing index instead of with the tree. This retains all stat cache information, and thus we really only need to check files that have changed stat information. This is semantically equivalent to what we previously did: previously, we used the tree of the commit computed from the index. Now we're just using the index directly. And, in fact, the cache is doing wonders: time lg2 stash real 0m1.836s user 0m1.166s sys 0m0.663s We're now performing 15x faster than before and are only 3x slower than git.git now.
a7d32d60 · Patrick Steinhardt · 88731e3c · a7d32d60
Commit a7d32d60 authored Jul 20, 2019 by Patrick Steinhardt
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 14 deletions

src/stash.c
+9 -14

No files found.
--- a/src/stash.c
+++ b/src/stash.c
@@ -398,28 +398,23 @@ static int commit_worktree(
 	git_commit *b_commit,
 	git_commit *u_commit)
 {
-	int error = 0;
+	const git_commit *parents[] = {	NULL, NULL, NULL };
-	git_tree *w_tree = NULL, *i_tree = NULL;
+	git_index *i_index = NULL, *r_index = NULL;
-	git_index *i_index = NULL;
+	git_tree *w_tree = NULL;
-	const git_commit *parents[] = {	NULL, NULL,	NULL };
+	int error = 0, ignorecase;
-	int ignorecase;
 	parents[0] = b_commit;
 	parents[1] = i_commit;
 	parents[2] = u_commit;
-	if ((error = git_commit_tree(&i_tree, i_commit)) < 0)
+	if ((error = git_repository_index(&r_index, repo) < 0) ||
-		goto cleanup;
+	    (error = git_index_new(&i_index)) < 0 ||
+	    (error = git_index__fill(i_index, &r_index->entries) < 0) ||
-	if ((error = git_index_new(&i_index)) < 0 ||
+	    (error = git_repository__configmap_lookup(&ignorecase, repo, GIT_CONFIGMAP_IGNORECASE)) < 0)
-		(error = git_repository__configmap_lookup(&ignorecase, repo, GIT_CONFIGMAP_IGNORECASE)) < 0)
 		goto cleanup;
 	git_index__set_ignore_case(i_index, ignorecase);
-	if ((error = git_index_read_tree(i_index, i_tree)) < 0)
-		goto cleanup;
 	if ((error = build_workdir_tree(&w_tree, repo, i_index, b_commit)) < 0)
 		goto cleanup;
@@ -436,9 +431,9 @@ static int commit_worktree(
 		parents);
 cleanup:
-	git_tree_free(i_tree);
 	git_tree_free(w_tree);
 	git_index_free(i_index);
+	git_index_free(r_index);
 	return error;
 }