cache: evict items more efficiently

When our object cache is full, we pick eight items (or the whole cache,
if there are fewer) and evict them. For small cache sizes, this is fine,
but when we're dealing with a large number of objects, we can repeatedly
exhaust the cache and spend a large amount of time in git_oidmap_iterate
trying to find items to evict.

Instead, let's assume that if the cache gets full, we have a large
number of objects that we're handling, and be more aggressive about
evicting items. Let's remove one item for every 2048 items, but not less
than 8. This causes us to scale our evictions in proportion to the size
of the cache and significantly reduces the time we spend in
git_oidmap_iterate.

Before this change, a full pack of all the non-blob objects in the Linux
repository took in excess of 30 minutes and spent 62.3% of total runtime
in odb_read_1 and its children, and 44.3% of the time in
git_oidmap_iterate. With this change, the same operation now takes 14
minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total
time, whereas git_oidmap_iterate consists of 6.2%.

Note that we do spend a little more time inflating objects and a decent
amount more time in memcmp. However, overall, the time taken is
significantly improved, and time in pack building is now dominated by
git_delta_create_from_index (33.7%), which is what we would expect.
parent f92d495d
......@@ -115,9 +115,12 @@ void git_cache_dispose(git_cache *cache)
/* Called with lock */
static void cache_evict_entries(git_cache *cache)
{
size_t evict_count = 8, i;
size_t evict_count = git_cache_size(cache) / 2048, i;
ssize_t evicted_memory = 0;
if (evict_count < 8)
evict_count = 8;
/* do not infinite loop if there's not enough entries to evict */
if (evict_count > git_cache_size(cache)) {
clear_cache(cache);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment