Commit b7dcea04 by Patrick Steinhardt

config_entries: micro-optimize storage of multivars

Multivars are configuration entries that have many values for the same
name; we can thus micro-optimize this case by just retaining the name of
the first configuration entry and freeing all the others, letting them
point to the string of the first entry.

The attached test case is an extreme example that demonstrates this. It
contains a section name that is approximately 500kB in size with 20.000
entries "a=b". Without the optimization, this would require at least
20000*500kB bytes, which is around 10GB. With this patch, it only
requires 500kB+20000*1B=20500kB.

The obvious culprit here is the section header, which we repeatedly
include in each of the configuration entry's names. This makes it very
easier for an adversary to provide a small configuration file that
disproportionally blows up in memory during processing and is thus a
feasible way for a denial-of-service attack. Unfortunately, we cannot
fix the root cause by e.g. having a separate "section" field that may
easily be deduplicated due to the `git_config_entry` structure being
part of our public API. So this micro-optimization is the best we can do
for now.
parent 62320860
......@@ -108,7 +108,8 @@ static void config_entries_free(git_config_entries *entries)
list = entries->list;
while (list != NULL) {
next = list->next;
git__free((char *) list->entry->name);
if (list->first)
git__free((char *) list->entry->name);
git__free((char *) list->entry->value);
git__free(list->entry);
git__free(list);
......@@ -126,12 +127,24 @@ void git_config_entries_free(git_config_entries *entries)
int git_config_entries_append(git_config_entries *entries, git_config_entry *entry)
{
config_entry_list *head;
config_entry_list *existing, *head;
head = git__calloc(1, sizeof(config_entry_list));
GIT_ERROR_CHECK_ALLOC(head);
head->entry = entry;
head->first = (git_strmap_get(entries->map, entry->name) == NULL);
/*
* This is a micro-optimization for configuration files
* with a lot of same keys. As for multivars the entry's
* key will be the same for all entries, we can just free
* all except the first entry's name and just re-use it.
*/
if ((existing = git_strmap_get(entries->map, entry->name)) != NULL) {
git__free((char *) entry->name);
entry->name = existing->entry->name;
} else {
head->first = 1;
}
if (entries->list)
entries->list->last->next = head;
......
......@@ -176,3 +176,23 @@ void test_config_stress__foreach_refreshes_snapshot(void)
git_config_free(config);
git__free(value);
}
void test_config_stress__huge_section_with_many_values(void)
{
git_config *config;
/*
* The config file is structured in such a way that is
* has a section header that is approximately 500kb of
* size followed by 40k entries. While the resulting
* configuration file itself is roughly 650kb in size and
* thus considered to be rather small, in the past we'd
* balloon to more than 20GB of memory (20000x500kb)
* while parsing the file. It thus was a trivial way to
* cause an out-of-memory situation and thus cause denial
* of service, e.g. via gitmodules.
*/
cl_git_pass(git_config_open_ondisk(&config, cl_fixture("config/config-oom")));
git_config_free(config);
}
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment