- 31 Jul, 2013 1 commit
-
-
After doing further profiling, I found that a lot of time was being spent attempting to insert hashes into the file hash signature when using the rolling hash because the rolling hash approach generates a hash per byte of the file instead of one per run/line of data. To optimize this, I decided to convert back to a run-based file signature algorithm which would be more like core Git. After changing this, a number of the existing tests started to fail. In some cases, this appears to have been because the test was coded to be too specific to the particular results of the file similarity metric and in some cases there appear to have been bugs in the core rename detection code where only by the coincidence of the file similarity scoring were the expected results being generated. This renames all the variables in the core rename detection code to be more consistent and hopefully easier to follow which made it a bit easier to reason about the behavior of that code and fix the problems that I was seeing. I think it's in better shape now. There are a couple of tests now that attempt to stress test the rename detection code and they are quite slow. Most of the time is spent setting up the test data on disk and in the index. When we roll out performance improvements for index insertion, it should also speed up these tests I hope.
Russell Belfer committed
-
- 26 Jul, 2013 1 commit
-
-
Russell Belfer committed
-
- 25 Jul, 2013 3 commits
-
-
The size data in the index may not reflect the actual size of the blob data from the ODB when content filtering comes into play. This commit fixes rename detection to use the actual blob size when calculating data signatures instead of the value from the index. Because of a misunderstanding on my part, I first converted the git_index_add_bypath API to use the post-filtered blob data size in creating the index entry. I backed that change out, but I kept the overall refactoring of that routine and the new internal git_blob__create_from_paths API because it eliminates an extra stat() call from the code that adds a file to the index. The existing tests actually cover this code path, at least when running on Windows, so at this point I'm not adding new tests to cover the changes.
Russell Belfer committed -
The previous fix for checking file sizes with rename detection always loads the blob. In this version, if the odb backend can get the object header without loading the whole thing into memory, then we'll just use that, so that we can eliminate possible rename sources & targets without loading them.
Russell Belfer committed -
The performance improvements I introduced for rename detection were not able to run successfully for tree-to-tree diffs because the blob size was not known early enough and so the file signature always had to be calculated nonetheless. This change separates loading blobs into memory from calculating the signature. I can't avoid having to load the large blobs into memory, but by moving it forward, I'm able to avoid the signature calculation if the blob won't come into play for renames.
Russell Belfer committed
-
- 24 Jul, 2013 6 commits
-
-
Russell Belfer committed
-
Before the optimization commits, this test used to take about 20 seconds to run on my machine. Afterwards, there is still a couple seconds of data setup, but the actual diff and rename detection runs in a fraction of a second.
Russell Belfer committed -
Russell Belfer committed
-
Russell Belfer committed
-
Russell Belfer committed
-
Russell Belfer committed
-
- 23 Jul, 2013 4 commits
-
-
Doc fixes
Russell Belfer committed -
The description of what the function does hasn't been true for quite a while. Change it to reflect the way it currently works. While here, remove an even older comment about missing features that have been implemented.
Carlos Martín Nieto committed -
clang's docparser highlighted these.
Carlos Martín Nieto committed -
Invalid refs on disk cause revwalk globbing to fail
Vicent Martí committed
-
- 22 Jul, 2013 4 commits
-
-
The new tests don't always want to use the same fixture data as the old ones so this makes it configurable on a per-test basis.
Russell Belfer committed -
Russell Belfer committed
-
Instead of using lots of strdup calls, this adds a memory pool to the loose refs iteration code and uses it for keeping track of the loose refs array. Memory usage could probably be reduced even further by eliminating the vector and just scanning by adding the strlen of each ref, but that would be a more intrusive changes. This also updates the error handling to be more thorough about checking for failed allocations, etc.
Russell Belfer committed -
The git_reference_next API silently skips invalid references when scanning the loose refs. The git_reference_next_name API should skip the same ones even though it isn't creating the reference object. This adds a test with a an invalid loose reference and makes sure that both APIs skip the same entries and generate the same results.
Russell Belfer committed
-
- 19 Jul, 2013 7 commits
-
-
Clarify when to use github issues
Martin Woodward committed -
Edward Thomson committed
-
git_buf_text_gather_stats doesn't work for multi-byte characters
Ben Straub committed -
Suggest that github issues are to be used for bug reports, while questions about usage should be directed to StackOverflow.
Edward Thomson committed -
Refresh readme and contributing guidance
Ben Straub committed -
Updating the contributing guidance to explain a bit more about how we use PR's
Martin Woodward committed -
Updated the methods of getting involved with the project and asking questions.
Martin Woodward committed
-
- 18 Jul, 2013 3 commits
-
-
Ben Straub committed
-
Switch default calling convention to cdecl
Vicent Martí committed -
git_revparse_ext: should return a NULL reference when the revparse expression doesn't lead to a reference
Ben Straub committed
-
- 17 Jul, 2013 3 commits
-
-
don't include ignored as rename candidates
Vicent Martí committed -
Edward Thomson committed
-
Ben Straub committed
-
- 16 Jul, 2013 2 commits
-
-
Small grammar fix in docs
Ben Straub committed -
Andy Lindeman committed
-
- 15 Jul, 2013 6 commits
-
-
Small fixes
Vicent Martí committed -
Add `git_remote_owner`.
Vicent Martí committed -
Etienne Samson committed
-
Rémi Duraffort committed
-
Rémi Duraffort committed
-
Rémi Duraffort committed
-