- 22 Feb, 2013 3 commits
-
-
Instead of creating three git_diff_similarity_metric statically for the various config options, just create the metric structure on demand and populate it, using the payload to specific the extra flags that should be passed to the hashsig. This removes a level of obfuscation from the code, I think.
Russell Belfer committed -
This fixes both a test that I broke in diff::patch where I was relying on the current state of the working directory for the renames test data and fixes an unstable test in diff::rename where the environment setting for the "diff.renames" config was being allowed to influence the test results.
Russell Belfer committed -
This adds some new tests that actually exercise the similarity metric between files to detect renames, copies, and split modified files that are too heavily modified. There is still more testing to do - these tests are just partially covering the cases. There is also one bug fix in this where a change set with only MODIFY being broken into ADD/DELETE (due to low self-similarity) without any additional RENAMED entries would end up not processing the split requests (because the num_rewrites counter got reset).
Russell Belfer committed
-
- 21 Feb, 2013 1 commit
-
-
This is the initial integration of the similarity metric into the `git_diff_find_similar()` code path. The existing tests all pass, but the new functionality isn't currently well tested. The integration does go through the pluggable metric interface, so it should be possible to drop in an alternative to the internal metric that libgit2 implements. This comes along with a behavior change for an existing interface; namely, passing two NULLs to git_diff_blobs (or passing NULLs to git_diff_blob_to_buffer) will now call the file_cb parameter zero times instead of one time. I know it's strange that that change is paired with this other change, but it emerged from some initialization changes that I ended up making.
Russell Belfer committed
-
- 20 Feb, 2013 19 commits
-
-
Previously the git_diff_delta recorded if the delta was binary. This replaces that (with no net change in structure size) with a full set of flags. The flag values that were already in use for individual git_diff_file objects are reused for the delta flags, too (along with renaming those flags to make it clear that they are used more generally). This (a) makes things somewhat more consistent (because I was using a -1 value in the "boolean" binary field to indicate unset, whereas now I can just use the flags that are easier to understand), and (b) will make it easier for me to add some additional flags to the delta object in the future, such as marking the results of a copy/rename detection or other deltas that might want a special indicator. While making this change, I officially moved some of the flags that were internal only into the private diff header. This also allowed me to remove a gross hack in rename/copy detect code where I was overwriting the status field with an internal value.
Russell Belfer committed -
This plugs in the three basic similarity strategies for handling whitespace via internal use of the pluggable API. In so doing, I realized that the use of git_buf in the hashsig API was not needed and actually just made it harder to use, so I tweaked that API as well. Note that the similarity metric is still not hooked up in the find_similarity code - this is just setting out the function that will be used.
Russell Belfer committed -
Russell Belfer committed
-
Seems to be working pretty well...
Russell Belfer committed -
This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.
Russell Belfer committed -
Russell Belfer committed
-
This makes the text similarity metric treat \r as equivalent to \n and makes it skip whitespace immediately following a line terminator, so line indentation will have less effect on the difference measurement (and so \r\n will be treated as just a single line terminator). This also separates the text and binary hash calculators into two separate functions instead of have more if statements inside the loop. This should make it easier to have more differentiated heuristics in the future if we so wish.
Russell Belfer committed -
This adds a new `git_buf_text_hashsig` type and functions to generate these hash signatures and compare them to give a similarity score. This can be plugged into diff similarity scoring.
Russell Belfer committed -
Add more treebuilder tests
Vicent Martí committed -
The recent changes with git_treebuilder_entrycount point out that the test coverage for git_treebuilder_remove and git_treebuilder_entrycount is completely absent. This adds tests.
Russell Belfer committed -
Add explicit entrycount to tree builder
Vicent Martí committed -
This replaces most of the explicit vector iteration with calls to git_vector_foreach, adds in some git__free and giterr_clear calls to clean up during some error paths, and a couple of other code simplifications.
Russell Belfer committed -
The treebuilder entries vector flags removed items which means we can't rely on the entries vector length to accurately get the number of entries. This adds an entrycount value and maintains it while updating the treebuilder entries.
Russell Belfer committed -
Simplify signature parsing
Russell Belfer committed -
Disable caching in Clar
Vicent Martí committed -
Vicent Marti committed
-
Vicent Marti committed
-
Vicent Marti committed
-
Vicent Marti committed
-
- 17 Feb, 2013 1 commit
-
-
Fix static analyzer issues
Vicent Martí committed
-
- 16 Feb, 2013 2 commits
-
-
The cppcheck static analyzer generates warnings for a bunch of places in the libgit2 code base. All the ones fixed in this commit are actually false positives, but I've reorganized the code to hopefully make it easier for static analysis tools to correctly understand the structure. I wouldn't do this if I felt like it was making the code harder to read or worse for humans, but in this case, these fixes don't seem too bad and will hopefully make it easier for better analysis tools to get at any real issues.
Russell Belfer committed -
If gethostbyname() fails on platforms with NO_ADDRINFO, the code leaks the struct addrinfo that was allocated. This fixes that (and a number of code formatting issues in that area of code in src/posix.c).
Russell Belfer committed
-
- 15 Feb, 2013 3 commits
-
-
There were a number of functions assigning their return value to `error` without much explanation. I added in some rudimentary error checking to help flesh out the example. Also, I reformatted all of the comments down to 80 cols (and in some cases, slightly updated the wording).
Russell Belfer committed -
push: fix typo in git_push_finish() doc
Vicent Martí committed -
Alessandro Ghedini committed
-
- 14 Feb, 2013 6 commits
-
-
Topic/diff tree coverage
Vicent Martí committed -
push: improve docs on success / failure of git_push_finish
Vicent Martí committed -
Michael Schubert committed
-
Ben Straub committed
-
Philip Kelley committed
-
Improve MSVC compiler, linker flags
Vicent Martí committed
-
- 13 Feb, 2013 2 commits
-
-
Philip Kelley committed
-
Philip Kelley committed
-
- 12 Feb, 2013 3 commits
-
-
Add git_push_options, to set packbuilder parallelism
Ben Straub committed -
Add FORCE_TEXT check into git_diff_blobs code path
Russell Belfer committed -
Allow network operations to cancel
Russell Belfer committed
-