Commits · 19f1a8e6f289b07389d525a12a13a4aaeaabe443 · lvzhengyang / git2

29 Sep, 2016 4 commits

diff: improve positioning of add/delete blocks in diffs · 19f1a8e6

Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:

    --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
    +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
    @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
     }

     if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
    +if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {
                            $smtp_server = $_;

The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:

    --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
    +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
    @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
            $initial_reply_to =~ s/(^\s+|\s+$)//g;
     }

    +if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
     if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {

This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.

The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.

Commit d634d61ed6 improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.

This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.

Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:

1. It measures the following characteristics of a proposed splitting
   position in a `struct split_measurement`:

   * the number of blank lines above the proposed split
   * whether the line directly after the split is blank
   * the number of blank lines following that line
   * the indentation of the nearest non-blank line above the split
   * the indentation of the line directly below the split
   * the indentation of the nearest non-blank line after that line

2. It combines the measured attributes using a bunch of
   empirically-optimized weighting factors to derive a `struct
   split_score` that measures the "badness" of splitting the text at
   that position.

3. It combines the `split_score` for the top and the bottom of the
   slider at each of its possible positions, and selects the position
   that has the best `split_score`.

I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:

| repository            | count |      Git 2.9.0 |     compaction | compaction-fixed |       indent-1 |       indent-2 |
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| afnetworking          |   109 |    89  (81.7%) |    37  (33.9%) |      37  (33.9%) |     2   (1.8%) |     2   (1.8%) |
| alamofire             |    30 |    18  (60.0%) |    14  (46.7%) |      15  (50.0%) |     0   (0.0%) |     0   (0.0%) |
| angular               |   184 |   127  (69.0%) |    39  (21.2%) |      23  (12.5%) |     5   (2.7%) |     5   (2.7%) |
| animate               |   313 |     2   (0.6%) |     2   (0.6%) |       2   (0.6%) |     2   (0.6%) |     2   (0.6%) |
| ant                   |   380 |   356  (93.7%) |   152  (40.0%) |     148  (38.9%) |    15   (3.9%) |    15   (3.9%) | *
| bugzilla              |   306 |   263  (85.9%) |   109  (35.6%) |      99  (32.4%) |    14   (4.6%) |    15   (4.9%) | *
| corefx                |   126 |    91  (72.2%) |    22  (17.5%) |      21  (16.7%) |     6   (4.8%) |     6   (4.8%) |
| couchdb               |    78 |    44  (56.4%) |    26  (33.3%) |      28  (35.9%) |     6   (7.7%) |     6   (7.7%) | *
| cpython               |   937 |   158  (16.9%) |    50   (5.3%) |      49   (5.2%) |     5   (0.5%) |     5   (0.5%) | *
| discourse             |   160 |    95  (59.4%) |    42  (26.2%) |      36  (22.5%) |    18  (11.2%) |    13   (8.1%) |
| docker                |   307 |   194  (63.2%) |   198  (64.5%) |     253  (82.4%) |     8   (2.6%) |     8   (2.6%) | *
| electron              |   163 |   132  (81.0%) |    38  (23.3%) |      39  (23.9%) |     6   (3.7%) |     6   (3.7%) |
| git                   |   536 |   470  (87.7%) |    73  (13.6%) |      78  (14.6%) |    16   (3.0%) |    16   (3.0%) | *
| gitflow               |   127 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| ionic                 |   133 |    89  (66.9%) |    29  (21.8%) |      38  (28.6%) |     1   (0.8%) |     1   (0.8%) |
| ipython               |   482 |   362  (75.1%) |   167  (34.6%) |     169  (35.1%) |    11   (2.3%) |    11   (2.3%) | *
| junit                 |   161 |   147  (91.3%) |    67  (41.6%) |      66  (41.0%) |     1   (0.6%) |     1   (0.6%) | *
| lighttable            |    15 |     5  (33.3%) |     0   (0.0%) |       2  (13.3%) |     0   (0.0%) |     0   (0.0%) |
| magit                 |    88 |    75  (85.2%) |    11  (12.5%) |       9  (10.2%) |     1   (1.1%) |     0   (0.0%) |
| neural-style          |    28 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| nodejs                |   781 |   649  (83.1%) |   118  (15.1%) |     111  (14.2%) |     4   (0.5%) |     5   (0.6%) | *
| phpmyadmin            |   491 |   481  (98.0%) |    75  (15.3%) |      48   (9.8%) |     2   (0.4%) |     2   (0.4%) | *
| react-native          |   168 |   130  (77.4%) |    79  (47.0%) |      81  (48.2%) |     0   (0.0%) |     0   (0.0%) |
| rust                  |   171 |   128  (74.9%) |    30  (17.5%) |      27  (15.8%) |    16   (9.4%) |    14   (8.2%) |
| spark                 |   186 |   149  (80.1%) |    52  (28.0%) |      52  (28.0%) |     2   (1.1%) |     2   (1.1%) |
| tensorflow            |   115 |    66  (57.4%) |    48  (41.7%) |      48  (41.7%) |     5   (4.3%) |     5   (4.3%) |
| test-more             |    19 |    15  (78.9%) |     2  (10.5%) |       2  (10.5%) |     1   (5.3%) |     1   (5.3%) | *
| test-unit             |    51 |    34  (66.7%) |    14  (27.5%) |       8  (15.7%) |     2   (3.9%) |     2   (3.9%) | *
| xmonad                |    23 |    22  (95.7%) |     2   (8.7%) |       2   (8.7%) |     1   (4.3%) |     1   (4.3%) | *
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| totals                |  6668 |  4391  (65.9%) |  1496  (22.4%) |    1491  (22.4%) |   150   (2.2%) |   144   (2.2%) |
| totals (training set) |  4552 |  3195  (70.2%) |  1053  (23.1%) |    1061  (23.3%) |    86   (1.9%) |    88   (1.9%) |
| totals (test set)     |  2116 |  1196  (56.5%) |   443  (20.9%) |     430  (20.3%) |    64   (3.0%) |    56   (2.6%) |

In this table, the numbers are the count and percentage of human-rated
sliders that the corresponding algorithm got *wrong*. The columns are

* "repository" - the name of the repository used. I used the diffs
  between successive non-merge commits on the HEAD branch of the
  corresponding repository.

* "count" - the number of sliders that were human-rated. I chose most,
  but not all, sliders to rate from those among which the various
  algorithms gave different answers.

* "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.

* "compaction" - the heuristic used by `git diff --compaction-heuristic`
  in Git 2.9.0.

* "compaction-fixed" - the heuristic used by `git diff
  --compaction-heuristic` after the fixes from earlier in this patch
  series. Note that the results are not dramatically different than
  those for "compaction". Both produce non-ideal diffs only about 1/3 as
  often as the default `git diff`.

* "indent-1" - the new `--indent-heuristic` algorithm, using the first
  set of weighting factors, determined as described above.

* "indent-2" - the new `--indent-heuristic` algorithm, using the final
  set of weighting factors, determined as described below.

* `*` - indicates that repo was part of training set used to determine
  the first set of weighting factors.

The fact that the heuristic performed nearly as well on the test set as
on the training set in column "indent-1" is a good indication that the
heuristic was not over-trained. Given that fact, I ran a second round of
optimization, using the entire corpus as the training set. The resulting
set of weights gave the results in column "indent-2". These are the
weights included in this patch.

The final result gives consistently and significantly better results
across the whole corpus than either `git diff` or `git diff
--compaction-heuristic`. It makes only about 1/30 as many errors as the
former and about 1/10 as many errors as the latter. (And a good fraction
of the remaining errors are for diffs that involve weirdly-formatted
code, sometimes apparently machine-generated.)

The tools that were used to do this optimization and analysis, along
with the human-generated data values, are recorded in a separate project
[1].

[1] https://github.com/mhagger/diff-slider-tools

Original Git commit: 433860f3d0beb0c6f205290bd16cda413148f098

committed Sep 29, 2016

19f1a8e6 Browse Files

xdl_change_compact(): introduce the concept of a change group · a49895b5

The idea of xdl_change_compact() is fairly simple:

* Proceed through groups of changed lines in the file to be compacted,
  keeping track of the corresponding location in the "other" file.

* If possible, slide the group up and down to try to give the most
  aesthetically pleasing diff. Whenever it is slid, the current location
  in the other file needs to be adjusted.

But these simple concepts are obfuscated by a lot of index handling that
is written in terse, subtle, and varied patterns. I found it very hard
to convince myself that the function was correct.

So introduce a "struct group" that represents a group of changed lines
in a file. Add some functions that perform elementary operations on
groups:

* Initialize a group to the first group in a file
* Move to the next or previous group in a file
* Slide a group up or down

Even though the resulting code is longer, I think it is easier to
understand and review. Its performance is not changed
appreciably (though it would be if `group_next()` and `group_previous()`
were not inlined).

...and in fact, the rewriting helped me discover another bug in the
--compaction-heuristic code: The update of blank_lines was never done
for the highest possible position of the group. This means that it could
fail to slide the group to its highest possible position, even if that
position had a blank line as its last line. So for example, it yielded
the following diff:

    $ git diff --no-index --compaction-heuristic a.txt b.txt
    diff --git a/a.txt b/b.txt
    index e53969f..0d60c5fe 100644
    --- a/a.txt
    +++ b/b.txt
    @@ -1,3 +1,7 @@
     1
     A
    +
    +B
    +
    +A
     2

when in fact the following diff is better (according to the rules of
--compaction-heuristic):

    $ git diff --no-index --compaction-heuristic a.txt b.txt
    diff --git a/a.txt b/b.txt
    index e53969f..0d60c5fe 100644
    --- a/a.txt
    +++ b/b.txt
    @@ -1,3 +1,7 @@
     1
    +A
    +
    +B
    +
     A
     2

The new code gives the bottom answer.

Original Git commit: e8adf23d1ee97b57c8aea32ee8365203b77c0e42

committed Sep 29, 2016

a49895b5 Browse Files

recs_match(): take two xrecord_t pointers as arguments · 09fb5b2a

There is no reason for it to take an array and two indexes as argument,
as it only accesses two elements of the array.

Original Git commit: 152598cbb667471c8f5be16e199922a41452b2d5

committed Sep 29, 2016

09fb5b2a Browse Files

xdiff: add recs_match helper function · 506bf09d

It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function recs_match which performs both checks to increase
code readability.

Original Git commit: 92e5b62fec0e9b647429e8d3736c571c434dd375

committed Sep 29, 2016

506bf09d Browse Files

13 Sep, 2016 2 commits
- Merge pull request #3929 from libgit2/vmg/time · 89c332e4
```
time: Export `git_time_monotonic`
```
  Edward Thomson committed Sep 13, 2016
  89c332e4 Browse Files
- time: Export `git_time_monotonic` · 2749ff46
  Vicent Marti committed Sep 13, 2016
  
  2749ff46 Browse Files
09 Sep, 2016 1 commit
- Merge pull request #3925 from pks-t/pks/cmake-library-dirs · bba704ad
```
cmake: add curl library path
```
  Patrick Steinhardt committed Sep 09, 2016
  bba704ad Browse Files
06 Sep, 2016 2 commits
- Merge pull request #3923 from libgit2/ethomson/diff-read-empty-binary · 9ad07fc0
```
Read binary patches (with no binary data)
```
  Edward Thomson committed Sep 06, 2016
  9ad07fc0 Browse Files
- Merge pull request #3882 from pks-t/pks/fix-fetch-refspec-dst-parsing · 46035d98
```
refspec: do not set empty rhs for fetch refspecs
```
  Patrick Steinhardt committed Sep 06, 2016
  46035d98 Browse Files
05 Sep, 2016 2 commits

diff: treat binary patches with no data special · adedac5a

When creating and printing diffs, deal with binary deltas that have
binary data specially, versus diffs that have a binary file but lack the
actual binary data.

committed Sep 05, 2016

adedac5a Browse Files

cmake: add curl library path · 528b2f7d

The `PKG_CHECK_MODULES` function searches a pkg-config module and
then proceeds to set various variables containing information on
how to link to the library. In contrast to the `FIND_PACKAGE`
function, the library path set by `PKG_CHECK_MODULES` will not
necessarily contain linking instructions with a complete path to
the library, though. So when a library is not installed in a
standard location, the linker might later fail due to being
unable to locate it.

While we already honor this when configuring libssh2 by adding
`LIBSSH2_LIBRARY_DIRS` to the link directories, we fail to do so
for libcurl, preventing us to build libgit2 on e.g. FreeBSD. Fix
the issue by adding the curl library directory to the linker
search path.

committed Sep 05, 2016

528b2f7d Browse Files

02 Sep, 2016 3 commits

diff_print: change test for skipping binary printing · f4e3dae7

Instead of skipping printing a binary diff when there is no data, skip
printing when we have a status of `UNMODIFIED`.  This is more in-line
with our internal data model and allows us to expand the notion of
binary data.

In the future, there may have no data because the files were unmodified
(there was no data to produce) or it may have no data because there was
no data given to us in a patch.  We want to treat these cases
separately.

committed Sep 02, 2016

f4e3dae7 Browse Files

patch: error on diff callback failure · 4bfd7c63
Edward Thomson committed Sep 02, 2016

4bfd7c63 Browse Files
Merge pull request #3922 from pks-t/pks/diff-only-load-binaries-when-requested · ce54e77c
```
patch_generate: only calculate binary diffs if requested
```
Edward Thomson committed Sep 02, 2016
ce54e77c Browse Files

01 Sep, 2016 1 commit

patch_generate: only calculate binary diffs if requested · 4b34f687

When generating diffs for binary files, we load and decompress
the blobs in order to generate the actual diff, which can be very
costly. While we cannot avoid this for the case when we are
called with the `GIT_DIFF_SHOW_BINARY` flag, we do not have to
load the blobs in the case where this flag is not set, as the
caller is expected to have no interest in the actual content of
binary files.

Fix the issue by only generating a binary diff when the caller is
actually interested in the diff. As libgit2 uses heuristics to
determine that a blob contains binary data by inspecting its size
without loading from the ODB, this saves us quite some time when
diffing in a repository with binary files.

committed Sep 01, 2016

4b34f687 Browse Files

30 Aug, 2016 3 commits
- Merge pull request #3915 from pks-t/pks/index-collision-test-leak · 40b08124
```
tests: index: do not re-allocate index
```
  Carlos Martín Nieto committed Aug 30, 2016
  40b08124 Browse Files
- Merge pull request #3907 from steffhip/git_checkout_tree-fix · a08e8825
  Patrick Steinhardt committed Aug 30, 2016
  
  a08e8825 Browse Files
- git_checkout_tree options fix · 88cfe614
```
According to the reference the git_checkout_tree and git_checkout_head
functions should accept NULL in the opts field

This was broken since the opts field was dereferenced and thus lead to a
crash.
```
  Stefan Huber committed Aug 30, 2016
  88cfe614 Browse Files
29 Aug, 2016 4 commits
- Merge pull request #3914 from pks-t/pks/libqgit2-binding-url · dfd79576
```
README: adjust URL to libqgit2 repository
```
  Edward Thomson committed Aug 29, 2016
  dfd79576 Browse Files
- tests: index: do not re-allocate index · 86e88534
```
Plug a memory leak caused by re-allocating a `git_index`
structure which has already been allocated by the test suite's
initializer.
```
  Patrick Steinhardt committed Aug 29, 2016
  86e88534 Browse Files
- README: adjust URL to libqgit2 repository · 8044ee42
  Patrick Steinhardt committed Aug 29, 2016
  
  8044ee42 Browse Files
- Merge pull request #3900 from pks-t/pks/http-close-substream-on-connect · ace0d36b
```
transports: http: set substream as disconnected after closing
```
  Patrick Steinhardt committed Aug 29, 2016
  ace0d36b Browse Files
26 Aug, 2016 1 commit
- Merge pull request #3908 from libgit2/ethomson/patch_from_diff · 5671e81f
```
Teach `git_patch_from_diff` about parsed diffs
```
  Edward Thomson committed Aug 26, 2016
  5671e81f Browse Files
24 Aug, 2016 2 commits
- Teach `git_patch_from_diff` about parsed diffs · b859faa6
```
Ensure that `git_patch_from_diff` can return the patch for parsed diffs,
not just generate a patch for a generated diff.
```
  Edward Thomson committed Aug 24, 2016
  b859faa6 Browse Files
- Merge pull request #3904 from stinb/filesystem-iterator-double-free · c60210d3
```
filesystem_iterator: fixed double free on error
```
  Patrick Steinhardt committed Aug 24, 2016
  c60210d3 Browse Files
22 Aug, 2016 1 commit
- filesystem_iterator: fixed double free on error · 7a3f1de5
  Jason Haslam committed Aug 22, 2016
  
  7a3f1de5 Browse Files
17 Aug, 2016 4 commits

Merge pull request #3837 from novalis/dturner/indexv4 · c1b370e9
```
Support index v4
```
Edward Thomson committed Aug 17, 2016
c1b370e9 Browse Files
Merge pull request #3895 from pks-t/pks/negate-basename-in-subdirs · 635a9222
```
ignore: allow unignoring basenames in subdirectories
```
Edward Thomson committed Aug 17, 2016
635a9222 Browse Files
transports: http: reset `connected` flag when closing transport · b1453601
Patrick Steinhardt committed Aug 17, 2016

b1453601 Browse Files

transports: http: reset `connected` flag when re-connecting transport · c4cba4e9

When calling `http_connect` on a subtransport whose stream is already
connected, we first close the stream in case no keep-alive is in use.
When doing so, we do not reset the transport's connection state,
though. Usually, this will do no harm in case the subsequent connect
will succeed. But when the connection fails we are left with a
substransport which is tagged as connected but which has no valid
stream attached.

Fix the issue by resetting the subtransport's connected-state when
closing its stream in `http_connect`.

committed Aug 17, 2016

c4cba4e9 Browse Files

12 Aug, 2016 1 commit

ignore: allow unignoring basenames in subdirectories · fcb2c1c8

The .gitignore file allows for patterns which unignore previous
ignore patterns. When unignoring a previous pattern, there are
basically three cases how this is matched when no globbing is
used:

1. when a previous file has been ignored, it can be unignored by
   using its exact name, e.g.

   foo/bar
   !foo/bar

2. when a file in a subdirectory has been ignored, it can be
   unignored by using its basename, e.g.

   foo/bar
   !bar

3. when all files with a basename are ignored, a specific file
   can be unignored again by specifying its path in a
   subdirectory, e.g.

   bar
   !foo/bar

The first problem in libgit2 is that we did not correctly treat
the second case. While we verified that the negative pattern
matches the tail of the positive one, we did not verify if it
only matches the basename of the positive pattern. So e.g. we
would have also negated a pattern like

    foo/fruz_bar
    !bar

Furthermore, we did not check for the third case, where a
basename is being unignored in a certain subdirectory again.

Both issues are fixed with this commit.

committed Aug 12, 2016

fcb2c1c8 Browse Files

10 Aug, 2016 2 commits

index: support index v4 · 5625d86b

Support reading and writing index v4. Index v4 uses a very simple
compression scheme for pathnames, but is otherwise similar to index v3.

Signed-off-by: David Turner <dturner@twitter.com>

committed Aug 10, 2016

5625d86b Browse Files

varint: Add varint encoding/decoding · aeb5ee5a

This code is ported from git.git

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Turner <dturner@twopensource.com>

committed Aug 10, 2016

aeb5ee5a Browse Files

09 Aug, 2016 4 commits
- Merge pull request #3891 from pks-t/pks/stransport-memory-management-improvements · 26a8617d
```
stransport memory management improvements
```
  Carlos Martín Nieto committed Aug 09, 2016
  26a8617d Browse Files
- Merge pull request #3893 from pks-t/pks/remove-unused-test-cb · 5961face
```
tests: blob: remove unused callback function
```
  Edward Thomson committed Aug 09, 2016
  5961face Browse Files
- tests: blob: remove unused callback function · 4006455f
  Patrick Steinhardt committed Aug 09, 2016
  
  4006455f Browse Files
- stransport: do not use `git_stream_free` on uninitialized stransport · b9895144
```
When failing to initialize a new stransport stream, we try to
release already allocated memory by calling out to
`git_stream_free`, which in turn called out to the stream's
`free` function pointer. As we only initialize the function
pointer later on, this leads to a `NULL` pointer exception.

Furthermore, plug another memory leak when failing to create the
SSL context.
```
  Patrick Steinhardt committed Aug 09, 2016
  b9895144 Browse Files
08 Aug, 2016 3 commits
- Merge pull request #3887 from libgit2/ethomson/empty_blob · 97e57e87
```
odb: only provide the empty tree
```
  Carlos Martín Nieto committed Aug 08, 2016
  97e57e87 Browse Files
- Merge pull request #3890 from pks-t/pks/stransport-static-linkage · b47e79e2
```
stransport: make internal functions static
```
  Edward Thomson committed Aug 08, 2016
  b47e79e2 Browse Files
- stransport: make internal functions static · 067bf5dc
  Patrick Steinhardt committed Aug 08, 2016
  
  067bf5dc Browse Files