1. 23 Feb, 2022 2 commits
  2. 21 Sep, 2019 1 commit
    • regexp: implement new regular expression API · d77378eb
      We currently support a set of different regular expression backends with
      PCRE, PCRE2, regcomp(3P) and regcomp_l(3). The current implementation of
      this is done via a simple POSIX wrapper that either directly uses
      supplied functions or that is a very small wrapper.
      
      To support PCRE and PCRE2, we use their provided <pcreposix.h> and
      <pcre2posix.h> wrappers. These wrappers are implemented in such a way
      that the accompanying libraries pcre-posix and pcre2-posix provide the
      same symbols as the libc ones, namely regcomp(3P) et al. This works out
      on some systems just fine, most importantly on glibc-based ones, where
      the regular expression functions are implemented as weak aliases and
      thus get overridden by linking in the pcre{,2}-posix library. On other
      systems we depend on the linking order of libc and pcre library, and as
      libc always comes first we will end up with the functions of the libc
      implementation. As a result, we may use the structures `regex_t` and
      `regmatch_t` declared by <pcre{,2}posix.h>, but use functions defined by
      the libc, leading to segfaults.
      
      The issue is not easily solvable. Somed distributions like Debian have
      resolved this by patching PCRE and PCRE2 to carry custom prefixes to all
      the POSIX function wrappers. But this is not supported by upstream and
      thus inherently unportable between distributions. We could instead try
      to modify linking order, but this starts becoming fragile and will not
      work e.g. when libgit2 is loaded via dlopen(3P) or similar ways. In the
      end, this means that we simply cannot use the POSIX wrappers provided by
      the PCRE libraries at all.
      
      Thus, this commit introduces a new regular expression API. The new API
      is on a tad higher level than the previous POSIX abstraction layer, as
      it tries to abstract away any non-portable flags like e.g. REG_EXTENDED,
      which has no equivalents in all of our supported backends. As there are
      no users of POSIX regular expressions that do _not_ reguest REG_EXTENDED
      this is fine to be abstracted away, though. Due to the API being
      higher-level than before, it should generally be a tad easier to use
      than the previous one.
      
      Note: ideally, the new API would've been called `git_regex_foobar` with
      a file "regex.h" and "regex.c". Unfortunately, this is currently
      impossible to implement due to naming clashes between the then-existing
      "regex.h" and <regex.h> provided by the libc. As we add the source
      directory of libgit2 to the header search path, an include of <regex.h>
      would always find our own "regex.h". Thus, we have to take the bitter
      pill of adding one more character to all the functions to disambiguate
      the includes.
      
      To improve guarantees around cross-backend compatibility, this commit
      also brings along an improved regular expression test suite
      core::regexp.
      Patrick Steinhardt committed