Commit d3d43aab by Neil Booth Committed by Neil Booth

* doc/cppinternals.texi: Update.

From-SVN: r46009
parent 3054eeed
2001-10-04 Neil Booth <neil@daikokuya.demon.co.uk>
* doc/cppinternals.texi: Update.
2001-10-04 Eric Christopher <echristo@redhat.com> 2001-10-04 Eric Christopher <echristo@redhat.com>
* config/mips/mips.c (init_cumulative_args): Remember to set * config/mips/mips.c (init_cumulative_args): Remember to set
......
...@@ -66,7 +66,8 @@ into another language, under the above conditions for modified versions. ...@@ -66,7 +66,8 @@ into another language, under the above conditions for modified versions.
@contents @contents
@page @page
@node Top, Conventions,, (DIR) @node Top
@top
@chapter Cpplib---the core of the GNU C Preprocessor @chapter Cpplib---the core of the GNU C Preprocessor
The GNU C preprocessor in GCC 3.x has been completely rewritten. It is The GNU C preprocessor in GCC 3.x has been completely rewritten. It is
...@@ -87,16 +88,18 @@ tricky issues encountered. It also describes certain behaviour we would ...@@ -87,16 +88,18 @@ tricky issues encountered. It also describes certain behaviour we would
like to preserve, such as the format and spacing of its output. like to preserve, such as the format and spacing of its output.
@menu @menu
* Conventions:: Conventions used in the code. * Conventions:: Conventions used in the code.
* Lexer:: The combined C, C++ and Objective-C Lexer. * Lexer:: The combined C, C++ and Objective-C Lexer.
* Whitespace:: Input and output newlines and whitespace. * Hash Nodes:: All identifiers are entered into a hash table.
* Hash Nodes:: All identifiers are hashed. * Macro Expansion:: Macro expansion algorithm.
* Macro Expansion:: Macro expansion algorithm. * Token Spacing:: Spacing and paste avoidance issues.
* Files:: File handling. * Line Numbering:: Tracking location within files.
* Index:: Index. * Guard Macros:: Optimizing header files with guard macros.
* Files:: File handling.
* Index:: Index.
@end menu @end menu
@node Conventions, Lexer, Top, Top @node Conventions
@unnumbered Conventions @unnumbered Conventions
@cindex interface @cindex interface
@cindex header files @cindex header files
...@@ -118,9 +121,11 @@ change internals in the future without worrying whether library clients ...@@ -118,9 +121,11 @@ change internals in the future without worrying whether library clients
are perhaps relying on some kind of undocumented implementation-specific are perhaps relying on some kind of undocumented implementation-specific
behaviour. behaviour.
@node Lexer, Whitespace, Conventions, Top @node Lexer
@unnumbered The Lexer @unnumbered The Lexer
@cindex lexer @cindex lexer
@cindex newlines
@cindex escaped newlines
@section Overview @section Overview
The lexer is contained in the file @file{cpplex.c}. It is a hand-coded The lexer is contained in the file @file{cpplex.c}. It is a hand-coded
...@@ -143,7 +148,7 @@ output. ...@@ -143,7 +148,7 @@ output.
@section Lexing a token @section Lexing a token
Lexing of an individual token is handled by @code{_cpp_lex_direct} and Lexing of an individual token is handled by @code{_cpp_lex_direct} and
its subroutines. In its current form the code is quite complicated, its subroutines. In its current form the code is quite complicated,
with read ahead characters and suchlike, since it strives to not step with read ahead characters and such-like, since it strives to not step
back in the character stream in preparation for handling non-ASCII file back in the character stream in preparation for handling non-ASCII file
encodings. The current plan is to convert any such files to UTF-8 encodings. The current plan is to convert any such files to UTF-8
before processing them. This complexity is therefore unnecessary and before processing them. This complexity is therefore unnecessary and
...@@ -175,7 +180,7 @@ using the line map code. ...@@ -175,7 +180,7 @@ using the line map code.
The first token on a logical, i.e.@: unescaped, line has the flag The first token on a logical, i.e.@: unescaped, line has the flag
@code{BOL} set for beginning-of-line. This flag is intended for @code{BOL} set for beginning-of-line. This flag is intended for
internal use, both to distinguish a @samp{#} that begins a directive internal use, both to distinguish a @samp{#} that begins a directive
from one that doesn't, and to generate a callback to clients that want from one that doesn't, and to generate a call-back to clients that want
to be notified about the start of every non-directive line with tokens to be notified about the start of every non-directive line with tokens
on it. Clients cannot reliably determine this for themselves: the first on it. Clients cannot reliably determine this for themselves: the first
token might be a macro, and the tokens of a macro expansion do not have token might be a macro, and the tokens of a macro expansion do not have
...@@ -219,9 +224,28 @@ foo ...@@ -219,9 +224,28 @@ foo
@end smallexample @end smallexample
This is a good example of the subtlety of getting token spacing correct This is a good example of the subtlety of getting token spacing correct
in the preprocessor; there are plenty of tests in the testsuite for in the preprocessor; there are plenty of tests in the test-suite for
corner cases like this. corner cases like this.
The lexer is written to treat each of @samp{\r}, @samp{\n}, @samp{\r\n}
and @samp{\n\r} as a single new line indicator. This allows it to
transparently preprocess MS-DOS, Macintosh and Unix files without their
needing to pass through a special filter beforehand.
We also decided to treat a backslash, either @samp{\} or the trigraph
@samp{??/}, separated from one of the above newline indicators by
non-comment whitespace only, as intending to escape the newline. It
tends to be a typing mistake, and cannot reasonably be mistaken for
anything else in any of the C-family grammars. Since handling it this
way is not strictly conforming to the ISO standard, the library issues a
warning wherever it encounters it.
Handling newlines like this is made simpler by doing it in one place
only. The function @code{handle_newline} takes care of all newline
characters, and @code{skip_escaped_newlines} takes care of arbitrarily
long sequences of escaped newlines, deferring to @code{handle_newline}
to handle the newlines themselves.
The most painful aspect of lexing ISO-standard C and C++ is handling The most painful aspect of lexing ISO-standard C and C++ is handling
trigraphs and backlash-escaped newlines. Trigraphs are processed before trigraphs and backlash-escaped newlines. Trigraphs are processed before
any interpretation of the meaning of a character is made, and unfortunately any interpretation of the meaning of a character is made, and unfortunately
...@@ -255,6 +279,7 @@ should be done even within C-style comments; they can appear in the ...@@ -255,6 +279,7 @@ should be done even within C-style comments; they can appear in the
middle of a line, and we want to report diagnostics in the correct middle of a line, and we want to report diagnostics in the correct
position for text appearing after the end of the comment. position for text appearing after the end of the comment.
@anchor{Invalid identifiers}
Some identifiers, such as @code{__VA_ARGS__} and poisoned identifiers, Some identifiers, such as @code{__VA_ARGS__} and poisoned identifiers,
may be invalid and require a diagnostic. However, if they appear in a may be invalid and require a diagnostic. However, if they appear in a
macro expansion we don't want to complain with each use of the macro. macro expansion we don't want to complain with each use of the macro.
...@@ -282,94 +307,100 @@ two separate @samp{:} tokens and almost certainly a syntax error. Such ...@@ -282,94 +307,100 @@ two separate @samp{:} tokens and almost certainly a syntax error. Such
cases are handled by @code{_cpp_lex_direct} based upon command-line cases are handled by @code{_cpp_lex_direct} based upon command-line
flags stored in the @code{cpp_options} structure. flags stored in the @code{cpp_options} structure.
Once a token has been lexed, it leads an independent existence. The
spelling of numbers, identifiers and strings is copied to permanent
storage from the original input buffer, so a token remains valid and
correct even if its source buffer is freed with @code{_cpp_pop_buffer}.
The storage holding the spellings of such tokens remains until the
client program calls cpp_destroy, probably at the end of the translation
unit.
@anchor{Lexing a line} @anchor{Lexing a line}
@section Lexing a line @section Lexing a line
@cindex token run
@node Whitespace, Hash Nodes, Lexer, Top
@unnumbered Whitespace When the preprocessor was changed to return pointers to tokens, one
@cindex whitespace feature I wanted was some sort of guarantee regarding how long a
@cindex newlines returned pointer remains valid. This is important to the stand-alone
@cindex escaped newlines preprocessor, the future direction of the C family front ends, and even
@cindex paste avoidance to cpplib itself internally.
@cindex line numbers
Occasionally the preprocessor wants to be able to peek ahead in the
The lexer has been written to treat each of @samp{\r}, @samp{\n}, token stream. For example, after the name of a function-like macro, it
@samp{\r\n} and @samp{\n\r} as a single new line indicator. This allows wants to check the next token to see if it is an opening parenthesis.
it to transparently preprocess MS-DOS, Macintosh and Unix files without Another example is that, after reading the first few tokens of a
their needing to pass through a special filter beforehand. @code{#pragma} directive and not recognising it as a registered pragma,
it wants to backtrack and allow the user-defined handler for unknown
We also decided to treat a backslash, either @samp{\} or the trigraph pragmas to access the full @code{#pragma} token stream. The stand-alone
@samp{??/}, separated from one of the above newline indicators by preprocessor wants to be able to test the current token with the
non-comment whitespace only, as intending to escape the newline. It previous one to see if a space needs to be inserted to preserve their
tends to be a typing mistake, and cannot reasonably be mistaken for separate tokenization upon re-lexing (paste avoidance), so it needs to
anything else in any of the C-family grammars. Since handling it this be sure the pointer to the previous token is still valid. The
way is not strictly conforming to the ISO standard, the library issues a recursive-descent C++ parser wants to be able to perform tentative
warning wherever it encounters it. parsing arbitrarily far ahead in the token stream, and then to be able
to jump back to a prior position in that stream if necessary.
Handling newlines like this is made simpler by doing it in one place
only. The function @samp{handle_newline} takes care of all newline The rule I chose, which is fairly natural, is to arrange that the
characters, and @samp{skip_escaped_newlines} takes care of arbitrarily preprocessor lex all tokens on a line consecutively into a token buffer,
long sequences of escaped newlines, deferring to @samp{handle_newline} which I call a @dfn{token run}, and when meeting an unescaped new line
to handle the newlines themselves. (newlines within comments do not count either), to start lexing back at
the beginning of the run. Note that we do @emph{not} lex a line of
Another whitespace issue only concerns the stand-alone preprocessor: we tokens at once; if we did that @code{parse_identifier} would not have
want to guarantee that re-reading the preprocessed output results in an state flags available to warn about invalid identifiers (@pxref{Invalid
identical token stream. Without taking special measures, this might not identifiers}).
be the case because of macro substitution. We could simply insert a
space between adjacent tokens, but ideally we would like to keep this to In other words, accessing tokens that appeared earlier in the current
a minimum, both for aesthetic reasons and because it causes problems for line is valid, but since each logical line overwrites the tokens of the
people who still try to abuse the preprocessor for things like Fortran previous line, tokens from prior lines are unavailable. In particular,
source and Makefiles. since a directive only occupies a single logical line, this means that
the directive handlers like the @code{#pragma} handler can jump around
The token structure contains a flags byte, and two flags are of interest in the directive's tokens if necessary.
here: @samp{PREV_WHITE} and @samp{AVOID_LPASTE}. @samp{PREV_WHITE}
indicates that the token was preceded by whitespace; if this is the case Two issues remain: what about tokens that arise from macro expansions,
we need not worry about it incorrectly pasting with its predecessor. and what happens when we have a long line that overflows the token run?
The @samp{AVOID_LPASTE} flag is set by the macro expansion routines, and
indicates that paste avoidance by insertion of a space to the left of Since we promise clients that we preserve the validity of pointers that
the token may be necessary. Recursively, the first token of a macro we have already returned for tokens that appeared earlier in the line,
substitution, the first token after a macro substitution, the first we cannot reallocate the run. Instead, on overflow it is expanded by
token of a substituted argument, and the first token after a substituted chaining a new token run on to the end of the existing one.
argument are all flagged @samp{AVOID_LPASTE} by the macro expander.
The tokens forming a macro's replacement list are collected by the
If a token flagged in this way does not have a @samp{PREV_WHITE} flag, @code{#define} handler, and placed in storage that is only freed by
and the routine @code{cpp_avoid_paste} determines that it might be @code{cpp_destroy}. So if a macro is expanded in our line of tokens,
misinterpreted by the lexer if a space is not inserted between it and the pointers to the tokens of its expansion that we return will always
the immediately preceding token, then stand-alone CPP's output routines remain valid. However, macros are a little trickier than that, since
will insert a space between them. To avoid excessive spacing, they give rise to three sources of fresh tokens. They are the built-in
@code{cpp_avoid_paste} tries hard to only request a space if one is macros like @code{__LINE__}, and the @samp{#} and @samp{##} operators
likely to be necessary, but for reasons of efficiency it is slightly for stringifcation and token pasting. I handled this by allocating
conservative and might recommend a space where one is not strictly space for these tokens from the lexer's token run chain. This means
needed. they automatically receive the same lifetime guarantees as lexed tokens,
and we don't need to concern ourselves with freeing them.
Finally, the preprocessor takes great care to ensure it keeps track of
both the position of a token in the source file, for diagnostic Lexing into a line of tokens solves some of the token memory management
purposes, and where it should appear in the output file, because using issues, but not all. The opening parenthesis after a function-like
CPP for other languages like assembler requires this. The two positions macro name might lie on a different line, and the front ends definitely
may differ for the following reasons: want the ability to look ahead past the end of the current line. So
cpplib only moves back to the start of the token run at the end of a
@itemize @bullet line if the variable @code{keep_tokens} is zero. Line-buffering is
@item quite natural for the preprocessor, and as a result the only time cpplib
Escaped newlines are deleted, so lines spliced in this way are joined to needs to increment this variable is whilst looking for the opening
form a single logical line. parenthesis to, and reading the arguments of, a function-like macro. In
the near future cpplib will export an interface to increment and
@item decrement this variable, so that clients can share full control over the
A macro expansion replaces the tokens that form its invocation, but any lifetime of token pointers too.
newlines appearing in the macro's arguments are interpreted as a single
space, with the result that the macro's replacement appears in full on The routine @code{_cpp_lex_token} handles moving to new token runs,
the same line that the macro name appeared in the source file. This is calling @code{_cpp_lex_direct} to lex new tokens, or returning
particularly important for stringification of arguments---newlines previously-lexed tokens if we stepped back in the token stream. It also
embedded in the arguments must appear in the string as spaces. checks each token for the @code{BOL} flag, which might indicate a
@end itemize directive that needs to be handled, or require a start-of-line call-back
to be made. @code{_cpp_lex_token} also handles skipping over tokens in
The source file location is maintained in the @code{lineno} member of the failed conditional blocks, and invalidates the control macro of the
@code{cpp_buffer} structure, and the column number inferred from the multiple-include optimization if a token was successfully lexed outside
current position in the buffer relative to the @code{line_base} buffer a directive. In other words, its callers do not need to concern
variable, which is updated with every newline whether escaped or not. themselves with such issues.
TODO: Finish this. @node Hash Nodes
@node Hash Nodes, Macro Expansion, Whitespace, Top
@unnumbered Hash Nodes @unnumbered Hash Nodes
@cindex hash table @cindex hash table
@cindex identifiers @cindex identifiers
...@@ -377,12 +408,12 @@ TODO: Finish this. ...@@ -377,12 +408,12 @@ TODO: Finish this.
@cindex assertions @cindex assertions
@cindex named operators @cindex named operators
When cpplib encounters an ``identifier'', it generates a hash code for it When cpplib encounters an ``identifier'', it generates a hash code for
and stores it in the hash table. By ``identifier'' we mean tokens with it and stores it in the hash table. By ``identifier'' we mean tokens
type @samp{CPP_NAME}; this includes identifiers in the usual C sense, as with type @code{CPP_NAME}; this includes identifiers in the usual C
well as keywords, directive names, macro names and so on. For example, sense, as well as keywords, directive names, macro names and so on. For
all of @samp{pragma}, @samp{int}, @samp{foo} and @samp{__GNUC__} are identifiers and hashed example, all of @code{pragma}, @code{int}, @code{foo} and
when lexed. @code{__GNUC__} are identifiers and hashed when lexed.
Each node in the hash table contain various information about the Each node in the hash table contain various information about the
identifier it represents. For example, its length and type. At any one identifier it represents. For example, its length and type. At any one
...@@ -392,12 +423,12 @@ time, each identifier falls into exactly one of three categories: ...@@ -392,12 +423,12 @@ time, each identifier falls into exactly one of three categories:
@item Macros @item Macros
These have been declared to be macros, either on the command line or These have been declared to be macros, either on the command line or
with @code{#define}. A few, such as @samp{__TIME__} are builtins with @code{#define}. A few, such as @code{__TIME__} are built-ins
entered in the hash table during initialisation. The hash node for a entered in the hash table during initialisation. The hash node for a
normal macro points to a structure with more information about the normal macro points to a structure with more information about the
macro, such as whether it is function-like, how many arguments it takes, macro, such as whether it is function-like, how many arguments it takes,
and its expansion. Builtin macros are flagged as special, and instead and its expansion. Built-in macros are flagged as special, and instead
contain an enum indicating which of the various builtin macros it is. contain an enum indicating which of the various built-in macros it is.
@item Assertions @item Assertions
...@@ -413,7 +444,7 @@ currently a macro, or a macro that has since been undefined with ...@@ -413,7 +444,7 @@ currently a macro, or a macro that has since been undefined with
@code{#undef}. @code{#undef}.
When preprocessing C++, this category also includes the named operators, When preprocessing C++, this category also includes the named operators,
such as @samp{xor}. In expressions these behave like the operators they such as @code{xor}. In expressions these behave like the operators they
represent, but in contexts where the spelling of a token matters they represent, but in contexts where the spelling of a token matters they
are spelt differently. This spelling distinction is relevant when they are spelt differently. This spelling distinction is relevant when they
are operands of the stringizing and pasting macro operators @code{#} and are operands of the stringizing and pasting macro operators @code{#} and
...@@ -429,13 +460,173 @@ hash node with the index of that argument. This makes duplicated ...@@ -429,13 +460,173 @@ hash node with the index of that argument. This makes duplicated
argument checking an O(1) operation for each argument. Similarly, for argument checking an O(1) operation for each argument. Similarly, for
each identifier in the macro's expansion, lookup to see if it is an each identifier in the macro's expansion, lookup to see if it is an
argument, and which argument it is, is also an O(1) operation. Further, argument, and which argument it is, is also an O(1) operation. Further,
each directive name, such as @samp{endif}, has an associated directive each directive name, such as @code{endif}, has an associated directive
enum stored in its hash node, so that directive lookup is also O(1). enum stored in its hash node, so that directive lookup is also O(1).
@node Macro Expansion, Files, Hash Nodes, Top @node Macro Expansion
@unnumbered Macro Expansion Algorithm @unnumbered Macro Expansion Algorithm
@node Files, Index, Macro Expansion, Top @c TODO
@node Token Spacing
@unnumbered Token Spacing
@cindex paste avoidance
@cindex spacing
@cindex token spacing
First, let's look at an issue that only concerns the stand-alone
preprocessor: we want to guarantee that re-reading its preprocessed
output results in an identical token stream. Without taking special
measures, this might not be the case because of macro substitution. For
example:
@smallexample
#define PLUS +
#define EMPTY
#define f(x) =x=
+PLUS -EMPTY- PLUS+ f(=)
@expansion{} + + - - + + = = =
@emph{not}
@expansion{} ++ -- ++ ===
@end smallexample
One solution would be to simply insert a space between all adjacent
tokens. However, we would like to keep space insertion to a minimum,
both for aesthetic reasons and because it causes problems for people who
still try to abuse the preprocessor for things like Fortran source and
Makefiles.
For now, just notice that the only places we need to be careful about
@dfn{paste avoidance} are when tokens are added (or removed) from the
original token stream. This only occurs because of macro expansion, but
care is needed in many places: before @strong{and} after each macro
replacement, each argument replacement, and additionally each token
created by the @samp{#} and @samp{##} operators.
Let's look at how the preprocessor gets whitespace output correct
normally. The @code{cpp_token} structure contains a flags byte, and one
of those flags is @code{PREV_WHITE}. This is flagged by the lexer, and
indicates that the token was preceded by whitespace of some form other
than a new line. The stand-alone preprocessor can use this flag to
decide whether to insert a space between tokens in the output.
Now consider the following:
@smallexample
#define add(x, y, z) x + y +z;
sum = add (1,2, 3);
@expansion{} sum = 1 + 2 +3;
@end smallexample
The interesting thing here is that the tokens @samp{1} and @samp{2} are
output with a preceding space, and @samp{3} is output without a
preceding space, but when lexed none of these tokens had that property.
Careful consideration reveals that @samp{1} gets its preceding
whitespace from the space preceding @samp{add} in the macro
@emph{invocation}, @samp{2} gets its whitespace from the space preceding
the parameter @samp{y} in the macro @emph{replacement list}, and
@samp{3} has no preceding space because parameter @samp{z} has none in
the replacement list.
Once lexed, tokens are effectively fixed and cannot be altered, since
pointers to them might be held in many places, in particular by
in-progress macro expansions. So instead of modifying the two tokens
above, the preprocessor inserts a special token, which I call a
@dfn{padding token}, into the token stream in front of every macro
expansion and expanded macro argument, to indicate that the subsequent
token should assume its @code{PREV_WHITE} flag from a different
@dfn{source token}. In the above example, the source tokens are
@samp{add} in the macro invocation, and @samp{y} and @samp{z} in the
macro replacement list, respectively.
It is quite easy to get multiple padding tokens in a row, for example if
a macro's first replacement token expands straight into another macro.
@smallexample
#define foo bar
#define bar baz
[foo]
@expansion{} [baz]
@end smallexample
Here, two padding tokens with sources @samp{foo} between the brackets,
and @samp{bar} from foo's replacement list, are generated. Clearly the
first padding token is the one that matters. But what if we happen to
leave a macro expansion? Adjusting the above example slightly:
@smallexample
#define foo bar
#define bar EMPTY baz
#define EMPTY
[foo] EMPTY;
@expansion{} [ baz] ;
@end smallexample
As shown, now there should be a space before baz and the semicolon. Our
initial algorithm fails for the former, because we would see three
padding tokens, one per macro invocation, followed by @samp{baz}, which
would have inherit its spacing from the original source, @samp{foo},
which has no leading space. Note that it is vital that cpplib get
spacing correct in these examples, since any of these macro expansions
could be stringified, where spacing matters.
So, I have demonstrated that not just entering macro and argument
expansions, but leaving them requires special handling too. So cpplib
inserts a padding token with a @code{NULL} source token when leaving
macro expansions and after each replaced argument in a macro's
replacement list. It also inserts appropriate padding tokens on either
side of tokens created by the @samp{#} and @samp{##} operators.
Now we can see the relationship with paste avoidance: we have to be
careful about paste avoidance in exactly the same locations we take care
to get white space correct. This makes implementation of paste
avoidance easy: wherever the stand-alone preprocessor is fixing up
spacing because of padding tokens, and it turns out that no space is
needed, it has to take the extra step to check that a space is not
needed after all to avoid an accidental paste. The function
@code{cpp_avoid_paste} advises whether a space is required between two
consecutive tokens. To avoid excessive spacing, it tries hard to only
require a space if one is likely to be necessary, but for reasons of
efficiency it is slightly conservative and might recommend a space where
one is not strictly needed.
@node Line Numbering
@unnumbered Line numbering
@cindex line numbers
The preprocessor takes great care to ensure it keeps track of both the
position of a token in the source file, for diagnostic purposes, and
where it should appear in the output file, because using CPP for other
languages like assembler requires this. The two positions may differ
for the following reasons:
@itemize @bullet
@item
Escaped newlines are deleted, so lines spliced in this way are joined to
form a single logical line.
@item
A macro expansion replaces the tokens that form its invocation, but any
newlines appearing in the macro's arguments are interpreted as a single
space, with the result that the macro's replacement appears in full on
the same line that the macro name appeared in the source file. This is
particularly important for stringification of arguments---newlines
embedded in the arguments must appear in the string as spaces.
@end itemize
The source file location is maintained in the @code{lineno} member of the
@code{cpp_buffer} structure, and the column number inferred from the
current position in the buffer relative to the @code{line_base} buffer
variable, which is updated with every newline whether escaped or not.
@c FINISH THIS
@node Guard Macros
@unnumbered The Multiple-Include Optimization
@c TODO
@node Files
@unnumbered File Handling @unnumbered File Handling
@cindex files @cindex files
...@@ -459,10 +650,10 @@ filesystem queries whilst searching for the correct file. ...@@ -459,10 +650,10 @@ filesystem queries whilst searching for the correct file.
For each file we try to open, we store the constructed path in a splay For each file we try to open, we store the constructed path in a splay
tree. This path first undergoes simplification by the function tree. This path first undergoes simplification by the function
@code{_cpp_simplify_pathname}. For example, @code{_cpp_simplify_pathname}. For example,
@samp{/usr/include/bits/../foo.h} is simplified to @file{/usr/include/bits/../foo.h} is simplified to
@samp{/usr/include/foo.h} before we enter it in the splay tree and try @file{/usr/include/foo.h} before we enter it in the splay tree and try
to @code{open ()} the file. CPP will then find subsequent uses of to @code{open ()} the file. CPP will then find subsequent uses of
@samp{foo.h}, even as @samp{/usr/include/foo.h}, in the splay tree and @file{foo.h}, even as @file{/usr/include/foo.h}, in the splay tree and
save system calls. save system calls.
Further, it is likely the file contents have also been cached, saving a Further, it is likely the file contents have also been cached, saving a
...@@ -486,7 +677,7 @@ directory on a per-file basis is handled by the function ...@@ -486,7 +677,7 @@ directory on a per-file basis is handled by the function
Note that a header included with a directory component, such as Note that a header included with a directory component, such as
@code{#include "mydir/foo.h"} and opened as @code{#include "mydir/foo.h"} and opened as
@samp{/usr/local/include/mydir/foo.h}, will have the complete path minus @file{/usr/local/include/mydir/foo.h}, will have the complete path minus
the basename @samp{foo.h} as the current directory. the basename @samp{foo.h} as the current directory.
Enough information is stored in the splay tree that CPP can immediately Enough information is stored in the splay tree that CPP can immediately
...@@ -503,7 +694,7 @@ command line (or system) include directories to which the mapping ...@@ -503,7 +694,7 @@ command line (or system) include directories to which the mapping
applies. This may be higher up the directory tree than the full path to applies. This may be higher up the directory tree than the full path to
the file minus the base name. the file minus the base name.
@node Index,, Files, Top @node Index
@unnumbered Index @unnumbered Index
@printindex cp @printindex cp
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment