Commit 5b810d3c by Neil Booth Committed by Neil Booth

* doc/cppinternals.texi: Update.

From-SVN: r46050
parent d644be7b
2001-10-06 Neil Booth <neil@daikokuya.demon.co.uk>
* doc/cppinternals.texi: Update.
2001-10-06 Zack Weinberg <zack@codesourcery.com> 2001-10-06 Zack Weinberg <zack@codesourcery.com>
* gcc.c (main): Set this_file_error if the appropriate * gcc.c (main): Set this_file_error if the appropriate
......
...@@ -41,7 +41,7 @@ into another language, under the above conditions for modified versions. ...@@ -41,7 +41,7 @@ into another language, under the above conditions for modified versions.
@titlepage @titlepage
@c @finalout @c @finalout
@title Cpplib Internals @title Cpplib Internals
@subtitle Last revised September 2001 @subtitle Last revised October 2001
@subtitle for GCC version 3.1 @subtitle for GCC version 3.1
@author Neil Booth @author Neil Booth
@page @page
...@@ -71,7 +71,7 @@ into another language, under the above conditions for modified versions. ...@@ -71,7 +71,7 @@ into another language, under the above conditions for modified versions.
@chapter Cpplib---the core of the GNU C Preprocessor @chapter Cpplib---the core of the GNU C Preprocessor
The GNU C preprocessor in GCC 3.x has been completely rewritten. It is The GNU C preprocessor in GCC 3.x has been completely rewritten. It is
now implemented as a library, cpplib, so it can be easily shared between now implemented as a library, @dfn{cpplib}, so it can be easily shared between
a stand-alone preprocessor, and a preprocessor integrated with the C, a stand-alone preprocessor, and a preprocessor integrated with the C,
C++ and Objective-C front ends. It is also available for use by other C++ and Objective-C front ends. It is also available for use by other
programs, though this is not recommended as its exposed interface has programs, though this is not recommended as its exposed interface has
...@@ -498,12 +498,13 @@ both for aesthetic reasons and because it causes problems for people who ...@@ -498,12 +498,13 @@ both for aesthetic reasons and because it causes problems for people who
still try to abuse the preprocessor for things like Fortran source and still try to abuse the preprocessor for things like Fortran source and
Makefiles. Makefiles.
For now, just notice that the only places we need to be careful about For now, just notice that when tokens are added (or removed, as shown by
@dfn{paste avoidance} are when tokens are added (or removed) from the the @code{EMPTY} example) from the original lexed token stream, we need
original token stream. This only occurs because of macro expansion, but to check for accidental token pasting. We call this @dfn{paste
care is needed in many places: before @strong{and} after each macro avoidance}. Token addition and removal can only occur because of macro
replacement, each argument replacement, and additionally each token expansion, but accidental pasting can occur in many places: both before
created by the @samp{#} and @samp{##} operators. and after each macro replacement, each argument replacement, and
additionally each token created by the @samp{#} and @samp{##} operators.
Let's look at how the preprocessor gets whitespace output correct Let's look at how the preprocessor gets whitespace output correct
normally. The @code{cpp_token} structure contains a flags byte, and one normally. The @code{cpp_token} structure contains a flags byte, and one
...@@ -512,7 +513,7 @@ indicates that the token was preceded by whitespace of some form other ...@@ -512,7 +513,7 @@ indicates that the token was preceded by whitespace of some form other
than a new line. The stand-alone preprocessor can use this flag to than a new line. The stand-alone preprocessor can use this flag to
decide whether to insert a space between tokens in the output. decide whether to insert a space between tokens in the output.
Now consider the following: Now consider the result of the following macro expansion:
@smallexample @smallexample
#define add(x, y, z) x + y +z; #define add(x, y, z) x + y +z;
...@@ -524,20 +525,21 @@ The interesting thing here is that the tokens @samp{1} and @samp{2} are ...@@ -524,20 +525,21 @@ The interesting thing here is that the tokens @samp{1} and @samp{2} are
output with a preceding space, and @samp{3} is output without a output with a preceding space, and @samp{3} is output without a
preceding space, but when lexed none of these tokens had that property. preceding space, but when lexed none of these tokens had that property.
Careful consideration reveals that @samp{1} gets its preceding Careful consideration reveals that @samp{1} gets its preceding
whitespace from the space preceding @samp{add} in the macro whitespace from the space preceding @samp{add} in the macro invocation,
@emph{invocation}, @samp{2} gets its whitespace from the space preceding @emph{not} replacement list. @samp{2} gets its whitespace from the
the parameter @samp{y} in the macro @emph{replacement list}, and space preceding the parameter @samp{y} in the macro replacement list,
@samp{3} has no preceding space because parameter @samp{z} has none in and @samp{3} has no preceding space because parameter @samp{z} has none
the replacement list. in the replacement list.
Once lexed, tokens are effectively fixed and cannot be altered, since Once lexed, tokens are effectively fixed and cannot be altered, since
pointers to them might be held in many places, in particular by pointers to them might be held in many places, in particular by
in-progress macro expansions. So instead of modifying the two tokens in-progress macro expansions. So instead of modifying the two tokens
above, the preprocessor inserts a special token, which I call a above, the preprocessor inserts a special token, which I call a
@dfn{padding token}, into the token stream in front of every macro @dfn{padding token}, into the token stream to indicate that spacing of
expansion and expanded macro argument, to indicate that the subsequent the subsequent token is special. The preprocessor inserts padding
token should assume its @code{PREV_WHITE} flag from a different tokens in front of every macro expansion and expanded macro argument.
@dfn{source token}. In the above example, the source tokens are These point to a @dfn{source token} from which the subsequent real token
should inherit its spacing. In the above example, the source tokens are
@samp{add} in the macro invocation, and @samp{y} and @samp{z} in the @samp{add} in the macro invocation, and @samp{y} and @samp{z} in the
macro replacement list, respectively. macro replacement list, respectively.
...@@ -551,10 +553,14 @@ a macro's first replacement token expands straight into another macro. ...@@ -551,10 +553,14 @@ a macro's first replacement token expands straight into another macro.
@expansion{} [baz] @expansion{} [baz]
@end smallexample @end smallexample
Here, two padding tokens with sources @samp{foo} between the brackets, Here, two padding tokens are generated with sources the @samp{foo} token
and @samp{bar} from foo's replacement list, are generated. Clearly the between the brackets, and the @samp{bar} token from foo's replacement
first padding token is the one that matters. But what if we happen to list, respectively. Clearly the first padding token is the one we
leave a macro expansion? Adjusting the above example slightly: should use, so our output code should contain a rule that the first
padding token in a sequence is the one that matters.
But what if we happen to leave a macro expansion? Adjusting the above
example slightly:
@smallexample @smallexample
#define foo bar #define foo bar
...@@ -564,33 +570,41 @@ leave a macro expansion? Adjusting the above example slightly: ...@@ -564,33 +570,41 @@ leave a macro expansion? Adjusting the above example slightly:
@expansion{} [ baz] ; @expansion{} [ baz] ;
@end smallexample @end smallexample
As shown, now there should be a space before baz and the semicolon. Our As shown, now there should be a space before @samp{baz} and the
initial algorithm fails for the former, because we would see three semicolon in the output.
padding tokens, one per macro invocation, followed by @samp{baz}, which
would have inherit its spacing from the original source, @samp{foo}, The rules we decided above fail for @samp{baz}: we generate three
which has no leading space. Note that it is vital that cpplib get padding tokens, one per macro invocation, before the token @samp{baz}.
spacing correct in these examples, since any of these macro expansions We would then have it take its spacing from the first of these, which
could be stringified, where spacing matters. carries source token @samp{foo} with no leading space.
So, I have demonstrated that not just entering macro and argument It is vital that cpplib get spacing correct in these examples since any
expansions, but leaving them requires special handling too. So cpplib of these macro expansions could be stringified, where spacing matters.
inserts a padding token with a @code{NULL} source token when leaving
macro expansions and after each replaced argument in a macro's So, this demonstrates that not just entering macro and argument
replacement list. It also inserts appropriate padding tokens on either expansions, but leaving them requires special handling too. I made
side of tokens created by the @samp{#} and @samp{##} operators. cpplib insert a padding token with a @code{NULL} source token when
leaving macro expansions, as well as after each replaced argument in a
Now we can see the relationship with paste avoidance: we have to be macro's replacement list. It also inserts appropriate padding tokens on
careful about paste avoidance in exactly the same locations we take care either side of tokens created by the @samp{#} and @samp{##} operators.
to get white space correct. This makes implementation of paste I expanded the rule so that, if we see a padding token with a
avoidance easy: wherever the stand-alone preprocessor is fixing up @code{NULL} source token, @emph{and} that source token has no leading
spacing because of padding tokens, and it turns out that no space is space, then we behave as if we have seen no padding tokens at all. A
needed, it has to take the extra step to check that a space is not quick check shows this rule will then get the above example correct as
needed after all to avoid an accidental paste. The function well.
@code{cpp_avoid_paste} advises whether a space is required between two
consecutive tokens. To avoid excessive spacing, it tries hard to only Now a relationship with paste avoidance is apparent: we have to be
require a space if one is likely to be necessary, but for reasons of careful about paste avoidance in exactly the same locations we have
efficiency it is slightly conservative and might recommend a space where padding tokens in order to get white space correct. This makes
one is not strictly needed. implementation of paste avoidance easy: wherever the stand-alone
preprocessor is fixing up spacing because of padding tokens, and it
turns out that no space is needed, it has to take the extra step to
check that a space is not needed after all to avoid an accidental paste.
The function @code{cpp_avoid_paste} advises whether a space is required
between two consecutive tokens. To avoid excessive spacing, it tries
hard to only require a space if one is likely to be necessary, but for
reasons of efficiency it is slightly conservative and might recommend a
space where one is not strictly needed.
@node Line Numbering @node Line Numbering
@unnumbered Line numbering @unnumbered Line numbering
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment