Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
R
riscv-gcc-1
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
lvzhengyang
riscv-gcc-1
Commits
d6d5f795
Commit
d6d5f795
authored
Apr 26, 2000
by
Neil Booth
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
* cpplex.c: add comment describing lexer algorithm.
From-SVN: r33443
parent
6f0ae5b4
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
90 additions
and
0 deletions
+90
-0
gcc/cpplex.c
+90
-0
No files found.
gcc/cpplex.c
View file @
d6d5f795
...
...
@@ -2050,6 +2050,96 @@ _cpp_init_input_buffer (pfile)
#if 0
/* Lexing algorithm.
The original lexer in cpplib was made up of two passes: a first pass
that replaced trigraphs and deleted esacped newlines, and a second
pass that tokenized the result of the first pass. Tokenisation was
performed by peeking at the next character in the input stream. For
example, if the input stream contained "~=", the handler for the ~
character would peek at the next character, and if it were a '='
would skip over it, and return a "~=" token, otherwise it would
return just the "~" token.
To implement a single-pass lexer, this peeking ahead is unworkable.
An arbitrary number of escaped newlines, and trigraphs (in particular
??/ which translates to the escape \), could separate the '~' and '='
in the input stream, yet the next token is still a "~=".
Suppose instead that we lex by one logical line at a time, producing
a token list or stack for each logical line, and when seeing the '~'
push a CPP_COMPLEMENT token on the list. Then if the '~' is part of
a longer token ("~=") we know we must see the remainder of the token
by the time we reach the end of the logical line. Thus we can have
the '=' handler look at the previous token (at the end of the list /
top of the stack) and see if it is a "~" token, and if so, instead of
pushing a "=" token revise the existing token to be a "~=" token.
This works in the presence of escaped newlines, because the '\' would
have been pushed on the top of the stack as a CPP_BACKSLASH. The
newline ('\n' or '\r') handler looks at the token at the top of the
stack to see if it is a CPP_BACKSLASH, and if so discards both.
Otherwise it pushes the newline (CPP_VSPACE) token as normal. Hence
the '=' handler would never see any intervening escaped newlines.
To make trigraphs work in this context, as in precedence trigraphs
are highest and converted before anything else, the '?' handler does
lookahead to see if it is a trigraph, and if so skips the trigraph
and pushes the token it represents onto the top of the stack. This
also works in the particular case of a CPP_BACKSLASH trigraph.
To the preprocessor, whitespace is only significant to the point of
knowing whether whitespace precedes a particular token. For example,
the '=' handler needs to know whether there was whitespace between it
and a "~" token on the top of the stack, to make the token conversion
decision correctly. So each token has a PREV_WHITESPACE flag to
indicate this - the standard permits consecutive whitespace to be
regarded as a single space. The compiler front ends are not
interested in whitespace at all; they just require a token stream.
Another place where whitespace is significant to the preprocessor is
a #define statment - if there is whitespace between the macro name
and an initial "(" token the macro is "object-like", otherwise it is
a function-like macro that takes arguments.
However, all is not rosy. Parsing of identifiers, numbers, comments
and strings becomes trickier because of the possibility of raw
trigraphs and escaped newlines in the input stream.
The trigraphs are three consecutive characters beginning with two
question marks. A question mark is not a valid as part of a number
or identifier, so parsing of a number or identifier terminates
normally upon reaching it, returning to the mainloop which handles
the trigraph just like it would in any other position. Similarly for
the backslash of a backslash-newline combination. So we just need
the escaped-newline dropper in the mainloop to check if the token on
the top of the stack is a number or identifier, and to continue the
processing of the token as if nothing had happened.
For strings, we replace trigraphs whenever we reach a quote or
newline, because there might be a backslash trigraph escaping them.
We need to be careful that we start trigraph replacing from where we
left off previously, because it is possible for a first scan to leave
"fake" trigraphs that a second scan would pick up as real (e.g. the
sequence "????\\n=" would find a fake ??= trigraph after removing the
escaped newline.)
For line comments, on reaching a newline we scan the previous
character(s) to see if it escaped, and continue if it is. Block
comments ignore everything and just focus on finding the comment
termination mark. The only difficult thing, and it is surprisingly
tricky, is checking if an asterisk precedes the final slash since
they could be separated by escaped newlines. If the preprocessor is
invoked with the output comments option, we don't bother removing
escaped newlines and replacing trigraphs for output.
Finally, numbers can begin with a period, which is pushed initially
as a CPP_DOT token in its own right. The digit handler checks if the
previous token was a CPP_DOT not separated by whitespace, and if so
pops it off the stack and pushes a period into the number's buffer
before calling the number parser.
*/
static void expand_comment_space PARAMS ((cpp_toklist *));
void init_trigraph_map PARAMS ((void));
static unsigned char* trigraph_replace PARAMS ((cpp_reader *, unsigned char *,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment