Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
R
riscv-gcc-1
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
lvzhengyang
riscv-gcc-1
Commits
111e0469
Commit
111e0469
authored
Jan 19, 2001
by
Neil Booth
Committed by
Neil Booth
Jan 19, 2001
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
* cppinternals.texi: Update.
From-SVN: r39144
parent
55cf7bb9
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
92 additions
and
11 deletions
+92
-11
gcc/ChangeLog
+4
-0
gcc/cppinternals.texi
+88
-11
No files found.
gcc/ChangeLog
View file @
111e0469
2001-01-19 Neil Booth <neil@daikokuya.demon.co.uk>
* cppinternals.texi: Update.
2001-01-19 Richard Earnshaw <rearnsha@arm.com>
* arm.c (arm_init_builtins): Re-enable builtins.
...
...
gcc/cppinternals.texi
View file @
111e0469
...
...
@@ -91,11 +91,15 @@ Identifiers, macro expansion, hash nodes, lexing.
*
Conventions
::
Conventions
used
in
the
code
.
*
Lexer
::
The
combined
C
,
C
++
and
Objective
C
Lexer
.
*
Whitespace
::
Input
and
output
newlines
and
whitespace
.
*
Hash
Nodes
::
All
identifiers
are
hashed
.
*
Macro
Expansion
::
Macro
expansion
algorithm
.
*
Files
::
File
handling
.
*
Concept
Index
::
Index
of
concepts
and
terms
.
*
Index
::
Index
.
@end
menu
@node
Conventions
,
Lexer
,
Top
,
Top
@unnumbered
Conventions
cpplib
has
two
interfaces
-
one
is
exposed
internally
only
,
and
the
other
is
for
both
internal
and
external
use
.
...
...
@@ -113,6 +117,7 @@ are perhaps relying on some kind of undocumented implementation-specific
behaviour
.
@node
Lexer
,
Whitespace
,
Conventions
,
Top
@unnumbered
The
Lexer
The
lexer
is
contained
in
the
file
@samp
{
cpplex
.
c
}.
We
want
to
have
a
lexer
that
is
single
-
pass
,
for
efficiency
reasons
.
We
would
also
like
...
...
@@ -194,7 +199,8 @@ a trigraph, but the command line option @samp{-trigraphs} is not in
force but @samp{-Wtrigraphs} is, we need to warn about it but then
buffer it and continue to treat it as 3 separate characters.
@node Whitespace, Concept Index, Lexer, Top
@node Whitespace, Hash Nodes, Lexer, Top
@unnumbered Whitespace
The lexer has been written to treat each of @samp{
\r
}, @samp{
\n
},
@samp{
\r\n
} and @samp{
\n\r
} as a single new line indicator. This allows
...
...
@@ -202,18 +208,89 @@ it to transparently preprocess MS-DOS, Macintosh and Unix files without
their needing to pass through a special filter beforehand.
We also decided to treat a backslash, either @samp{\} or the trigraph
@samp{??/}, separated from one of the above newline forms by whitespace
only (one or more space, tab, form-feed, vertical tab or NUL characters),
as an intended escaped newline. The library issues a diagnostic in this
case.
Handling newlines in this way is made simpler by doing it in one place
@samp{??/}, separated from one of the above newline indicators by
non-comment whitespace only, as intending to escape the newline. It
tends to be a typing mistake, and cannot reasonably be mistaken for
anything else in any of the C-family grammars. Since handling it this
way is not strictly conforming to the ISO standard, the library issues a
warning wherever it encounters it.
Handling newlines like this is made simpler by doing it in one place
only. The function @samp{handle_newline} takes care of all newline
characters, and @samp{skip_escaped_newlines} takes care of all escaping
of newlines, deferring to @samp{handle_newline} to handle the newlines
themselves.
characters, and @samp{skip_escaped_newlines} takes care of arbitrarily
long sequences of escaped newlines, deferring to @samp{handle_newline}
to handle the newlines themselves.
@node Hash Nodes, Macro Expansion, Whitespace, Top
@unnumbered Hash Nodes
When cpplib encounters an "
identifier
", it generates a hash code for it
and stores it in the hash table. By "
identifier
" we mean tokens with
type @samp{CPP_NAME}; this includes identifiers in the usual C sense, as
well as keywords, directive names, macro names and so on. For example,
all of "
pragma
", "
int
", "
foo
" and "
__GNUC__
" are identifiers and hashed
when lexed.
Each node in the hash table contain various information about the
identifier it represents. For example, its length and type. At any one
time, each identifier falls into exactly one of three categories:
@itemize @bullet
@item Macros
These have been declared to be macros, either on the command line or
with @samp{#define}. A few, such as @samp{__TIME__} are builtins
entered in the hash table during initialisation. The hash node for a
normal macro points to a structure with more information about the
macro, such as whether it is function-like, how many arguments it takes,
and its expansion. Builtin macros are flagged as special, and instead
contain an enum indicating which of the various builtin macros it is.
@item Assertions
Assertions are in a separate namespace to macros. To enforce this, cpp
actually prepends a @samp{#} character before hashing and entering it in
the hash table. An assertion's node points to a chain of answers to
that assertion.
@item Void
Everything else falls into this category - an identifier that is not
currently a macro, or a macro that has since been undefined with
@samp{#undef}.
When preprocessing C++, this category also includes the named operators,
such as @samp{xor}. In expressions these behave like the operators they
represent, but in contexts where the spelling of a token matters they
are spelt differently. This spelling distinction is relevant when they
are operands of the stringizing and pasting macro operators @samp{#} and
@samp{##}. Named operator hash nodes are flagged, both to catch the
spelling distinction and to prevent them from being defined as macros.
@end itemize
The same identifiers share the same hash node. Since each identifier
token, after lexing, contains a pointer to its hash node, this is used
to provide rapid lookup of various information. For example, when
parsing a @samp{#define} statement, CPP flags each argument's identifier
hash node with the index of that argument. This makes duplicated
argument checking an O(1) operation for each argument. Similarly, for
each identifier in the macro's expansion, lookup to see if it is an
argument, and which argument it is, is also an O(1) operation. Further,
each directive name, such as @samp{endif}, has an associated directive
enum stored in its hash node, so that directive lookup is also O(1).
Later, CPP may also store C front-end information in its identifier hash
table, such as a @samp{tree} pointer.
@node Macro Expansion, Files, Hash Nodes, Top
@unnumbered Macro Expansion Algorithm
@printindex cp
@node Files, Concept Index, Macro Expansion, Top
@unnumbered File Handling
@printindex cp
@node Concept Index, Index,
Whitespace
, Top
@node Concept Index, Index,
Files
, Top
@unnumbered Concept Index
@printindex cp
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment