Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
R
riscv-gcc-1
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
lvzhengyang
riscv-gcc-1
Commits
4cf817a7
Commit
4cf817a7
authored
Sep 27, 2001
by
Neil Booth
Committed by
Neil Booth
Sep 27, 2001
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
* doc/cppinternals.texi: Update.
From-SVN: r45839
parent
ef1d8fc8
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
137 additions
and
53 deletions
+137
-53
gcc/ChangeLog
+4
-0
gcc/doc/cppinternals.texi
+133
-53
No files found.
gcc/ChangeLog
View file @
4cf817a7
2001
-
09
-
27
Neil
Booth
<
neil
@daikokuya
.
demon
.
co
.
uk
>
*
doc
/
cppinternals
.
texi
:
Update
.
2001
-
09
-
26
Neil
Booth
<
neil
@daikokuya
.
demon
.
co
.
uk
>
*
cpphash
.
h
(
struct
cpp_pool
)
:
Remove
locks
and
locked
.
...
...
gcc/doc/cppinternals.texi
View file @
4cf817a7
...
...
@@ -41,8 +41,8 @@ into another language, under the above conditions for modified versions.
@titlepage
@c
@finalout
@title
Cpplib
Internals
@subtitle
Last
revised
Jan
2001
@subtitle
for
GCC
version
3
.
0
@subtitle
Last
revised
September
2001
@subtitle
for
GCC
version
3
.
1
@author
Neil
Booth
@page
@vskip
0
pt
plus
1
filll
...
...
@@ -69,14 +69,14 @@ into another language, under the above conditions for modified versions.
@node
Top
,
Conventions
,,
(
DIR
)
@chapter
Cpplib
---
the
core
of
the
GNU
C
Preprocessor
The
GNU
C
preprocessor
in
GCC
3
.
0
has
been
completely
rewritten
.
It
is
The
GNU
C
preprocessor
in
GCC
3
.
x
has
been
completely
rewritten
.
It
is
now
implemented
as
a
library
,
cpplib
,
so
it
can
be
easily
shared
between
a
stand
-
alone
preprocessor
,
and
a
preprocessor
integrated
with
the
C
,
C
++
and
Objective
-
C
front
ends
.
It
is
also
available
for
use
by
other
programs
,
though
this
is
not
recommended
as
its
exposed
interface
has
not
yet
reached
a
point
of
reasonable
stability
.
Th
is
library
has
been
written
to
be
re
-
entrant
,
so
that
it
can
be
used
Th
e
library
has
been
written
to
be
re
-
entrant
,
so
that
it
can
be
used
to
preprocess
many
files
simultaneously
if
necessary
.
It
has
also
been
written
with
the
preprocessing
token
as
the
fundamental
unit
;
the
preprocessor
in
previous
versions
of
GCC
would
operate
on
text
strings
...
...
@@ -86,8 +86,6 @@ This brief manual documents some of the internals of cpplib, and a few
tricky
issues
encountered
.
It
also
describes
certain
behaviour
we
would
like
to
preserve
,
such
as
the
format
and
spacing
of
its
output
.
Identifiers
,
macro
expansion
,
hash
nodes
,
lexing
.
@menu
*
Conventions
::
Conventions
used
in
the
code
.
*
Lexer
::
The
combined
C
,
C
++
and
Objective
-
C
Lexer
.
...
...
@@ -123,18 +121,106 @@ behaviour.
@node
Lexer
,
Whitespace
,
Conventions
,
Top
@unnumbered
The
Lexer
@cindex
lexer
@cindex
tokens
The
lexer
is
contained
in
the
file
@file
{
cpplex
.
c
}.
We
want
to
have
a
lexer
that
is
single
-
pass
,
for
efficiency
reasons
.
We
would
also
like
the
lexer
to
only
step
forwards
through
the
input
files
,
and
not
step
back
.
This
will
make
future
changes
to
support
different
character
sets
,
in
particular
state
or
shift
-
dependent
ones
,
much
easier
.
This
file
also
contains
all
information
needed
to
spell
a
token
,
i
.
e
.
@
:
to
output
it
either
in
a
diagnostic
or
to
a
preprocessed
output
file
.
This
information
is
not
exported
,
but
made
available
to
clients
through
such
functions
as
@samp
{
cpp_spell_token
}
and
@samp
{
cpp_token_len
}.
@section
Overview
The
lexer
is
contained
in
the
file
@file
{
cpplex
.
c
}.
It
is
a
hand
-
coded
lexer
,
and
not
implemented
as
a
state
machine
.
It
can
understand
C
,
C
++
and
Objective
-
C
source
code
,
and
has
been
extended
to
allow
reasonably
successful
preprocessing
of
assembly
language
.
The
lexer
does
not
make
an
initial
pass
to
strip
out
trigraphs
and
escaped
newlines
,
but
handles
them
as
they
are
encountered
in
a
single
pass
of
the
input
file
.
It
returns
preprocessing
tokens
individually
,
not
a
line
at
a
time
.
It
is
mostly
transparent
to
users
of
the
library
,
since
the
library
'
s
interface
for
obtaining
the
next
token
,
@code
{
cpp_get_token
},
takes
care
of
lexing
new
tokens
,
handling
directives
,
and
expanding
macros
as
necessary
.
However
,
the
lexer
does
expose
some
functionality
so
that
clients
of
the
library
can
easily
spell
a
given
token
,
such
as
@code
{
cpp_spell_token
}
and
@code
{
cpp_token_len
}.
These
functions
are
useful
when
generating
diagnostics
,
and
for
emitting
the
preprocessed
output
.
@section
Lexing
a
token
Lexing
of
an
individual
token
is
handled
by
@code
{
_cpp_lex_direct
}
and
its
subroutines
.
In
its
current
form
the
code
is
quite
complicated
,
with
read
ahead
characters
and
suchlike
,
since
it
strives
to
not
step
back
in
the
character
stream
in
preparation
for
handling
non
-
ASCII
file
encodings
.
The
current
plan
is
to
convert
any
such
files
to
UTF
-
8
before
processing
them
.
This
complexity
is
therefore
unnecessary
and
will
be
removed
,
so
I
'
ll
not
discuss
it
further
here
.
The
job
of
@code
{
_cpp_lex_direct
}
is
simply
to
lex
a
token
.
It
is
not
responsible
for
issues
like
directive
handling
,
returning
lookahead
tokens
directly
,
multiple
-
include
optimisation
,
or
conditional
block
skipping
.
It
necessarily
has
a
minor
r
@
^
ole
to
play
in
memory
management
of
lexed
lines
.
I
discuss
these
issues
in
a
separate
section
(
@pxref
{
Lexing
a
line
}).
The
lexer
places
the
token
it
lexes
into
storage
pointed
to
by
the
variable
@var
{
cur_token
},
and
then
increments
it
.
This
variable
is
important
for
correct
diagnostic
positioning
.
Unless
a
specific
line
and
column
are
passed
to
the
diagnostic
routines
,
they
will
examine
the
@var
{
line
}
and
@var
{
col
}
values
of
the
token
just
before
the
location
that
@var
{
cur_token
}
points
to
,
and
use
that
location
to
report
the
diagnostic
.
The
lexer
does
not
consider
whitespace
to
be
a
token
in
its
own
right
.
If
whitespace
(
other
than
a
new
line
)
precedes
a
token
,
it
sets
the
@code
{
PREV_WHITE
}
bit
in
the
token
'
s
flags
.
Each
token
has
its
@var
{
line
}
and
@var
{
col
}
variables
set
to
the
line
and
column
of
the
first
character
of
the
token
.
This
line
number
is
the
line
number
in
the
translation
unit
,
and
can
be
converted
to
a
source
(
file
,
line
)
pair
using
the
line
map
code
.
The
first
token
on
a
logical
,
i
.
e
.
@
:
unescaped
,
line
has
the
flag
@code
{
BOL
}
set
for
beginning
-
of
-
line
.
This
flag
is
intended
for
internal
use
,
both
to
distinguish
a
@samp
{
#
}
that
begins
a
directive
from
one
that
doesn
'
t
,
and
to
generate
a
callback
to
clients
that
want
to
be
notified
about
the
start
of
every
non
-
directive
line
with
tokens
on
it
.
Clients
cannot
reliably
determine
this
for
themselves
:
the
first
token
might
be
a
macro
,
and
the
tokens
of
a
macro
expansion
do
not
have
the
@code
{
BOL
}
flag
set
.
The
macro
expansion
may
even
be
empty
,
and
the
next
token
on
the
line
certainly
won
'
t
have
the
@code
{
BOL
}
flag
set
.
New
lines
are
treated
specially
;
exactly
how
the
lexer
handles
them
is
context
-
dependent
.
The
C
standard
mandates
that
directives
are
terminated
by
the
first
unescaped
newline
character
,
even
if
it
appears
in
the
middle
of
a
macro
expansion
.
Therefore
,
if
the
state
variable
@var
{
in_directive
}
is
set
,
the
lexer
returns
a
@code
{
CPP_EOF
}
token
,
which
is
normally
used
to
indicate
end
-
of
-
file
,
to
indicate
end
-
of
-
directive
.
In
a
directive
a
@code
{
CPP_EOF
}
token
never
means
end
-
of
-
file
.
Conveniently
,
if
the
caller
was
@code
{
collect_args
},
it
already
handles
@code
{
CPP_EOF
}
as
if
it
were
end
-
of
-
file
,
and
reports
an
error
about
an
unterminated
macro
argument
list
.
The
C
standard
also
specifies
that
a
new
line
in
the
middle
of
the
arguments
to
a
macro
is
treated
as
whitespace
.
This
white
space
is
important
in
case
the
macro
argument
is
stringified
.
The
state
variable
@code{
parsing_args
}
is
non
-
zero
when
the
preprocessor
is
collecting
the
arguments
to
a
macro
call
.
It
is
set
to
1
when
looking
for
the
opening
parenthesis
to
a
function
-
like
macro
,
and
2
when
collecting
the
actual
arguments
up
to
the
closing
parenthesis
,
since
these
two
cases
need
to
be
distinguished
sometimes
.
One
such
time
is
here
:
the
lexer
sets
the
@code
{
PREV_WHITE
}
flag
of
a
token
if
it
meets
a
new
line
when
@code
{
parsing_args
}
is
set
to
2
.
It
doesn
'
t
set
it
if
it
meets
a
new
line
when
@code
{
parsing_args
}
is
1
,
since
then
code
like
@smallexample
#define foo() bar
foo
baz
@end
smallexample
@noindent
would
be
output
with
an
erroneous
space
before
@samp
{
baz
}
:
@smallexample
foo
baz
@end
smallexample
This
is
a
good
example
of
the
subtlety
of
getting
token
spacing
correct
in
the
preprocessor
;
there
are
plenty
of
tests
in
the
testsuite
for
corner
cases
like
this
.
The
most
painful
aspect
of
lexing
ISO
-
standard
C
and
C
++
is
handling
trigraphs
and
backlash
-
escaped
newlines
.
Trigraphs
are
processed
before
...
...
@@ -148,62 +234,56 @@ within the characters of an identifier, and even between the @samp{*}
and
@samp{
/
}
that
terminates
a
comment
.
Moreover
,
you
cannot
be
sure
there
is
just
one
---
there
might
be
an
arbitrarily
long
sequence
of
them
.
So
the
routine
@samp{
parse_identifier
}
,
that
lexes
an
identifier
,
cannot
assume
that
it
can
scan
forwards
until
the
first
non
-
identifi
er
So
,
for
example
,
the
routine
that
lexes
a
number
,
@code{
parse_number
}
,
cannot
assume
that
it
can
scan
forwards
until
the
first
non
-
numb
er
character
and
be
done
with
it
,
because
this
could
be
the
@samp{\}
introducing
an
escaped
newline
,
or
the
@samp{
?
}
introducing
the
trigraph
sequence
that
represents
the
@samp{\}
of
an
escaped
newline
.
Similarly
for
the
routine
that
handles
numbers
,
@samp{
parse_number
}
.
If
these
routines
stumble
upon
a
@samp{
?
}
or
@samp{\}
,
they
call
@samp{
skip_escaped_newlines
}
to
skip
over
any
potential
escaped
newlines
before
checking
whether
they
can
finish
.
sequence
that
represents
the
@samp{\}
of
an
escaped
newline
.
If
it
encounters
a
@samp{
?
}
or
@samp{\}
,
it
calls
@code{
skip_escaped_newlines
}
to
skip
over
any
potential
escaped
newlines
before
checking
whether
the
number
has
been
finished
.
Similarly
code
in
the
main
body
of
@
samp{
_cpp_lex_token
}
cannot
simply
Similarly
code
in
the
main
body
of
@
code{
_cpp_lex_direct
}
cannot
simply
check
for
a
@samp{
=
}
after
a
@samp{
+
}
character
to
determine
whether
it
has
a
@samp{
+=
}
token
;
it
needs
to
be
prepared
for
an
escaped
newline
of
some
sort
.
These
cases
use
the
function
@samp{
get_effective_char
}
,
which
returns
the
first
character
after
any
intervening
newlines
.
some
sort
.
Such
cases
use
the
function
@code{
get_effective_char
}
,
which
returns
the
first
character
after
any
intervening
escaped
newlines
.
The
lexer
needs
to
keep
track
of
the
correct
column
position
,
including
counting
tabs
as
specified
by
the
@option{
-
ftabstop
=
}
option
.
This
should
be
done
even
within
comments
;
C
-
style
comments
can
appear
in
the
middle
of
a
line
,
and
we
want
to
report
diagnostics
in
the
correct
The
lexer
needs
to
keep
track
of
the
correct
column
position
,
including
counting
tabs
as
specified
by
the
@option{
-
ftabstop
=
}
option
.
This
should
be
done
even
within
C
-
style
comments
;
they
can
appear
in
the
middle
of
a
line
,
and
we
want
to
report
diagnostics
in
the
correct
position
for
text
appearing
after
the
end
of
the
comment
.
Some
identifiers
,
such
as
@
samp
{
__VA_ARGS__
}
and
poisoned
identifiers
,
Some
identifiers
,
such
as
@
code
{
__VA_ARGS__
}
and
poisoned
identifiers
,
may
be
invalid
and
require
a
diagnostic
.
However
,
if
they
appear
in
a
macro
expansion
we
don
'
t
want
to
complain
with
each
use
of
the
macro
.
It
is
therefore
best
to
catch
them
during
the
lexing
stage
,
in
@
samp
{
parse_identifier
}
.
In
both
cases
,
whether
a
diagnostic
is
needed
or
not
is
dependent
upon
lexer
state
.
For
example
,
we
don
'
t
want
to
issue
a
diagnostic
for
re
-
poisoning
a
poisoned
identifier
,
or
for
using
@samp
{
__VA_ARGS__
}
in
the
expansion
of
a
variable
-
argument
macro
.
Therefore
@
samp{
parse_identifier
}
makes
use
of
flags
to
determine
@
code
{
parse_identifier
}
.
In
both
cases
,
whether
a
diagnostic
is
needed
or
not
is
dependent
upon
the
lexer
'
s
state
.
For
example
,
we
don
'
t
want
to
issue
a
diagnostic
for
re
-
poisoning
a
poisoned
identifier
,
or
for
using
@code
{
__VA_ARGS__
}
in
the
expansion
of
a
variable
-
argument
macro
.
Therefore
@
code{
parse_identifier
}
makes
use
of
state
flags
to
determine
whether
a
diagnostic
is
appropriate
.
Since
we
change
state
on
a
per
-
token
basis
,
and
don
'
t
lex
whole
lines
at
a
time
,
this
is
not
a
problem
.
Another
place
where
state
flags
are
used
to
change
behaviour
is
whilst
pars
ing
header
names
.
Normally
,
a
@samp{
<
}
would
be
lexed
as
a
single
token
.
After
a
@code{#
include
}
directive
,
though
,
it
should
be
lexed
a
s
a
single
token
as
far
as
the
nearest
@samp{
>
}
character
.
Note
that
we
don
'
t
allow
the
terminators
of
header
names
to
be
escaped
;
the
first
lex
ing
header
names
.
Normally
,
a
@samp{
<
}
would
be
lexed
as
a
single
token
.
After
a
@code{#
include
}
directive
,
though
,
it
should
be
lexed
as
a
single
token
as
far
as
the
nearest
@samp{
>
}
character
.
Note
that
we
don
'
t
allow
the
terminators
of
header
names
to
be
escaped
;
the
first
@samp{
"} or @samp{>} terminates the header name.
Interpretation of some character sequences depends upon whether we are
lexing C, C++ or Objective-C, and on the revision of the standard in
force. For example, @samp{::} is a single token in C++, but two
separate @samp{:} tokens, and almost certainly a syntax error, in C@.
Such cases are handled in the main function @samp{_cpp_lex_token}, based
upon the flags set in the @samp{cpp_options} structure.
Note we have almost, but not quite, achieved the goal of not stepping
backwards in the input stream. Currently @samp{skip_escaped_newlines}
does step back, though with care it should be possible to adjust it so
that this does not happen. For example, one tricky issue is if we meet
a trigraph, but the command line option @option{-trigraphs} is not in
force but @option{-Wtrigraphs} is, we need to warn about it but then
buffer it and continue to treat it as 3 separate characters.
force. For example, @samp{::} is a single token in C++, but in C it is
two separate @samp{:} tokens and almost certainly a syntax error. Such
cases are handled by @code{_cpp_lex_direct} based upon command-line
flags stored in the @code{cpp_options} structure.
@anchor{Lexing a line}
@section Lexing a line
@node Whitespace, Hash Nodes, Lexer, Top
@unnumbered Whitespace
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment