Commit ee925640 by Lewis Hyatt Committed by David Malcolm

Byte vs column awareness for diagnostic-show-locus.c (PR 49973)

contrib/ChangeLog

2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>

	PR preprocessor/49973
	* unicode/from_glibc/unicode_utils.py: Support script from
	glibc (commit 464cd3) to extract character widths from Unicode data
	files.
	* unicode/from_glibc/utf8_gen.py: Likewise.
	* unicode/UnicodeData.txt: Unicode v. 12.1.0 data file.
	* unicode/EastAsianWidth.txt: Likewise.
	* unicode/PropList.txt: Likewise.
	* unicode/gen_wcwidth.py: New utility to generate
	libcpp/generated_cpp_wcwidth.h with help from the glibc support
	scripts and the Unicode data files.
	* unicode/unicode-license.txt: Added.
	* unicode/README: New explanatory file.

libcpp/ChangeLog

2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>

	PR preprocessor/49973
	* generated_cpp_wcwidth.h: New file generated by
	../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function.
	* charset.c (compute_next_display_width): New function to help
	implement display columns.
	(cpp_byte_column_to_display_column): Likewise.
	(cpp_display_column_to_byte_column): Likewise.
	(cpp_wcwidth): Likewise.
	* include/cpplib.h (cpp_byte_column_to_display_column): Declare.
	(cpp_display_column_to_byte_column): Declare.
	(cpp_wcwidth): Declare.
	(cpp_display_width): New function.

gcc/ChangeLog

2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>

	PR preprocessor/49973
	* input.c (location_compute_display_column): New function to help with
	multibyte awareness in diagnostics.
	(test_cpp_utf8): New self-test.
	(input_c_tests): Call the new test.
	* input.h (location_compute_display_column): Declare.
	* diagnostic-show-locus.c: Pervasive changes to add multibyte awareness
	to all classes and functions.
	(enum column_unit): New enum.
	(class exploc_with_display_col): New class.
	(class layout_point): Convert m_column member to array m_columns[2].
	(layout_range::contains_point): Add col_unit argument.
	(test_layout_range_for_single_point): Pass new argument.
	(test_layout_range_for_single_line): Likewise.
	(test_layout_range_for_multiple_lines): Likewise.
	(line_bounds::convert_to_display_cols): New function.
	(layout::get_state_at_point): Add col_unit argument.
	(make_range): Use empty filename rather than dummy filename.
	(get_line_width_without_trailing_whitespace): Rename to...
	(get_line_bytes_without_trailing_whitespace): ...this.
	(test_get_line_width_without_trailing_whitespace): Rename to...
	(test_get_line_bytes_without_trailing_whitespace): ...this.
	(class layout): m_exploc changed to exploc_with_display_col from
	plain expanded_location.
	(layout::get_linenum_width): New accessor member function.
	(layout::get_x_offset_display): Likewise.
	(layout::calculate_linenum_width): New subroutine for the constuctor.
	(layout::calculate_x_offset_display): Likewise.
	(layout::layout): Use the new subroutines. Add multibyte awareness.
	(layout::print_source_line): Add multibyte awareness.
	(layout::print_line): Likewise.
	(layout::print_annotation_line): Likewise.
	(line_label::line_label): Likewise.
	(layout::print_any_labels): Likewise.
	(layout::annotation_line_showed_range_p): Likewise.
	(get_printed_columns): Likewise.
	(class line_label): Rename m_length to m_display_width.
	(get_affected_columns): Rename to...
	(get_affected_range): ...this; add col_unit argument and multibyte
	awareness.
	(class correction): Add m_affected_bytes and m_display_cols
	members.  Rename m_len to m_byte_length for clarity.  Add multibyte
	awareness throughout.
	(correction::insertion_p): Add multibyte awareness.
	(correction::compute_display_cols): New function.
	(correction::ensure_terminated): Use new member name m_byte_length.
	(line_corrections::add_hint): Add multibyte awareness.
	(layout::print_trailing_fixits): Likewise.
	(layout::get_x_bound_for_row): Likewise.
	(test_one_liner_simple_caret_utf8): New self-test analogous to the one
	with _utf8 suffix removed, testing multibyte awareness.
	(test_one_liner_caret_and_range_utf8): Likewise.
	(test_one_liner_multiple_carets_and_ranges_utf8): Likewise.
	(test_one_liner_fixit_insert_before_utf8): Likewise.
	(test_one_liner_fixit_insert_after_utf8): Likewise.
	(test_one_liner_fixit_remove_utf8): Likewise.
	(test_one_liner_fixit_replace_utf8): Likewise.
	(test_one_liner_fixit_replace_non_equal_range_utf8): Likewise.
	(test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise.
	(test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
	(test_one_liner_many_fixits_1_utf8): Likewise.
	(test_one_liner_many_fixits_2_utf8): Likewise.
	(test_one_liner_labels_utf8): Likewise.
	(test_diagnostic_show_locus_one_liner_utf8): Likewise.
	(test_overlapped_fixit_printing_utf8): Likewise.
	(test_overlapped_fixit_printing): Adapt for changes to
	get_affected_columns, get_printed_columns and class corrections.
	(test_overlapped_fixit_printing_2): Likewise.
	(test_linenum_sep): New constant.
	(test_left_margin): Likewise.
	(test_offset_impl): Helper function for new test.
	(test_layout_x_offset_display_utf8): New test.
	(diagnostic_show_locus_c_tests): Call new tests.

gcc/testsuite/ChangeLog:

2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>

	PR preprocessor/49973
	* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
	(test_show_locus): Tweak so that expected output is the same as
	before the diagnostic-show-locus.c changes.
	* gcc.dg/cpp/pr66415-1.c: Likewise.

From-SVN: r279137
parent 763c9f4a
2019-12-09 Lewis Hyatt <lhyatt@gmail.com>
PR preprocessor/49973
* unicode/from_glibc/unicode_utils.py: Support script from
glibc (commit 464cd3) to extract character widths from Unicode data
files.
* unicode/from_glibc/utf8_gen.py: Likewise.
* unicode/UnicodeData.txt: Unicode v. 12.1.0 data file.
* unicode/EastAsianWidth.txt: Likewise.
* unicode/PropList.txt: Likewise.
* unicode/gen_wcwidth.py: New utility to generate
libcpp/generated_cpp_wcwidth.h with help from the glibc support
scripts and the Unicode data files.
* unicode/unicode-license.txt: Added.
* unicode/README: New explanatory file.
2019-12-07 Richard Sandiford <richard.sandiford@arm.com>
* texi2pod.pl: Handle @headitems in @multitables, printing them
......
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This directory contains a mechanism for GCC to have its own internal
implementation of wcwidth functionality. (cpp_wcwidth () in libcpp/charset.c).
The idea is to produce the necessary lookup table
(../../libcpp/generated_cpp_wcwidth.h) in a reproducible way, starting from the
following files that are distributed by the Unicode Consortium:
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
These three files have been added to source control in this directory;
please see unicode-license.txt for the relevant copyright information.
In order to keep in sync with glibc's wcwidth as much as possible, it is
desirable for the logic that processes the Unicode data to be the same as
glibc's. To that end, we also put in this directory, in the from_glibc/
directory, the glibc python code that implements their logic. This code was
copied verbatim from glibc, and it can be updated at any time from the glibc
source code repository. The files copied from that respository are:
localedata/unicode-gen/unicode_utils.py
localedata/unicode-gen/utf8_gen.py
And the most recent versions added to GCC are from glibc git commit:
2a764c6ee848dfe92cb2921ed3b14085f15d9e79
Finally, the script gen_wcwidth.py found here contains the GCC-specific code to
map glibc's output to the lookup tables we require. This script should not need
to change, unless there are structural changes to the Unicode data files or to
the glibc code.
The procedure to update GCC's wcwidth tables is the following:
1. Update the three Unicode data files from the above URLs.
2. Update the two glibc files in from_glibc/ from glibc's git. Update
the commit number above in this README.
3. Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
(where X.Y is the version of the Unicode standard corresponding to the
Unicode data files being used, most recently, 12.1).
After that, GCC's wcwidth will match the most recent glibc.
This source diff could not be displayed because it is too large. You can view the blob instead.
#!/usr/bin/env python3
#
# Script to generate tables for cpp_wcwidth, leveraging glibc's utf8_gen.py.
#
# This file is part of GCC.
#
# GCC is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free
# Software Foundation; either version 3, or (at your option) any later
# version.
#
# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
# WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License
# along with GCC; see the file COPYING3. If not see
# <http://www.gnu.org/licenses/>. */
import sys
import os
if len(sys.argv) != 2:
print("usage: %s <unicode version>", file=sys.stderr)
sys.exit(1)
unicode_version = sys.argv[1]
# Parse a codepoint in the format output by glibc tools.
def parse_ucn(s):
if not (s.startswith("<U") and s.endswith(">")):
raise ValueError
return int(s[2:-1], base=16)
# Process a line of width output from utf_gen.py and update global array.
widths = [1] * (1 + 0x10FFFF)
def process_width(line):
# Example lines:
# <UA8FF> 0
# <UA926>...<UA92D> 0
s = line.split()
width = int(s[1])
r = s[0].split("...")
if len(r) == 1:
begin = parse_ucn(r[0])
end = begin + 1
elif len(r) == 2:
begin = parse_ucn(r[0])
end = parse_ucn(r[1]) + 1
else:
raise ValueError
widths[begin:end] = [width] * (end - begin)
# To keep things simple, we use glibc utf8_gen.py as-is. It only outputs to a
# file named UTF-8, which is not configurable. Then we parse this into the form
# we want it.
os.system("from_glibc/utf8_gen.py --unicode_version %s" % unicode_version)
processing = False
for line in open("UTF-8", "r"):
if processing:
if line == "END WIDTH\n":
processing = False
else:
try:
process_width(line)
except (ValueError, IndexError):
print(e, "warning: ignored unexpected line: %s" % line,
file=sys.stderr, end="")
elif line == "WIDTH\n":
processing = True
# All bytes < 256 we treat as width 1.
widths[0:255] = [1] * 255
# Condense the list to contiguous ranges.
cur_range = [-1, 1]
all_ranges = []
for i, width in enumerate(widths):
if width == cur_range[1]:
cur_range[0] = i
else:
all_ranges.append(cur_range)
cur_range = [i, width]
# Output the arrays for generated_cpp_wcwidth.h
print("/* Generated by contrib/unicode/gen_wcwidth.py,",
"with the help of glibc's")
print(" utf8_gen.py, using version %s" % unicode_version,
"of the Unicode standard. */")
print("\nstatic const cppchar_t wcwidth_range_ends[] = {", end="")
for i, r in enumerate(all_ranges):
if i % 8:
print(" ", end="")
else:
print("\n ", end="")
print("0x%x," % (r[0]), end="")
print("\n};\n")
print("static const unsigned char wcwidth_widths[] = {", end="")
for i, r in enumerate(all_ranges):
if i % 24:
print(" ", end="")
else:
print("\n ", end="")
print("%d," % r[1], end="")
print("\n};")
UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
Unicode Data Files include all data files under the directories
http://www.unicode.org/Public/, http://www.unicode.org/reports/, and
http://www.unicode.org/cldr/data/. Unicode Data Files do not include PDF
online code charts under the directory http://www.unicode.org/Public/.
Software includes any source code published in the Unicode Standard or under
the directories http://www.unicode.org/Public/,
http://www.unicode.org/reports/, and http://www.unicode.org/cldr/data/.
NOTICE TO USER: Carefully read the following legal agreement. BY
DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S DATA FILES
("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"), YOU UNEQUIVOCALLY ACCEPT, AND
AGREE TO BE BOUND BY, ALL OF THE TERMS AND CONDITIONS OF THIS AGREEMENT. IF
YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE THE DATA
FILES OR SOFTWARE.
COPYRIGHT AND PERMISSION NOTICE
Copyright © 1991-2013 Unicode, Inc. All rights reserved. Distributed under
the Terms of Use in http://www.unicode.org/copyright.html.
Permission is hereby granted, free of charge, to any person obtaining a
copy of the Unicode data files and any associated documentation (the "Data
Files") or Unicode software and any associated documentation (the "Software")
to deal in the Data Files or Software without restriction, including without
limitation the rights to use, copy, modify, merge, publish, distribute, and/or
sell copies of the Data Files or Software, and to permit persons to whom the
Data Files or Software are furnished to do so, provided that (a) the above
copyright notice(s) and this permission notice appear with all copies of the
Data Files or Software, (b) both the above copyright notice(s) and this
permission notice appear in associated documentation, and (c) there is clear
notice in each modified Data File or in the Software as well as in the
documentation associated with the Data File(s) or Software that the data or
software has been modified.
THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD
PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN
THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE
DATA FILES OR SOFTWARE.
Except as contained in this notice, the name of a copyright holder shall
not be used in advertising or otherwise to promote the sale, use or other
dealings in these Data Files or Software without prior written authorization
of the copyright holder.
2019-12-09 Lewis Hyatt <lhyatt@gmail.com>
PR preprocessor/49973
* input.c (location_compute_display_column): New function to help with
multibyte awareness in diagnostics.
(test_cpp_utf8): New self-test.
(input_c_tests): Call the new test.
* input.h (location_compute_display_column): Declare.
* diagnostic-show-locus.c: Pervasive changes to add multibyte awareness
to all classes and functions.
(enum column_unit): New enum.
(class exploc_with_display_col): New class.
(class layout_point): Convert m_column member to array m_columns[2].
(layout_range::contains_point): Add col_unit argument.
(test_layout_range_for_single_point): Pass new argument.
(test_layout_range_for_single_line): Likewise.
(test_layout_range_for_multiple_lines): Likewise.
(line_bounds::convert_to_display_cols): New function.
(layout::get_state_at_point): Add col_unit argument.
(make_range): Use empty filename rather than dummy filename.
(get_line_width_without_trailing_whitespace): Rename to...
(get_line_bytes_without_trailing_whitespace): ...this.
(test_get_line_width_without_trailing_whitespace): Rename to...
(test_get_line_bytes_without_trailing_whitespace): ...this.
(class layout): m_exploc changed to exploc_with_display_col from
plain expanded_location.
(layout::get_linenum_width): New accessor member function.
(layout::get_x_offset_display): Likewise.
(layout::calculate_linenum_width): New subroutine for the constuctor.
(layout::calculate_x_offset_display): Likewise.
(layout::layout): Use the new subroutines. Add multibyte awareness.
(layout::print_source_line): Add multibyte awareness.
(layout::print_line): Likewise.
(layout::print_annotation_line): Likewise.
(line_label::line_label): Likewise.
(layout::print_any_labels): Likewise.
(layout::annotation_line_showed_range_p): Likewise.
(get_printed_columns): Likewise.
(class line_label): Rename m_length to m_display_width.
(get_affected_columns): Rename to...
(get_affected_range): ...this; add col_unit argument and multibyte
awareness.
(class correction): Add m_affected_bytes and m_display_cols
members. Rename m_len to m_byte_length for clarity. Add multibyte
awareness throughout.
(correction::insertion_p): Add multibyte awareness.
(correction::compute_display_cols): New function.
(correction::ensure_terminated): Use new member name m_byte_length.
(line_corrections::add_hint): Add multibyte awareness.
(layout::print_trailing_fixits): Likewise.
(layout::get_x_bound_for_row): Likewise.
(test_one_liner_simple_caret_utf8): New self-test analogous to the one
with _utf8 suffix removed, testing multibyte awareness.
(test_one_liner_caret_and_range_utf8): Likewise.
(test_one_liner_multiple_carets_and_ranges_utf8): Likewise.
(test_one_liner_fixit_insert_before_utf8): Likewise.
(test_one_liner_fixit_insert_after_utf8): Likewise.
(test_one_liner_fixit_remove_utf8): Likewise.
(test_one_liner_fixit_replace_utf8): Likewise.
(test_one_liner_fixit_replace_non_equal_range_utf8): Likewise.
(test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise.
(test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
(test_one_liner_many_fixits_1_utf8): Likewise.
(test_one_liner_many_fixits_2_utf8): Likewise.
(test_one_liner_labels_utf8): Likewise.
(test_diagnostic_show_locus_one_liner_utf8): Likewise.
(test_overlapped_fixit_printing_utf8): Likewise.
(test_overlapped_fixit_printing): Adapt for changes to
get_affected_columns, get_printed_columns and class corrections.
(test_overlapped_fixit_printing_2): Likewise.
(test_linenum_sep): New constant.
(test_left_margin): Likewise.
(test_offset_impl): Helper function for new test.
(test_layout_x_offset_display_utf8): New test.
(diagnostic_show_locus_c_tests): Call new tests.
2019-12-09 Eric Botcazou <ebotcazou@adacore.com>
* tree.c (build_array_type_1): Add SET_CANONICAL parameter and compute
......@@ -908,6 +908,22 @@ make_location (location_t caret, source_range src_range)
return COMBINE_LOCATION_DATA (line_table, pure_loc, src_range, NULL);
}
/* An expanded_location stores the column in byte units. This function
converts that column to display units. That requires reading the associated
source line in order to calculate the display width. If that cannot be done
for any reason, then returns the byte column as a fallback. */
int
location_compute_display_column (expanded_location exploc)
{
if (!(exploc.file && *exploc.file && exploc.line && exploc.column))
return exploc.column;
char_span line = location_get_source_line (exploc.file, exploc.line);
/* If line is NULL, this function returns exploc.column which is the
desired fallback. */
return cpp_byte_column_to_display_column (line.get_buffer (), line.length (),
exploc.column);
}
/* Dump statistics to stderr about the memory usage of the line_table
set of line maps. This also displays some statistics about macro
expansion. */
......@@ -3590,6 +3606,93 @@ test_line_offset_overflow ()
ASSERT_NE (ordmap_a, ordmap_b);
}
void test_cpp_utf8 ()
{
/* Verify that wcwidth of invalid UTF-8 or control bytes is 1. */
{
int w_bad = cpp_display_width ("\xf0!\x9f!\x98!\x82!", 8);
ASSERT_EQ (8, w_bad);
int w_ctrl = cpp_display_width ("\r\t\n\v\0\1", 6);
ASSERT_EQ (6, w_ctrl);
}
/* Verify that wcwidth of valid UTF-8 is as expected. */
{
const int w_pi = cpp_display_width ("\xcf\x80", 2);
ASSERT_EQ (1, w_pi);
const int w_emoji = cpp_display_width ("\xf0\x9f\x98\x82", 4);
ASSERT_EQ (2, w_emoji);
const int w_umlaut_precomposed = cpp_display_width ("\xc3\xbf", 2);
ASSERT_EQ (1, w_umlaut_precomposed);
const int w_umlaut_combining = cpp_display_width ("y\xcc\x88", 3);
ASSERT_EQ (1, w_umlaut_combining);
const int w_han = cpp_display_width ("\xe4\xb8\xba", 3);
ASSERT_EQ (2, w_han);
const int w_ascii = cpp_display_width ("GCC", 3);
ASSERT_EQ (3, w_ascii);
const int w_mixed = cpp_display_width ("\xcf\x80 = 3.14 \xf0\x9f\x98\x82"
"\x9f! \xe4\xb8\xba y\xcc\x88", 24);
ASSERT_EQ (18, w_mixed);
}
/* Verify that cpp_byte_column_to_display_column can go past the end,
and similar edge cases. */
{
const char *str
/* Display columns.
111111112345 */
= "\xcf\x80 abc";
/* 111122223456
Byte columns. */
ASSERT_EQ (5, cpp_display_width (str, 6));
ASSERT_EQ (105, cpp_byte_column_to_display_column (str, 6, 106));
ASSERT_EQ (10000, cpp_byte_column_to_display_column (NULL, 0, 10000));
ASSERT_EQ (0, cpp_byte_column_to_display_column (NULL, 10000, 0));
}
/* Verify that cpp_display_column_to_byte_column can go past the end,
and similar edge cases, and check invertibility. */
{
const char *str
/* Display columns.
000000000000000000000000000000000000011
111111112222222234444444455555555678901 */
= "\xf0\x9f\x98\x82 \xf0\x9f\x98\x82 hello";
/* 000000000000000000000000000000000111111
111122223333444456666777788889999012345
Byte columns. */
ASSERT_EQ (4, cpp_display_column_to_byte_column (str, 15, 2));
ASSERT_EQ (15, cpp_display_column_to_byte_column (str, 15, 11));
ASSERT_EQ (115, cpp_display_column_to_byte_column (str, 15, 111));
ASSERT_EQ (10000, cpp_display_column_to_byte_column (NULL, 0, 10000));
ASSERT_EQ (0, cpp_display_column_to_byte_column (NULL, 10000, 0));
/* Verify that we do not interrupt a UTF-8 sequence. */
ASSERT_EQ (4, cpp_display_column_to_byte_column (str, 15, 1));
for (int byte_col = 1; byte_col <= 15; ++byte_col)
{
const int disp_col = cpp_byte_column_to_display_column (str, 15,
byte_col);
const int byte_col2 = cpp_display_column_to_byte_column (str, 15,
disp_col);
/* If we ask for the display column in the middle of a UTF-8
sequence, it will return the length of the partial sequence,
matching the behavior of GCC before display column support.
Otherwise check the round trip was successful. */
if (byte_col < 4)
ASSERT_EQ (byte_col, disp_col);
else if (byte_col >= 6 && byte_col < 9)
ASSERT_EQ (3 + (byte_col - 5), disp_col);
else
ASSERT_EQ (byte_col2, byte_col);
}
}
}
/* Run all of the selftests within this file. */
void
......@@ -3631,6 +3734,8 @@ input_c_tests ()
test_reading_source_line ();
test_line_offset_overflow ();
test_cpp_utf8 ();
}
} // namespace selftest
......
......@@ -38,6 +38,7 @@ STATIC_ASSERT (BUILTINS_LOCATION < RESERVED_LOCATION_COUNT);
extern bool is_location_from_builtin_token (location_t);
extern expanded_location expand_location (location_t);
extern int location_compute_display_column (expanded_location);
/* A class capturing the bounds of a buffer, to allow for run-time
bounds-checking in a checked build. */
......
2019-12-09 Lewis Hyatt <lhyatt@gmail.com>
PR preprocessor/49973
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
(test_show_locus): Tweak so that expected output is the same as
before the diagnostic-show-locus.c changes.
* gcc.dg/cpp/pr66415-1.c: Likewise.
2019-12-09 Eric Botcazou <ebotcazou@adacore.com>
* gnat.dg/lto23.adb: New test.
......
/* PR c/66415 */
/* { dg-do compile } */
/* { dg-options "-Wformat -fdiagnostics-show-caret" } */
/* { dg-set-compiler-env-var COLUMNS "82" } */
/* { dg-set-compiler-env-var COLUMNS "83" } */
void
fn1 (void)
......
......@@ -174,7 +174,7 @@ test_show_locus (function *fun)
/* Hardcode the "terminal width", to verify the behavior of
very wide lines. */
global_dc->caret_max_width = 70;
global_dc->caret_max_width = 71;
if (0 == strcmp (fnname, "test_simple"))
{
......
2019-12-09 Lewis Hyatt <lhyatt@gmail.com>
PR preprocessor/49973
* generated_cpp_wcwidth.h: New file generated by
../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function.
* charset.c (compute_next_display_width): New function to help
implement display columns.
(cpp_byte_column_to_display_column): Likewise.
(cpp_display_column_to_byte_column): Likewise.
(cpp_wcwidth): Likewise.
* include/cpplib.h (cpp_byte_column_to_display_column): Declare.
(cpp_display_column_to_byte_column): Declare.
(cpp_wcwidth): Declare.
(cpp_display_width): New function.
2019-11-14 Joseph Myers <joseph@codesourcery.com>
* charset.c (narrow_str_to_charconst): Make CPP_UTF8CHAR constants
......
......@@ -2265,3 +2265,106 @@ cpp_string_location_reader::get_next ()
m_loc += m_offset_per_column;
return result;
}
/* Helper for cpp_byte_column_to_display_column and its inverse. Given a
pointer to a UTF-8-encoded character, compute its display width. *INBUFP
points on entry to the start of the UTF-8 encoding of the character, and
is updated to point just after the last byte of the encoding. *INBYTESLEFTP
contains on entry the remaining size of the buffer into which *INBUFP
points, and this is also updated accordingly. If *INBUFP does not
point to a valid UTF-8-encoded sequence, then it will be treated as a single
byte with display width 1. */
static inline int
compute_next_display_width (const uchar **inbufp, size_t *inbytesleftp)
{
cppchar_t c;
if (one_utf8_to_cppchar (inbufp, inbytesleftp, &c) != 0)
{
/* Input is not convertible to UTF-8. This could be fine, e.g. in a
string literal, so don't complain. Just treat it as if it has a width
of one. */
++*inbufp;
--*inbytesleftp;
return 1;
}
/* one_utf8_to_cppchar() has updated inbufp and inbytesleftp for us. */
return cpp_wcwidth (c);
}
/* For the string of length DATA_LENGTH bytes that begins at DATA, compute
how many display columns are occupied by the first COLUMN bytes. COLUMN
may exceed DATA_LENGTH, in which case the phantom bytes at the end are
treated as if they have display width 1. */
int
cpp_byte_column_to_display_column (const char *data, int data_length,
int column)
{
int display_col = 0;
const uchar *udata = (const uchar *) data;
const int offset = MAX (0, column - data_length);
size_t inbytesleft = column - offset;
while (inbytesleft)
display_col += compute_next_display_width (&udata, &inbytesleft);
return display_col + offset;
}
/* For the string of length DATA_LENGTH bytes that begins at DATA, compute
the least number of bytes that will result in at least DISPLAY_COL display
columns. The return value may exceed DATA_LENGTH if the entire string does
not occupy enough display columns. */
int
cpp_display_column_to_byte_column (const char *data, int data_length,
int display_col)
{
int column = 0;
const uchar *udata = (const uchar *) data;
size_t inbytesleft = data_length;
while (column < display_col && inbytesleft)
column += compute_next_display_width (&udata, &inbytesleft);
return data_length - inbytesleft + MAX (0, display_col - column);
}
/* Our own version of wcwidth(). We don't use the actual wcwidth() in glibc,
because that will inspect the user's locale, and in particular in an ASCII
locale, it will not return anything useful for extended characters. But GCC
in other respects (see e.g. _cpp_default_encoding()) behaves as if
everything is UTF-8. We also make some tweaks that are useful for the way
GCC needs to use this data, e.g. tabs and other control characters should be
treated as having width 1. The lookup tables are generated from
contrib/unicode/gen_wcwidth.py and were made by simply calling glibc
wcwidth() on all codepoints, then applying the small tweaks. These tables
are not highly optimized, but for the present purpose of outputting
diagnostics, they are sufficient. */
#include "generated_cpp_wcwidth.h"
int cpp_wcwidth (cppchar_t c)
{
if (__builtin_expect (c <= wcwidth_range_ends[0], true))
return wcwidth_widths[0];
/* Binary search the tables. */
int begin = 1;
static const int end
= sizeof wcwidth_range_ends / sizeof (*wcwidth_range_ends);
int len = end - begin;
do
{
int half = len/2;
int middle = begin + half;
if (c > wcwidth_range_ends[middle])
{
begin = middle + 1;
len -= half + 1;
}
else
len = half;
} while (len);
if (__builtin_expect (begin != end, true))
return wcwidth_widths[begin];
return 1;
}
......@@ -1320,4 +1320,15 @@ extern bool cpp_userdef_char_p
extern const char * cpp_get_userdef_suffix
(const cpp_token *);
/* In charset.c */
int cpp_byte_column_to_display_column (const char *data, int data_length,
int column);
inline int cpp_display_width (const char *data, int data_length)
{
return cpp_byte_column_to_display_column (data, data_length, data_length);
}
int cpp_display_column_to_byte_column (const char *data, int data_length,
int display_col);
int cpp_wcwidth (cppchar_t c);
#endif /* ! LIBCPP_CPPLIB_H */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment