Byte vs column awareness for diagnostic-show-locus.c (PR 49973)

contrib/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * unicode/from_glibc/unicode_utils.py: Support script from glibc (commit 464cd3) to extract character widths from Unicode data files. * unicode/from_glibc/utf8_gen.py: Likewise. * unicode/UnicodeData.txt: Unicode v. 12.1.0 data file. * unicode/EastAsianWidth.txt: Likewise. * unicode/PropList.txt: Likewise. * unicode/gen_wcwidth.py: New utility to generate libcpp/generated_cpp_wcwidth.h with help from the glibc support scripts and the Unicode data files. * unicode/unicode-license.txt: Added. * unicode/README: New explanatory file. libcpp/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * generated_cpp_wcwidth.h: New file generated by ../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function. * charset.c (compute_next_display_width): New function to help implement display columns. (cpp_byte_column_to_display_column): Likewise. (cpp_display_column_to_byte_column): Likewise. (cpp_wcwidth): Likewise. * include/cpplib.h (cpp_byte_column_to_display_column): Declare. (cpp_display_column_to_byte_column): Declare. (cpp_wcwidth): Declare. (cpp_display_width): New function. gcc/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * input.c (location_compute_display_column): New function to help with multibyte awareness in diagnostics. (test_cpp_utf8): New self-test. (input_c_tests): Call the new test. * input.h (location_compute_display_column): Declare. * diagnostic-show-locus.c: Pervasive changes to add multibyte awareness to all classes and functions. (enum column_unit): New enum. (class exploc_with_display_col): New class. (class layout_point): Convert m_column member to array m_columns[2]. (layout_range::contains_point): Add col_unit argument. (test_layout_range_for_single_point): Pass new argument. (test_layout_range_for_single_line): Likewise. (test_layout_range_for_multiple_lines): Likewise. (line_bounds::convert_to_display_cols): New function. (layout::get_state_at_point): Add col_unit argument. (make_range): Use empty filename rather than dummy filename. (get_line_width_without_trailing_whitespace): Rename to... (get_line_bytes_without_trailing_whitespace): ...this. (test_get_line_width_without_trailing_whitespace): Rename to... (test_get_line_bytes_without_trailing_whitespace): ...this. (class layout): m_exploc changed to exploc_with_display_col from plain expanded_location. (layout::get_linenum_width): New accessor member function. (layout::get_x_offset_display): Likewise. (layout::calculate_linenum_width): New subroutine for the constuctor. (layout::calculate_x_offset_display): Likewise. (layout::layout): Use the new subroutines. Add multibyte awareness. (layout::print_source_line): Add multibyte awareness. (layout::print_line): Likewise. (layout::print_annotation_line): Likewise. (line_label::line_label): Likewise. (layout::print_any_labels): Likewise. (layout::annotation_line_showed_range_p): Likewise. (get_printed_columns): Likewise. (class line_label): Rename m_length to m_display_width. (get_affected_columns): Rename to... (get_affected_range): ...this; add col_unit argument and multibyte awareness. (class correction): Add m_affected_bytes and m_display_cols members. Rename m_len to m_byte_length for clarity. Add multibyte awareness throughout. (correction::insertion_p): Add multibyte awareness. (correction::compute_display_cols): New function. (correction::ensure_terminated): Use new member name m_byte_length. (line_corrections::add_hint): Add multibyte awareness. (layout::print_trailing_fixits): Likewise. (layout::get_x_bound_for_row): Likewise. (test_one_liner_simple_caret_utf8): New self-test analogous to the one with _utf8 suffix removed, testing multibyte awareness. (test_one_liner_caret_and_range_utf8): Likewise. (test_one_liner_multiple_carets_and_ranges_utf8): Likewise. (test_one_liner_fixit_insert_before_utf8): Likewise. (test_one_liner_fixit_insert_after_utf8): Likewise. (test_one_liner_fixit_remove_utf8): Likewise. (test_one_liner_fixit_replace_utf8): Likewise. (test_one_liner_fixit_replace_non_equal_range_utf8): Likewise. (test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise. (test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise. (test_one_liner_many_fixits_1_utf8): Likewise. (test_one_liner_many_fixits_2_utf8): Likewise. (test_one_liner_labels_utf8): Likewise. (test_diagnostic_show_locus_one_liner_utf8): Likewise. (test_overlapped_fixit_printing_utf8): Likewise. (test_overlapped_fixit_printing): Adapt for changes to get_affected_columns, get_printed_columns and class corrections. (test_overlapped_fixit_printing_2): Likewise. (test_linenum_sep): New constant. (test_left_margin): Likewise. (test_offset_impl): Helper function for new test. (test_layout_x_offset_display_utf8): New test. (diagnostic_show_locus_c_tests): Call new tests. gcc/testsuite/ChangeLog: 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * gcc.dg/plugin/diagnostic_plugin_test_show_locus.c (test_show_locus): Tweak so that expected output is the same as before the diagnostic-show-locus.c changes. * gcc.dg/cpp/pr66415-1.c: Likewise. From-SVN: r279137

Byte vs column awareness for diagnostic-show-locus.c (PR 49973)
contrib/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * unicode/from_glibc/unicode_utils.py: Support script from glibc (commit 464cd3) to extract character widths from Unicode data files. * unicode/from_glibc/utf8_gen.py: Likewise. * unicode/UnicodeData.txt: Unicode v. 12.1.0 data file. * unicode/EastAsianWidth.txt: Likewise. * unicode/PropList.txt: Likewise. * unicode/gen_wcwidth.py: New utility to generate libcpp/generated_cpp_wcwidth.h with help from the glibc support scripts and the Unicode data files. * unicode/unicode-license.txt: Added. * unicode/README: New explanatory file. libcpp/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * generated_cpp_wcwidth.h: New file generated by ../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function. * charset.c (compute_next_display_width): New function to help implement display columns. (cpp_byte_column_to_display_column): Likewise. (cpp_display_column_to_byte_column): Likewise. (cpp_wcwidth): Likewise. * include/cpplib.h (cpp_byte_column_to_display_column): Declare. (cpp_display_column_to_byte_column): Declare. (cpp_wcwidth): Declare. (cpp_display_width): New function. gcc/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * input.c (location_compute_display_column): New function to help with multibyte awareness in diagnostics. (test_cpp_utf8): New self-test. (input_c_tests): Call the new test. * input.h (location_compute_display_column): Declare. * diagnostic-show-locus.c: Pervasive changes to add multibyte awareness to all classes and functions. (enum column_unit): New enum. (class exploc_with_display_col): New class. (class layout_point): Convert m_column member to array m_columns[2]. (layout_range::contains_point): Add col_unit argument. (test_layout_range_for_single_point): Pass new argument. (test_layout_range_for_single_line): Likewise. (test_layout_range_for_multiple_lines): Likewise. (line_bounds::convert_to_display_cols): New function. (layout::get_state_at_point): Add col_unit argument. (make_range): Use empty filename rather than dummy filename. (get_line_width_without_trailing_whitespace): Rename to... (get_line_bytes_without_trailing_whitespace): ...this. (test_get_line_width_without_trailing_whitespace): Rename to... (test_get_line_bytes_without_trailing_whitespace): ...this. (class layout): m_exploc changed to exploc_with_display_col from plain expanded_location. (layout::get_linenum_width): New accessor member function. (layout::get_x_offset_display): Likewise. (layout::calculate_linenum_width): New subroutine for the constuctor. (layout::calculate_x_offset_display): Likewise. (layout::layout): Use the new subroutines. Add multibyte awareness. (layout::print_source_line): Add multibyte awareness. (layout::print_line): Likewise. (layout::print_annotation_line): Likewise. (line_label::line_label): Likewise. (layout::print_any_labels): Likewise. (layout::annotation_line_showed_range_p): Likewise. (get_printed_columns): Likewise. (class line_label): Rename m_length to m_display_width. (get_affected_columns): Rename to... (get_affected_range): ...this; add col_unit argument and multibyte awareness. (class correction): Add m_affected_bytes and m_display_cols members. Rename m_len to m_byte_length for clarity. Add multibyte awareness throughout. (correction::insertion_p): Add multibyte awareness. (correction::compute_display_cols): New function. (correction::ensure_terminated): Use new member name m_byte_length. (line_corrections::add_hint): Add multibyte awareness. (layout::print_trailing_fixits): Likewise. (layout::get_x_bound_for_row): Likewise. (test_one_liner_simple_caret_utf8): New self-test analogous to the one with _utf8 suffix removed, testing multibyte awareness. (test_one_liner_caret_and_range_utf8): Likewise. (test_one_liner_multiple_carets_and_ranges_utf8): Likewise. (test_one_liner_fixit_insert_before_utf8): Likewise. (test_one_liner_fixit_insert_after_utf8): Likewise. (test_one_liner_fixit_remove_utf8): Likewise. (test_one_liner_fixit_replace_utf8): Likewise. (test_one_liner_fixit_replace_non_equal_range_utf8): Likewise. (test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise. (test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise. (test_one_liner_many_fixits_1_utf8): Likewise. (test_one_liner_many_fixits_2_utf8): Likewise. (test_one_liner_labels_utf8): Likewise. (test_diagnostic_show_locus_one_liner_utf8): Likewise. (test_overlapped_fixit_printing_utf8): Likewise. (test_overlapped_fixit_printing): Adapt for changes to get_affected_columns, get_printed_columns and class corrections. (test_overlapped_fixit_printing_2): Likewise. (test_linenum_sep): New constant. (test_left_margin): Likewise. (test_offset_impl): Helper function for new test. (test_layout_x_offset_display_utf8): New test. (diagnostic_show_locus_c_tests): Call new tests. gcc/testsuite/ChangeLog: 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * gcc.dg/plugin/diagnostic_plugin_test_show_locus.c (test_show_locus): Tweak so that expected output is the same as before the diagnostic-show-locus.c changes. * gcc.dg/cpp/pr66415-1.c: Likewise. From-SVN: r279137
ee925640 · Lewis Hyatt · David Malcolm · 763c9f4a · ee925640 · ee925640
Commit ee925640 authored Dec 09, 2019 by Lewis Hyatt Committed by David Malcolm Dec 09, 2019
20 changed files
--- a/contrib/ChangeLog
+++ b/contrib/ChangeLog
+2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>
+
+	PR preprocessor/49973
+	* unicode/from_glibc/unicode_utils.py: Support script from
+	glibc (commit 464cd3) to extract character widths from Unicode data
+	files.
+	* unicode/from_glibc/utf8_gen.py: Likewise.
+	* unicode/UnicodeData.txt: Unicode v. 12.1.0 data file.
+	* unicode/EastAsianWidth.txt: Likewise.
+	* unicode/PropList.txt: Likewise.
+	* unicode/gen_wcwidth.py: New utility to generate
+	libcpp/generated_cpp_wcwidth.h with help from the glibc support
+	scripts and the Unicode data files.
+	* unicode/unicode-license.txt: Added.
+	* unicode/README: New explanatory file.
+
 2019-12-07  Richard Sandiford  <richard.sandiford@arm.com>

 	* texi2pod.pl: Handle @headitems in @multitables, printing them

--- a/contrib/unicode/EastAsianWidth.txt
+++ b/contrib/unicode/EastAsianWidth.txt
--- a/contrib/unicode/PropList.txt
+++ b/contrib/unicode/PropList.txt
--- a/contrib/unicode/README
+++ b/contrib/unicode/README
+This directory contains a mechanism for GCC to have its own internal
+implementation of wcwidth functionality.  (cpp_wcwidth () in libcpp/charset.c).
+
+The idea is to produce the necessary lookup table
+(../../libcpp/generated_cpp_wcwidth.h) in a reproducible way, starting from the
+following files that are distributed by the Unicode Consortium:
+
+ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
+ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
+ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
+
+These three files have been added to source control in this directory;
+please see unicode-license.txt for the relevant copyright information.
+
+In order to keep in sync with glibc's wcwidth as much as possible, it is
+desirable for the logic that processes the Unicode data to be the same as
+glibc's.  To that end, we also put in this directory, in the from_glibc/
+directory, the glibc python code that implements their logic.  This code was
+copied verbatim from glibc, and it can be updated at any time from the glibc
+source code repository.  The files copied from that respository are:
+
+localedata/unicode-gen/unicode_utils.py
+localedata/unicode-gen/utf8_gen.py
+
+And the most recent versions added to GCC are from glibc git commit:
+2a764c6ee848dfe92cb2921ed3b14085f15d9e79
+
+Finally, the script gen_wcwidth.py found here contains the GCC-specific code to
+map glibc's output to the lookup tables we require.  This script should not need
+to change, unless there are structural changes to the Unicode data files or to
+the glibc code.
+
+The procedure to update GCC's wcwidth tables is the following:
+
+1.  Update the three Unicode data files from the above URLs.
+
+2.  Update the two glibc files in from_glibc/ from glibc's git.  Update
+    the commit number above in this README.
+
+3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
+    (where X.Y is the version of the Unicode standard corresponding to the
+    Unicode data files being used, most recently, 12.1).
+
+After that, GCC's wcwidth will match the most recent glibc.
--- a/contrib/unicode/UnicodeData.txt
+++ b/contrib/unicode/UnicodeData.txt
--- a/contrib/unicode/from_glibc/unicode_utils.py
+++ b/contrib/unicode/from_glibc/unicode_utils.py
--- a/contrib/unicode/from_glibc/utf8_gen.py
+++ b/contrib/unicode/from_glibc/utf8_gen.py
--- a/contrib/unicode/gen_wcwidth.py
+++ b/contrib/unicode/gen_wcwidth.py
+#!/usr/bin/env python3
+#
+# Script to generate tables for cpp_wcwidth, leveraging glibc's utf8_gen.py.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.  */
+
+import sys
+import os
+
+if len(sys.argv) != 2:
+    print("usage: %s <unicode version>", file=sys.stderr)
+    sys.exit(1)
+unicode_version = sys.argv[1]
+
+# Parse a codepoint in the format output by glibc tools.
+def parse_ucn(s):
+    if not (s.startswith("<U") and s.endswith(">")):
+        raise ValueError
+    return int(s[2:-1], base=16)
+
+# Process a line of width output from utf_gen.py and update global array.
+widths = [1] * (1 + 0x10FFFF)
+def process_width(line):
+    # Example lines:
+    # <UA8FF>	0
+    # <UA926>...<UA92D>	0
+
+    s = line.split()
+    width = int(s[1])
+    r = s[0].split("...")
+    if len(r) == 1:
+        begin = parse_ucn(r[0])
+        end = begin + 1
+    elif len(r) == 2:
+        begin = parse_ucn(r[0])
+        end = parse_ucn(r[1]) + 1
+    else:
+        raise ValueError
+    widths[begin:end] = [width] * (end - begin)
+
+# To keep things simple, we use glibc utf8_gen.py as-is.  It only outputs to a
+# file named UTF-8, which is not configurable.  Then we parse this into the form
+# we want it.
+os.system("from_glibc/utf8_gen.py --unicode_version %s" % unicode_version)
+processing = False
+for line in open("UTF-8", "r"):
+    if processing:
+        if line == "END WIDTH\n":
+            processing = False
+        else:
+            try:
+                process_width(line)
+            except (ValueError, IndexError):
+                print(e, "warning: ignored unexpected line: %s" % line,
+                        file=sys.stderr, end="")
+    elif line == "WIDTH\n":
+        processing = True
+
+# All bytes < 256 we treat as width 1.
+widths[0:255] = [1] * 255
+
+# Condense the list to contiguous ranges.
+cur_range = [-1, 1]
+all_ranges = []
+for i, width in enumerate(widths):
+    if width == cur_range[1]:
+        cur_range[0] = i
+    else:
+        all_ranges.append(cur_range)
+        cur_range = [i, width]
+
+# Output the arrays for generated_cpp_wcwidth.h
+print("/*  Generated by contrib/unicode/gen_wcwidth.py,",
+          "with the help of glibc's")
+print("    utf8_gen.py, using version %s" % unicode_version,
+          "of the Unicode standard.  */")
+print("\nstatic const cppchar_t wcwidth_range_ends[] = {", end="")
+for i, r in enumerate(all_ranges):
+    if i % 8:
+        print(" ", end="")
+    else:
+        print("\n  ", end="")
+    print("0x%x," % (r[0]), end="")
+print("\n};\n")
+print("static const unsigned char wcwidth_widths[] = {", end="")
+for i, r in enumerate(all_ranges):
+    if i % 24:
+        print(" ", end="")
+    else:
+        print("\n  ", end="")
+    print("%d," % r[1], end="")
+print("\n};")
--- a/contrib/unicode/unicode-license.txt
+++ b/contrib/unicode/unicode-license.txt
+UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
+
+    Unicode Data Files include all data files under the directories
+http://www.unicode.org/Public/, http://www.unicode.org/reports/, and
+http://www.unicode.org/cldr/data/. Unicode Data Files do not include PDF
+online code charts under the directory http://www.unicode.org/Public/.
+Software includes any source code published in the Unicode Standard or under
+the directories http://www.unicode.org/Public/,
+http://www.unicode.org/reports/, and http://www.unicode.org/cldr/data/.
+
+    NOTICE TO USER: Carefully read the following legal agreement. BY
+DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S DATA FILES
+("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"), YOU UNEQUIVOCALLY ACCEPT, AND
+AGREE TO BE BOUND BY, ALL OF THE TERMS AND CONDITIONS OF THIS AGREEMENT. IF
+YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE THE DATA
+FILES OR SOFTWARE.
+
+    COPYRIGHT AND PERMISSION NOTICE
+
+    Copyright © 1991-2013 Unicode, Inc. All rights reserved. Distributed under
+the Terms of Use in http://www.unicode.org/copyright.html.
+
+    Permission is hereby granted, free of charge, to any person obtaining a
+copy of the Unicode data files and any associated documentation (the "Data
+Files") or Unicode software and any associated documentation (the "Software")
+to deal in the Data Files or Software without restriction, including without
+limitation the rights to use, copy, modify, merge, publish, distribute, and/or
+sell copies of the Data Files or Software, and to permit persons to whom the
+Data Files or Software are furnished to do so, provided that (a) the above
+copyright notice(s) and this permission notice appear with all copies of the
+Data Files or Software, (b) both the above copyright notice(s) and this
+permission notice appear in associated documentation, and (c) there is clear
+notice in each modified Data File or in the Software as well as in the
+documentation associated with the Data File(s) or Software that the data or
+software has been modified.
+
+    THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
+KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD
+PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN
+THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
+DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
+PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
+ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE
+DATA FILES OR SOFTWARE.
+
+    Except as contained in this notice, the name of a copyright holder shall
+not be used in advertising or otherwise to promote the sale, use or other
+dealings in these Data Files or Software without prior written authorization
+of the copyright holder.
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
+2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>
+
+	PR preprocessor/49973
+	* input.c (location_compute_display_column): New function to help with
+	multibyte awareness in diagnostics.
+	(test_cpp_utf8): New self-test.
+	(input_c_tests): Call the new test.
+	* input.h (location_compute_display_column): Declare.
+	* diagnostic-show-locus.c: Pervasive changes to add multibyte awareness
+	to all classes and functions.
+	(enum column_unit): New enum.
+	(class exploc_with_display_col): New class.
+	(class layout_point): Convert m_column member to array m_columns[2].
+	(layout_range::contains_point): Add col_unit argument.
+	(test_layout_range_for_single_point): Pass new argument.
+	(test_layout_range_for_single_line): Likewise.
+	(test_layout_range_for_multiple_lines): Likewise.
+	(line_bounds::convert_to_display_cols): New function.
+	(layout::get_state_at_point): Add col_unit argument.
+	(make_range): Use empty filename rather than dummy filename.
+	(get_line_width_without_trailing_whitespace): Rename to...
+	(get_line_bytes_without_trailing_whitespace): ...this.
+	(test_get_line_width_without_trailing_whitespace): Rename to...
+	(test_get_line_bytes_without_trailing_whitespace): ...this.
+	(class layout): m_exploc changed to exploc_with_display_col from
+	plain expanded_location.
+	(layout::get_linenum_width): New accessor member function.
+	(layout::get_x_offset_display): Likewise.
+	(layout::calculate_linenum_width): New subroutine for the constuctor.
+	(layout::calculate_x_offset_display): Likewise.
+	(layout::layout): Use the new subroutines. Add multibyte awareness.
+	(layout::print_source_line): Add multibyte awareness.
+	(layout::print_line): Likewise.
+	(layout::print_annotation_line): Likewise.
+	(line_label::line_label): Likewise.
+	(layout::print_any_labels): Likewise.
+	(layout::annotation_line_showed_range_p): Likewise.
+	(get_printed_columns): Likewise.
+	(class line_label): Rename m_length to m_display_width.
+	(get_affected_columns): Rename to...
+	(get_affected_range): ...this; add col_unit argument and multibyte
+	awareness.
+	(class correction): Add m_affected_bytes and m_display_cols
+	members.  Rename m_len to m_byte_length for clarity.  Add multibyte
+	awareness throughout.
+	(correction::insertion_p): Add multibyte awareness.
+	(correction::compute_display_cols): New function.
+	(correction::ensure_terminated): Use new member name m_byte_length.
+	(line_corrections::add_hint): Add multibyte awareness.
+	(layout::print_trailing_fixits): Likewise.
+	(layout::get_x_bound_for_row): Likewise.
+	(test_one_liner_simple_caret_utf8): New self-test analogous to the one
+	with _utf8 suffix removed, testing multibyte awareness.
+	(test_one_liner_caret_and_range_utf8): Likewise.
+	(test_one_liner_multiple_carets_and_ranges_utf8): Likewise.
+	(test_one_liner_fixit_insert_before_utf8): Likewise.
+	(test_one_liner_fixit_insert_after_utf8): Likewise.
+	(test_one_liner_fixit_remove_utf8): Likewise.
+	(test_one_liner_fixit_replace_utf8): Likewise.
+	(test_one_liner_fixit_replace_non_equal_range_utf8): Likewise.
+	(test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise.
+	(test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
+	(test_one_liner_many_fixits_1_utf8): Likewise.
+	(test_one_liner_many_fixits_2_utf8): Likewise.
+	(test_one_liner_labels_utf8): Likewise.
+	(test_diagnostic_show_locus_one_liner_utf8): Likewise.
+	(test_overlapped_fixit_printing_utf8): Likewise.
+	(test_overlapped_fixit_printing): Adapt for changes to
+	get_affected_columns, get_printed_columns and class corrections.
+	(test_overlapped_fixit_printing_2): Likewise.
+	(test_linenum_sep): New constant.
+	(test_left_margin): Likewise.
+	(test_offset_impl): Helper function for new test.
+	(test_layout_x_offset_display_utf8): New test.
+	(diagnostic_show_locus_c_tests): Call new tests.
+
 2019-12-09  Eric Botcazou  <ebotcazou@adacore.com>

 	* tree.c (build_array_type_1): Add SET_CANONICAL parameter and compute
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -908,6 +908,22 @@ make_location (location_t caret, source_range src_range)
  return COMBINE_LOCATION_DATA (line_table, pure_loc, src_range, NULL);
 }

+/* An expanded_location stores the column in byte units.  This function
+   converts that column to display units.  That requires reading the associated
+   source line in order to calculate the display width.  If that cannot be done
+   for any reason, then returns the byte column as a fallback.  */
+int
+location_compute_display_column (expanded_location exploc)
+{
+  if (!(exploc.file && *exploc.file && exploc.line && exploc.column))
+    return exploc.column;
+  char_span line = location_get_source_line (exploc.file, exploc.line);
+  /* If line is NULL, this function returns exploc.column which is the
+     desired fallback.  */
+  return cpp_byte_column_to_display_column (line.get_buffer (), line.length (),
+					    exploc.column);
+}
+
 /* Dump statistics to stderr about the memory usage of the line_table
   set of line maps.  This also displays some statistics about macro
   expansion.  */
@@ -3590,6 +3606,93 @@ test_line_offset_overflow ()
  ASSERT_NE (ordmap_a, ordmap_b);
 }

+void test_cpp_utf8 ()
+{
+  /* Verify that wcwidth of invalid UTF-8 or control bytes is 1.  */
+  {
+    int w_bad = cpp_display_width ("\xf0!\x9f!\x98!\x82!", 8);
+    ASSERT_EQ (8, w_bad);
+    int w_ctrl = cpp_display_width ("\r\t\n\v\0\1", 6);
+    ASSERT_EQ (6, w_ctrl);
+  }
+
+  /* Verify that wcwidth of valid UTF-8 is as expected.  */
+  {
+    const int w_pi = cpp_display_width ("\xcf\x80", 2);
+    ASSERT_EQ (1, w_pi);
+    const int w_emoji = cpp_display_width ("\xf0\x9f\x98\x82", 4);
+    ASSERT_EQ (2, w_emoji);
+    const int w_umlaut_precomposed = cpp_display_width ("\xc3\xbf", 2);
+    ASSERT_EQ (1, w_umlaut_precomposed);
+    const int w_umlaut_combining = cpp_display_width ("y\xcc\x88", 3);
+    ASSERT_EQ (1, w_umlaut_combining);
+    const int w_han = cpp_display_width ("\xe4\xb8\xba", 3);
+    ASSERT_EQ (2, w_han);
+    const int w_ascii = cpp_display_width ("GCC", 3);
+    ASSERT_EQ (3, w_ascii);
+    const int w_mixed = cpp_display_width ("\xcf\x80 = 3.14 \xf0\x9f\x98\x82"
+					   "\x9f! \xe4\xb8\xba y\xcc\x88", 24);
+    ASSERT_EQ (18, w_mixed);
+  }
+
+  /* Verify that cpp_byte_column_to_display_column can go past the end,
+     and similar edge cases.  */
+  {
+    const char *str
+      /* Display columns.
+         111111112345  */
+      = "\xcf\x80 abc";
+      /* 111122223456
+	 Byte columns.  */
+
+    ASSERT_EQ (5, cpp_display_width (str, 6));
+    ASSERT_EQ (105, cpp_byte_column_to_display_column (str, 6, 106));
+    ASSERT_EQ (10000, cpp_byte_column_to_display_column (NULL, 0, 10000));
+    ASSERT_EQ (0, cpp_byte_column_to_display_column (NULL, 10000, 0));
+  }
+
+  /* Verify that cpp_display_column_to_byte_column can go past the end,
+     and similar edge cases, and check invertibility.  */
+  {
+    const char *str
+      /* Display columns.
+	 000000000000000000000000000000000000011
+	 111111112222222234444444455555555678901  */
+      = "\xf0\x9f\x98\x82 \xf0\x9f\x98\x82 hello";
+      /* 000000000000000000000000000000000111111
+	 111122223333444456666777788889999012345
+	 Byte columns.  */
+    ASSERT_EQ (4, cpp_display_column_to_byte_column (str, 15, 2));
+    ASSERT_EQ (15, cpp_display_column_to_byte_column (str, 15, 11));
+    ASSERT_EQ (115, cpp_display_column_to_byte_column (str, 15, 111));
+    ASSERT_EQ (10000, cpp_display_column_to_byte_column (NULL, 0, 10000));
+    ASSERT_EQ (0, cpp_display_column_to_byte_column (NULL, 10000, 0));
+
+    /* Verify that we do not interrupt a UTF-8 sequence.  */
+    ASSERT_EQ (4, cpp_display_column_to_byte_column (str, 15, 1));
+
+    for (int byte_col = 1; byte_col <= 15; ++byte_col)
+      {
+	const int disp_col = cpp_byte_column_to_display_column (str, 15,
+								byte_col);
+	const int byte_col2 = cpp_display_column_to_byte_column (str, 15,
+								 disp_col);
+
+	/* If we ask for the display column in the middle of a UTF-8
+	   sequence, it will return the length of the partial sequence,
+	   matching the behavior of GCC before display column support.
+	   Otherwise check the round trip was successful.  */
+	if (byte_col < 4)
+	  ASSERT_EQ (byte_col, disp_col);
+	else if (byte_col >= 6 && byte_col < 9)
+	  ASSERT_EQ (3 + (byte_col - 5), disp_col);
+	else
+	  ASSERT_EQ (byte_col2, byte_col);
+      }
+  }
+
+}
+
 /* Run all of the selftests within this file.  */

 void
@@ -3631,6 +3734,8 @@ input_c_tests ()
  test_reading_source_line ();

  test_line_offset_overflow ();
+
+  test_cpp_utf8 ();
 }

 } // namespace selftest

--- a/gcc/input.h
+++ b/gcc/input.h
@@ -38,6 +38,7 @@ STATIC_ASSERT (BUILTINS_LOCATION < RESERVED_LOCATION_COUNT);

 extern bool is_location_from_builtin_token (location_t);
 extern expanded_location expand_location (location_t);
+extern int location_compute_display_column (expanded_location);

 /* A class capturing the bounds of a buffer, to allow for run-time
   bounds-checking in a checked build.  */

--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
+2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>
+
+	PR preprocessor/49973
+	* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
+	(test_show_locus): Tweak so that expected output is the same as
+	before the diagnostic-show-locus.c changes.
+	* gcc.dg/cpp/pr66415-1.c: Likewise.
+
 2019-12-09  Eric Botcazou  <ebotcazou@adacore.com>

 	* gnat.dg/lto23.adb: New test.

--- a/gcc/testsuite/gcc.dg/cpp/pr66415-1.c
+++ b/gcc/testsuite/gcc.dg/cpp/pr66415-1.c
 /* PR c/66415 */
 /* { dg-do compile } */
 /* { dg-options "-Wformat -fdiagnostics-show-caret" } */
-/* { dg-set-compiler-env-var COLUMNS "82" } */
+/* { dg-set-compiler-env-var COLUMNS "83" } */

 void
 fn1 (void)

--- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
@@ -174,7 +174,7 @@ test_show_locus (function *fun)

  /* Hardcode the "terminal width", to verify the behavior of
     very wide lines.  */
-  global_dc->caret_max_width = 70;
+  global_dc->caret_max_width = 71;

  if (0 == strcmp (fnname, "test_simple"))
    {

--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
+2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>
+
+	PR preprocessor/49973
+	* generated_cpp_wcwidth.h: New file generated by
+	../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function.
+	* charset.c (compute_next_display_width): New function to help
+	implement display columns.
+	(cpp_byte_column_to_display_column): Likewise.
+	(cpp_display_column_to_byte_column): Likewise.
+	(cpp_wcwidth): Likewise.
+	* include/cpplib.h (cpp_byte_column_to_display_column): Declare.
+	(cpp_display_column_to_byte_column): Declare.
+	(cpp_wcwidth): Declare.
+	(cpp_display_width): New function.
+
 2019-11-14  Joseph Myers  <joseph@codesourcery.com>

 	* charset.c (narrow_str_to_charconst): Make CPP_UTF8CHAR constants

--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -2265,3 +2265,106 @@ cpp_string_location_reader::get_next ()
    m_loc += m_offset_per_column;
  return result;
 }
+
+/* Helper for cpp_byte_column_to_display_column and its inverse.  Given a
+   pointer to a UTF-8-encoded character, compute its display width.  *INBUFP
+   points on entry to the start of the UTF-8 encoding of the character, and
+   is updated to point just after the last byte of the encoding.  *INBYTESLEFTP
+   contains on entry the remaining size of the buffer into which *INBUFP
+   points, and this is also updated accordingly.  If *INBUFP does not
+   point to a valid UTF-8-encoded sequence, then it will be treated as a single
+   byte with display width 1.  */
+
+static inline int
+compute_next_display_width (const uchar **inbufp, size_t *inbytesleftp)
+{
+  cppchar_t c;
+  if (one_utf8_to_cppchar (inbufp, inbytesleftp, &c) != 0)
+    {
+      /* Input is not convertible to UTF-8.  This could be fine, e.g. in a
+	 string literal, so don't complain.  Just treat it as if it has a width
+	 of one.  */
+      ++*inbufp;
+      --*inbytesleftp;
+      return 1;
+    }
+
+  /*  one_utf8_to_cppchar() has updated inbufp and inbytesleftp for us.  */
+  return cpp_wcwidth (c);
+}
+
+/*  For the string of length DATA_LENGTH bytes that begins at DATA, compute
+    how many display columns are occupied by the first COLUMN bytes.  COLUMN
+    may exceed DATA_LENGTH, in which case the phantom bytes at the end are
+    treated as if they have display width 1.  */
+
+int
+cpp_byte_column_to_display_column (const char *data, int data_length,
+				   int column)
+{
+  int display_col = 0;
+  const uchar *udata = (const uchar *) data;
+  const int offset = MAX (0, column - data_length);
+  size_t inbytesleft = column - offset;
+  while (inbytesleft)
+    display_col += compute_next_display_width (&udata, &inbytesleft);
+  return display_col + offset;
+}
+
+/*  For the string of length DATA_LENGTH bytes that begins at DATA, compute
+    the least number of bytes that will result in at least DISPLAY_COL display
+    columns.  The return value may exceed DATA_LENGTH if the entire string does
+    not occupy enough display columns.  */
+
+int
+cpp_display_column_to_byte_column (const char *data, int data_length,
+				   int display_col)
+{
+  int column = 0;
+  const uchar *udata = (const uchar *) data;
+  size_t inbytesleft = data_length;
+  while (column < display_col && inbytesleft)
+      column += compute_next_display_width (&udata, &inbytesleft);
+  return data_length - inbytesleft + MAX (0, display_col - column);
+}
+
+/* Our own version of wcwidth().  We don't use the actual wcwidth() in glibc,
+   because that will inspect the user's locale, and in particular in an ASCII
+   locale, it will not return anything useful for extended characters.  But GCC
+   in other respects (see e.g. _cpp_default_encoding()) behaves as if
+   everything is UTF-8.  We also make some tweaks that are useful for the way
+   GCC needs to use this data, e.g. tabs and other control characters should be
+   treated as having width 1.  The lookup tables are generated from
+   contrib/unicode/gen_wcwidth.py and were made by simply calling glibc
+   wcwidth() on all codepoints, then applying the small tweaks.  These tables
+   are not highly optimized, but for the present purpose of outputting
+   diagnostics, they are sufficient.  */
+
+#include "generated_cpp_wcwidth.h"
+int cpp_wcwidth (cppchar_t c)
+{
+  if (__builtin_expect (c <= wcwidth_range_ends[0], true))
+    return wcwidth_widths[0];
+
+  /* Binary search the tables.  */
+  int begin = 1;
+  static const int end
+      = sizeof wcwidth_range_ends / sizeof (*wcwidth_range_ends);
+  int len = end - begin;
+  do
+    {
+      int half = len/2;
+      int middle = begin + half;
+      if (c > wcwidth_range_ends[middle])
+	{
+	  begin = middle + 1;
+	  len -= half + 1;
+	}
+      else
+	len = half;
+    } while (len);
+
+  if (__builtin_expect (begin != end, true))
+    return wcwidth_widths[begin];
+  return 1;
+}
--- a/libcpp/generated_cpp_wcwidth.h
+++ b/libcpp/generated_cpp_wcwidth.h
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -1320,4 +1320,15 @@ extern bool cpp_userdef_char_p
 extern const char * cpp_get_userdef_suffix
  (const cpp_token *);

+/* In charset.c */
+int cpp_byte_column_to_display_column (const char *data, int data_length,
+				       int column);
+inline int cpp_display_width (const char *data, int data_length)
+{
+    return cpp_byte_column_to_display_column (data, data_length, data_length);
+}
+int cpp_display_column_to_byte_column (const char *data, int data_length,
+				       int display_col);
+int cpp_wcwidth (cppchar_t c);
+
 #endif /* ! LIBCPP_CPPLIB_H */