Byte vs column awareness for diagnostic-show-locus.c (PR 49973)

contrib/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * unicode/from_glibc/unicode_utils.py: Support script from glibc (commit 464cd3) to extract character widths from Unicode data files. * unicode/from_glibc/utf8_gen.py: Likewise. * unicode/UnicodeData.txt: Unicode v. 12.1.0 data file. * unicode/EastAsianWidth.txt: Likewise. * unicode/PropList.txt: Likewise. * unicode/gen_wcwidth.py: New utility to generate libcpp/generated_cpp_wcwidth.h with help from the glibc support scripts and the Unicode data files. * unicode/unicode-license.txt: Added. * unicode/README: New explanatory file. libcpp/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * generated_cpp_wcwidth.h: New file generated by ../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function. * charset.c (compute_next_display_width): New function to help implement display columns. (cpp_byte_column_to_display_column): Likewise. (cpp_display_column_to_byte_column): Likewise. (cpp_wcwidth): Likewise. * include/cpplib.h (cpp_byte_column_to_display_column): Declare. (cpp_display_column_to_byte_column): Declare. (cpp_wcwidth): Declare. (cpp_display_width): New function. gcc/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * input.c (location_compute_display_column): New function to help with multibyte awareness in diagnostics. (test_cpp_utf8): New self-test. (input_c_tests): Call the new test. * input.h (location_compute_display_column): Declare. * diagnostic-show-locus.c: Pervasive changes to add multibyte awareness to all classes and functions. (enum column_unit): New enum. (class exploc_with_display_col): New class. (class layout_point): Convert m_column member to array m_columns[2]. (layout_range::contains_point): Add col_unit argument. (test_layout_range_for_single_point): Pass new argument. (test_layout_range_for_single_line): Likewise. (test_layout_range_for_multiple_lines): Likewise. (line_bounds::convert_to_display_cols): New function. (layout::get_state_at_point): Add col_unit argument. (make_range): Use empty filename rather than dummy filename. (get_line_width_without_trailing_whitespace): Rename to... (get_line_bytes_without_trailing_whitespace): ...this. (test_get_line_width_without_trailing_whitespace): Rename to... (test_get_line_bytes_without_trailing_whitespace): ...this. (class layout): m_exploc changed to exploc_with_display_col from plain expanded_location. (layout::get_linenum_width): New accessor member function. (layout::get_x_offset_display): Likewise. (layout::calculate_linenum_width): New subroutine for the constuctor. (layout::calculate_x_offset_display): Likewise. (layout::layout): Use the new subroutines. Add multibyte awareness. (layout::print_source_line): Add multibyte awareness. (layout::print_line): Likewise. (layout::print_annotation_line): Likewise. (line_label::line_label): Likewise. (layout::print_any_labels): Likewise. (layout::annotation_line_showed_range_p): Likewise. (get_printed_columns): Likewise. (class line_label): Rename m_length to m_display_width. (get_affected_columns): Rename to... (get_affected_range): ...this; add col_unit argument and multibyte awareness. (class correction): Add m_affected_bytes and m_display_cols members. Rename m_len to m_byte_length for clarity. Add multibyte awareness throughout. (correction::insertion_p): Add multibyte awareness. (correction::compute_display_cols): New function. (correction::ensure_terminated): Use new member name m_byte_length. (line_corrections::add_hint): Add multibyte awareness. (layout::print_trailing_fixits): Likewise. (layout::get_x_bound_for_row): Likewise. (test_one_liner_simple_caret_utf8): New self-test analogous to the one with _utf8 suffix removed, testing multibyte awareness. (test_one_liner_caret_and_range_utf8): Likewise. (test_one_liner_multiple_carets_and_ranges_utf8): Likewise. (test_one_liner_fixit_insert_before_utf8): Likewise. (test_one_liner_fixit_insert_after_utf8): Likewise. (test_one_liner_fixit_remove_utf8): Likewise. (test_one_liner_fixit_replace_utf8): Likewise. (test_one_liner_fixit_replace_non_equal_range_utf8): Likewise. (test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise. (test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise. (test_one_liner_many_fixits_1_utf8): Likewise. (test_one_liner_many_fixits_2_utf8): Likewise. (test_one_liner_labels_utf8): Likewise. (test_diagnostic_show_locus_one_liner_utf8): Likewise. (test_overlapped_fixit_printing_utf8): Likewise. (test_overlapped_fixit_printing): Adapt for changes to get_affected_columns, get_printed_columns and class corrections. (test_overlapped_fixit_printing_2): Likewise. (test_linenum_sep): New constant. (test_left_margin): Likewise. (test_offset_impl): Helper function for new test. (test_layout_x_offset_display_utf8): New test. (diagnostic_show_locus_c_tests): Call new tests. gcc/testsuite/ChangeLog: 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * gcc.dg/plugin/diagnostic_plugin_test_show_locus.c (test_show_locus): Tweak so that expected output is the same as before the diagnostic-show-locus.c changes. * gcc.dg/cpp/pr66415-1.c: Likewise. From-SVN: r279137

Byte vs column awareness for diagnostic-show-locus.c (PR 49973)
contrib/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * unicode/from_glibc/unicode_utils.py: Support script from glibc (commit 464cd3) to extract character widths from Unicode data files. * unicode/from_glibc/utf8_gen.py: Likewise. * unicode/UnicodeData.txt: Unicode v. 12.1.0 data file. * unicode/EastAsianWidth.txt: Likewise. * unicode/PropList.txt: Likewise. * unicode/gen_wcwidth.py: New utility to generate libcpp/generated_cpp_wcwidth.h with help from the glibc support scripts and the Unicode data files. * unicode/unicode-license.txt: Added. * unicode/README: New explanatory file. libcpp/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * generated_cpp_wcwidth.h: New file generated by ../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function. * charset.c (compute_next_display_width): New function to help implement display columns. (cpp_byte_column_to_display_column): Likewise. (cpp_display_column_to_byte_column): Likewise. (cpp_wcwidth): Likewise. * include/cpplib.h (cpp_byte_column_to_display_column): Declare. (cpp_display_column_to_byte_column): Declare. (cpp_wcwidth): Declare. (cpp_display_width): New function. gcc/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * input.c (location_compute_display_column): New function to help with multibyte awareness in diagnostics. (test_cpp_utf8): New self-test. (input_c_tests): Call the new test. * input.h (location_compute_display_column): Declare. * diagnostic-show-locus.c: Pervasive changes to add multibyte awareness to all classes and functions. (enum column_unit): New enum. (class exploc_with_display_col): New class. (class layout_point): Convert m_column member to array m_columns[2]. (layout_range::contains_point): Add col_unit argument. (test_layout_range_for_single_point): Pass new argument. (test_layout_range_for_single_line): Likewise. (test_layout_range_for_multiple_lines): Likewise. (line_bounds::convert_to_display_cols): New function. (layout::get_state_at_point): Add col_unit argument. (make_range): Use empty filename rather than dummy filename. (get_line_width_without_trailing_whitespace): Rename to... (get_line_bytes_without_trailing_whitespace): ...this. (test_get_line_width_without_trailing_whitespace): Rename to... (test_get_line_bytes_without_trailing_whitespace): ...this. (class layout): m_exploc changed to exploc_with_display_col from plain expanded_location. (layout::get_linenum_width): New accessor member function. (layout::get_x_offset_display): Likewise. (layout::calculate_linenum_width): New subroutine for the constuctor. (layout::calculate_x_offset_display): Likewise. (layout::layout): Use the new subroutines. Add multibyte awareness. (layout::print_source_line): Add multibyte awareness. (layout::print_line): Likewise. (layout::print_annotation_line): Likewise. (line_label::line_label): Likewise. (layout::print_any_labels): Likewise. (layout::annotation_line_showed_range_p): Likewise. (get_printed_columns): Likewise. (class line_label): Rename m_length to m_display_width. (get_affected_columns): Rename to... (get_affected_range): ...this; add col_unit argument and multibyte awareness. (class correction): Add m_affected_bytes and m_display_cols members. Rename m_len to m_byte_length for clarity. Add multibyte awareness throughout. (correction::insertion_p): Add multibyte awareness. (correction::compute_display_cols): New function. (correction::ensure_terminated): Use new member name m_byte_length. (line_corrections::add_hint): Add multibyte awareness. (layout::print_trailing_fixits): Likewise. (layout::get_x_bound_for_row): Likewise. (test_one_liner_simple_caret_utf8): New self-test analogous to the one with _utf8 suffix removed, testing multibyte awareness. (test_one_liner_caret_and_range_utf8): Likewise. (test_one_liner_multiple_carets_and_ranges_utf8): Likewise. (test_one_liner_fixit_insert_before_utf8): Likewise. (test_one_liner_fixit_insert_after_utf8): Likewise. (test_one_liner_fixit_remove_utf8): Likewise. (test_one_liner_fixit_replace_utf8): Likewise. (test_one_liner_fixit_replace_non_equal_range_utf8): Likewise. (test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise. (test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise. (test_one_liner_many_fixits_1_utf8): Likewise. (test_one_liner_many_fixits_2_utf8): Likewise. (test_one_liner_labels_utf8): Likewise. (test_diagnostic_show_locus_one_liner_utf8): Likewise. (test_overlapped_fixit_printing_utf8): Likewise. (test_overlapped_fixit_printing): Adapt for changes to get_affected_columns, get_printed_columns and class corrections. (test_overlapped_fixit_printing_2): Likewise. (test_linenum_sep): New constant. (test_left_margin): Likewise. (test_offset_impl): Helper function for new test. (test_layout_x_offset_display_utf8): New test. (diagnostic_show_locus_c_tests): Call new tests. gcc/testsuite/ChangeLog: 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * gcc.dg/plugin/diagnostic_plugin_test_show_locus.c (test_show_locus): Tweak so that expected output is the same as before the diagnostic-show-locus.c changes. * gcc.dg/cpp/pr66415-1.c: Likewise. From-SVN: r279137
ee925640 · Lewis Hyatt · David Malcolm · 763c9f4a · ee925640 · ee925640
Commit ee925640 authored Dec 09, 2019 by Lewis Hyatt Committed by David Malcolm Dec 09, 2019
20 changed files
--- a/contrib/ChangeLog
+++ b/contrib/ChangeLog
+2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>
+	PR preprocessor/49973
+	* unicode/from_glibc/unicode_utils.py: Support script from
+	glibc (commit 464cd3) to extract character widths from Unicode data
+	files.
+	* unicode/from_glibc/utf8_gen.py: Likewise.
+	* unicode/UnicodeData.txt: Unicode v. 12.1.0 data file.
+	* unicode/EastAsianWidth.txt: Likewise.
+	* unicode/PropList.txt: Likewise.
+	* unicode/gen_wcwidth.py: New utility to generate
+	libcpp/generated_cpp_wcwidth.h with help from the glibc support
+	scripts and the Unicode data files.
+	* unicode/unicode-license.txt: Added.
+	* unicode/README: New explanatory file.
 2019-12-07  Richard Sandiford  <richard.sandiford@arm.com>
 	* texi2pod.pl: Handle @headitems in @multitables, printing them

--- a/contrib/unicode/EastAsianWidth.txt
+++ b/contrib/unicode/EastAsianWidth.txt
--- a/contrib/unicode/PropList.txt
+++ b/contrib/unicode/PropList.txt
--- a/contrib/unicode/README
+++ b/contrib/unicode/README
+This directory contains a mechanism for GCC to have its own internal
+implementation of wcwidth functionality.  (cpp_wcwidth () in libcpp/charset.c).
+The idea is to produce the necessary lookup table
+(../../libcpp/generated_cpp_wcwidth.h) in a reproducible way, starting from the
+following files that are distributed by the Unicode Consortium:
+ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
+ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
+ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
+These three files have been added to source control in this directory;
+please see unicode-license.txt for the relevant copyright information.
+In order to keep in sync with glibc's wcwidth as much as possible, it is
+desirable for the logic that processes the Unicode data to be the same as
+glibc's.  To that end, we also put in this directory, in the from_glibc/
+directory, the glibc python code that implements their logic.  This code was
+copied verbatim from glibc, and it can be updated at any time from the glibc
+source code repository.  The files copied from that respository are:
+localedata/unicode-gen/unicode_utils.py
+localedata/unicode-gen/utf8_gen.py
+And the most recent versions added to GCC are from glibc git commit:
+2a764c6ee848dfe92cb2921ed3b14085f15d9e79
+Finally, the script gen_wcwidth.py found here contains the GCC-specific code to
+map glibc's output to the lookup tables we require.  This script should not need
+to change, unless there are structural changes to the Unicode data files or to
+the glibc code.
+The procedure to update GCC's wcwidth tables is the following:
+1.  Update the three Unicode data files from the above URLs.
+2.  Update the two glibc files in from_glibc/ from glibc's git.  Update
+    the commit number above in this README.
+3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
+    (where X.Y is the version of the Unicode standard corresponding to the
+    Unicode data files being used, most recently, 12.1).
+After that, GCC's wcwidth will match the most recent glibc.
--- a/contrib/unicode/UnicodeData.txt
+++ b/contrib/unicode/UnicodeData.txt
--- a/contrib/unicode/from_glibc/unicode_utils.py
+++ b/contrib/unicode/from_glibc/unicode_utils.py
+# Utilities to generate Unicode data for glibc from upstream Unicode data.
+#
+# Copyright (C) 2014-2019 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+'''
+This module contains utilities used by the scripts to generate
+Unicode data for glibc from upstream Unicode data files.
+'''
+import sys
+import re
+# Common locale header.
+COMMENT_HEADER = """
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file.  The foregoing does not
+% affect the license of the GNU C Library as a whole.  It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+"""
+# Dictionary holding the entire contents of the UnicodeData.txt file
+#
+# Contents of this dictionary look like this:
+#
+# {0: {'category': 'Cc',
+#      'title': None,
+#      'digit': '',
+#      'name': '<control>',
+#      'bidi': 'BN',
+#      'combining': '0',
+#      'comment': '',
+#      'oldname': 'NULL',
+#      'decomposition': '',
+#      'upper': None,
+#      'mirrored': 'N',
+#      'lower': None,
+#      'decdigit': '',
+#      'numeric': ''},
+#      …
+# }
+UNICODE_ATTRIBUTES = {}
+# Dictionary holding the entire contents of the DerivedCoreProperties.txt file
+#
+# Contents of this dictionary look like this:
+#
+# {917504: ['Default_Ignorable_Code_Point'],
+#  917505: ['Case_Ignorable', 'Default_Ignorable_Code_Point'],
+#  …
+# }
+DERIVED_CORE_PROPERTIES = {}
+# Dictionary holding the entire contents of the EastAsianWidths.txt file
+#
+# Contents of this dictionary look like this:
+#
+# {0: 'N', … , 45430: 'W', …}
+EAST_ASIAN_WIDTHS = {}
+def fill_attribute(code_point, fields):
+    '''Stores in UNICODE_ATTRIBUTES[code_point] the values from the fields.
+    One entry in the UNICODE_ATTRIBUTES dictionary represents one line
+    in the UnicodeData.txt file.
+    '''
+    UNICODE_ATTRIBUTES[code_point] =  {
+        'name': fields[1],          # Character name
+        'category': fields[2],      # General category
+        'combining': fields[3],     # Canonical combining classes
+        'bidi': fields[4],          # Bidirectional category
+        'decomposition': fields[5], # Character decomposition mapping
+        'decdigit': fields[6],      # Decimal digit value
+        'digit': fields[7],         # Digit value
+        'numeric': fields[8],       # Numeric value
+        'mirrored': fields[9],      # mirrored
+        'oldname': fields[10],      # Old Unicode 1.0 name
+        'comment': fields[11],      # comment
+        # Uppercase mapping
+        'upper': int(fields[12], 16) if fields[12] else None,
+        # Lowercase mapping
+        'lower': int(fields[13], 16) if fields[13] else None,
+        # Titlecase mapping
+        'title': int(fields[14], 16) if fields[14] else None,
+    }
+def fill_attributes(filename):
+    '''Stores the entire contents of the UnicodeData.txt file
+    in the UNICODE_ATTRIBUTES dictionary.
+    A typical line for a single code point in UnicodeData.txt looks
+    like this:
+    0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;
+    Code point ranges are indicated by pairs of lines like this:
+    4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
+    9FCC;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
+    '''
+    with open(filename, mode='r') as unicode_data_file:
+        fields_start = []
+        for line in unicode_data_file:
+            fields = line.strip().split(';')
+            if len(fields) != 15:
+                sys.stderr.write(
+                    'short line in file "%(f)s": %(l)s\n' %{
+                    'f': filename, 'l': line})
+                exit(1)
+            if fields[2] == 'Cs':
+                # Surrogates are UTF-16 artefacts,
+                # not real characters. Ignore them.
+                fields_start = []
+                continue
+            if fields[1].endswith(', First>'):
+                fields_start = fields
+                fields_start[1] = fields_start[1].split(',')[0][1:]
+                continue
+            if fields[1].endswith(', Last>'):
+                fields[1] = fields[1].split(',')[0][1:]
+                if fields[1:] != fields_start[1:]:
+                    sys.stderr.write(
+                        'broken code point range in file "%(f)s": %(l)s\n' %{
+                            'f': filename, 'l': line})
+                    exit(1)
+                for code_point in range(
+                        int(fields_start[0], 16),
+                        int(fields[0], 16)+1):
+                    fill_attribute(code_point, fields)
+                fields_start = []
+                continue
+            fill_attribute(int(fields[0], 16), fields)
+            fields_start = []
+def fill_derived_core_properties(filename):
+    '''Stores the entire contents of the DerivedCoreProperties.txt file
+    in the DERIVED_CORE_PROPERTIES dictionary.
+    Lines in DerivedCoreProperties.txt are either a code point range like
+    this:
+    0061..007A    ; Lowercase # L&  [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z
+    or a single code point like this:
+    00AA          ; Lowercase # Lo       FEMININE ORDINAL INDICATOR
+    '''
+    with open(filename, mode='r') as derived_core_properties_file:
+        for line in derived_core_properties_file:
+            match = re.match(
+                r'^(?P<codepoint1>[0-9A-F]{4,6})'
+                + r'(?:\.\.(?P<codepoint2>[0-9A-F]{4,6}))?'
+                + r'\s*;\s*(?P<property>[a-zA-Z_]+)',
+                line)
+            if not match:
+                continue
+            start = match.group('codepoint1')
+            end = match.group('codepoint2')
+            if not end:
+                end = start
+            for code_point in range(int(start, 16), int(end, 16)+1):
+                prop = match.group('property')
+                if code_point in DERIVED_CORE_PROPERTIES:
+                    DERIVED_CORE_PROPERTIES[code_point].append(prop)
+                else:
+                    DERIVED_CORE_PROPERTIES[code_point] = [prop]
+def fill_east_asian_widths(filename):
+    '''Stores the entire contents of the EastAsianWidths.txt file
+    in the EAST_ASIAN_WIDTHS dictionary.
+    Lines in EastAsianWidths.txt are either a code point range like
+    this:
+    9FCD..9FFF;W     # Cn    [51] <reserved-9FCD>..<reserved-9FFF>
+    or a single code point like this:
+    A015;W           # Lm         YI SYLLABLE WU
+    '''
+    with open(filename, mode='r') as east_asian_widths_file:
+        for line in east_asian_widths_file:
+            match = re.match(
+                r'^(?P<codepoint1>[0-9A-F]{4,6})'
+                +r'(?:\.\.(?P<codepoint2>[0-9A-F]{4,6}))?'
+                +r'\s*;\s*(?P<property>[a-zA-Z]+)',
+                line)
+            if not match:
+                continue
+            start = match.group('codepoint1')
+            end = match.group('codepoint2')
+            if not end:
+                end = start
+            for code_point in range(int(start, 16), int(end, 16)+1):
+                EAST_ASIAN_WIDTHS[code_point] = match.group('property')
+def to_upper(code_point):
+    '''Returns the code point of the uppercase version
+    of the given code point'''
+    if (UNICODE_ATTRIBUTES[code_point]['name']
+        and UNICODE_ATTRIBUTES[code_point]['upper']):
+        return UNICODE_ATTRIBUTES[code_point]['upper']
+    else:
+        return code_point
+def to_lower(code_point):
+    '''Returns the code point of the lowercase version
+    of the given code point'''
+    if (UNICODE_ATTRIBUTES[code_point]['name']
+        and UNICODE_ATTRIBUTES[code_point]['lower']):
+        return UNICODE_ATTRIBUTES[code_point]['lower']
+    else:
+        return code_point
+def to_upper_turkish(code_point):
+    '''Returns the code point of the Turkish uppercase version
+    of the given code point'''
+    if code_point == 0x0069:
+        return 0x0130
+    return to_upper(code_point)
+def to_lower_turkish(code_point):
+    '''Returns the code point of the Turkish lowercase version
+    of the given code point'''
+    if code_point == 0x0049:
+        return 0x0131
+    return to_lower(code_point)
+def to_title(code_point):
+    '''Returns the code point of the titlecase version
+    of the given code point'''
+    if (UNICODE_ATTRIBUTES[code_point]['name']
+        and UNICODE_ATTRIBUTES[code_point]['title']):
+        return UNICODE_ATTRIBUTES[code_point]['title']
+    else:
+        return code_point
+def is_upper(code_point):
+    '''Checks whether the character with this code point is uppercase'''
+    return (to_lower(code_point) != code_point
+            or (code_point in DERIVED_CORE_PROPERTIES
+                and 'Uppercase' in DERIVED_CORE_PROPERTIES[code_point]))
+def is_lower(code_point):
+    '''Checks whether the character with this code point is lowercase'''
+    # Some characters are defined as “Lowercase” in
+    # DerivedCoreProperties.txt but do not have a mapping to upper
+    # case. For example, ꜰ U+A72F “LATIN LETTER SMALL CAPITAL F” is
+    # one of these.
+    return (to_upper(code_point) != code_point
+            # <U00DF> is lowercase, but without simple to_upper mapping.
+            or code_point == 0x00DF
+            or (code_point in DERIVED_CORE_PROPERTIES
+                and 'Lowercase' in DERIVED_CORE_PROPERTIES[code_point]))
+def is_alpha(code_point):
+    '''Checks whether the character with this code point is alphabetic'''
+    return ((code_point in DERIVED_CORE_PROPERTIES
+             and
+             'Alphabetic' in DERIVED_CORE_PROPERTIES[code_point])
+            or
+            # Consider all the non-ASCII digits as alphabetic.
+            # ISO C 99 forbids us to have them in category “digit”,
+            # but we want iswalnum to return true on them.
+            (UNICODE_ATTRIBUTES[code_point]['category'] == 'Nd'
+             and not (code_point >= 0x0030 and code_point <= 0x0039)))
+def is_digit(code_point):
+    '''Checks whether the character with this code point is a digit'''
+    if False:
+        return (UNICODE_ATTRIBUTES[code_point]['name']
+                and UNICODE_ATTRIBUTES[code_point]['category'] == 'Nd')
+        # Note: U+0BE7..U+0BEF and U+1369..U+1371 are digit systems without
+        # a zero.  Must add <0> in front of them by hand.
+    else:
+        # SUSV2 gives us some freedom for the "digit" category, but ISO C 99
+        # takes it away:
+        # 7.25.2.1.5:
+        #    The iswdigit function tests for any wide character that
+        #    corresponds to a decimal-digit character (as defined in 5.2.1).
+        # 5.2.1:
+        #    the 10 decimal digits 0 1 2 3 4 5 6 7 8 9
+        return (code_point >= 0x0030 and code_point <= 0x0039)
+def is_outdigit(code_point):
+    '''Checks whether the character with this code point is outdigit'''
+    return (code_point >= 0x0030 and code_point <= 0x0039)
+def is_blank(code_point):
+    '''Checks whether the character with this code point is blank'''
+    return (code_point == 0x0009 # '\t'
+            # Category Zs without mention of '<noBreak>'
+            or (UNICODE_ATTRIBUTES[code_point]['name']
+                and UNICODE_ATTRIBUTES[code_point]['category'] == 'Zs'
+                and '<noBreak>' not in
+                UNICODE_ATTRIBUTES[code_point]['decomposition']))
+def is_space(code_point):
+    '''Checks whether the character with this code point is a space'''
+    # Don’t make U+00A0 a space. Non-breaking space means that all programs
+    # should treat it like a punctuation character, not like a space.
+    return (code_point == 0x0020 # ' '
+            or code_point == 0x000C # '\f'
+            or code_point == 0x000A # '\n'
+            or code_point == 0x000D # '\r'
+            or code_point == 0x0009 # '\t'
+            or code_point == 0x000B # '\v'
+            # Categories Zl, Zp, and Zs without mention of "<noBreak>"
+            or (UNICODE_ATTRIBUTES[code_point]['name']
+                and
+                (UNICODE_ATTRIBUTES[code_point]['category'] in ['Zl', 'Zp']
+                 or
+                 (UNICODE_ATTRIBUTES[code_point]['category'] in ['Zs']
+                  and
+                  '<noBreak>' not in
+                  UNICODE_ATTRIBUTES[code_point]['decomposition']))))
+def is_cntrl(code_point):
+    '''Checks whether the character with this code point is
+    a control character'''
+    return (UNICODE_ATTRIBUTES[code_point]['name']
+            and (UNICODE_ATTRIBUTES[code_point]['name'] == '<control>'
+                 or
+                 UNICODE_ATTRIBUTES[code_point]['category'] in ['Zl', 'Zp']))
+def is_xdigit(code_point):
+    '''Checks whether the character with this code point is
+    a hexadecimal digit'''
+    if False:
+        return (is_digit(code_point)
+                or (code_point >= 0x0041 and code_point <= 0x0046)
+                or (code_point >= 0x0061 and code_point <= 0x0066))
+    else:
+        # SUSV2 gives us some freedom for the "xdigit" category, but ISO C 99
+        # takes it away:
+        # 7.25.2.1.12:
+        #    The iswxdigit function tests for any wide character that
+        #    corresponds to a hexadecimal-digit character (as defined
+        #    in 6.4.4.1).
+        # 6.4.4.1:
+        #    hexadecimal-digit: one of
+        #    0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
+        return ((code_point >= 0x0030 and code_point  <= 0x0039)
+                or (code_point >= 0x0041 and code_point <= 0x0046)
+                or (code_point >= 0x0061 and code_point <= 0x0066))
+def is_graph(code_point):
+    '''Checks whether the character with this code point is
+    a graphical character'''
+    return (UNICODE_ATTRIBUTES[code_point]['name']
+            and UNICODE_ATTRIBUTES[code_point]['name'] != '<control>'
+            and not is_space(code_point))
+def is_print(code_point):
+    '''Checks whether the character with this code point is printable'''
+    return (UNICODE_ATTRIBUTES[code_point]['name']
+            and UNICODE_ATTRIBUTES[code_point]['name'] != '<control>'
+            and UNICODE_ATTRIBUTES[code_point]['category'] not in ['Zl', 'Zp'])
+def is_punct(code_point):
+    '''Checks whether the character with this code point is punctuation'''
+    if False:
+        return (UNICODE_ATTRIBUTES[code_point]['name']
+                and UNICODE_ATTRIBUTES[code_point]['category'].startswith('P'))
+    else:
+        # The traditional POSIX definition of punctuation is every graphic,
+        # non-alphanumeric character.
+        return (is_graph(code_point)
+                and not is_alpha(code_point)
+                and not is_digit(code_point))
+def is_combining(code_point):
+    '''Checks whether the character with this code point is
+    a combining character'''
+    # Up to Unicode 3.0.1 we took the Combining property from the PropList.txt
+    # file. In 3.0.1 it was identical to the union of the general categories
+    # "Mn", "Mc", "Me". In Unicode 3.1 this property has been dropped from the
+    # PropList.txt file, so we take the latter definition.
+    return (UNICODE_ATTRIBUTES[code_point]['name']
+            and
+            UNICODE_ATTRIBUTES[code_point]['category'] in ['Mn', 'Mc', 'Me'])
+def is_combining_level3(code_point):
+    '''Checks whether the character with this code point is
+    a combining level3 character'''
+    return (is_combining(code_point)
+            and
+            int(UNICODE_ATTRIBUTES[code_point]['combining']) in range(0, 200))
+def ucs_symbol(code_point):
+    '''Return the UCS symbol string for a Unicode character.'''
+    if code_point < 0x10000:
+        return '<U{:04X}>'.format(code_point)
+    else:
+        return '<U{:08X}>'.format(code_point)
+def ucs_symbol_range(code_point_low, code_point_high):
+    '''Returns a string UCS symbol string for a code point range.
+    Example:
+    <U0041>..<U005A>
+    '''
+    return ucs_symbol(code_point_low) + '..' + ucs_symbol(code_point_high)
+def verifications():
+    '''Tests whether the is_* functions observe the known restrictions'''
+    for code_point in sorted(UNICODE_ATTRIBUTES):
+        # toupper restriction: "Only characters specified for the keywords
+        # lower and upper shall be specified.
+        if (to_upper(code_point) != code_point
+            and not (is_lower(code_point) or is_upper(code_point))):
+            sys.stderr.write(
+                ('%(sym)s is not upper|lower '
+                 + 'but toupper(0x%(c)04X) = 0x%(uc)04X\n') %{
+                    'sym': ucs_symbol(code_point),
+                    'c': code_point,
+                    'uc': to_upper(code_point)})
+        # tolower restriction: "Only characters specified for the keywords
+        # lower and upper shall be specified.
+        if (to_lower(code_point) != code_point
+            and not (is_lower(code_point) or is_upper(code_point))):
+            sys.stderr.write(
+                ('%(sym)s is not upper|lower '
+                 + 'but tolower(0x%(c)04X) = 0x%(uc)04X\n') %{
+                    'sym': ucs_symbol(code_point),
+                    'c': code_point,
+                    'uc': to_lower(code_point)})
+        # alpha restriction: "Characters classified as either upper or lower
+        # shall automatically belong to this class.
+        if ((is_lower(code_point) or is_upper(code_point))
+             and not is_alpha(code_point)):
+            sys.stderr.write('%(sym)s is upper|lower but not alpha\n' %{
+                'sym': ucs_symbol(code_point)})
+        # alpha restriction: “No character specified for the keywords cntrl,
+        # digit, punct or space shall be specified.”
+        if (is_alpha(code_point) and is_cntrl(code_point)):
+            sys.stderr.write('%(sym)s is alpha and cntrl\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_alpha(code_point) and is_digit(code_point)):
+            sys.stderr.write('%(sym)s is alpha and digit\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_alpha(code_point) and is_punct(code_point)):
+            sys.stderr.write('%(sym)s is alpha and punct\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_alpha(code_point) and is_space(code_point)):
+            sys.stderr.write('%(sym)s is alpha and space\n' %{
+                'sym': ucs_symbol(code_point)})
+        # space restriction: “No character specified for the keywords upper,
+        # lower, alpha, digit, graph or xdigit shall be specified.”
+        # upper, lower, alpha already checked above.
+        if (is_space(code_point) and is_digit(code_point)):
+            sys.stderr.write('%(sym)s is space and digit\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_space(code_point) and is_graph(code_point)):
+            sys.stderr.write('%(sym)s is space and graph\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_space(code_point) and is_xdigit(code_point)):
+            sys.stderr.write('%(sym)s is space and xdigit\n' %{
+                'sym': ucs_symbol(code_point)})
+        # cntrl restriction: “No character specified for the keywords upper,
+        # lower, alpha, digit, punct, graph, print or xdigit shall be
+        # specified.”  upper, lower, alpha already checked above.
+        if (is_cntrl(code_point) and is_digit(code_point)):
+            sys.stderr.write('%(sym)s is cntrl and digit\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_cntrl(code_point) and is_punct(code_point)):
+            sys.stderr.write('%(sym)s is cntrl and punct\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_cntrl(code_point) and is_graph(code_point)):
+            sys.stderr.write('%(sym)s is cntrl and graph\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_cntrl(code_point) and is_print(code_point)):
+            sys.stderr.write('%(sym)s is cntrl and print\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_cntrl(code_point) and is_xdigit(code_point)):
+            sys.stderr.write('%(sym)s is cntrl and xdigit\n' %{
+                'sym': ucs_symbol(code_point)})
+        # punct restriction: “No character specified for the keywords upper,
+        # lower, alpha, digit, cntrl, xdigit or as the <space> character shall
+        # be specified.”  upper, lower, alpha, cntrl already checked above.
+        if (is_punct(code_point) and is_digit(code_point)):
+            sys.stderr.write('%(sym)s is punct and digit\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_punct(code_point) and is_xdigit(code_point)):
+            sys.stderr.write('%(sym)s is punct and xdigit\n' %{
+                'sym': ucs_symbol(code_point)})
+        if (is_punct(code_point) and code_point == 0x0020):
+            sys.stderr.write('%(sym)s is punct\n' %{
+                'sym': ucs_symbol(code_point)})
+        # graph restriction: “No character specified for the keyword cntrl
+        # shall be specified.”  Already checked above.
+        # print restriction: “No character specified for the keyword cntrl
+        # shall be specified.”  Already checked above.
+        # graph - print relation: differ only in the <space> character.
+        # How is this possible if there are more than one space character?!
+        # I think susv2/xbd/locale.html should speak of “space characters”,
+        # not “space character”.
+        if (is_print(code_point)
+            and not (is_graph(code_point) or is_space(code_point))):
+            sys.stderr.write('%(sym)s is print but not graph|<space>\n' %{
+                'sym': unicode_utils.ucs_symbol(code_point)})
+        if (not is_print(code_point)
+            and (is_graph(code_point) or code_point == 0x0020)):
+            sys.stderr.write('%(sym)s is graph|<space> but not print\n' %{
+                'sym': unicode_utils.ucs_symbol(code_point)})
--- a/contrib/unicode/from_glibc/utf8_gen.py
+++ b/contrib/unicode/from_glibc/utf8_gen.py
+#!/usr/bin/python3
+# -*- coding: utf-8 -*-
+# Copyright (C) 2014-2019 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+'''glibc/localedata/charmaps/UTF-8 file generator script
+This script generates a glibc/localedata/charmaps/UTF-8 file
+from Unicode data.
+Usage: python3 utf8_gen.py UnicodeData.txt EastAsianWidth.txt
+It will output UTF-8 file
+'''
+import argparse
+import sys
+import re
+import unicode_utils
+# Auxiliary tables for Hangul syllable names, see the Unicode 3.0 book,
+# sections 3.11 and 4.4.
+JAMO_INITIAL_SHORT_NAME = (
+    'G', 'GG', 'N', 'D', 'DD', 'R', 'M', 'B', 'BB', 'S', 'SS', '', 'J', 'JJ',
+    'C', 'K', 'T', 'P', 'H'
+)
+JAMO_MEDIAL_SHORT_NAME = (
+    'A', 'AE', 'YA', 'YAE', 'EO', 'E', 'YEO', 'YE', 'O', 'WA', 'WAE', 'OE',
+    'YO', 'U', 'WEO', 'WE', 'WI', 'YU', 'EU', 'YI', 'I'
+)
+JAMO_FINAL_SHORT_NAME = (
+    '', 'G', 'GG', 'GS', 'N', 'NI', 'NH', 'D', 'L', 'LG', 'LM', 'LB', 'LS',
+    'LT', 'LP', 'LH', 'M', 'B', 'BS', 'S', 'SS', 'NG', 'J', 'C', 'K', 'T',
+    'P', 'H'
+)
+def process_range(start, end, outfile, name):
+    '''Writes a range of code points into the CHARMAP section of the
+    output file
+    '''
+    if 'Hangul Syllable' in name:
+        # from glibc/localedata/ChangeLog:
+        #
+        #  2000-09-24  Bruno Haible  <haible@clisp.cons.org>
+        #  * charmaps/UTF-8: Expand <Hangul Syllable> and <Private Use> ranges,
+        #  so they become printable and carry a width. Comment out surrogate
+        #  ranges. Add a WIDTH table
+        #
+        # So we expand the Hangul Syllables here:
+        for i in range(int(start, 16), int(end, 16)+1 ):
+            index2, index3 = divmod(i - 0xaC00, 28)
+            index1, index2 = divmod(index2, 21)
+            hangul_syllable_name = 'HANGUL SYLLABLE ' \
+                                   + JAMO_INITIAL_SHORT_NAME[index1] \
+                                   + JAMO_MEDIAL_SHORT_NAME[index2] \
+                                   + JAMO_FINAL_SHORT_NAME[index3]
+            outfile.write('{:<11s} {:<12s} {:s}\n'.format(
+                unicode_utils.ucs_symbol(i), convert_to_hex(i),
+                hangul_syllable_name))
+        return
+    # UnicodeData.txt file has contains code point ranges like this:
+    #
+    # 3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
+    # 4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
+    #
+    # The glibc UTF-8 file splits ranges like these into shorter
+    # ranges of 64 code points each:
+    #
+    # <U3400>..<U343F>     /xe3/x90/x80         <CJK Ideograph Extension A>
+    # …
+    # <U4D80>..<U4DB5>     /xe4/xb6/x80         <CJK Ideograph Extension A>
+    for i in range(int(start, 16), int(end, 16), 64 ):
+        if i > (int(end, 16)-64):
+            outfile.write('{:s}..{:s} {:<12s} {:s}\n'.format(
+                    unicode_utils.ucs_symbol(i),
+                    unicode_utils.ucs_symbol(int(end,16)),
+                    convert_to_hex(i),
+                    name))
+            break
+        outfile.write('{:s}..{:s} {:<12s} {:s}\n'.format(
+                unicode_utils.ucs_symbol(i),
+                unicode_utils.ucs_symbol(i+63),
+                convert_to_hex(i),
+                name))
+def process_charmap(flines, outfile):
+    '''This function takes an array which contains *all* lines of
+    of UnicodeData.txt and write lines to outfile as used in the
+    CHARMAP
+    …
+    END CHARMAP
+    section of the UTF-8 file in glibc/localedata/charmaps/UTF-8.
+    Samples for input lines:
+    0010;<control>;Cc;0;BN;;;;;N;DATA LINK ESCAPE;;;;
+    3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
+    4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
+    D800;<Non Private Use High Surrogate, First>;Cs;0;L;;;;;N;;;;;
+    DB7F;<Non Private Use High Surrogate, Last>;Cs;0;L;;;;;N;;;;;
+    100000;<Plane 16 Private Use, First>;Co;0;L;;;;;N;;;;;
+    10FFFD;<Plane 16 Private Use, Last>;Co;0;L;;;;;N;;;;;
+    Samples for output lines (Unicode-Value UTF-8-HEX Unicode-Char-Name):
+    <U0010>     /x10 DATA LINK ESCAPE
+    <U3400>..<U343F>     /xe3/x90/x80 <CJK Ideograph Extension A>
+    %<UD800>     /xed/xa0/x80 <Non Private Use High Surrogate, First>
+    %<UDB7F>     /xed/xad/xbf <Non Private Use High Surrogate, Last>
+    <U0010FFC0>..<U0010FFFD>     /xf4/x8f/xbf/x80 <Plane 16 Private Use>
+    '''
+    fields_start = []
+    for line in flines:
+        fields = line.split(";")
+         # Some characters have “<control>” as their name. We try to
+         # use the “Unicode 1.0 Name” (10th field in
+         # UnicodeData.txt) for them.
+         #
+         # The Characters U+0080, U+0081, U+0084 and U+0099 have
+         # “<control>” as their name but do not even have aa
+         # ”Unicode 1.0 Name”. We could write code to take their
+         # alternate names from NameAliases.txt.
+        if fields[1] == "<control>" and fields[10]:
+            fields[1] = fields[10]
+        # Handling code point ranges like:
+        #
+        # 3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
+        # 4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
+        if fields[1].endswith(', First>') and not 'Surrogate,' in fields[1]:
+            fields_start = fields
+            continue
+        if fields[1].endswith(', Last>') and not 'Surrogate,' in fields[1]:
+            process_range(fields_start[0], fields[0],
+                          outfile, fields[1][:-7]+'>')
+            fields_start = []
+            continue
+        fields_start = []
+        if 'Surrogate,' in fields[1]:
+            # Comment out the surrogates in the UTF-8 file.
+            # One could of course skip them completely but
+            # the original UTF-8 file in glibc had them as
+            # comments, so we keep these comment lines.
+            outfile.write('%')
+        outfile.write('{:<11s} {:<12s} {:s}\n'.format(
+                unicode_utils.ucs_symbol(int(fields[0], 16)),
+                convert_to_hex(int(fields[0], 16)),
+                fields[1]))
+def convert_to_hex(code_point):
+    '''Converts a code point to a hexadecimal UTF-8 representation
+    like /x**/x**/x**.'''
+    # Getting UTF8 of Unicode characters.
+    # In Python3, .encode('UTF-8') does not work for
+    # surrogates. Therefore, we use this conversion table
+    surrogates = {
+        0xD800: '/xed/xa0/x80',
+        0xDB7F: '/xed/xad/xbf',
+        0xDB80: '/xed/xae/x80',
+        0xDBFF: '/xed/xaf/xbf',
+        0xDC00: '/xed/xb0/x80',
+        0xDFFF: '/xed/xbf/xbf',
+    }
+    if code_point in surrogates:
+        return surrogates[code_point]
+    return ''.join([
+        '/x{:02x}'.format(c) for c in chr(code_point).encode('UTF-8')
+    ])
+def write_header_charmap(outfile):
+    '''Write the header on top of the CHARMAP section to the output file'''
+    outfile.write("<code_set_name> UTF-8\n")
+    outfile.write("<comment_char> %\n")
+    outfile.write("<escape_char> /\n")
+    outfile.write("<mb_cur_min> 1\n")
+    outfile.write("<mb_cur_max> 6\n\n")
+    outfile.write("% CHARMAP generated using utf8_gen.py\n")
+    outfile.write("% alias ISO-10646/UTF-8\n")
+    outfile.write("CHARMAP\n")
+def write_header_width(outfile, unicode_version):
+    '''Writes the header on top of the WIDTH section to the output file'''
+    outfile.write('% Character width according to Unicode '
+                  + '{:s}.\n'.format(unicode_version))
+    outfile.write('% - Default width is 1.\n')
+    outfile.write('% - Double-width characters have width 2; generated from\n')
+    outfile.write('%        "grep \'^[^;]*;[WF]\' EastAsianWidth.txt"\n')
+    outfile.write('% - Non-spacing characters have width 0; '
+                  + 'generated from PropList.txt or\n')
+    outfile.write('%   "grep \'^[^;]*;[^;]*;[^;]*;[^;]*;NSM;\' '
+                  + 'UnicodeData.txt"\n')
+    outfile.write('% - Format control characters have width 0; '
+                  + 'generated from\n')
+    outfile.write("%   \"grep '^[^;]*;[^;]*;Cf;' UnicodeData.txt\"\n")
+#   Not needed covered by Cf
+#    outfile.write("% - Zero width characters have width 0; generated from\n")
+#    outfile.write("%   \"grep '^[^;]*;ZERO WIDTH ' UnicodeData.txt\"\n")
+    outfile.write("WIDTH\n")
+def process_width(outfile, ulines, elines, plines):
+    '''ulines are lines from UnicodeData.txt, elines are lines from
+    EastAsianWidth.txt containing characters with width “W” or “F”,
+    plines are lines from PropList.txt which contain characters
+    with the property “Prepended_Concatenation_Mark”.
+    '''
+    width_dict = {}
+    for line in elines:
+        fields = line.split(";")
+        if not '..' in fields[0]:
+            code_points = (fields[0], fields[0])
+        else:
+            code_points = fields[0].split("..")
+        for key in range(int(code_points[0], 16),
+                         int(code_points[1], 16)+1):
+            width_dict[key] = 2
+    for line in ulines:
+        fields = line.split(";")
+        if fields[4] == "NSM" or fields[2] in ("Cf", "Me", "Mn"):
+            width_dict[int(fields[0], 16)] = 0
+    for line in plines:
+        # Characters with the property “Prepended_Concatenation_Mark”
+        # should have the width 1:
+        fields = line.split(";")
+        if not '..' in fields[0]:
+            code_points = (fields[0], fields[0])
+        else:
+            code_points = fields[0].split("..")
+        for key in range(int(code_points[0], 16),
+                         int(code_points[1], 16)+1):
+            del width_dict[key] # default width is 1
+    # handle special cases for compatibility
+    for key in list((0x00AD,)):
+        # https://www.cs.tut.fi/~jkorpela/shy.html
+        if key in width_dict:
+            del width_dict[key] # default width is 1
+    for key in list(range(0x1160, 0x1200)):
+        width_dict[key] = 0
+    for key in list(range(0x3248, 0x3250)):
+        # These are “A” which means we can decide whether to treat them
+        # as “W” or “N” based on context:
+        # http://www.unicode.org/mail-arch/unicode-ml/y2017-m08/0023.html
+        # For us, “W” seems better.
+        width_dict[key] = 2
+    for key in list(range(0x4DC0, 0x4E00)):
+        width_dict[key] = 2
+    same_width_lists = []
+    current_width_list = []
+    for key in sorted(width_dict):
+        if not current_width_list:
+            current_width_list = [key]
+        elif (key == current_width_list[-1] + 1
+              and width_dict[key] == width_dict[current_width_list[0]]):
+            current_width_list.append(key)
+        else:
+            same_width_lists.append(current_width_list)
+            current_width_list = [key]
+    if current_width_list:
+        same_width_lists.append(current_width_list)
+    for same_width_list in same_width_lists:
+        if len(same_width_list) == 1:
+            outfile.write('{:s}\t{:d}\n'.format(
+                unicode_utils.ucs_symbol(same_width_list[0]),
+                width_dict[same_width_list[0]]))
+        else:
+            outfile.write('{:s}...{:s}\t{:d}\n'.format(
+                unicode_utils.ucs_symbol(same_width_list[0]),
+                unicode_utils.ucs_symbol(same_width_list[-1]),
+                width_dict[same_width_list[0]]))
+if __name__ == "__main__":
+    PARSER = argparse.ArgumentParser(
+        description='''
+        Generate a UTF-8 file from UnicodeData.txt, EastAsianWidth.txt, and PropList.txt.
+        ''')
+    PARSER.add_argument(
+        '-u', '--unicode_data_file',
+        nargs='?',
+        type=str,
+        default='UnicodeData.txt',
+        help=('The UnicodeData.txt file to read, '
+              + 'default: %(default)s'))
+    PARSER.add_argument(
+        '-e', '--east_asian_with_file',
+        nargs='?',
+        type=str,
+        default='EastAsianWidth.txt',
+        help=('The EastAsianWidth.txt file to read, '
+              + 'default: %(default)s'))
+    PARSER.add_argument(
+        '-p', '--prop_list_file',
+        nargs='?',
+        type=str,
+        default='PropList.txt',
+        help=('The PropList.txt file to read, '
+              + 'default: %(default)s'))
+    PARSER.add_argument(
+        '--unicode_version',
+        nargs='?',
+        required=True,
+        type=str,
+        help='The Unicode version of the input files used.')
+    ARGS = PARSER.parse_args()
+    with open(ARGS.unicode_data_file, mode='r') as UNIDATA_FILE:
+        UNICODE_DATA_LINES = UNIDATA_FILE.readlines()
+    with open(ARGS.east_asian_with_file, mode='r') as EAST_ASIAN_WIDTH_FILE:
+        EAST_ASIAN_WIDTH_LINES = []
+        for LINE in EAST_ASIAN_WIDTH_FILE:
+            # If characters from EastAasianWidth.txt which are from
+            # from reserved ranges (i.e. not yet assigned code points)
+            # are added to the WIDTH section of the UTF-8 file, then
+            # “make check” produces “Unknown Character” errors for
+            # these code points because such unassigned code points
+            # are not in the CHARMAP section of the UTF-8 file.
+            #
+            # Therefore, we skip all reserved code points when reading
+            # the EastAsianWidth.txt file.
+            if re.match(r'.*<reserved-.+>\.\.<reserved-.+>.*', LINE):
+                continue
+            if re.match(r'^[^;]*;[WF]', LINE):
+                EAST_ASIAN_WIDTH_LINES.append(LINE.strip())
+    with open(ARGS.prop_list_file, mode='r') as PROP_LIST_FILE:
+        PROP_LIST_LINES = []
+        for LINE in PROP_LIST_FILE:
+            if re.match(r'^[^;]*;[\s]*Prepended_Concatenation_Mark', LINE):
+                PROP_LIST_LINES.append(LINE.strip())
+    with open('UTF-8', mode='w') as OUTFILE:
+        # Processing UnicodeData.txt and write CHARMAP to UTF-8 file
+        write_header_charmap(OUTFILE)
+        process_charmap(UNICODE_DATA_LINES, OUTFILE)
+        OUTFILE.write("END CHARMAP\n\n")
+        # Processing EastAsianWidth.txt and write WIDTH to UTF-8 file
+        write_header_width(OUTFILE, ARGS.unicode_version)
+        process_width(OUTFILE,
+                      UNICODE_DATA_LINES,
+                      EAST_ASIAN_WIDTH_LINES,
+                      PROP_LIST_LINES)
+        OUTFILE.write("END WIDTH\n")
--- a/contrib/unicode/gen_wcwidth.py
+++ b/contrib/unicode/gen_wcwidth.py
+#!/usr/bin/env python3
+#
+# Script to generate tables for cpp_wcwidth, leveraging glibc's utf8_gen.py.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.  */
+import sys
+import os
+if len(sys.argv) != 2:
+    print("usage: %s <unicode version>", file=sys.stderr)
+    sys.exit(1)
+unicode_version = sys.argv[1]
+# Parse a codepoint in the format output by glibc tools.
+def parse_ucn(s):
+    if not (s.startswith("<U") and s.endswith(">")):
+        raise ValueError
+    return int(s[2:-1], base=16)
+# Process a line of width output from utf_gen.py and update global array.
+widths = [1] * (1 + 0x10FFFF)
+def process_width(line):
+    # Example lines:
+    # <UA8FF>	0
+    # <UA926>...<UA92D>	0
+    s = line.split()
+    width = int(s[1])
+    r = s[0].split("...")
+    if len(r) == 1:
+        begin = parse_ucn(r[0])
+        end = begin + 1
+    elif len(r) == 2:
+        begin = parse_ucn(r[0])
+        end = parse_ucn(r[1]) + 1
+    else:
+        raise ValueError
+    widths[begin:end] = [width] * (end - begin)
+# To keep things simple, we use glibc utf8_gen.py as-is.  It only outputs to a
+# file named UTF-8, which is not configurable.  Then we parse this into the form
+# we want it.
+os.system("from_glibc/utf8_gen.py --unicode_version %s" % unicode_version)
+processing = False
+for line in open("UTF-8", "r"):
+    if processing:
+        if line == "END WIDTH\n":
+            processing = False
+        else:
+            try:
+                process_width(line)
+            except (ValueError, IndexError):
+                print(e, "warning: ignored unexpected line: %s" % line,
+                        file=sys.stderr, end="")
+    elif line == "WIDTH\n":
+        processing = True
+# All bytes < 256 we treat as width 1.
+widths[0:255] = [1] * 255
+# Condense the list to contiguous ranges.
+cur_range = [-1, 1]
+all_ranges = []
+for i, width in enumerate(widths):
+    if width == cur_range[1]:
+        cur_range[0] = i
+    else:
+        all_ranges.append(cur_range)
+        cur_range = [i, width]
+# Output the arrays for generated_cpp_wcwidth.h
+print("/*  Generated by contrib/unicode/gen_wcwidth.py,",
+          "with the help of glibc's")
+print("    utf8_gen.py, using version %s" % unicode_version,
+          "of the Unicode standard.  */")
+print("\nstatic const cppchar_t wcwidth_range_ends[] = {", end="")
+for i, r in enumerate(all_ranges):
+    if i % 8:
+        print(" ", end="")
+    else:
+        print("\n  ", end="")
+    print("0x%x," % (r[0]), end="")
+print("\n};\n")
+print("static const unsigned char wcwidth_widths[] = {", end="")
+for i, r in enumerate(all_ranges):
+    if i % 24:
+        print(" ", end="")
+    else:
+        print("\n  ", end="")
+    print("%d," % r[1], end="")
+print("\n};")
--- a/contrib/unicode/unicode-license.txt
+++ b/contrib/unicode/unicode-license.txt
+UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
+    Unicode Data Files include all data files under the directories
+http://www.unicode.org/Public/, http://www.unicode.org/reports/, and
+http://www.unicode.org/cldr/data/. Unicode Data Files do not include PDF
+online code charts under the directory http://www.unicode.org/Public/.
+Software includes any source code published in the Unicode Standard or under
+the directories http://www.unicode.org/Public/,
+http://www.unicode.org/reports/, and http://www.unicode.org/cldr/data/.
+    NOTICE TO USER: Carefully read the following legal agreement. BY
+DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S DATA FILES
+("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"), YOU UNEQUIVOCALLY ACCEPT, AND
+AGREE TO BE BOUND BY, ALL OF THE TERMS AND CONDITIONS OF THIS AGREEMENT. IF
+YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE THE DATA
+FILES OR SOFTWARE.
+    COPYRIGHT AND PERMISSION NOTICE
+    Copyright © 1991-2013 Unicode, Inc. All rights reserved. Distributed under
+the Terms of Use in http://www.unicode.org/copyright.html.
+    Permission is hereby granted, free of charge, to any person obtaining a
+copy of the Unicode data files and any associated documentation (the "Data
+Files") or Unicode software and any associated documentation (the "Software")
+to deal in the Data Files or Software without restriction, including without
+limitation the rights to use, copy, modify, merge, publish, distribute, and/or
+sell copies of the Data Files or Software, and to permit persons to whom the
+Data Files or Software are furnished to do so, provided that (a) the above
+copyright notice(s) and this permission notice appear with all copies of the
+Data Files or Software, (b) both the above copyright notice(s) and this
+permission notice appear in associated documentation, and (c) there is clear
+notice in each modified Data File or in the Software as well as in the
+documentation associated with the Data File(s) or Software that the data or
+software has been modified.
+    THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
+KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD
+PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN
+THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
+DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
+PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
+ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE
+DATA FILES OR SOFTWARE.
+    Except as contained in this notice, the name of a copyright holder shall
+not be used in advertising or otherwise to promote the sale, use or other
+dealings in these Data Files or Software without prior written authorization
+of the copyright holder.
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
+2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>
+	PR preprocessor/49973
+	* input.c (location_compute_display_column): New function to help with
+	multibyte awareness in diagnostics.
+	(test_cpp_utf8): New self-test.
+	(input_c_tests): Call the new test.
+	* input.h (location_compute_display_column): Declare.
+	* diagnostic-show-locus.c: Pervasive changes to add multibyte awareness
+	to all classes and functions.
+	(enum column_unit): New enum.
+	(class exploc_with_display_col): New class.
+	(class layout_point): Convert m_column member to array m_columns[2].
+	(layout_range::contains_point): Add col_unit argument.
+	(test_layout_range_for_single_point): Pass new argument.
+	(test_layout_range_for_single_line): Likewise.
+	(test_layout_range_for_multiple_lines): Likewise.
+	(line_bounds::convert_to_display_cols): New function.
+	(layout::get_state_at_point): Add col_unit argument.
+	(make_range): Use empty filename rather than dummy filename.
+	(get_line_width_without_trailing_whitespace): Rename to...
+	(get_line_bytes_without_trailing_whitespace): ...this.
+	(test_get_line_width_without_trailing_whitespace): Rename to...
+	(test_get_line_bytes_without_trailing_whitespace): ...this.
+	(class layout): m_exploc changed to exploc_with_display_col from
+	plain expanded_location.
+	(layout::get_linenum_width): New accessor member function.
+	(layout::get_x_offset_display): Likewise.
+	(layout::calculate_linenum_width): New subroutine for the constuctor.
+	(layout::calculate_x_offset_display): Likewise.
+	(layout::layout): Use the new subroutines. Add multibyte awareness.
+	(layout::print_source_line): Add multibyte awareness.
+	(layout::print_line): Likewise.
+	(layout::print_annotation_line): Likewise.
+	(line_label::line_label): Likewise.
+	(layout::print_any_labels): Likewise.
+	(layout::annotation_line_showed_range_p): Likewise.
+	(get_printed_columns): Likewise.
+	(class line_label): Rename m_length to m_display_width.
+	(get_affected_columns): Rename to...
+	(get_affected_range): ...this; add col_unit argument and multibyte
+	awareness.
+	(class correction): Add m_affected_bytes and m_display_cols
+	members.  Rename m_len to m_byte_length for clarity.  Add multibyte
+	awareness throughout.
+	(correction::insertion_p): Add multibyte awareness.
+	(correction::compute_display_cols): New function.
+	(correction::ensure_terminated): Use new member name m_byte_length.
+	(line_corrections::add_hint): Add multibyte awareness.
+	(layout::print_trailing_fixits): Likewise.
+	(layout::get_x_bound_for_row): Likewise.
+	(test_one_liner_simple_caret_utf8): New self-test analogous to the one
+	with _utf8 suffix removed, testing multibyte awareness.
+	(test_one_liner_caret_and_range_utf8): Likewise.
+	(test_one_liner_multiple_carets_and_ranges_utf8): Likewise.
+	(test_one_liner_fixit_insert_before_utf8): Likewise.
+	(test_one_liner_fixit_insert_after_utf8): Likewise.
+	(test_one_liner_fixit_remove_utf8): Likewise.
+	(test_one_liner_fixit_replace_utf8): Likewise.
+	(test_one_liner_fixit_replace_non_equal_range_utf8): Likewise.
+	(test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise.
+	(test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
+	(test_one_liner_many_fixits_1_utf8): Likewise.
+	(test_one_liner_many_fixits_2_utf8): Likewise.
+	(test_one_liner_labels_utf8): Likewise.
+	(test_diagnostic_show_locus_one_liner_utf8): Likewise.
+	(test_overlapped_fixit_printing_utf8): Likewise.
+	(test_overlapped_fixit_printing): Adapt for changes to
+	get_affected_columns, get_printed_columns and class corrections.
+	(test_overlapped_fixit_printing_2): Likewise.
+	(test_linenum_sep): New constant.
+	(test_left_margin): Likewise.
+	(test_offset_impl): Helper function for new test.
+	(test_layout_x_offset_display_utf8): New test.
+	(diagnostic_show_locus_c_tests): Call new tests.
 2019-12-09  Eric Botcazou  <ebotcazou@adacore.com>
 	* tree.c (build_array_type_1): Add SET_CANONICAL parameter and compute
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gcc-rich-location.h"
 #include "selftest.h"
 #include "selftest-diagnostic.h"
+#include "cpplib.h"
 #ifdef HAVE_TERMIOS_H
 # include <termios.h>
@@ -112,18 +113,81 @@ class colorizer
  const char *m_stop_color;
 };
-/* A point within a layout_range; similar to an expanded_location,
+/* In order to handle multibyte sources properly, all of this logic needs to be
+   aware of the distinction between the number of bytes and the number of
+   display columns occupied by a character, which are not the same for non-ASCII
+   characters.  For example, the Unicode pi symbol, U+03C0, is encoded in UTF-8
+   as "\xcf\x80", and thus occupies 2 bytes of space while only occupying 1
+   display column when it is output.  A typical emoji, such as U+1F602 (in
+   UTF-8, "\xf0\x9f\x98\x82"), requires 4 bytes and has a display width of 2.
+   The below example line, which is also used for selftests below, shows how the
+   display column and byte column are related:
+     0000000001111111111222222   display
+     1234567890123456789012345   columns
+     SS_foo = P_bar.SS_fieldP;
+     0000000111111111222222223   byte
+     1356789012456789134567891   columns
+   Here SS represents the two display columns for the U+1F602 emoji, and P
+   represents the one display column for the U+03C0 pi symbol.  As an example, a
+   diagnostic pointing to the final P on this line is at byte column 29 and
+   display column 24.  This reflects the fact that the three extended characters
+   before the final P occupy cumulatively 5 more bytes than they do display
+   columns (a difference of 2 for each of the two SSs, and one for the other P).
+   One or the other of the two column units is more useful depending on the
+   context.  For instance, in order to output the caret at the correct location,
+   we need to count display columns; in order to colorize a source line, we need
+   to count the bytes.  All locations are provided to us as byte counts, which
+   we augment with the display column on demand so that it can be used when
+   needed.  This is not the most efficient way to do things since it requires
+   looping over the whole line each time, but it should be fine for the purpose
+   of outputting diagnostics.
+   In order to keep straight which units (byte or display) are in use at a
+   given time, the following enum lets us specify that explicitly.  */
+enum column_unit {
+  /* Measured in raw bytes.  */
+  CU_BYTES = 0,
+  /* Measured in display units.  */
+  CU_DISPLAY_COLS,
+  /* For arrays indexed by column_unit.  */
+  CU_NUM_UNITS
+};
+/* Utility class to augment an exploc with the corresponding display column.  */
+class exploc_with_display_col : public expanded_location
+{
+ public:
+  exploc_with_display_col (const expanded_location &exploc)
+    : expanded_location (exploc),
+      m_display_col (location_compute_display_column (exploc)) {}
+  int m_display_col;
+};
+/* A point within a layout_range; similar to an exploc_with_display_col,
   but after filtering on file.  */
 class layout_point
 {
 public:
  layout_point (const expanded_location &exploc)
-  : m_line (exploc.line),
+    : m_line (exploc.line)
-    m_column (exploc.column) {}
+  {
+    m_columns[CU_BYTES] = exploc.column;
+    m_columns[CU_DISPLAY_COLS] = location_compute_display_column (exploc);
+  }
  linenum_type m_line;
-  int m_column;
+  int m_columns[CU_NUM_UNITS];
 };
 /* A class for use by "class layout" below: a filtered location_range.  */
@@ -138,7 +202,8 @@ class layout_range
 		unsigned original_idx,
 		const range_label *label);
-  bool contains_point (linenum_type row, int column) const;
+  bool contains_point (linenum_type row, int column,
+		       enum column_unit col_unit) const;
  bool intersects_line_p (linenum_type row) const;
  layout_point m_start;
@@ -157,6 +222,17 @@ struct line_bounds
 {
  int m_first_non_ws;
  int m_last_non_ws;
+  void convert_to_display_cols (char_span line)
+  {
+    m_first_non_ws = cpp_byte_column_to_display_column (line.get_buffer (),
+							line.length (),
+							m_first_non_ws);
+    m_last_non_ws = cpp_byte_column_to_display_column (line.get_buffer (),
+						       line.length (),
+						       m_last_non_ws);
+  }
 };
 /* A range of contiguous source lines within a layout (e.g. "lines 5-10"
@@ -252,6 +328,9 @@ class layout
  int get_num_line_spans () const { return m_line_spans.length (); }
  const line_span *get_line_span (int idx) const { return &m_line_spans[idx]; }
+  int get_linenum_width () const { return m_linenum_width; }
+  int get_x_offset_display () const { return m_x_offset_display; }
  void print_gap_in_line_numbering ();
  bool print_heading_for_line_span_index_p (int line_span_idx) const;
@@ -262,7 +341,7 @@ class layout
 private:
  bool will_show_line_p (linenum_type row) const;
  void print_leading_fixits (linenum_type row);
-  void print_source_line (linenum_type row, const char *line, int line_width,
+  void print_source_line (linenum_type row, const char *line, int line_bytes,
 			  line_bounds *lbounds_out);
  bool should_print_annotation_line_p (linenum_type row) const;
  void start_annotation_line (char margin_char = ' ') const;
@@ -277,6 +356,8 @@ class layout
  bool validate_fixit_hint_p (const fixit_hint *hint);
  void calculate_line_spans ();
+  void calculate_linenum_width ();
+  void calculate_x_offset_display ();
  void print_newline ();
@@ -284,6 +365,7 @@ class layout
  get_state_at_point (/* Inputs.  */
 		      linenum_type row, int column,
 		      int first_non_ws, int last_non_ws,
+		      enum column_unit col_unit,
 		      /* Outputs.  */
 		      point_state *out_state);
@@ -298,7 +380,7 @@ class layout
  diagnostic_context *m_context;
  pretty_printer *m_pp;
  location_t m_primary_loc;
-  expanded_location m_exploc;
+  exploc_with_display_col m_exploc;
  colorizer m_colorizer;
  bool m_colorize_source_p;
  bool m_show_labels_p;
@@ -307,7 +389,7 @@ class layout
  auto_vec <const fixit_hint *> m_fixit_hints;
  auto_vec <line_span> m_line_spans;
  int m_linenum_width;
-  int m_x_offset;
+  int m_x_offset_display;
 };
 /* Implementation of "class colorizer".  */
@@ -472,10 +554,15 @@ layout_range::layout_range (const expanded_location *start_exploc,
   - 'w' indicates a point within the range
   - 'F' indicates the finish of the range (which is
 	 within it).
-   - 'a' indicates a subsequent point *after* the range.  */
+   - 'a' indicates a subsequent point *after* the range.
+   COL_UNIT controls whether we check the byte column or
+   the display column; one or the other is more convenient
+   depending on the context.  */
 bool
-layout_range::contains_point (linenum_type row, int column) const
+layout_range::contains_point (linenum_type row, int column,
+			      enum column_unit col_unit) const
 {
  gcc_assert (m_start.m_line <= m_finish.m_line);
  /* ...but the equivalent isn't true for the columns;
@@ -491,7 +578,7 @@ layout_range::contains_point (linenum_type row, int column) const
    /* On same line as start of range (corresponding
       to line 02 in example A and line 03 in example B).  */
    {
-      if (column < m_start.m_column)
+      if (column < m_start.m_columns[col_unit])
 	/* Points on the starting line of the range, but
 	   before the column in which it begins.  */
 	return false;
@@ -505,7 +592,7 @@ layout_range::contains_point (linenum_type row, int column) const
 	{
 	  /* This is a single-line range.  */
 	  gcc_assert (row == m_finish.m_line);
-	  return column <= m_finish.m_column;
+	  return column <= m_finish.m_columns[col_unit];
 	}
    }
@@ -530,7 +617,7 @@ layout_range::contains_point (linenum_type row, int column) const
  gcc_assert (row ==  m_finish.m_line);
-  return column <= m_finish.m_column;
+  return column <= m_finish.m_columns[col_unit];
 }
 /* Does this layout_range contain any part of line ROW?  */
@@ -548,15 +635,22 @@ layout_range::intersects_line_p (linenum_type row) const
 #if CHECKING_P
-/* A helper function for testing layout_range.  */
+/* Create some expanded locations for testing layout_range.  The filename
+   member of the explocs is set to the empty string.  This member will only be
+   inspected by the calls to location_compute_display_column() made from the
+   layout_point constructors.  That function will check for an empty filename
+   argument and not attempt to open it, rather treating the non-existent data
+   as if the display width were the same as the byte count.  Tests exercising a
+   real difference between byte count and display width are performed later,
+   e.g. in test_diagnostic_show_locus_one_liner_utf8().  */
 static layout_range
 make_range (int start_line, int start_col, int end_line, int end_col)
 {
  const expanded_location start_exploc
-    = {"test.c", start_line, start_col, NULL, false};
+    = {"", start_line, start_col, NULL, false};
  const expanded_location finish_exploc
-    = {"test.c", end_line, end_col, NULL, false};
+    = {"", end_line, end_col, NULL, false};
  return layout_range (&start_exploc, &finish_exploc, SHOW_RANGE_WITHOUT_CARET,
 		       &start_exploc, 0, NULL);
 }
@@ -574,20 +668,25 @@ test_layout_range_for_single_point ()
  /* Tests for layout_range::contains_point.  */
+  for (int i = 0; i != CU_NUM_UNITS; ++i)
+    {
+      const enum column_unit col_unit = (enum column_unit) i;
      /* Before the line.  */
-  ASSERT_FALSE (point.contains_point (6, 1));
+      ASSERT_FALSE (point.contains_point (6, 1, col_unit));
      /* On the line, but before start.  */
-  ASSERT_FALSE (point.contains_point (7, 9));
+      ASSERT_FALSE (point.contains_point (7, 9, col_unit));
      /* At the point.  */
-  ASSERT_TRUE (point.contains_point (7, 10));
+      ASSERT_TRUE (point.contains_point (7, 10, col_unit));
      /* On the line, after the point.  */
-  ASSERT_FALSE (point.contains_point (7, 11));
+      ASSERT_FALSE (point.contains_point (7, 11, col_unit));
      /* After the line.  */
-  ASSERT_FALSE (point.contains_point (8, 1));
+      ASSERT_FALSE (point.contains_point (8, 1, col_unit));
+    }
  /* Tests for layout_range::intersects_line_p.  */
  ASSERT_FALSE (point.intersects_line_p (6));
@@ -605,26 +704,31 @@ test_layout_range_for_single_line ()
  /* Tests for layout_range::contains_point.  */
+  for (int i = 0; i != CU_NUM_UNITS; ++i)
+    {
+      const enum column_unit col_unit = (enum column_unit) i;
      /* Before the line.  */
-  ASSERT_FALSE (example_a.contains_point (1, 1));
+      ASSERT_FALSE (example_a.contains_point (1, 1, col_unit));
      /* On the line, but before start.  */
-  ASSERT_FALSE (example_a.contains_point (2, 21));
+      ASSERT_FALSE (example_a.contains_point (2, 21, col_unit));
      /* On the line, at the start.  */
-  ASSERT_TRUE (example_a.contains_point (2, 22));
+      ASSERT_TRUE (example_a.contains_point (2, 22, col_unit));
      /* On the line, within the range.  */
-  ASSERT_TRUE (example_a.contains_point (2, 23));
+      ASSERT_TRUE (example_a.contains_point (2, 23, col_unit));
      /* On the line, at the end.  */
-  ASSERT_TRUE (example_a.contains_point (2, 38));
+      ASSERT_TRUE (example_a.contains_point (2, 38, col_unit));
      /* On the line, after the end.  */
-  ASSERT_FALSE (example_a.contains_point (2, 39));
+      ASSERT_FALSE (example_a.contains_point (2, 39, col_unit));
      /* After the line.  */
-  ASSERT_FALSE (example_a.contains_point (2, 39));
+      ASSERT_FALSE (example_a.contains_point (2, 39, col_unit));
+    }
  /* Tests for layout_range::intersects_line_p.  */
  ASSERT_FALSE (example_a.intersects_line_p (1));
@@ -642,40 +746,45 @@ test_layout_range_for_multiple_lines ()
  /* Tests for layout_range::contains_point.  */
+  for (int i = 0; i != CU_NUM_UNITS; ++i)
+    {
+      const enum column_unit col_unit = (enum column_unit) i;
      /* Before first line.  */
-  ASSERT_FALSE (example_b.contains_point (1, 1));
+      ASSERT_FALSE (example_b.contains_point (1, 1, col_unit));
      /* On the first line, but before start.  */
-  ASSERT_FALSE (example_b.contains_point (3, 13));
+      ASSERT_FALSE (example_b.contains_point (3, 13, col_unit));
      /* At the start.  */
-  ASSERT_TRUE (example_b.contains_point (3, 14));
+      ASSERT_TRUE (example_b.contains_point (3, 14, col_unit));
      /* On the first line, within the range.  */
-  ASSERT_TRUE (example_b.contains_point (3, 15));
+      ASSERT_TRUE (example_b.contains_point (3, 15, col_unit));
      /* On an interior line.
 	 The column number should not matter; try various boundary
 	 values.  */
-  ASSERT_TRUE (example_b.contains_point (4, 1));
+      ASSERT_TRUE (example_b.contains_point (4, 1, col_unit));
-  ASSERT_TRUE (example_b.contains_point (4, 7));
+      ASSERT_TRUE (example_b.contains_point (4, 7, col_unit));
-  ASSERT_TRUE (example_b.contains_point (4, 8));
+      ASSERT_TRUE (example_b.contains_point (4, 8, col_unit));
-  ASSERT_TRUE (example_b.contains_point (4, 9));
+      ASSERT_TRUE (example_b.contains_point (4, 9, col_unit));
-  ASSERT_TRUE (example_b.contains_point (4, 13));
+      ASSERT_TRUE (example_b.contains_point (4, 13, col_unit));
-  ASSERT_TRUE (example_b.contains_point (4, 14));
+      ASSERT_TRUE (example_b.contains_point (4, 14, col_unit));
-  ASSERT_TRUE (example_b.contains_point (4, 15));
+      ASSERT_TRUE (example_b.contains_point (4, 15, col_unit));
      /* On the final line, before the end.  */
-  ASSERT_TRUE (example_b.contains_point (5, 7));
+      ASSERT_TRUE (example_b.contains_point (5, 7, col_unit));
      /* On the final line, at the end.  */
-  ASSERT_TRUE (example_b.contains_point (5, 8));
+      ASSERT_TRUE (example_b.contains_point (5, 8, col_unit));
      /* On the final line, after the end.  */
-  ASSERT_FALSE (example_b.contains_point (5, 9));
+      ASSERT_FALSE (example_b.contains_point (5, 9, col_unit));
      /* After the line.  */
-  ASSERT_FALSE (example_b.contains_point (6, 1));
+      ASSERT_FALSE (example_b.contains_point (6, 1, col_unit));
+    }
  /* Tests for layout_range::intersects_line_p.  */
  ASSERT_FALSE (example_b.intersects_line_p (2));
@@ -687,13 +796,13 @@ test_layout_range_for_multiple_lines ()
 #endif /* #if CHECKING_P */
-/* Given a source line LINE of length LINE_WIDTH, determine the width
+/* Given a source line LINE of length LINE_BYTES bytes, determine the length
-   without any trailing whitespace.  */
+   (still in bytes, not display cols) without any trailing whitespace.  */
 static int
-get_line_width_without_trailing_whitespace (const char *line, int line_width)
+get_line_bytes_without_trailing_whitespace (const char *line, int line_bytes)
 {
-  int result = line_width;
+  int result = line_bytes;
  while (result > 0)
    {
      char ch = line[result - 1];
@@ -703,7 +812,7 @@ get_line_width_without_trailing_whitespace (const char *line, int line_width)
 	break;
    }
  gcc_assert (result >= 0);
-  gcc_assert (result <= line_width);
+  gcc_assert (result <= line_bytes);
  gcc_assert (result == 0 ||
 	      (line[result - 1] != ' '
 	       && line[result -1] != '\t'
@@ -713,21 +822,21 @@ get_line_width_without_trailing_whitespace (const char *line, int line_width)
 #if CHECKING_P
-/* A helper function for testing get_line_width_without_trailing_whitespace.  */
+/* A helper function for testing get_line_bytes_without_trailing_whitespace.  */
 static void
-assert_eq (const char *line, int expected_width)
+assert_eq (const char *line, int expected_bytes)
 {
  int actual_value
-    = get_line_width_without_trailing_whitespace (line, strlen (line));
+    = get_line_bytes_without_trailing_whitespace (line, strlen (line));
-  ASSERT_EQ (actual_value, expected_width);
+  ASSERT_EQ (actual_value, expected_bytes);
 }
-/* Verify that get_line_width_without_trailing_whitespace is sane for
+/* Verify that get_line_bytes_without_trailing_whitespace is sane for
   various inputs.  It is not required to handle newlines.  */
 static void
-test_get_line_width_without_trailing_whitespace ()
+test_get_line_bytes_without_trailing_whitespace ()
 {
  assert_eq ("", 0);
  assert_eq (" ", 0);
@@ -835,7 +944,7 @@ fixit_cmp (const void *p_a, const void *p_b)
   sanely print, populating m_layout_ranges and m_fixit_hints.
   Determine the range of lines that we will print, splitting them
   up into an ordered list of disjoint spans of contiguous line numbers.
-   Determine m_x_offset, to ensure that the primary caret
+   Determine m_x_offset_display, to ensure that the primary caret
   will fit within the max_width provided by the diagnostic_context.  */
 layout::layout (diagnostic_context * context,
@@ -853,7 +962,7 @@ layout::layout (diagnostic_context * context,
  m_fixit_hints (richloc->get_num_fixit_hints ()),
  m_line_spans (1 + richloc->get_num_locations ()),
  m_linenum_width (0),
-  m_x_offset (0)
+  m_x_offset_display (0)
 {
  for (unsigned int idx = 0; idx < richloc->get_num_locations (); idx++)
    {
@@ -875,45 +984,16 @@ layout::layout (diagnostic_context * context,
  /* Sort m_fixit_hints.  */
  m_fixit_hints.qsort (fixit_cmp);
-  /* Populate m_line_spans.  */
+  /* Populate the indicated members.  */
  calculate_line_spans ();
+  calculate_linenum_width ();
-  /* Determine m_linenum_width.  */
+  calculate_x_offset_display ();
-  gcc_assert (m_line_spans.length () > 0);
-  const line_span *last_span = &m_line_spans[m_line_spans.length () - 1];
-  int highest_line = last_span->m_last_line;
-  if (highest_line < 0)
-    highest_line = 0;
-  m_linenum_width = num_digits (highest_line);
-  /* If we're showing jumps in the line-numbering, allow at least 3 chars.  */
-  if (m_line_spans.length () > 1)
-    m_linenum_width = MAX (m_linenum_width, 3);
-  /* If there's a minimum margin width, apply it (subtracting 1 for the space
-     after the line number.  */
-  m_linenum_width = MAX (m_linenum_width, context->min_margin_width - 1);
-  /* Adjust m_x_offset.
-     Center the primary caret to fit in max_width; all columns
-     will be adjusted accordingly.  */
-  size_t max_width = m_context->caret_max_width;
-  char_span line = location_get_source_line (m_exploc.file, m_exploc.line);
-  if (line && (size_t)m_exploc.column <= line.length ())
-    {
-      size_t right_margin = CARET_LINE_MARGIN;
-      size_t column = m_exploc.column;
-      if (m_show_line_numbers_p)
-	column += m_linenum_width + 2;
-      right_margin = MIN (line.length () - column, right_margin);
-      right_margin = max_width - right_margin;
-      if (line.length () >= max_width && column > right_margin)
-	m_x_offset = column - right_margin;
-      gcc_assert (m_x_offset >= 0);
-    }
  if (context->show_ruler_p)
-    show_ruler (m_x_offset + max_width);
+    show_ruler (m_x_offset_display + m_context->caret_max_width);
 }
 /* Attempt to add LOC_RANGE to m_layout_ranges, filtering them to
   those that we can sanely print.
@@ -1086,7 +1166,7 @@ layout::get_expanded_location (const line_span *line_span) const
 	{
 	  expanded_location exploc = m_exploc;
 	  exploc.line = lr->m_start.m_line;
-	  exploc.column = lr->m_start.m_column;
+	  exploc.column = lr->m_start.m_columns[CU_BYTES];
 	  return exploc;
 	}
    }
@@ -1251,25 +1331,122 @@ layout::calculate_line_spans ()
    }
 }
+/* Determine how many display columns need to be reserved for line numbers,
+   based on the largest line number that will be needed, and populate
+   m_linenum_width.  */
+void
+layout::calculate_linenum_width ()
+{
+  gcc_assert (m_line_spans.length () > 0);
+  const line_span *last_span = &m_line_spans[m_line_spans.length () - 1];
+  int highest_line = last_span->m_last_line;
+  if (highest_line < 0)
+    highest_line = 0;
+  m_linenum_width = num_digits (highest_line);
+  /* If we're showing jumps in the line-numbering, allow at least 3 chars.  */
+  if (m_line_spans.length () > 1)
+    m_linenum_width = MAX (m_linenum_width, 3);
+  /* If there's a minimum margin width, apply it (subtracting 1 for the space
+     after the line number.  */
+  m_linenum_width = MAX (m_linenum_width, m_context->min_margin_width - 1);
+}
+/* Calculate m_x_offset_display, which improves readability in case the source
+   line of interest is longer than the user's display.  All lines output will be
+   shifted to the left (so that their beginning is no longer displayed) by
+   m_x_offset_display display columns, so that the caret is in a reasonable
+   location.  */
+void
+layout::calculate_x_offset_display ()
+{
+  m_x_offset_display = 0;
+  const int max_width = m_context->caret_max_width;
+  if (!max_width)
+    {
+      /* Nothing to do, the width is not capped.  */
+      return;
+    }
+  const char_span line = location_get_source_line (m_exploc.file,
+						   m_exploc.line);
+  if (!line)
+    {
+      /* Nothing to do, we couldn't find the source line.  */
+      return;
+    }
+  int caret_display_column = m_exploc.m_display_col;
+  const int line_bytes
+    = get_line_bytes_without_trailing_whitespace (line.get_buffer (),
+						  line.length ());
+  int eol_display_column
+    = cpp_display_width (line.get_buffer (), line_bytes);
+  if (caret_display_column > eol_display_column
+      || !caret_display_column)
+    {
+      /* This does not make sense, so don't try to do anything in this case.  */
+      return;
+    }
+  /* Adjust caret and eol positions to include the left margin.  If we are
+     outputting line numbers, then the left margin is equal to m_linenum_width
+     plus three for the " | " which follows it.  Otherwise the left margin width
+     is equal to 1, because layout::print_source_line() will prefix each line
+     with a space.  */
+  const int source_display_cols = eol_display_column;
+  int left_margin_size = 1;
+  if (m_show_line_numbers_p)
+      left_margin_size = m_linenum_width + 3;
+  caret_display_column += left_margin_size;
+  eol_display_column += left_margin_size;
+  if (eol_display_column <= max_width)
+    {
+      /* Nothing to do, everything fits in the display.  */
+      return;
+    }
+  /* The line is too long for the display.  Calculate an offset such that the
+     caret is not too close to the right edge of the screen.  It will be
+     CARET_LINE_MARGIN display columns from the right edge, unless it is closer
+     than that to the end of the source line anyway.  */
+  int right_margin_size = CARET_LINE_MARGIN;
+  right_margin_size = MIN (eol_display_column - caret_display_column,
+			   right_margin_size);
+  if (right_margin_size + left_margin_size >= max_width)
+    {
+      /* The max_width is very small, so anything we try to do will not be very
+	 effective; just punt in this case and output with no offset.  */
+      return;
+    }
+  const int max_caret_display_column = max_width - right_margin_size;
+  if (caret_display_column > max_caret_display_column)
+    {
+      m_x_offset_display = caret_display_column - max_caret_display_column;
+      /* Make sure we don't offset the line into oblivion.  */
+      static const int min_cols_visible = 2;
+      if (source_display_cols - m_x_offset_display < min_cols_visible)
+	m_x_offset_display = 0;
+    }
+}
 /* Print line ROW of source code, potentially colorized at any ranges, and
   populate *LBOUNDS_OUT.
-   LINE is the source line (not necessarily 0-terminated) and LINE_WIDTH
+   LINE is the source line (not necessarily 0-terminated) and LINE_BYTES
-   is its width.  */
+   is its length in bytes.
+   This function deals only with byte offsets, not display columns, so
+   m_x_offset_display must be converted from display to byte units.  In
+   particular, LINE_BYTES and LBOUNDS_OUT are in bytes.  */
 void
-layout::print_source_line (linenum_type row, const char *line, int line_width,
+layout::print_source_line (linenum_type row, const char *line, int line_bytes,
 			   line_bounds *lbounds_out)
 {
  m_colorizer.set_normal_text ();
-  /* We will stop printing the source line at any trailing
-     whitespace.  */
-  line_width = get_line_width_without_trailing_whitespace (line,
-							   line_width);
-  line += m_x_offset;
  pp_emit_prefix (m_pp);
  if (m_show_line_numbers_p)
    {
      int width = num_digits (row);
@@ -1279,10 +1456,31 @@ layout::print_source_line (linenum_type row, const char *line, int line_width,
    }
  else
    pp_space (m_pp);
+  /* We will stop printing the source line at any trailing whitespace, and start
+     printing it as per m_x_offset_display.  */
+  line_bytes = get_line_bytes_without_trailing_whitespace (line,
+							   line_bytes);
+  int x_offset_bytes = 0;
+  if (m_x_offset_display)
+    {
+      x_offset_bytes = cpp_display_column_to_byte_column (line, line_bytes,
+							  m_x_offset_display);
+      /* In case the leading portion of the line that will be skipped over ends
+	 with a character with wcwidth > 1, then it is possible we skipped too
+	 much, so account for that by padding with spaces.  */
+      const int overage
+	= cpp_byte_column_to_display_column (line, line_bytes, x_offset_bytes)
+	- m_x_offset_display;
+      for (int column = 0; column < overage; ++column)
+	pp_space (m_pp);
+      line += x_offset_bytes;
+    }
+  /* Print the line.  */
  int first_non_ws = INT_MAX;
  int last_non_ws = 0;
-  int column;
+  for (int col_byte = 1 + x_offset_bytes; col_byte <= line_bytes; col_byte++)
-  for (column = 1 + m_x_offset; column <= line_width; column++)
    {
      /* Assuming colorization is enabled for the caret and underline
 	 characters, we may also colorize the associated characters
@@ -1300,8 +1498,9 @@ layout::print_source_line (linenum_type row, const char *line, int line_width,
 	{
 	  bool in_range_p;
 	  point_state state;
-	  in_range_p = get_state_at_point (row, column,
+	  in_range_p = get_state_at_point (row, col_byte,
 					   0, INT_MAX,
+					   CU_BYTES,
 					   &state);
 	  if (in_range_p)
 	    m_colorizer.set_range (state.range_idx);
@@ -1313,9 +1512,9 @@ layout::print_source_line (linenum_type row, const char *line, int line_width,
 	c = ' ';
      if (c != ' ')
 	{
-	  last_non_ws = column;
+	  last_non_ws = col_byte;
 	  if (first_non_ws == INT_MAX)
-	    first_non_ws = column;
+	    first_non_ws = col_byte;
 	}
      pp_character (m_pp, c);
      line++;
@@ -1365,24 +1564,26 @@ layout::start_annotation_line (char margin_char) const
 }
 /* Print a line consisting of the caret/underlines for the given
-   source line.  */
+   source line.  This function works with display columns, rather than byte
+   counts; in particular, LBOUNDS should be in display column units.  */
 void
 layout::print_annotation_line (linenum_type row, const line_bounds lbounds)
 {
-  int x_bound = get_x_bound_for_row (row, m_exploc.column,
+  int x_bound = get_x_bound_for_row (row, m_exploc.m_display_col,
 				     lbounds.m_last_non_ws);
  start_annotation_line ();
  pp_space (m_pp);
-  for (int column = 1 + m_x_offset; column < x_bound; column++)
+  for (int column = 1 + m_x_offset_display; column < x_bound; column++)
    {
      bool in_range_p;
      point_state state;
      in_range_p = get_state_at_point (row, column,
 				       lbounds.m_first_non_ws,
 				       lbounds.m_last_non_ws,
+				       CU_DISPLAY_COLS,
 				       &state);
      if (in_range_p)
 	{
@@ -1420,9 +1621,11 @@ class line_label
 public:
  line_label (int state_idx, int column, label_text text)
  : m_state_idx (state_idx), m_column (column),
-    m_text (text), m_length (strlen (text.m_buffer)),
+    m_text (text), m_label_line (0), m_has_vbar (true)
-    m_label_line (0), m_has_vbar (true)
+  {
-  {}
+    const int bytes = strlen (text.m_buffer);
+    m_display_width = cpp_display_width (text.m_buffer, bytes);
+  }
  /* Sorting is primarily by column, then by state index.  */
  static int comparator (const void *p1, const void *p2)
@@ -1441,7 +1644,7 @@ public:
  int m_state_idx;
  int m_column;
  label_text m_text;
-  size_t m_length;
+  size_t m_display_width;
  int m_label_line;
  bool m_has_vbar;
 };
@@ -1467,8 +1670,9 @@ layout::print_any_labels (linenum_type row)
 	  continue;
 	/* Reject labels that aren't fully visible due to clipping
-	   by m_x_offset.  */
+	   by m_x_offset_display.  */
-	if (range->m_caret.m_column <= m_x_offset)
+	const int disp_col = range->m_caret.m_columns[CU_DISPLAY_COLS];
+	if (disp_col <= m_x_offset_display)
 	  continue;
 	label_text text;
@@ -1480,7 +1684,7 @@ layout::print_any_labels (linenum_type row)
 	if (text.m_buffer == NULL)
 	  continue;
-	labels.safe_push (line_label (i, range->m_caret.m_column, text));
+	labels.safe_push (line_label (i, disp_col, text));
      }
  }
@@ -1530,7 +1734,7 @@ layout::print_any_labels (linenum_type row)
    FOR_EACH_VEC_ELT_REVERSE (labels, i, label)
      {
 	/* Would this label "touch" or overlap the next label?  */
-	if (label->m_column + label->m_length >= (size_t)next_column)
+	if (label->m_column + label->m_display_width >= (size_t)next_column)
 	  {
 	    max_label_line++;
@@ -1554,7 +1758,7 @@ layout::print_any_labels (linenum_type row)
      {
 	start_annotation_line ();
 	pp_space (m_pp);
-	int column = 1 + m_x_offset;
+	int column = 1 + m_x_offset_display;
 	line_label *label;
 	FOR_EACH_VEC_ELT (labels, i, label)
 	  {
@@ -1569,7 +1773,7 @@ layout::print_any_labels (linenum_type row)
 		m_colorizer.set_range (label->m_state_idx);
 		pp_string (m_pp, label->m_text.m_buffer);
 		m_colorizer.set_normal_text ();
-		column += label->m_length;
+		column += label->m_display_width;
 	      }
 	    else if (label->m_has_vbar)
 	      {
@@ -1636,7 +1840,7 @@ layout::print_leading_fixits (linenum_type row)
 /* Subroutine of layout::print_trailing_fixits.
   Determine if the annotation line printed for LINE contained
-   the exact range from START_COLUMN to FINISH_COLUMN.  */
+   the exact range from START_COLUMN to FINISH_COLUMN (in display units).  */
 bool
 layout::annotation_line_showed_range_p (linenum_type line, int start_column,
@@ -1646,9 +1850,9 @@ layout::annotation_line_showed_range_p (linenum_type line, int start_column,
  int i;
  FOR_EACH_VEC_ELT (m_layout_ranges, i, range)
    if (range->m_start.m_line == line
-	&& range->m_start.m_column == start_column
+	&& range->m_start.m_columns[CU_DISPLAY_COLS] == start_column
 	&& range->m_finish.m_line == line
-	&& range->m_finish.m_column == finish_column)
+	&& range->m_finish.m_columns[CU_DISPLAY_COLS] == finish_column)
      return true;
  return false;
 }
@@ -1735,7 +1939,7 @@ layout::annotation_line_showed_range_p (linenum_type line, int start_column,
   and is thus printed as desired.  */
-/* A range of columns within a line.  */
+/* A range of (byte or display) columns within a line.  */
 class column_range
 {
@@ -1755,32 +1959,51 @@ public:
  int finish;
 };
-/* Get the range of columns that HINT would affect.  */
+/* Get the range of bytes or display columns that HINT would affect.  */
 static column_range
-get_affected_columns (const fixit_hint *hint)
+get_affected_range (const fixit_hint *hint, enum column_unit col_unit)
 {
-  int start_column = LOCATION_COLUMN (hint->get_start_loc ());
+  expanded_location exploc_start = expand_location (hint->get_start_loc ());
-  int finish_column = LOCATION_COLUMN (hint->get_next_loc ()) - 1;
+  expanded_location exploc_finish = expand_location (hint->get_next_loc ());
+  --exploc_finish.column;
+  int start_column;
+  int finish_column;
+  if (col_unit == CU_DISPLAY_COLS)
+    {
+      start_column = location_compute_display_column (exploc_start);
+      if (hint->insertion_p ())
+	finish_column = start_column - 1;
+      else
+	finish_column = location_compute_display_column (exploc_finish);
+    }
+  else
+    {
+      start_column = exploc_start.column;
+      finish_column = exploc_finish.column;
+    }
  return column_range (start_column, finish_column);
 }
-/* Get the range of columns that would be printed for HINT.  */
+/* Get the range of display columns that would be printed for HINT.  */
 static column_range
 get_printed_columns (const fixit_hint *hint)
 {
-  int start_column = LOCATION_COLUMN (hint->get_start_loc ());
+  expanded_location exploc = expand_location (hint->get_start_loc ());
-  int final_hint_column = start_column + hint->get_length () - 1;
+  int start_column = location_compute_display_column (exploc);
+  int hint_width = cpp_display_width (hint->get_string (),
+				      hint->get_length ());
+  int final_hint_column = start_column + hint_width - 1;
  if (hint->insertion_p ())
    {
      return column_range (start_column, final_hint_column);
    }
  else
    {
-      int finish_column = LOCATION_COLUMN (hint->get_next_loc ()) - 1;
+      exploc = expand_location (hint->get_next_loc ());
+      --exploc.column;
+      int finish_column = location_compute_display_column (exploc);
      return column_range (start_column,
 			   MAX (finish_column, final_hint_column));
    }
@@ -1794,27 +2017,35 @@ get_printed_columns (const fixit_hint *hint)
 class correction
 {
 public:
-  correction (column_range affected_columns,
+  correction (column_range affected_bytes,
+	      column_range affected_columns,
 	      column_range printed_columns,
 	      const char *new_text, size_t new_text_len)
-  : m_affected_columns (affected_columns),
+  : m_affected_bytes (affected_bytes),
+    m_affected_columns (affected_columns),
    m_printed_columns (printed_columns),
    m_text (xstrdup (new_text)),
-    m_len (new_text_len),
+    m_byte_length (new_text_len),
    m_alloc_sz (new_text_len + 1)
  {
+    compute_display_cols ();
  }
  ~correction () { free (m_text); }
  bool insertion_p () const
  {
-    return m_affected_columns.start == m_affected_columns.finish + 1;
+    return m_affected_bytes.start == m_affected_bytes.finish + 1;
  }
  void ensure_capacity (size_t len);
  void ensure_terminated ();
+  void compute_display_cols ()
+  {
+    m_display_cols = cpp_display_width (m_text, m_byte_length);
+  }
  void overwrite (int dst_offset, const char_span &src_span)
  {
    gcc_assert (dst_offset >= 0);
@@ -1827,6 +2058,7 @@ public:
     is to be inserted, and finish is offset by the length of
     the replacement.
     If replace, then the range of columns affected.  */
+  column_range m_affected_bytes;
  column_range m_affected_columns;
  /* If insert, then start: the column before which the text
@@ -1837,7 +2069,8 @@ public:
  /* The text to be inserted/used as replacement.  */
  char *m_text;
-  size_t m_len;
+  size_t m_byte_length; /* Not including null-terminator.  */
+  int m_display_cols;
  size_t m_alloc_sz;
 };
@@ -1862,8 +2095,8 @@ void
 correction::ensure_terminated ()
 {
  /* 0-terminate the buffer.  */
-  gcc_assert (m_len < m_alloc_sz);
+  gcc_assert (m_byte_length < m_alloc_sz);
-  m_text[m_len] = '\0';
+  m_text[m_byte_length] = '\0';
 }
 /* A list of corrections affecting a particular line.
@@ -1925,7 +2158,8 @@ source_line::source_line (const char *filename, int line)
 void
 line_corrections::add_hint (const fixit_hint *hint)
 {
-  column_range affected_columns = get_affected_columns (hint);
+  column_range affected_bytes = get_affected_range (hint, CU_BYTES);
+  column_range affected_columns = get_affected_range (hint, CU_DISPLAY_COLS);
  column_range printed_columns = get_printed_columns (hint);
  /* Potentially consolidate.  */
@@ -1936,8 +2170,8 @@ line_corrections::add_hint (const fixit_hint *hint)
      /* The following consolidation code assumes that the fix-it hints
 	 have been sorted by start (done within layout's ctor).  */
-      gcc_assert (affected_columns.start
+      gcc_assert (affected_bytes.start
-		  >= last_correction->m_affected_columns.start);
+		  >= last_correction->m_affected_bytes.start);
      gcc_assert (printed_columns.start
 		  >= last_correction->m_printed_columns.start);
@@ -1949,8 +2183,8 @@ line_corrections::add_hint (const fixit_hint *hint)
 	     Attempt to inject a "replace" correction from immediately
 	     after the end of the last hint to immediately before the start
 	     of the next hint.  */
-	  column_range between (last_correction->m_affected_columns.finish + 1,
+	  column_range between (last_correction->m_affected_bytes.finish + 1,
-				printed_columns.start - 1);
+				affected_bytes.start - 1);
 	  /* Try to read the source.  */
 	  source_line line (m_filename, m_row);
@@ -1959,33 +2193,39 @@ line_corrections::add_hint (const fixit_hint *hint)
 	      /* Consolidate into the last correction:
 		 add a no-op "replace" of the "between" text, and
 		 add the text from the new hint.  */
-	      int old_len = last_correction->m_len;
+	      int old_byte_len = last_correction->m_byte_length;
-	      gcc_assert (old_len >= 0);
+	      gcc_assert (old_byte_len >= 0);
-	      int between_len = between.finish + 1 - between.start;
+	      int between_byte_len = between.finish + 1 - between.start;
-	      gcc_assert (between_len >= 0);
+	      gcc_assert (between_byte_len >= 0);
-	      int new_len = old_len + between_len + hint->get_length ();
+	      int new_byte_len
-	      gcc_assert (new_len >= 0);
+		= old_byte_len + between_byte_len + hint->get_length ();
-	      last_correction->ensure_capacity (new_len);
+	      gcc_assert (new_byte_len >= 0);
+	      last_correction->ensure_capacity (new_byte_len);
 	      last_correction->overwrite
-		(old_len,
+		(old_byte_len,
 		 line.as_span ().subspan (between.start - 1,
 					  between.finish + 1 - between.start));
-	      last_correction->overwrite (old_len + between_len,
+	      last_correction->overwrite (old_byte_len + between_byte_len,
 					  char_span (hint->get_string (),
 						     hint->get_length ()));
-	      last_correction->m_len = new_len;
+	      last_correction->m_byte_length = new_byte_len;
 	      last_correction->ensure_terminated ();
+	      last_correction->m_affected_bytes.finish
+		= affected_bytes.finish;
 	      last_correction->m_affected_columns.finish
 		= affected_columns.finish;
+	      int prev_display_cols = last_correction->m_display_cols;
+	      last_correction->compute_display_cols ();
 	      last_correction->m_printed_columns.finish
-		+= between_len + hint->get_length ();
+		+= last_correction->m_display_cols - prev_display_cols;
 	      return;
 	    }
 	}
    }
  /* If no consolidation happened, add a new correction instance.  */
-  m_corrections.safe_push (new correction (affected_columns,
+  m_corrections.safe_push (new correction (affected_bytes,
+					   affected_columns,
 					   printed_columns,
 					   hint->get_string (),
 					   hint->get_length ()));
@@ -2018,7 +2258,7 @@ layout::print_trailing_fixits (linenum_type row)
  /* Now print the corrections.  */
  unsigned i;
  correction *c;
-  int column = m_x_offset;
+  int column = m_x_offset_display;
  if (!corrections.m_corrections.is_empty ())
    start_annotation_line ();
@@ -2034,7 +2274,7 @@ layout::print_trailing_fixits (linenum_type row)
 	  m_colorizer.set_fixit_insert ();
 	  pp_string (m_pp, c->m_text);
 	  m_colorizer.set_normal_text ();
-	  column += c->m_len;
+	  column += c->m_display_cols;
 	}
      else
 	{
@@ -2046,7 +2286,7 @@ layout::print_trailing_fixits (linenum_type row)
 	  int finish_column = c->m_affected_columns.finish;
 	  if (!annotation_line_showed_range_p (row, start_column,
 					       finish_column)
-	      || c->m_len == 0)
+	      || c->m_byte_length == 0)
 	    {
 	      move_to_column (&column, start_column, true);
 	      m_colorizer.set_fixit_delete ();
@@ -2057,13 +2297,13 @@ layout::print_trailing_fixits (linenum_type row)
 	  /* Print the replacement text.  REPLACE also covers
 	     removals, so only do this extra work (potentially starting
 	     a new line) if we have actual replacement text.  */
-	  if (c->m_len > 0)
+	  if (c->m_byte_length > 0)
 	    {
 	      move_to_column (&column, start_column, true);
 	      m_colorizer.set_fixit_insert ();
 	      pp_string (m_pp, c->m_text);
 	      m_colorizer.set_normal_text ();
-	      column += c->m_len;
+	      column += c->m_display_cols;
 	    }
 	}
    }
@@ -2084,12 +2324,14 @@ layout::print_newline ()
 /* Return true if (ROW/COLUMN) is within a range of the layout.
   If it returns true, OUT_STATE is written to, with the
   range index, and whether we should draw the caret at
-   (ROW/COLUMN) (as opposed to an underline).  */
+   (ROW/COLUMN) (as opposed to an underline).  COL_UNIT controls
+   whether all inputs and outputs are in bytes or display column units.  */
 bool
 layout::get_state_at_point (/* Inputs.  */
 			    linenum_type row, int column,
 			    int first_non_ws, int last_non_ws,
+			    enum column_unit col_unit,
 			    /* Outputs.  */
 			    point_state *out_state)
 {
@@ -2102,7 +2344,7 @@ layout::get_state_at_point (/* Inputs.  */
 	   source colorization.  */
 	continue;
-      if (range->contains_point (row, column))
+      if (range->contains_point (row, column, col_unit))
 	{
 	  out_state->range_idx = i;
@@ -2110,7 +2352,7 @@ layout::get_state_at_point (/* Inputs.  */
 	  out_state->draw_caret_p = false;
 	  if (range->m_range_display_kind == SHOW_RANGE_WITH_CARET
 	      && row == range->m_caret.m_line
-	      && column == range->m_caret.m_column)
+	      && column == range->m_caret.m_columns[col_unit])
 	    out_state->draw_caret_p = true;
 	  /* Within a multiline range, don't display any underline
@@ -2130,11 +2372,11 @@ layout::get_state_at_point (/* Inputs.  */
 /* Helper function for use by layout::print_line when printing the
   annotation line under the source line.
-   Get the column beyond the rightmost one that could contain a caret or
+   Get the display column beyond the rightmost one that could contain a caret
-   range marker, given that we stop rendering at trailing whitespace.
+   or range marker, given that we stop rendering at trailing whitespace.
   ROW is the source line within the given file.
-   CARET_COLUMN is the column of range 0's caret.
+   CARET_COLUMN is the display column of range 0's caret.
-   LAST_NON_WS_COLUMN is the last column containing a non-whitespace
+   LAST_NON_WS_COLUMN is the last display column containing a non-whitespace
   character of source (as determined when printing the source line).  */
 int
@@ -2153,8 +2395,9 @@ layout::get_x_bound_for_row (linenum_type row, int caret_column,
 	    {
 	      /* On the final line within a range; ensure that
 		 we render up to the end of the range.  */
-	      if (result <= range->m_finish.m_column)
+	      const int disp_col = range->m_finish.m_columns[CU_DISPLAY_COLS];
-		result = range->m_finish.m_column + 1;
+	      if (result <= disp_col)
+		result = disp_col + 1;
 	    }
 	  else if (row < range->m_finish.m_line)
 	    {
@@ -2183,7 +2426,7 @@ layout::move_to_column (int *column, int dest_column, bool add_left_margin)
      print_newline ();
      if (add_left_margin)
 	start_annotation_line ();
-      *column = m_x_offset;
+      *column = m_x_offset_display;
    }
  while (*column < dest_column)
@@ -2204,7 +2447,7 @@ layout::show_ruler (int max_column) const
    {
      start_annotation_line ();
      pp_space (m_pp);
-      for (int column = 1 + m_x_offset; column <= max_column; column++)
+      for (int column = 1 + m_x_offset_display; column <= max_column; column++)
 	if (column % 10 == 0)
 	  pp_character (m_pp, '0' + (column / 100) % 10);
 	else
@@ -2215,7 +2458,7 @@ layout::show_ruler (int max_column) const
  /* Tens.  */
  start_annotation_line ();
  pp_space (m_pp);
-  for (int column = 1 + m_x_offset; column <= max_column; column++)
+  for (int column = 1 + m_x_offset_display; column <= max_column; column++)
    if (column % 10 == 0)
      pp_character (m_pp, '0' + (column / 10) % 10);
    else
@@ -2225,7 +2468,7 @@ layout::show_ruler (int max_column) const
  /* Units.  */
  start_annotation_line ();
  pp_space (m_pp);
-  for (int column = 1 + m_x_offset; column <= max_column; column++)
+  for (int column = 1 + m_x_offset_display; column <= max_column; column++)
    pp_character (m_pp, '0' + (column % 10));
  pp_newline (m_pp);
 }
@@ -2245,7 +2488,11 @@ layout::print_line (linenum_type row)
  print_leading_fixits (row);
  print_source_line (row, line.get_buffer (), line.length (), &lbounds);
  if (should_print_annotation_line_p (row))
+    {
+      if (lbounds.m_first_non_ws != INT_MAX)
+	lbounds.convert_to_display_cols (line);
      print_annotation_line (row, lbounds);
+    }
  if (m_show_labels_p)
    print_any_labels (row);
  print_trailing_fixits (row);
@@ -2339,6 +2586,178 @@ namespace selftest {
 /* Selftests for diagnostic_show_locus.  */
+/* For precise tests of the layout, make clear where the source line will
+   start.  test_left_margin sets the total byte count from the left side of the
+   screen to the start of source lines, after the line number and the separator,
+   which consists of the three characters " | ".  */
+static const int test_linenum_sep = 3;
+static const int test_left_margin = 7;
+/* Helper function for test_layout_x_offset_display_utf8().  */
+static void
+test_offset_impl (int caret_byte_col, int max_width,
+		  int expected_x_offset_display,
+		  int left_margin = test_left_margin)
+{
+  test_diagnostic_context dc;
+  dc.caret_max_width = max_width;
+  /* diagnostic_context::min_margin_width sets the minimum space reserved for
+     the line number plus one space after.  */
+  dc.min_margin_width = left_margin - test_linenum_sep + 1;
+  dc.show_line_numbers_p = true;
+  rich_location richloc (line_table,
+			 linemap_position_for_column (line_table,
+						      caret_byte_col));
+  layout test_layout (&dc, &richloc, DK_ERROR);
+  ASSERT_EQ (left_margin - test_linenum_sep,
+	     test_layout.get_linenum_width ());
+  ASSERT_EQ (expected_x_offset_display,
+	     test_layout.get_x_offset_display ());
+}
+/* Test that layout::calculate_x_offset_display() works.  */
+static void
+test_layout_x_offset_display_utf8 (const line_table_case &case_)
+{
+  const char *content
+    = "This line is very long, so that we can use it to test the logic for "
+      "clipping long lines.  Also this: \xf0\x9f\x98\x82\xf0\x9f\x98\x82 is a "
+      "pair of emojis that occupies 8 bytes and 4 display columns, starting at "
+      "column #102.\n";
+  /* Number of bytes in the line, subtracting one to remove the newline.  */
+  const int line_bytes = strlen (content) - 1;
+  /* Number of display columns occupied by the line; each of the 2 emojis
+     takes up 2 fewer display columns than it does bytes.  */
+  const int line_display_cols = line_bytes - 2*2;
+  /* The column of the first emoji.  Byte or display is the same as there are
+     no multibyte characters earlier on the line.  */
+  const int emoji_col = 102;
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
+  line_table_test ltt (case_);
+  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  location_t line_end = linemap_position_for_column (line_table, line_bytes);
+  /* Don't attempt to run the tests if column data might be unavailable.  */
+  if (line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
+    return;
+  ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+  ASSERT_EQ (1, LOCATION_LINE (line_end));
+  ASSERT_EQ (line_bytes, LOCATION_COLUMN (line_end));
+  char_span lspan = location_get_source_line (tmp.get_filename (), 1);
+  ASSERT_EQ (line_display_cols,
+	     cpp_display_width (lspan.get_buffer (), lspan.length ()));
+  ASSERT_EQ (line_display_cols,
+	     location_compute_display_column (expand_location (line_end)));
+  ASSERT_EQ (0, memcmp (lspan.get_buffer () + (emoji_col - 1),
+			"\xf0\x9f\x98\x82\xf0\x9f\x98\x82", 8));
+  /* (caret_byte, max_width, expected_x_offset_display, [left_margin])  */
+  /* No constraint on the width -> no offset.  */
+  test_offset_impl (emoji_col, 0, 0);
+  /* Caret is before the beginning -> no offset.  */
+  test_offset_impl (0, 100, 0);
+  /* Caret is past the end of the line -> no offset.  */
+  test_offset_impl (line_bytes+1, 100, 0);
+  /* Line fits in the display -> no offset.  */
+  test_offset_impl (line_bytes, line_display_cols + test_left_margin, 0);
+  test_offset_impl (emoji_col, line_display_cols + test_left_margin, 0);
+  /* Line is too long for the display but caret location is OK
+     anyway -> no offset.  */
+  static const int small_width = 24;
+  test_offset_impl (1, small_width, 0);
+  /* Width constraint is very small -> no offset.  */
+  test_offset_impl (emoji_col, CARET_LINE_MARGIN, 0);
+  /* Line would be offset, but due to large line numbers, offsetting
+     would remove the whole line -> no offset.  */
+  static const int huge_left_margin = 100;
+  test_offset_impl (emoji_col, huge_left_margin, 0, huge_left_margin);
+  /* Line is the same length as the display, but the line number makes it too
+     long, so offset is required.  Caret is at the end so padding on the right
+     is not in effect.  */
+  for (int excess = 1; excess <= 3; ++excess)
+    test_offset_impl (line_bytes, line_display_cols + test_left_margin - excess,
+		      excess);
+  /* Line is much too long for the display, caret is near the end ->
+     offset should be such that the line fits in the display and caret
+     remains the same distance from the end that it was.  */
+  for (int caret_offset = 0, max_offset = MIN (CARET_LINE_MARGIN, 10);
+       caret_offset <= max_offset; ++caret_offset)
+    test_offset_impl (line_bytes - caret_offset, small_width,
+		      line_display_cols + test_left_margin - small_width);
+  /* As previous case but caret is closer to the middle; now we want it to end
+     up CARET_LINE_MARGIN bytes from the end.  */
+  ASSERT_GT (line_display_cols - emoji_col, CARET_LINE_MARGIN);
+  test_offset_impl (emoji_col, small_width,
+		    emoji_col + test_left_margin
+		    - (small_width - CARET_LINE_MARGIN));
+  /* Test that the source line is offset as expected when printed.  */
+  {
+    test_diagnostic_context dc;
+    dc.caret_max_width = small_width - 6;
+    dc.min_margin_width = test_left_margin - test_linenum_sep + 1;
+    dc.show_line_numbers_p = true;
+    dc.show_ruler_p = true;
+    rich_location richloc (line_table,
+			   linemap_position_for_column (line_table,
+							emoji_col));
+    layout test_layout (&dc, &richloc, DK_ERROR);
+    test_layout.print_line (1);
+    ASSERT_STREQ ("     |         1         \n"
+		  "     |         1         \n"
+		  "     | 234567890123456789\n"
+		  "   1 | \xf0\x9f\x98\x82\xf0\x9f\x98\x82 is a pair of emojis "
+		  "that occupies 8 bytes and 4 display columns, starting at "
+		  "column #102.\n"
+		  "     | ^\n\n",
+		  pp_formatted_text (dc.printer));
+  }
+  /* Similar to the previous example, but now the offset called for would split
+     the first emoji in the middle of the UTF-8 sequence.  Check that we replace
+     it with a padding space in this case.  */
+  {
+    test_diagnostic_context dc;
+    dc.caret_max_width = small_width - 5;
+    dc.min_margin_width = test_left_margin - test_linenum_sep + 1;
+    dc.show_line_numbers_p = true;
+    dc.show_ruler_p = true;
+    rich_location richloc (line_table,
+			   linemap_position_for_column (line_table,
+							emoji_col + 2));
+    layout test_layout (&dc, &richloc, DK_ERROR);
+    test_layout.print_line (1);
+    ASSERT_STREQ ("     |        1         1 \n"
+		  "     |        1         2 \n"
+		  "     | 3456789012345678901\n"
+		  "   1 |  \xf0\x9f\x98\x82 is a pair of emojis "
+		  "that occupies 8 bytes and 4 display columns, starting at "
+		  "column #102.\n"
+		  "     |  ^\n\n",
+		  pp_formatted_text (dc.printer));
+  }
+}
 /* Verify that diagnostic_show_locus works sanely on UNKNOWN_LOCATION.  */
 static void
@@ -2965,97 +3384,642 @@ test_diagnostic_show_locus_one_liner (const line_table_case &case_)
  test_one_liner_labels ();
 }
-/* Verify that gcc_rich_location::add_location_if_nearby works.  */
+/* Version of all one-liner tests exercising multibyte awareness.  For
+   simplicity we stick to using two multibyte characters in the test, U+1F602
-static void
+   == "\xf0\x9f\x98\x82", which uses 4 bytes and 2 display columns, and U+03C0
-test_add_location_if_nearby (const line_table_case &case_)
+   == "\xcf\x80", which uses 2 bytes and 1 display column.  Note: all of the
-{
+   below asserts would be easier to read if we used UTF-8 directly in the
-  /* Create a tempfile and write some text to it.
+   string constants, but it seems better not to demand the host compiler
-     ...000000000111111111122222222223333333333.
+   support this, when it isn't otherwise necessary.  Instead, whenever an
-     ...123456789012345678901234567890123456789.  */
+   extended character appears in a string, we put a line break after it so that
-  const char *content
+   all succeeding characters can appear visually at the correct display column.
-    = ("struct same_line { double x; double y; ;\n" /* line 1.  */
-       "struct different_line\n"                    /* line 2.  */
-       "{\n"                                        /* line 3.  */
-       "  double x;\n"                              /* line 4.  */
-       "  double y;\n"                              /* line 5.  */
-       ";\n");                                      /* line 6.  */
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
-  line_table_test ltt (case_);
-  const line_map_ordinary *ord_map
+   All of these work on the following 1-line source file:
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
-  linemap_line_start (line_table, 1, 100);
+     .0000000001111111111222222   display
+     .1234567890123456789012345   columns
+     "SS_foo = P_bar.SS_fieldP;\n"
+     .0000000111111111222222223   byte
+     .1356789012456789134567891   columns
-  const location_t final_line_end
+   which is set up by test_diagnostic_show_locus_one_liner and calls
-    = linemap_position_for_line_and_column (line_table, ord_map, 6, 7);
+   them.  Here SS represents the two display columns for the U+1F602 emoji and
+   P represents the one display column for the U+03C0 pi symbol.  */
-  /* Don't attempt to run the tests if column data might be unavailable.  */
+/* Just a caret.  */
-  if (final_line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
-    return;
-  /* Test of add_location_if_nearby on the same line as the
+static void
-     primary location.  */
+test_one_liner_simple_caret_utf8 ()
-  {
+{
-    const location_t missing_close_brace_1_39
-      = linemap_position_for_line_and_column (line_table, ord_map, 1, 39);
-    const location_t matching_open_brace_1_18
-      = linemap_position_for_line_and_column (line_table, ord_map, 1, 18);
-    gcc_rich_location richloc (missing_close_brace_1_39);
-    bool added = richloc.add_location_if_nearby (matching_open_brace_1_18);
-    ASSERT_TRUE (added);
-    ASSERT_EQ (2, richloc.get_num_locations ());
  test_diagnostic_context dc;
+  location_t caret = linemap_position_for_column (line_table, 18);
+  rich_location richloc (line_table, caret);
  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
  ASSERT_STREQ ("\n"
-		  " struct same_line { double x; double y; ;\n"
+		" \xf0\x9f\x98\x82"
-		  "                  ~                    ^\n",
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		"               ^\n",
 		pp_formatted_text (dc.printer));
-  }
-  /* Test of add_location_if_nearby on a different line to the
-     primary location.  */
-  {
-    const location_t missing_close_brace_6_1
-      = linemap_position_for_line_and_column (line_table, ord_map, 6, 1);
-    const location_t matching_open_brace_3_1
-      = linemap_position_for_line_and_column (line_table, ord_map, 3, 1);
-    gcc_rich_location richloc (missing_close_brace_6_1);
-    bool added = richloc.add_location_if_nearby (matching_open_brace_3_1);
-    ASSERT_FALSE (added);
-    ASSERT_EQ (1, richloc.get_num_locations ());
-  }
 }
-/* Verify that we print fixits even if they only affect lines
+/* Caret and range.  */
-   outside those covered by the ranges in the rich_location.  */
 static void
-test_diagnostic_show_locus_fixit_lines (const line_table_case &case_)
+test_one_liner_caret_and_range_utf8 ()
 {
-  /* Create a tempfile and write some text to it.
+  test_diagnostic_context dc;
-     ...000000000111111111122222222223333333333.
+  location_t caret = linemap_position_for_column (line_table, 18);
-     ...123456789012345678901234567890123456789.  */
+  location_t start = linemap_position_for_column (line_table, 12);
-  const char *content
+  location_t finish = linemap_position_for_column (line_table, 30);
-    = ("struct point { double x; double y; };\n" /* line 1.  */
+  location_t loc = make_location (caret, start, finish);
-       "struct point origin = {x: 0.0,\n"        /* line 2.  */
+  rich_location richloc (line_table, loc);
-       "                       y\n"              /* line 3.  */
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
-       "\n"                                      /* line 4.  */
+  ASSERT_STREQ ("\n"
-       "\n"                                      /* line 5.  */
+		" \xf0\x9f\x98\x82"
-       "                        : 0.0};\n");     /* line 6.  */
+		   "_foo = \xcf\x80"
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
+			   "_bar.\xf0\x9f\x98\x82"
-  line_table_test ltt (case_);
+				  "_field\xcf\x80"
+					 ";\n"
+		"          ~~~~~^~~~~~~~~~\n",
+		pp_formatted_text (dc.printer));
+}
-  const line_map_ordinary *ord_map
+/* Multiple ranges and carets.  */
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
-  linemap_line_start (line_table, 1, 100);
+static void
+test_one_liner_multiple_carets_and_ranges_utf8 ()
+{
+  test_diagnostic_context dc;
+  location_t foo
+    = make_location (linemap_position_for_column (line_table, 7),
+		     linemap_position_for_column (line_table, 1),
+		     linemap_position_for_column (line_table, 8));
+  dc.caret_chars[0] = 'A';
-  const location_t final_line_end
+  location_t bar
-    = linemap_position_for_line_and_column (line_table, ord_map, 6, 36);
+    = make_location (linemap_position_for_column (line_table, 16),
+		     linemap_position_for_column (line_table, 12),
+		     linemap_position_for_column (line_table, 17));
+  dc.caret_chars[1] = 'B';
+  location_t field
+    = make_location (linemap_position_for_column (line_table, 26),
+		     linemap_position_for_column (line_table, 19),
+		     linemap_position_for_column (line_table, 30));
+  dc.caret_chars[2] = 'C';
+  rich_location richloc (line_table, foo);
+  richloc.add_range (bar, SHOW_RANGE_WITH_CARET);
+  richloc.add_range (field, SHOW_RANGE_WITH_CARET);
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  ASSERT_STREQ ("\n"
+		" \xf0\x9f\x98\x82"
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		" ~~~~A~   ~~~B~ ~~~~~C~~~\n",
+		pp_formatted_text (dc.printer));
+}
+/* Insertion fix-it hint: adding an "&" to the front of "P_bar.field". */
+static void
+test_one_liner_fixit_insert_before_utf8 ()
+{
+  test_diagnostic_context dc;
+  location_t caret = linemap_position_for_column (line_table, 12);
+  rich_location richloc (line_table, caret);
+  richloc.add_fixit_insert_before ("&");
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  ASSERT_STREQ ("\n"
+		" \xf0\x9f\x98\x82"
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		"          ^\n"
+		"          &\n",
+		pp_formatted_text (dc.printer));
+}
+/* Insertion fix-it hint: adding a "[0]" after "SS_foo". */
+static void
+test_one_liner_fixit_insert_after_utf8 ()
+{
+  test_diagnostic_context dc;
+  location_t start = linemap_position_for_column (line_table, 1);
+  location_t finish = linemap_position_for_column (line_table, 8);
+  location_t foo = make_location (start, start, finish);
+  rich_location richloc (line_table, foo);
+  richloc.add_fixit_insert_after ("[0]");
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  ASSERT_STREQ ("\n"
+		" \xf0\x9f\x98\x82"
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		" ^~~~~~\n"
+		"       [0]\n",
+		pp_formatted_text (dc.printer));
+}
+/* Removal fix-it hint: removal of the ".SS_fieldP". */
+static void
+test_one_liner_fixit_remove_utf8 ()
+{
+  test_diagnostic_context dc;
+  location_t start = linemap_position_for_column (line_table, 18);
+  location_t finish = linemap_position_for_column (line_table, 30);
+  location_t dot = make_location (start, start, finish);
+  rich_location richloc (line_table, dot);
+  richloc.add_fixit_remove ();
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  ASSERT_STREQ ("\n"
+		" \xf0\x9f\x98\x82"
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		"               ^~~~~~~~~~\n"
+		"               ----------\n",
+		pp_formatted_text (dc.printer));
+}
+/* Replace fix-it hint: replacing "SS_fieldP" with "m_SSfieldP". */
+static void
+test_one_liner_fixit_replace_utf8 ()
+{
+  test_diagnostic_context dc;
+  location_t start = linemap_position_for_column (line_table, 19);
+  location_t finish = linemap_position_for_column (line_table, 30);
+  location_t field = make_location (start, start, finish);
+  rich_location richloc (line_table, field);
+  richloc.add_fixit_replace ("m_\xf0\x9f\x98\x82_field\xcf\x80");
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  ASSERT_STREQ ("\n"
+		" \xf0\x9f\x98\x82"
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		"                ^~~~~~~~~\n"
+		"                m_\xf0\x9f\x98\x82"
+				    "_field\xcf\x80\n",
+		pp_formatted_text (dc.printer));
+}
+/* Replace fix-it hint: replacing "SS_fieldP" with "m_SSfieldP",
+   but where the caret was elsewhere.  */
+static void
+test_one_liner_fixit_replace_non_equal_range_utf8 ()
+{
+  test_diagnostic_context dc;
+  location_t equals = linemap_position_for_column (line_table, 10);
+  location_t start = linemap_position_for_column (line_table, 19);
+  location_t finish = linemap_position_for_column (line_table, 30);
+  rich_location richloc (line_table, equals);
+  source_range range;
+  range.m_start = start;
+  range.m_finish = finish;
+  richloc.add_fixit_replace (range, "m_\xf0\x9f\x98\x82_field\xcf\x80");
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  /* The replacement range is not indicated in the annotation line, so
+     it should be indicated via an additional underline.  */
+  ASSERT_STREQ ("\n"
+		" \xf0\x9f\x98\x82"
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		"        ^\n"
+		"                ---------\n"
+		"                m_\xf0\x9f\x98\x82"
+				    "_field\xcf\x80\n",
+		pp_formatted_text (dc.printer));
+}
+/* Replace fix-it hint: replacing "SS_fieldP" with "m_SSfieldP",
+   where the caret was elsewhere, but where a secondary range
+   exactly covers "field".  */
+static void
+test_one_liner_fixit_replace_equal_secondary_range_utf8 ()
+{
+  test_diagnostic_context dc;
+  location_t equals = linemap_position_for_column (line_table, 10);
+  location_t start = linemap_position_for_column (line_table, 19);
+  location_t finish = linemap_position_for_column (line_table, 30);
+  rich_location richloc (line_table, equals);
+  location_t field = make_location (start, start, finish);
+  richloc.add_range (field);
+  richloc.add_fixit_replace (field, "m_\xf0\x9f\x98\x82_field\xcf\x80");
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  /* The replacement range is indicated in the annotation line,
+     so it shouldn't be indicated via an additional underline.  */
+  ASSERT_STREQ ("\n"
+		" \xf0\x9f\x98\x82"
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		"        ^       ~~~~~~~~~\n"
+		"                m_\xf0\x9f\x98\x82"
+				    "_field\xcf\x80\n",
+		pp_formatted_text (dc.printer));
+}
+/* Verify that we can use ad-hoc locations when adding fixits to a
+   rich_location.  */
+static void
+test_one_liner_fixit_validation_adhoc_locations_utf8 ()
+{
+  /* Generate a range that's too long to be packed, so must
+     be stored as an ad-hoc location (given the defaults
+     of 5 bits or 0 bits of packed range); 41 columns > 2**5.  */
+  const location_t c12 = linemap_position_for_column (line_table, 12);
+  const location_t c52 = linemap_position_for_column (line_table, 52);
+  const location_t loc = make_location (c12, c12, c52);
+  if (c52 > LINE_MAP_MAX_LOCATION_WITH_COLS)
+    return;
+  ASSERT_TRUE (IS_ADHOC_LOC (loc));
+  /* Insert.  */
+  {
+    rich_location richloc (line_table, loc);
+    richloc.add_fixit_insert_before (loc, "test");
+    /* It should not have been discarded by the validator.  */
+    ASSERT_EQ (1, richloc.get_num_fixit_hints ());
+    test_diagnostic_context dc;
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  " \xf0\x9f\x98\x82"
+		     "_foo = \xcf\x80"
+			     "_bar.\xf0\x9f\x98\x82"
+				    "_field\xcf\x80"
+					   ";\n"
+		  "          ^~~~~~~~~~~~~~~~                     \n"
+		  "          test\n",
+		pp_formatted_text (dc.printer));
+  }
+  /* Remove.  */
+  {
+    rich_location richloc (line_table, loc);
+    source_range range = source_range::from_locations (loc, c52);
+    richloc.add_fixit_remove (range);
+    /* It should not have been discarded by the validator.  */
+    ASSERT_EQ (1, richloc.get_num_fixit_hints ());
+    test_diagnostic_context dc;
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  " \xf0\x9f\x98\x82"
+		     "_foo = \xcf\x80"
+			     "_bar.\xf0\x9f\x98\x82"
+				    "_field\xcf\x80"
+					   ";\n"
+		  "          ^~~~~~~~~~~~~~~~                     \n"
+		  "          -------------------------------------\n",
+		pp_formatted_text (dc.printer));
+  }
+  /* Replace.  */
+  {
+    rich_location richloc (line_table, loc);
+    source_range range = source_range::from_locations (loc, c52);
+    richloc.add_fixit_replace (range, "test");
+    /* It should not have been discarded by the validator.  */
+    ASSERT_EQ (1, richloc.get_num_fixit_hints ());
+    test_diagnostic_context dc;
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  " \xf0\x9f\x98\x82"
+		     "_foo = \xcf\x80"
+			     "_bar.\xf0\x9f\x98\x82"
+				    "_field\xcf\x80"
+					   ";\n"
+		  "          ^~~~~~~~~~~~~~~~                     \n"
+		  "          test\n",
+		pp_formatted_text (dc.printer));
+  }
+}
+/* Test of consolidating insertions at the same location.  */
+static void
+test_one_liner_many_fixits_1_utf8 ()
+{
+  test_diagnostic_context dc;
+  location_t equals = linemap_position_for_column (line_table, 10);
+  rich_location richloc (line_table, equals);
+  for (int i = 0; i < 19; i++)
+    richloc.add_fixit_insert_before (i & 1 ? "@" : "\xcf\x80");
+  ASSERT_EQ (1, richloc.get_num_fixit_hints ());
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  ASSERT_STREQ ("\n"
+		" \xf0\x9f\x98\x82"
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		"        ^\n"
+		"        \xcf\x80@\xcf\x80@\xcf\x80@\xcf\x80@\xcf\x80@"
+		"\xcf\x80@\xcf\x80@\xcf\x80@\xcf\x80@\xcf\x80\n",
+		pp_formatted_text (dc.printer));
+}
+/* Ensure that we can add an arbitrary number of fix-it hints to a
+   rich_location, even if they are not consolidated.  */
+static void
+test_one_liner_many_fixits_2_utf8 ()
+{
+  test_diagnostic_context dc;
+  location_t equals = linemap_position_for_column (line_table, 10);
+  rich_location richloc (line_table, equals);
+  const int nlocs = 19;
+  int locs[nlocs] = {1, 5, 7, 9, 11, 14, 16, 18, 23, 25, 27, 29, 32,
+		     34, 36, 38, 40, 42, 44};
+  for (int i = 0; i != nlocs; ++i)
+    {
+      location_t loc = linemap_position_for_column (line_table, locs[i]);
+      richloc.add_fixit_insert_before (loc, i & 1 ? "@" : "\xcf\x80");
+    }
+  ASSERT_EQ (nlocs, richloc.get_num_fixit_hints ());
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  ASSERT_STREQ ("\n"
+		" \xf0\x9f\x98\x82"
+		   "_foo = \xcf\x80"
+			   "_bar.\xf0\x9f\x98\x82"
+				  "_field\xcf\x80"
+					 ";\n"
+		"        ^\n"
+		" \xcf\x80 @ \xcf\x80 @ \xcf\x80 @ \xcf\x80 @  \xcf\x80 @"
+		" \xcf\x80 @ \xcf\x80 @ \xcf\x80 @ \xcf\x80 @ \xcf\x80\n",
+		pp_formatted_text (dc.printer));
+}
+/* Test of labeling the ranges within a rich_location.  */
+static void
+test_one_liner_labels_utf8 ()
+{
+  location_t foo
+    = make_location (linemap_position_for_column (line_table, 1),
+		     linemap_position_for_column (line_table, 1),
+		     linemap_position_for_column (line_table, 8));
+  location_t bar
+    = make_location (linemap_position_for_column (line_table, 12),
+		     linemap_position_for_column (line_table, 12),
+		     linemap_position_for_column (line_table, 17));
+  location_t field
+    = make_location (linemap_position_for_column (line_table, 19),
+		     linemap_position_for_column (line_table, 19),
+		     linemap_position_for_column (line_table, 30));
+  /* Example where all the labels fit on one line.  */
+  {
+    /* These three labels contain multibyte characters such that their byte
+       lengths are respectively (12, 10, 18), but their display widths are only
+       (6, 5, 9).  All three fit on the line when considering the display
+       widths, but not when considering the byte widths, so verify that we do
+       indeed put them all on one line.  */
+    text_range_label label0
+      ("\xcf\x80\xcf\x80\xcf\x80\xcf\x80\xcf\x80\xcf\x80");
+    text_range_label label1
+      ("\xf0\x9f\x98\x82\xf0\x9f\x98\x82\xcf\x80");
+    text_range_label label2
+      ("\xf0\x9f\x98\x82\xcf\x80\xf0\x9f\x98\x82\xf0\x9f\x98\x82\xcf\x80"
+       "\xcf\x80");
+    gcc_rich_location richloc (foo, &label0);
+    richloc.add_range (bar, SHOW_RANGE_WITHOUT_CARET, &label1);
+    richloc.add_range (field, SHOW_RANGE_WITHOUT_CARET, &label2);
+    {
+      test_diagnostic_context dc;
+      diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+      ASSERT_STREQ ("\n"
+		    " \xf0\x9f\x98\x82"
+		       "_foo = \xcf\x80"
+			       "_bar.\xf0\x9f\x98\x82"
+				      "_field\xcf\x80"
+					     ";\n"
+		    " ^~~~~~   ~~~~~ ~~~~~~~~~\n"
+		    " |        |     |\n"
+		    " \xcf\x80\xcf\x80\xcf\x80\xcf\x80\xcf\x80\xcf\x80"
+			   "   \xf0\x9f\x98\x82\xf0\x9f\x98\x82\xcf\x80"
+				   " \xf0\x9f\x98\x82\xcf\x80\xf0\x9f\x98\x82"
+					 "\xf0\x9f\x98\x82\xcf\x80\xcf\x80\n",
+		    pp_formatted_text (dc.printer));
+    }
+  }
+  /* Example where the labels need extra lines.  */
+  {
+    text_range_label label0 ("label 0\xf0\x9f\x98\x82");
+    text_range_label label1 ("label 1\xcf\x80");
+    text_range_label label2 ("label 2\xcf\x80");
+    gcc_rich_location richloc (foo, &label0);
+    richloc.add_range (bar, SHOW_RANGE_WITHOUT_CARET, &label1);
+    richloc.add_range (field, SHOW_RANGE_WITHOUT_CARET, &label2);
+    test_diagnostic_context dc;
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  " \xf0\x9f\x98\x82"
+		     "_foo = \xcf\x80"
+			     "_bar.\xf0\x9f\x98\x82"
+				    "_field\xcf\x80"
+					   ";\n"
+		  " ^~~~~~   ~~~~~ ~~~~~~~~~\n"
+		  " |        |     |\n"
+		  " |        |     label 2\xcf\x80\n"
+		  " |        label 1\xcf\x80\n"
+		  " label 0\xf0\x9f\x98\x82\n",
+		  pp_formatted_text (dc.printer));
+  }
+  /* Example of boundary conditions: label 0 and 1 have just enough clearance,
+     but label 1 just touches label 2.  */
+  {
+    text_range_label label0 ("aaaaa\xf0\x9f\x98\x82\xcf\x80");
+    text_range_label label1 ("bb\xf0\x9f\x98\x82\xf0\x9f\x98\x82");
+    text_range_label label2 ("c");
+    gcc_rich_location richloc (foo, &label0);
+    richloc.add_range (bar, SHOW_RANGE_WITHOUT_CARET, &label1);
+    richloc.add_range (field, SHOW_RANGE_WITHOUT_CARET, &label2);
+    test_diagnostic_context dc;
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  " \xf0\x9f\x98\x82"
+		     "_foo = \xcf\x80"
+			     "_bar.\xf0\x9f\x98\x82"
+				    "_field\xcf\x80"
+					   ";\n"
+		  " ^~~~~~   ~~~~~ ~~~~~~~~~\n"
+		  " |        |     |\n"
+		  " |        |     c\n"
+		  " aaaaa\xf0\x9f\x98\x82\xcf\x80"
+			   " bb\xf0\x9f\x98\x82\xf0\x9f\x98\x82\n",
+		  pp_formatted_text (dc.printer));
+  }
+}
+/* Run the various one-liner tests.  */
+static void
+test_diagnostic_show_locus_one_liner_utf8 (const line_table_case &case_)
+{
+  /* Create a tempfile and write some text to it.  */
+  const char *content
+    /* Display columns.
+       0000000000000000000000011111111111111111111111111111112222222222222
+       1111111122222222345678900000000123456666666677777777890123444444445  */
+    = "\xf0\x9f\x98\x82_foo = \xcf\x80_bar.\xf0\x9f\x98\x82_field\xcf\x80;\n";
+    /* 0000000000000000000001111111111111111111222222222222222222222233333
+       1111222233334444567890122223333456789999000011112222345678999900001
+       Byte columns.  */
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
+  line_table_test ltt (case_);
+  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  location_t line_end = linemap_position_for_column (line_table, 31);
+  /* Don't attempt to run the tests if column data might be unavailable.  */
+  if (line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
+    return;
+  ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+  ASSERT_EQ (1, LOCATION_LINE (line_end));
+  ASSERT_EQ (31, LOCATION_COLUMN (line_end));
+  char_span lspan = location_get_source_line (tmp.get_filename (), 1);
+  ASSERT_EQ (25, cpp_display_width (lspan.get_buffer (), lspan.length ()));
+  ASSERT_EQ (25, location_compute_display_column (expand_location (line_end)));
+  test_one_liner_simple_caret_utf8 ();
+  test_one_liner_caret_and_range_utf8 ();
+  test_one_liner_multiple_carets_and_ranges_utf8 ();
+  test_one_liner_fixit_insert_before_utf8 ();
+  test_one_liner_fixit_insert_after_utf8 ();
+  test_one_liner_fixit_remove_utf8 ();
+  test_one_liner_fixit_replace_utf8 ();
+  test_one_liner_fixit_replace_non_equal_range_utf8 ();
+  test_one_liner_fixit_replace_equal_secondary_range_utf8 ();
+  test_one_liner_fixit_validation_adhoc_locations_utf8 ();
+  test_one_liner_many_fixits_1_utf8 ();
+  test_one_liner_many_fixits_2_utf8 ();
+  test_one_liner_labels_utf8 ();
+}
+/* Verify that gcc_rich_location::add_location_if_nearby works.  */
+static void
+test_add_location_if_nearby (const line_table_case &case_)
+{
+  /* Create a tempfile and write some text to it.
+     ...000000000111111111122222222223333333333.
+     ...123456789012345678901234567890123456789.  */
+  const char *content
+    = ("struct same_line { double x; double y; ;\n" /* line 1.  */
+       "struct different_line\n"                    /* line 2.  */
+       "{\n"                                        /* line 3.  */
+       "  double x;\n"                              /* line 4.  */
+       "  double y;\n"                              /* line 5.  */
+       ";\n");                                      /* line 6.  */
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
+  line_table_test ltt (case_);
+  const line_map_ordinary *ord_map
+    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
+					   tmp.get_filename (), 0));
+  linemap_line_start (line_table, 1, 100);
+  const location_t final_line_end
+    = linemap_position_for_line_and_column (line_table, ord_map, 6, 7);
+  /* Don't attempt to run the tests if column data might be unavailable.  */
+  if (final_line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
+    return;
+  /* Test of add_location_if_nearby on the same line as the
+     primary location.  */
+  {
+    const location_t missing_close_brace_1_39
+      = linemap_position_for_line_and_column (line_table, ord_map, 1, 39);
+    const location_t matching_open_brace_1_18
+      = linemap_position_for_line_and_column (line_table, ord_map, 1, 18);
+    gcc_rich_location richloc (missing_close_brace_1_39);
+    bool added = richloc.add_location_if_nearby (matching_open_brace_1_18);
+    ASSERT_TRUE (added);
+    ASSERT_EQ (2, richloc.get_num_locations ());
+    test_diagnostic_context dc;
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  " struct same_line { double x; double y; ;\n"
+		  "                  ~                    ^\n",
+		  pp_formatted_text (dc.printer));
+  }
+  /* Test of add_location_if_nearby on a different line to the
+     primary location.  */
+  {
+    const location_t missing_close_brace_6_1
+      = linemap_position_for_line_and_column (line_table, ord_map, 6, 1);
+    const location_t matching_open_brace_3_1
+      = linemap_position_for_line_and_column (line_table, ord_map, 3, 1);
+    gcc_rich_location richloc (missing_close_brace_6_1);
+    bool added = richloc.add_location_if_nearby (matching_open_brace_3_1);
+    ASSERT_FALSE (added);
+    ASSERT_EQ (1, richloc.get_num_locations ());
+  }
+}
+/* Verify that we print fixits even if they only affect lines
+   outside those covered by the ranges in the rich_location.  */
+static void
+test_diagnostic_show_locus_fixit_lines (const line_table_case &case_)
+{
+  /* Create a tempfile and write some text to it.
+     ...000000000111111111122222222223333333333.
+     ...123456789012345678901234567890123456789.  */
+  const char *content
+    = ("struct point { double x; double y; };\n" /* line 1.  */
+       "struct point origin = {x: 0.0,\n"        /* line 2.  */
+       "                       y\n"              /* line 3.  */
+       "\n"                                      /* line 4.  */
+       "\n"                                      /* line 5.  */
+       "                        : 0.0};\n");     /* line 6.  */
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
+  line_table_test ltt (case_);
+  const line_map_ordinary *ord_map
+    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
+					   tmp.get_filename (), 0));
+  linemap_line_start (line_table, 1, 100);
+  const location_t final_line_end
+    = linemap_position_for_line_and_column (line_table, ord_map, 6, 36);
  /* Don't attempt to run the tests if column data might be unavailable.  */
  if (final_line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
@@ -3340,13 +4304,19 @@ test_overlapped_fixit_printing (const line_table_case &case_)
    /* Unit-test the line_corrections machinery.  */
    ASSERT_EQ (3, richloc.get_num_fixit_hints ());
    const fixit_hint *hint_0 = richloc.get_fixit_hint (0);
-    ASSERT_EQ (column_range (12, 12), get_affected_columns (hint_0));
+    ASSERT_EQ (column_range (12, 12), get_affected_range (hint_0, CU_BYTES));
+    ASSERT_EQ (column_range (12, 12),
+			   get_affected_range (hint_0, CU_DISPLAY_COLS));
    ASSERT_EQ (column_range (12, 22), get_printed_columns (hint_0));
    const fixit_hint *hint_1 = richloc.get_fixit_hint (1);
-    ASSERT_EQ (column_range (18, 18), get_affected_columns (hint_1));
+    ASSERT_EQ (column_range (18, 18), get_affected_range (hint_1, CU_BYTES));
+    ASSERT_EQ (column_range (18, 18),
+			   get_affected_range (hint_1, CU_DISPLAY_COLS));
    ASSERT_EQ (column_range (18, 20), get_printed_columns (hint_1));
    const fixit_hint *hint_2 = richloc.get_fixit_hint (2);
-    ASSERT_EQ (column_range (29, 28), get_affected_columns (hint_2));
+    ASSERT_EQ (column_range (29, 28), get_affected_range (hint_2, CU_BYTES));
+    ASSERT_EQ (column_range (29, 28),
+			   get_affected_range (hint_2, CU_DISPLAY_COLS));
    ASSERT_EQ (column_range (29, 29), get_printed_columns (hint_2));
    /* Add each hint in turn to a line_corrections instance,
@@ -3357,6 +4327,7 @@ test_overlapped_fixit_printing (const line_table_case &case_)
    /* The first replace hint by itself.  */
    lc.add_hint (hint_0);
    ASSERT_EQ (1, lc.m_corrections.length ());
+    ASSERT_EQ (column_range (12, 12), lc.m_corrections[0]->m_affected_bytes);
    ASSERT_EQ (column_range (12, 12), lc.m_corrections[0]->m_affected_columns);
    ASSERT_EQ (column_range (12, 22), lc.m_corrections[0]->m_printed_columns);
    ASSERT_STREQ ("const_cast<", lc.m_corrections[0]->m_text);
@@ -3366,6 +4337,7 @@ test_overlapped_fixit_printing (const line_table_case &case_)
    lc.add_hint (hint_1);
    ASSERT_EQ (1, lc.m_corrections.length ());
    ASSERT_STREQ ("const_cast<foo *> (", lc.m_corrections[0]->m_text);
+    ASSERT_EQ (column_range (12, 18), lc.m_corrections[0]->m_affected_bytes);
    ASSERT_EQ (column_range (12, 18), lc.m_corrections[0]->m_affected_columns);
    ASSERT_EQ (column_range (12, 30), lc.m_corrections[0]->m_printed_columns);
@@ -3375,6 +4347,7 @@ test_overlapped_fixit_printing (const line_table_case &case_)
    ASSERT_STREQ ("const_cast<foo *> (ptr->field)",
 		  lc.m_corrections[0]->m_text);
    ASSERT_EQ (1, lc.m_corrections.length ());
+    ASSERT_EQ (column_range (12, 28), lc.m_corrections[0]->m_affected_bytes);
    ASSERT_EQ (column_range (12, 28), lc.m_corrections[0]->m_affected_columns);
    ASSERT_EQ (column_range (12, 41), lc.m_corrections[0]->m_printed_columns);
  }
@@ -3477,6 +4450,246 @@ test_overlapped_fixit_printing (const line_table_case &case_)
  }
 }
+/* Multibyte-aware version of preceding tests.  See comments above
+   test_one_liner_simple_caret_utf8() too, we use the same two multibyte
+   characters here.  */
+static void
+test_overlapped_fixit_printing_utf8 (const line_table_case &case_)
+{
+  /* Create a tempfile and write some text to it.  */
+  const char *content
+    /* Display columns.
+       00000000000000000000000111111111111111111111111222222222222222223
+       12344444444555555556789012344444444555555556789012345678999999990  */
+    = "  f\xf0\x9f\x98\x82 *f = (f\xf0\x9f\x98\x82 *)ptr->field\xcf\x80;\n";
+    /* 00000000000000000000011111111111111111111112222222222333333333333
+       12344445555666677778901234566667777888899990123456789012333344445
+       Byte columns.  */
+  temp_source_file tmp (SELFTEST_LOCATION, ".C", content);
+  line_table_test ltt (case_);
+  const line_map_ordinary *ord_map
+    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
+					   tmp.get_filename (), 0));
+  linemap_line_start (line_table, 1, 100);
+  const location_t final_line_end
+    = linemap_position_for_line_and_column (line_table, ord_map, 6, 50);
+  /* Don't attempt to run the tests if column data might be unavailable.  */
+  if (final_line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
+    return;
+  /* A test for converting a C-style cast to a C++-style cast.  */
+  const location_t open_paren
+    = linemap_position_for_line_and_column (line_table, ord_map, 1, 14);
+  const location_t close_paren
+    = linemap_position_for_line_and_column (line_table, ord_map, 1, 22);
+  const location_t expr_start
+    = linemap_position_for_line_and_column (line_table, ord_map, 1, 23);
+  const location_t expr_finish
+    = linemap_position_for_line_and_column (line_table, ord_map, 1, 34);
+  const location_t expr = make_location (expr_start, expr_start, expr_finish);
+  /* Various examples of fix-it hints that aren't themselves consolidated,
+     but for which the *printing* may need consolidation.  */
+  /* Example where 3 fix-it hints are printed as one.  */
+  {
+    test_diagnostic_context dc;
+    rich_location richloc (line_table, expr);
+    richloc.add_fixit_replace (open_paren, "const_cast<");
+    richloc.add_fixit_replace (close_paren, "> (");
+    richloc.add_fixit_insert_after (")");
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  "   f\xf0\x9f\x98\x82"
+			" *f = (f\xf0\x9f\x98\x82"
+				  " *)ptr->field\xcf\x80"
+						";\n"
+		  "                   ^~~~~~~~~~~\n"
+		  "            ------------------\n"
+		  "            const_cast<f\xf0\x9f\x98\x82"
+					    " *> (ptr->field\xcf\x80"
+							    ")\n",
+		  pp_formatted_text (dc.printer));
+    /* Unit-test the line_corrections machinery.  */
+    ASSERT_EQ (3, richloc.get_num_fixit_hints ());
+    const fixit_hint *hint_0 = richloc.get_fixit_hint (0);
+    ASSERT_EQ (column_range (14, 14), get_affected_range (hint_0, CU_BYTES));
+    ASSERT_EQ (column_range (12, 12),
+			   get_affected_range (hint_0, CU_DISPLAY_COLS));
+    ASSERT_EQ (column_range (12, 22), get_printed_columns (hint_0));
+    const fixit_hint *hint_1 = richloc.get_fixit_hint (1);
+    ASSERT_EQ (column_range (22, 22), get_affected_range (hint_1, CU_BYTES));
+    ASSERT_EQ (column_range (18, 18),
+			   get_affected_range (hint_1, CU_DISPLAY_COLS));
+    ASSERT_EQ (column_range (18, 20), get_printed_columns (hint_1));
+    const fixit_hint *hint_2 = richloc.get_fixit_hint (2);
+    ASSERT_EQ (column_range (35, 34), get_affected_range (hint_2, CU_BYTES));
+    ASSERT_EQ (column_range (30, 29),
+			   get_affected_range (hint_2, CU_DISPLAY_COLS));
+    ASSERT_EQ (column_range (30, 30), get_printed_columns (hint_2));
+    /* Add each hint in turn to a line_corrections instance,
+       and verify that they are consolidated into one correction instance
+       as expected.  */
+    line_corrections lc (tmp.get_filename (), 1);
+    /* The first replace hint by itself.  */
+    lc.add_hint (hint_0);
+    ASSERT_EQ (1, lc.m_corrections.length ());
+    ASSERT_EQ (column_range (14, 14), lc.m_corrections[0]->m_affected_bytes);
+    ASSERT_EQ (column_range (12, 12), lc.m_corrections[0]->m_affected_columns);
+    ASSERT_EQ (column_range (12, 22), lc.m_corrections[0]->m_printed_columns);
+    ASSERT_STREQ ("const_cast<", lc.m_corrections[0]->m_text);
+    /* After the second replacement hint, they are printed together
+       as a replacement (along with the text between them).  */
+    lc.add_hint (hint_1);
+    ASSERT_EQ (1, lc.m_corrections.length ());
+    ASSERT_STREQ ("const_cast<f\xf0\x9f\x98\x82 *> (",
+		  lc.m_corrections[0]->m_text);
+    ASSERT_EQ (column_range (14, 22), lc.m_corrections[0]->m_affected_bytes);
+    ASSERT_EQ (column_range (12, 18), lc.m_corrections[0]->m_affected_columns);
+    ASSERT_EQ (column_range (12, 30), lc.m_corrections[0]->m_printed_columns);
+    /* After the final insertion hint, they are all printed together
+       as a replacement (along with the text between them).  */
+    lc.add_hint (hint_2);
+    ASSERT_STREQ ("const_cast<f\xf0\x9f\x98\x82 *> (ptr->field\xcf\x80)",
+		  lc.m_corrections[0]->m_text);
+    ASSERT_EQ (1, lc.m_corrections.length ());
+    ASSERT_EQ (column_range (14, 34), lc.m_corrections[0]->m_affected_bytes);
+    ASSERT_EQ (column_range (12, 29), lc.m_corrections[0]->m_affected_columns);
+    ASSERT_EQ (column_range (12, 42), lc.m_corrections[0]->m_printed_columns);
+  }
+  /* Example where two are consolidated during printing.  */
+  {
+    test_diagnostic_context dc;
+    rich_location richloc (line_table, expr);
+    richloc.add_fixit_replace (open_paren, "CAST (");
+    richloc.add_fixit_replace (close_paren, ") (");
+    richloc.add_fixit_insert_after (")");
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  "   f\xf0\x9f\x98\x82"
+			" *f = (f\xf0\x9f\x98\x82"
+				  " *)ptr->field\xcf\x80"
+						";\n"
+		  "                   ^~~~~~~~~~~\n"
+		  "            -\n"
+		  "            CAST (-\n"
+		  "                  ) (         )\n",
+		  pp_formatted_text (dc.printer));
+  }
+  /* Example where none are consolidated during printing.  */
+  {
+    test_diagnostic_context dc;
+    rich_location richloc (line_table, expr);
+    richloc.add_fixit_replace (open_paren, "CST (");
+    richloc.add_fixit_replace (close_paren, ") (");
+    richloc.add_fixit_insert_after (")");
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  "   f\xf0\x9f\x98\x82"
+			" *f = (f\xf0\x9f\x98\x82"
+				  " *)ptr->field\xcf\x80"
+						";\n"
+		  "                   ^~~~~~~~~~~\n"
+		  "            -\n"
+		  "            CST ( -\n"
+		  "                  ) (         )\n",
+		  pp_formatted_text (dc.printer));
+  }
+  /* Example of deletion fix-it hints.  */
+  {
+    test_diagnostic_context dc;
+    rich_location richloc (line_table, expr);
+    richloc.add_fixit_insert_before (open_paren, "(bar\xf0\x9f\x98\x82 *)");
+    source_range victim = {open_paren, close_paren};
+    richloc.add_fixit_remove (victim);
+    /* This case is actually handled by fixit-consolidation,
+       rather than by line_corrections.  */
+    ASSERT_EQ (1, richloc.get_num_fixit_hints ());
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  "   f\xf0\x9f\x98\x82"
+			" *f = (f\xf0\x9f\x98\x82"
+				  " *)ptr->field\xcf\x80"
+						";\n"
+		  "                   ^~~~~~~~~~~\n"
+		  "            -------\n"
+		  "            (bar\xf0\x9f\x98\x82"
+				    " *)\n",
+		  pp_formatted_text (dc.printer));
+  }
+  /* Example of deletion fix-it hints that would overlap.  */
+  {
+    test_diagnostic_context dc;
+    rich_location richloc (line_table, expr);
+    richloc.add_fixit_insert_before (open_paren, "(long\xf0\x9f\x98\x82 *)");
+    source_range victim = {expr_start, expr_finish};
+    richloc.add_fixit_remove (victim);
+    /* These fixits are not consolidated.  */
+    ASSERT_EQ (2, richloc.get_num_fixit_hints ());
+    /* But the corrections are.  */
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  "   f\xf0\x9f\x98\x82"
+			" *f = (f\xf0\x9f\x98\x82"
+				  " *)ptr->field\xcf\x80"
+						";\n"
+		  "                   ^~~~~~~~~~~\n"
+		  "            ------------------\n"
+		  "            (long\xf0\x9f\x98\x82"
+				     " *)(f\xf0\x9f\x98\x82"
+					    " *)\n",
+		  pp_formatted_text (dc.printer));
+  }
+  /* Example of insertion fix-it hints that would overlap.  */
+  {
+    test_diagnostic_context dc;
+    rich_location richloc (line_table, expr);
+    richloc.add_fixit_insert_before
+      (open_paren, "L\xf0\x9f\x98\x82NGER THAN THE CAST");
+    richloc.add_fixit_insert_after (close_paren, "TEST");
+    /* The first insertion is long enough that if printed naively,
+       it would overlap with the second.
+       Verify that they are printed as a single replacement.  */
+    diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+    ASSERT_STREQ ("\n"
+		  "   f\xf0\x9f\x98\x82"
+			" *f = (f\xf0\x9f\x98\x82"
+				  " *)ptr->field\xcf\x80"
+						";\n"
+		  "                   ^~~~~~~~~~~\n"
+		  "            -------\n"
+		  "            L\xf0\x9f\x98\x82"
+				 "NGER THAN THE CAST(f\xf0\x9f\x98\x82"
+						       " *)TEST\n",
+		  pp_formatted_text (dc.printer));
+  }
+}
 /* Verify that the line_corrections machinery correctly prints
   overlapping fixit-hints that have been added in the wrong
   order.
@@ -3526,10 +4739,10 @@ test_overlapped_fixit_printing_2 (const line_table_case &case_)
    /* These fixits should be accepted; they can't be consolidated.  */
    ASSERT_EQ (2, richloc.get_num_fixit_hints ());
    const fixit_hint *hint_0 = richloc.get_fixit_hint (0);
-    ASSERT_EQ (column_range (23, 22), get_affected_columns (hint_0));
+    ASSERT_EQ (column_range (23, 22), get_affected_range (hint_0, CU_BYTES));
    ASSERT_EQ (column_range (23, 23), get_printed_columns (hint_0));
    const fixit_hint *hint_1 = richloc.get_fixit_hint (1);
-    ASSERT_EQ (column_range (21, 20), get_affected_columns (hint_1));
+    ASSERT_EQ (column_range (21, 20), get_affected_range (hint_1, CU_BYTES));
    ASSERT_EQ (column_range (21, 21), get_printed_columns (hint_1));
    /* Verify that they're printed correctly.  */
@@ -3851,15 +5064,19 @@ diagnostic_show_locus_c_tests ()
  test_layout_range_for_single_line ();
  test_layout_range_for_multiple_lines ();
-  test_get_line_width_without_trailing_whitespace ();
+  for_each_line_table_case (test_layout_x_offset_display_utf8);
+  test_get_line_bytes_without_trailing_whitespace ();
  test_diagnostic_show_locus_unknown_location ();
  for_each_line_table_case (test_diagnostic_show_locus_one_liner);
+  for_each_line_table_case (test_diagnostic_show_locus_one_liner_utf8);
  for_each_line_table_case (test_add_location_if_nearby);
  for_each_line_table_case (test_diagnostic_show_locus_fixit_lines);
  for_each_line_table_case (test_fixit_consolidation);
  for_each_line_table_case (test_overlapped_fixit_printing);
+  for_each_line_table_case (test_overlapped_fixit_printing_utf8);
  for_each_line_table_case (test_overlapped_fixit_printing_2);
  for_each_line_table_case (test_fixit_insert_containing_newline);
  for_each_line_table_case (test_fixit_insert_containing_newline_2);

--- a/gcc/input.c
+++ b/gcc/input.c
@@ -908,6 +908,22 @@ make_location (location_t caret, source_range src_range)
  return COMBINE_LOCATION_DATA (line_table, pure_loc, src_range, NULL);
 }
+/* An expanded_location stores the column in byte units.  This function
+   converts that column to display units.  That requires reading the associated
+   source line in order to calculate the display width.  If that cannot be done
+   for any reason, then returns the byte column as a fallback.  */
+int
+location_compute_display_column (expanded_location exploc)
+{
+  if (!(exploc.file && *exploc.file && exploc.line && exploc.column))
+    return exploc.column;
+  char_span line = location_get_source_line (exploc.file, exploc.line);
+  /* If line is NULL, this function returns exploc.column which is the
+     desired fallback.  */
+  return cpp_byte_column_to_display_column (line.get_buffer (), line.length (),
+					    exploc.column);
+}
 /* Dump statistics to stderr about the memory usage of the line_table
   set of line maps.  This also displays some statistics about macro
   expansion.  */
@@ -3590,6 +3606,93 @@ test_line_offset_overflow ()
  ASSERT_NE (ordmap_a, ordmap_b);
 }
+void test_cpp_utf8 ()
+{
+  /* Verify that wcwidth of invalid UTF-8 or control bytes is 1.  */
+  {
+    int w_bad = cpp_display_width ("\xf0!\x9f!\x98!\x82!", 8);
+    ASSERT_EQ (8, w_bad);
+    int w_ctrl = cpp_display_width ("\r\t\n\v\0\1", 6);
+    ASSERT_EQ (6, w_ctrl);
+  }
+  /* Verify that wcwidth of valid UTF-8 is as expected.  */
+  {
+    const int w_pi = cpp_display_width ("\xcf\x80", 2);
+    ASSERT_EQ (1, w_pi);
+    const int w_emoji = cpp_display_width ("\xf0\x9f\x98\x82", 4);
+    ASSERT_EQ (2, w_emoji);
+    const int w_umlaut_precomposed = cpp_display_width ("\xc3\xbf", 2);
+    ASSERT_EQ (1, w_umlaut_precomposed);
+    const int w_umlaut_combining = cpp_display_width ("y\xcc\x88", 3);
+    ASSERT_EQ (1, w_umlaut_combining);
+    const int w_han = cpp_display_width ("\xe4\xb8\xba", 3);
+    ASSERT_EQ (2, w_han);
+    const int w_ascii = cpp_display_width ("GCC", 3);
+    ASSERT_EQ (3, w_ascii);
+    const int w_mixed = cpp_display_width ("\xcf\x80 = 3.14 \xf0\x9f\x98\x82"
+					   "\x9f! \xe4\xb8\xba y\xcc\x88", 24);
+    ASSERT_EQ (18, w_mixed);
+  }
+  /* Verify that cpp_byte_column_to_display_column can go past the end,
+     and similar edge cases.  */
+  {
+    const char *str
+      /* Display columns.
+         111111112345  */
+      = "\xcf\x80 abc";
+      /* 111122223456
+	 Byte columns.  */
+    ASSERT_EQ (5, cpp_display_width (str, 6));
+    ASSERT_EQ (105, cpp_byte_column_to_display_column (str, 6, 106));
+    ASSERT_EQ (10000, cpp_byte_column_to_display_column (NULL, 0, 10000));
+    ASSERT_EQ (0, cpp_byte_column_to_display_column (NULL, 10000, 0));
+  }
+  /* Verify that cpp_display_column_to_byte_column can go past the end,
+     and similar edge cases, and check invertibility.  */
+  {
+    const char *str
+      /* Display columns.
+	 000000000000000000000000000000000000011
+	 111111112222222234444444455555555678901  */
+      = "\xf0\x9f\x98\x82 \xf0\x9f\x98\x82 hello";
+      /* 000000000000000000000000000000000111111
+	 111122223333444456666777788889999012345
+	 Byte columns.  */
+    ASSERT_EQ (4, cpp_display_column_to_byte_column (str, 15, 2));
+    ASSERT_EQ (15, cpp_display_column_to_byte_column (str, 15, 11));
+    ASSERT_EQ (115, cpp_display_column_to_byte_column (str, 15, 111));
+    ASSERT_EQ (10000, cpp_display_column_to_byte_column (NULL, 0, 10000));
+    ASSERT_EQ (0, cpp_display_column_to_byte_column (NULL, 10000, 0));
+    /* Verify that we do not interrupt a UTF-8 sequence.  */
+    ASSERT_EQ (4, cpp_display_column_to_byte_column (str, 15, 1));
+    for (int byte_col = 1; byte_col <= 15; ++byte_col)
+      {
+	const int disp_col = cpp_byte_column_to_display_column (str, 15,
+								byte_col);
+	const int byte_col2 = cpp_display_column_to_byte_column (str, 15,
+								 disp_col);
+	/* If we ask for the display column in the middle of a UTF-8
+	   sequence, it will return the length of the partial sequence,
+	   matching the behavior of GCC before display column support.
+	   Otherwise check the round trip was successful.  */
+	if (byte_col < 4)
+	  ASSERT_EQ (byte_col, disp_col);
+	else if (byte_col >= 6 && byte_col < 9)
+	  ASSERT_EQ (3 + (byte_col - 5), disp_col);
+	else
+	  ASSERT_EQ (byte_col2, byte_col);
+      }
+  }
+}
 /* Run all of the selftests within this file.  */
 void
@@ -3631,6 +3734,8 @@ input_c_tests ()
  test_reading_source_line ();
  test_line_offset_overflow ();
+  test_cpp_utf8 ();
 }
 } // namespace selftest

--- a/gcc/input.h
+++ b/gcc/input.h
@@ -38,6 +38,7 @@ STATIC_ASSERT (BUILTINS_LOCATION < RESERVED_LOCATION_COUNT);
 extern bool is_location_from_builtin_token (location_t);
 extern expanded_location expand_location (location_t);
+extern int location_compute_display_column (expanded_location);
 /* A class capturing the bounds of a buffer, to allow for run-time
   bounds-checking in a checked build.  */

--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
+2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>
+	PR preprocessor/49973
+	* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
+	(test_show_locus): Tweak so that expected output is the same as
+	before the diagnostic-show-locus.c changes.
+	* gcc.dg/cpp/pr66415-1.c: Likewise.
 2019-12-09  Eric Botcazou  <ebotcazou@adacore.com>
 	* gnat.dg/lto23.adb: New test.

--- a/gcc/testsuite/gcc.dg/cpp/pr66415-1.c
+++ b/gcc/testsuite/gcc.dg/cpp/pr66415-1.c
 /* PR c/66415 */
 /* { dg-do compile } */
 /* { dg-options "-Wformat -fdiagnostics-show-caret" } */
-/* { dg-set-compiler-env-var COLUMNS "82" } */
+/* { dg-set-compiler-env-var COLUMNS "83" } */
 void
 fn1 (void)

--- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
@@ -174,7 +174,7 @@ test_show_locus (function *fun)
  /* Hardcode the "terminal width", to verify the behavior of
     very wide lines.  */
-  global_dc->caret_max_width = 70;
+  global_dc->caret_max_width = 71;
  if (0 == strcmp (fnname, "test_simple"))
    {

--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
+2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>
+	PR preprocessor/49973
+	* generated_cpp_wcwidth.h: New file generated by
+	../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function.
+	* charset.c (compute_next_display_width): New function to help
+	implement display columns.
+	(cpp_byte_column_to_display_column): Likewise.
+	(cpp_display_column_to_byte_column): Likewise.
+	(cpp_wcwidth): Likewise.
+	* include/cpplib.h (cpp_byte_column_to_display_column): Declare.
+	(cpp_display_column_to_byte_column): Declare.
+	(cpp_wcwidth): Declare.
+	(cpp_display_width): New function.
 2019-11-14  Joseph Myers  <joseph@codesourcery.com>
 	* charset.c (narrow_str_to_charconst): Make CPP_UTF8CHAR constants

--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -2265,3 +2265,106 @@ cpp_string_location_reader::get_next ()
    m_loc += m_offset_per_column;
  return result;
 }
+/* Helper for cpp_byte_column_to_display_column and its inverse.  Given a
+   pointer to a UTF-8-encoded character, compute its display width.  *INBUFP
+   points on entry to the start of the UTF-8 encoding of the character, and
+   is updated to point just after the last byte of the encoding.  *INBYTESLEFTP
+   contains on entry the remaining size of the buffer into which *INBUFP
+   points, and this is also updated accordingly.  If *INBUFP does not
+   point to a valid UTF-8-encoded sequence, then it will be treated as a single
+   byte with display width 1.  */
+static inline int
+compute_next_display_width (const uchar **inbufp, size_t *inbytesleftp)
+{
+  cppchar_t c;
+  if (one_utf8_to_cppchar (inbufp, inbytesleftp, &c) != 0)
+    {
+      /* Input is not convertible to UTF-8.  This could be fine, e.g. in a
+	 string literal, so don't complain.  Just treat it as if it has a width
+	 of one.  */
+      ++*inbufp;
+      --*inbytesleftp;
+      return 1;
+    }
+  /*  one_utf8_to_cppchar() has updated inbufp and inbytesleftp for us.  */
+  return cpp_wcwidth (c);
+}
+/*  For the string of length DATA_LENGTH bytes that begins at DATA, compute
+    how many display columns are occupied by the first COLUMN bytes.  COLUMN
+    may exceed DATA_LENGTH, in which case the phantom bytes at the end are
+    treated as if they have display width 1.  */
+int
+cpp_byte_column_to_display_column (const char *data, int data_length,
+				   int column)
+{
+  int display_col = 0;
+  const uchar *udata = (const uchar *) data;
+  const int offset = MAX (0, column - data_length);
+  size_t inbytesleft = column - offset;
+  while (inbytesleft)
+    display_col += compute_next_display_width (&udata, &inbytesleft);
+  return display_col + offset;
+}
+/*  For the string of length DATA_LENGTH bytes that begins at DATA, compute
+    the least number of bytes that will result in at least DISPLAY_COL display
+    columns.  The return value may exceed DATA_LENGTH if the entire string does
+    not occupy enough display columns.  */
+int
+cpp_display_column_to_byte_column (const char *data, int data_length,
+				   int display_col)
+{
+  int column = 0;
+  const uchar *udata = (const uchar *) data;
+  size_t inbytesleft = data_length;
+  while (column < display_col && inbytesleft)
+      column += compute_next_display_width (&udata, &inbytesleft);
+  return data_length - inbytesleft + MAX (0, display_col - column);
+}
+/* Our own version of wcwidth().  We don't use the actual wcwidth() in glibc,
+   because that will inspect the user's locale, and in particular in an ASCII
+   locale, it will not return anything useful for extended characters.  But GCC
+   in other respects (see e.g. _cpp_default_encoding()) behaves as if
+   everything is UTF-8.  We also make some tweaks that are useful for the way
+   GCC needs to use this data, e.g. tabs and other control characters should be
+   treated as having width 1.  The lookup tables are generated from
+   contrib/unicode/gen_wcwidth.py and were made by simply calling glibc
+   wcwidth() on all codepoints, then applying the small tweaks.  These tables
+   are not highly optimized, but for the present purpose of outputting
+   diagnostics, they are sufficient.  */
+#include "generated_cpp_wcwidth.h"
+int cpp_wcwidth (cppchar_t c)
+{
+  if (__builtin_expect (c <= wcwidth_range_ends[0], true))
+    return wcwidth_widths[0];
+  /* Binary search the tables.  */
+  int begin = 1;
+  static const int end
+      = sizeof wcwidth_range_ends / sizeof (*wcwidth_range_ends);
+  int len = end - begin;
+  do
+    {
+      int half = len/2;
+      int middle = begin + half;
+      if (c > wcwidth_range_ends[middle])
+	{
+	  begin = middle + 1;
+	  len -= half + 1;
+	}
+      else
+	len = half;
+    } while (len);
+  if (__builtin_expect (begin != end, true))
+    return wcwidth_widths[begin];
+  return 1;
+}
--- a/libcpp/generated_cpp_wcwidth.h
+++ b/libcpp/generated_cpp_wcwidth.h
+/*  Generated by contrib/unicode/gen_wcwidth.py, with the help of glibc's
+    utf8_gen.py, using version 12.1.0 of the Unicode standard.  */
+static const cppchar_t wcwidth_range_ends[] = {
+  0x2ff, 0x36f, 0x482, 0x489, 0x590, 0x5bd, 0x5be, 0x5bf,
+  0x5c0, 0x5c2, 0x5c3, 0x5c5, 0x5c6, 0x5c7, 0x60f, 0x61a,
+  0x61b, 0x61c, 0x64a, 0x65f, 0x66f, 0x670, 0x6d5, 0x6dc,
+  0x6de, 0x6e4, 0x6e6, 0x6e8, 0x6e9, 0x6ed, 0x710, 0x711,
+  0x72f, 0x74a, 0x7a5, 0x7b0, 0x7ea, 0x7f3, 0x7fc, 0x7fd,
+  0x815, 0x819, 0x81a, 0x823, 0x824, 0x827, 0x828, 0x82d,
+  0x858, 0x85b, 0x8d2, 0x8e1, 0x8e2, 0x902, 0x939, 0x93a,
+  0x93b, 0x93c, 0x940, 0x948, 0x94c, 0x94d, 0x950, 0x957,
+  0x961, 0x963, 0x980, 0x981, 0x9bb, 0x9bc, 0x9c0, 0x9c4,
+  0x9cc, 0x9cd, 0x9e1, 0x9e3, 0x9fd, 0x9fe, 0xa00, 0xa02,
+  0xa3b, 0xa3c, 0xa40, 0xa42, 0xa46, 0xa48, 0xa4a, 0xa4d,
+  0xa50, 0xa51, 0xa6f, 0xa71, 0xa74, 0xa75, 0xa80, 0xa82,
+  0xabb, 0xabc, 0xac0, 0xac5, 0xac6, 0xac8, 0xacc, 0xacd,
+  0xae1, 0xae3, 0xaf9, 0xaff, 0xb00, 0xb01, 0xb3b, 0xb3c,
+  0xb3e, 0xb3f, 0xb40, 0xb44, 0xb4c, 0xb4d, 0xb55, 0xb56,
+  0xb61, 0xb63, 0xb81, 0xb82, 0xbbf, 0xbc0, 0xbcc, 0xbcd,
+  0xbff, 0xc00, 0xc03, 0xc04, 0xc3d, 0xc40, 0xc45, 0xc48,
+  0xc49, 0xc4d, 0xc54, 0xc56, 0xc61, 0xc63, 0xc80, 0xc81,
+  0xcbb, 0xcbc, 0xcbe, 0xcbf, 0xcc5, 0xcc6, 0xccb, 0xccd,
+  0xce1, 0xce3, 0xcff, 0xd01, 0xd3a, 0xd3c, 0xd40, 0xd44,
+  0xd4c, 0xd4d, 0xd61, 0xd63, 0xdc9, 0xdca, 0xdd1, 0xdd4,
+  0xdd5, 0xdd6, 0xe30, 0xe31, 0xe33, 0xe3a, 0xe46, 0xe4e,
+  0xeb0, 0xeb1, 0xeb3, 0xebc, 0xec7, 0xecd, 0xf17, 0xf19,
+  0xf34, 0xf35, 0xf36, 0xf37, 0xf38, 0xf39, 0xf70, 0xf7e,
+  0xf7f, 0xf84, 0xf85, 0xf87, 0xf8c, 0xf97, 0xf98, 0xfbc,
+  0xfc5, 0xfc6, 0x102c, 0x1030, 0x1031, 0x1037, 0x1038, 0x103a,
+  0x103c, 0x103e, 0x1057, 0x1059, 0x105d, 0x1060, 0x1070, 0x1074,
+  0x1081, 0x1082, 0x1084, 0x1086, 0x108c, 0x108d, 0x109c, 0x109d,
+  0x10ff, 0x115f, 0x11ff, 0x135c, 0x135f, 0x1711, 0x1714, 0x1731,
+  0x1734, 0x1751, 0x1753, 0x1771, 0x1773, 0x17b3, 0x17b5, 0x17b6,
+  0x17bd, 0x17c5, 0x17c6, 0x17c8, 0x17d3, 0x17dc, 0x17dd, 0x180a,
+  0x180e, 0x1884, 0x1886, 0x18a8, 0x18a9, 0x191f, 0x1922, 0x1926,
+  0x1928, 0x1931, 0x1932, 0x1938, 0x193b, 0x1a16, 0x1a18, 0x1a1a,
+  0x1a1b, 0x1a55, 0x1a56, 0x1a57, 0x1a5e, 0x1a5f, 0x1a60, 0x1a61,
+  0x1a62, 0x1a64, 0x1a6c, 0x1a72, 0x1a7c, 0x1a7e, 0x1a7f, 0x1aaf,
+  0x1abe, 0x1aff, 0x1b03, 0x1b33, 0x1b34, 0x1b35, 0x1b3a, 0x1b3b,
+  0x1b3c, 0x1b41, 0x1b42, 0x1b6a, 0x1b73, 0x1b7f, 0x1b81, 0x1ba1,
+  0x1ba5, 0x1ba7, 0x1ba9, 0x1baa, 0x1bad, 0x1be5, 0x1be6, 0x1be7,
+  0x1be9, 0x1bec, 0x1bed, 0x1bee, 0x1bf1, 0x1c2b, 0x1c33, 0x1c35,
+  0x1c37, 0x1ccf, 0x1cd2, 0x1cd3, 0x1ce0, 0x1ce1, 0x1ce8, 0x1cec,
+  0x1ced, 0x1cf3, 0x1cf4, 0x1cf7, 0x1cf9, 0x1dbf, 0x1df9, 0x1dfa,
+  0x1dff, 0x200a, 0x200f, 0x2029, 0x202e, 0x205f, 0x2064, 0x2065,
+  0x206f, 0x20cf, 0x20f0, 0x2319, 0x231b, 0x2328, 0x232a, 0x23e8,
+  0x23ec, 0x23ef, 0x23f0, 0x23f2, 0x23f3, 0x25fc, 0x25fe, 0x2613,
+  0x2615, 0x2647, 0x2653, 0x267e, 0x267f, 0x2692, 0x2693, 0x26a0,
+  0x26a1, 0x26a9, 0x26ab, 0x26bc, 0x26be, 0x26c3, 0x26c5, 0x26cd,
+  0x26ce, 0x26d3, 0x26d4, 0x26e9, 0x26ea, 0x26f1, 0x26f3, 0x26f4,
+  0x26f5, 0x26f9, 0x26fa, 0x26fc, 0x26fd, 0x2704, 0x2705, 0x2709,
+  0x270b, 0x2727, 0x2728, 0x274b, 0x274c, 0x274d, 0x274e, 0x2752,
+  0x2755, 0x2756, 0x2757, 0x2794, 0x2797, 0x27af, 0x27b0, 0x27be,
+  0x27bf, 0x2b1a, 0x2b1c, 0x2b4f, 0x2b50, 0x2b54, 0x2b55, 0x2cee,
+  0x2cf1, 0x2d7e, 0x2d7f, 0x2ddf, 0x2dff, 0x2e7f, 0x2e99, 0x2e9a,
+  0x2ef3, 0x2eff, 0x2fd5, 0x2fef, 0x2ffb, 0x2fff, 0x3029, 0x302d,
+  0x303e, 0x3040, 0x3096, 0x3098, 0x309a, 0x30ff, 0x3104, 0x312f,
+  0x3130, 0x318e, 0x318f, 0x31ba, 0x31bf, 0x31e3, 0x31ef, 0x321e,
+  0x321f, 0x4db5, 0x4dbf, 0x9fef, 0x9fff, 0xa48c, 0xa48f, 0xa4c6,
+  0xa66e, 0xa672, 0xa673, 0xa67d, 0xa69d, 0xa69f, 0xa6ef, 0xa6f1,
+  0xa801, 0xa802, 0xa805, 0xa806, 0xa80a, 0xa80b, 0xa824, 0xa826,
+  0xa8c3, 0xa8c5, 0xa8df, 0xa8f1, 0xa8fe, 0xa8ff, 0xa925, 0xa92d,
+  0xa946, 0xa951, 0xa95f, 0xa97c, 0xa97f, 0xa982, 0xa9b2, 0xa9b3,
+  0xa9b5, 0xa9b9, 0xa9bb, 0xa9bd, 0xa9e4, 0xa9e5, 0xaa28, 0xaa2e,
+  0xaa30, 0xaa32, 0xaa34, 0xaa36, 0xaa42, 0xaa43, 0xaa4b, 0xaa4c,
+  0xaa7b, 0xaa7c, 0xaaaf, 0xaab0, 0xaab1, 0xaab4, 0xaab6, 0xaab8,
+  0xaabd, 0xaabf, 0xaac0, 0xaac1, 0xaaeb, 0xaaed, 0xaaf5, 0xaaf6,
+  0xabe4, 0xabe5, 0xabe7, 0xabe8, 0xabec, 0xabed, 0xabff, 0xd7a3,
+  0xf8ff, 0xfa6d, 0xfa6f, 0xfad9, 0xfb1d, 0xfb1e, 0xfdff, 0xfe0f,
+  0xfe19, 0xfe1f, 0xfe2f, 0xfe52, 0xfe53, 0xfe66, 0xfe67, 0xfe6b,
+  0xfefe, 0xfeff, 0xff00, 0xff60, 0xffdf, 0xffe6, 0xfff8, 0xfffb,
+  0x101fc, 0x101fd, 0x102df, 0x102e0, 0x10375, 0x1037a, 0x10a00, 0x10a03,
+  0x10a04, 0x10a06, 0x10a0b, 0x10a0f, 0x10a37, 0x10a3a, 0x10a3e, 0x10a3f,
+  0x10ae4, 0x10ae6, 0x10d23, 0x10d27, 0x10f45, 0x10f50, 0x11000, 0x11001,
+  0x11037, 0x11046, 0x1107e, 0x11081, 0x110b2, 0x110b6, 0x110b8, 0x110ba,
+  0x110ff, 0x11102, 0x11126, 0x1112b, 0x1112c, 0x11134, 0x11172, 0x11173,
+  0x1117f, 0x11181, 0x111b5, 0x111be, 0x111c8, 0x111cc, 0x1122e, 0x11231,
+  0x11233, 0x11234, 0x11235, 0x11237, 0x1123d, 0x1123e, 0x112de, 0x112df,
+  0x112e2, 0x112ea, 0x112ff, 0x11301, 0x1133a, 0x1133c, 0x1133f, 0x11340,
+  0x11365, 0x1136c, 0x1136f, 0x11374, 0x11437, 0x1143f, 0x11441, 0x11444,
+  0x11445, 0x11446, 0x1145d, 0x1145e, 0x114b2, 0x114b8, 0x114b9, 0x114ba,
+  0x114be, 0x114c0, 0x114c1, 0x114c3, 0x115b1, 0x115b5, 0x115bb, 0x115bd,
+  0x115be, 0x115c0, 0x115db, 0x115dd, 0x11632, 0x1163a, 0x1163c, 0x1163d,
+  0x1163e, 0x11640, 0x116aa, 0x116ab, 0x116ac, 0x116ad, 0x116af, 0x116b5,
+  0x116b6, 0x116b7, 0x1171c, 0x1171f, 0x11721, 0x11725, 0x11726, 0x1172b,
+  0x1182e, 0x11837, 0x11838, 0x1183a, 0x119d3, 0x119d7, 0x119d9, 0x119db,
+  0x119df, 0x119e0, 0x11a00, 0x11a0a, 0x11a32, 0x11a38, 0x11a3a, 0x11a3e,
+  0x11a46, 0x11a47, 0x11a50, 0x11a56, 0x11a58, 0x11a5b, 0x11a89, 0x11a96,
+  0x11a97, 0x11a99, 0x11c2f, 0x11c36, 0x11c37, 0x11c3d, 0x11c3e, 0x11c3f,
+  0x11c91, 0x11ca7, 0x11ca9, 0x11cb0, 0x11cb1, 0x11cb3, 0x11cb4, 0x11cb6,
+  0x11d30, 0x11d36, 0x11d39, 0x11d3a, 0x11d3b, 0x11d3d, 0x11d3e, 0x11d45,
+  0x11d46, 0x11d47, 0x11d8f, 0x11d91, 0x11d94, 0x11d95, 0x11d96, 0x11d97,
+  0x11ef2, 0x11ef4, 0x1342f, 0x13438, 0x16aef, 0x16af4, 0x16b2f, 0x16b36,
+  0x16f4e, 0x16f4f, 0x16f8e, 0x16f92, 0x16fdf, 0x16fe3, 0x16fff, 0x187f7,
+  0x187ff, 0x18af2, 0x1afff, 0x1b11e, 0x1b14f, 0x1b152, 0x1b163, 0x1b167,
+  0x1b16f, 0x1b2fb, 0x1bc9c, 0x1bc9e, 0x1bc9f, 0x1bca3, 0x1d166, 0x1d169,
+  0x1d172, 0x1d182, 0x1d184, 0x1d18b, 0x1d1a9, 0x1d1ad, 0x1d241, 0x1d244,
+  0x1d9ff, 0x1da36, 0x1da3a, 0x1da6c, 0x1da74, 0x1da75, 0x1da83, 0x1da84,
+  0x1da9a, 0x1da9f, 0x1daa0, 0x1daaf, 0x1dfff, 0x1e006, 0x1e007, 0x1e018,
+  0x1e01a, 0x1e021, 0x1e022, 0x1e024, 0x1e025, 0x1e02a, 0x1e12f, 0x1e136,
+  0x1e2eb, 0x1e2ef, 0x1e8cf, 0x1e8d6, 0x1e943, 0x1e94a, 0x1f003, 0x1f004,
+  0x1f0ce, 0x1f0cf, 0x1f18d, 0x1f18e, 0x1f190, 0x1f19a, 0x1f1ff, 0x1f202,
+  0x1f20f, 0x1f23b, 0x1f23f, 0x1f248, 0x1f24f, 0x1f251, 0x1f25f, 0x1f265,
+  0x1f2ff, 0x1f320, 0x1f32c, 0x1f335, 0x1f336, 0x1f37c, 0x1f37d, 0x1f393,
+  0x1f39f, 0x1f3ca, 0x1f3ce, 0x1f3d3, 0x1f3df, 0x1f3f0, 0x1f3f3, 0x1f3f4,
+  0x1f3f7, 0x1f43e, 0x1f43f, 0x1f440, 0x1f441, 0x1f4fc, 0x1f4fe, 0x1f53d,
+  0x1f54a, 0x1f54e, 0x1f54f, 0x1f567, 0x1f579, 0x1f57a, 0x1f594, 0x1f596,
+  0x1f5a3, 0x1f5a4, 0x1f5fa, 0x1f64f, 0x1f67f, 0x1f6c5, 0x1f6cb, 0x1f6cc,
+  0x1f6cf, 0x1f6d2, 0x1f6d4, 0x1f6d5, 0x1f6ea, 0x1f6ec, 0x1f6f3, 0x1f6fa,
+  0x1f7df, 0x1f7eb, 0x1f90c, 0x1f971, 0x1f972, 0x1f976, 0x1f979, 0x1f9a2,
+  0x1f9a4, 0x1f9aa, 0x1f9ad, 0x1f9ca, 0x1f9cc, 0x1f9ff, 0x1fa6f, 0x1fa73,
+  0x1fa77, 0x1fa7a, 0x1fa7f, 0x1fa82, 0x1fa8f, 0x1fa95, 0x1ffff, 0x2a6d6,
+  0x2a6ff, 0x2b734, 0x2b73f, 0x2b81d, 0x2b81f, 0x2cea1, 0x2ceaf, 0x2ebe0,
+  0x2f7ff, 0x2fa1d, 0xe0000, 0xe0001, 0xe001f, 0xe007f, 0xe00ff, 0xe01ef,
+};
+static const unsigned char wcwidth_widths[] = {
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+  0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+  0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+  0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+  0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+  0, 1, 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
+  2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
+  2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
+  0, 1, 0, 1, 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 0, 2, 1, 2, 1, 0, 2, 1, 2,
+  1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 1, 2, 1, 2, 1, 0, 1, 0,
+  2, 1, 0, 2, 1, 2, 1, 2, 1, 0, 1, 2, 1, 2, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
+  1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
+  1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
+  1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
+  1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
+  1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 0, 1, 0, 1, 0,
+};
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -1320,4 +1320,15 @@ extern bool cpp_userdef_char_p
 extern const char * cpp_get_userdef_suffix
  (const cpp_token *);
+/* In charset.c */
+int cpp_byte_column_to_display_column (const char *data, int data_length,
+				       int column);
+inline int cpp_display_width (const char *data, int data_length)
+{
+    return cpp_byte_column_to_display_column (data, data_length, data_length);
+}
+int cpp_display_column_to_byte_column (const char *data, int data_length,
+				       int display_col);
+int cpp_wcwidth (cppchar_t c);
 #endif /* ! LIBCPP_CPPLIB_H */