[AArch64, ILP32] 2/6 More backend changes and support for small absolute and

small PIC addressing models gcc/ * config/aarch64/aarch64.c (POINTER_BYTES): New define. (aarch64_load_symref_appropriately): In the case of SYMBOL_SMALL_ABSOLUTE, use the mode of 'dest' instead of Pmode to generate new rtx; likewise to the case of SYMBOL_SMALL_GOT. (aarch64_expand_mov_immediate): In the case of SYMBOL_FORCE_TO_MEM, change to pass 'ptr_mode' to force_const_mem and zero-extend 'mem' if 'mode' doesn't equal to 'ptr_mode'. (aarch64_output_mi_thunk): Add an assertion on the alignment of 'vcall_offset'; change to call aarch64_emit_move differently depending on whether 'Pmode' equals to 'ptr_mode' or not; use 'POINTER_BYTES' to calculate the upper bound of 'vcall_offset'. (aarch64_cannot_force_const_mem): Change to also return true if mode != ptr_mode. (aarch64_legitimize_reload_address): In the case of large displacements, add new local variable 'xmode' and an assertion based on it; change to use 'xmode' to generate the new rtx and reload. (aarch64_asm_trampoline_template): Change to generate the template differently depending on TARGET_ILP32 or not; change to use 'POINTER_BYTES' in the argument passed to assemble_aligned_integer. (aarch64_trampoline_size): Removed. (aarch64_trampoline_init): Add new local constant 'tramp_code_sz' and replace immediate literals with it. Change to use 'ptr_mode' instead of 'DImode' and call convert_memory_address if the mode of 'fnaddr' doesn't equal to 'ptr_mode'. (aarch64_elf_asm_constructor): Change to use assemble_aligned_integer to output symbol. (aarch64_elf_asm_destructor): Likewise. * config/aarch64/aarch64.h (TRAMPOLINE_SIZE): Change to be dependent on TARGET_ILP32 instead of aarch64_trampoline_size. * config/aarch64/aarch64.md (movsi_aarch64): Add new alternatives of 'mov' between WSP and W registers as well as 'adr' and 'adrp'. (loadwb_pair<GPI:mode>_<PTR:mode>): Rename to ... (loadwb_pair<GPI:mode>_<P:mode>): ... this. Replace PTR with P. (storewb_pair<GPI:mode>_<PTR:mode>): Likewise; rename to ... (storewb_pair<GPI:mode>_<P:mode>): ... this. (add_losym): Change to 'define_expand' and call gen_add_losym_<mode> depending on the value of 'mode'. (add_losym_<mode>): New. (ldr_got_small_<mode>): New, based on ldr_got_small. (ldr_got_small): Remove. (ldr_got_small_sidi): New. * config/aarch64/iterators.md (P): New. (PTR): Change to 'ptr_mode' in the condition. From-SVN: r201165

[AArch64, ILP32] 2/6 More backend changes and support for small absolute and
small PIC addressing models gcc/ * config/aarch64/aarch64.c (POINTER_BYTES): New define. (aarch64_load_symref_appropriately): In the case of SYMBOL_SMALL_ABSOLUTE, use the mode of 'dest' instead of Pmode to generate new rtx; likewise to the case of SYMBOL_SMALL_GOT. (aarch64_expand_mov_immediate): In the case of SYMBOL_FORCE_TO_MEM, change to pass 'ptr_mode' to force_const_mem and zero-extend 'mem' if 'mode' doesn't equal to 'ptr_mode'. (aarch64_output_mi_thunk): Add an assertion on the alignment of 'vcall_offset'; change to call aarch64_emit_move differently depending on whether 'Pmode' equals to 'ptr_mode' or not; use 'POINTER_BYTES' to calculate the upper bound of 'vcall_offset'. (aarch64_cannot_force_const_mem): Change to also return true if mode != ptr_mode. (aarch64_legitimize_reload_address): In the case of large displacements, add new local variable 'xmode' and an assertion based on it; change to use 'xmode' to generate the new rtx and reload. (aarch64_asm_trampoline_template): Change to generate the template differently depending on TARGET_ILP32 or not; change to use 'POINTER_BYTES' in the argument passed to assemble_aligned_integer. (aarch64_trampoline_size): Removed. (aarch64_trampoline_init): Add new local constant 'tramp_code_sz' and replace immediate literals with it. Change to use 'ptr_mode' instead of 'DImode' and call convert_memory_address if the mode of 'fnaddr' doesn't equal to 'ptr_mode'. (aarch64_elf_asm_constructor): Change to use assemble_aligned_integer to output symbol. (aarch64_elf_asm_destructor): Likewise. * config/aarch64/aarch64.h (TRAMPOLINE_SIZE): Change to be dependent on TARGET_ILP32 instead of aarch64_trampoline_size. * config/aarch64/aarch64.md (movsi_aarch64): Add new alternatives of 'mov' between WSP and W registers as well as 'adr' and 'adrp'. (loadwb_pair<GPI:mode>_<PTR:mode>): Rename to ... (loadwb_pair<GPI:mode>_<P:mode>): ... this. Replace PTR with P. (storewb_pair<GPI:mode>_<PTR:mode>): Likewise; rename to ... (storewb_pair<GPI:mode>_<P:mode>): ... this. (add_losym): Change to 'define_expand' and call gen_add_losym_<mode> depending on the value of 'mode'. (add_losym_<mode>): New. (ldr_got_small_<mode>): New, based on ldr_got_small. (ldr_got_small): Remove. (ldr_got_small_sidi): New. * config/aarch64/iterators.md (P): New. (PTR): Change to 'ptr_mode' in the condition. From-SVN: r201165
28514dda · Yufeng Zhang · Yufeng Zhang · 17a819cb · 28514dda · 28514dda
Commit 28514dda authored Jul 23, 2013 by Yufeng Zhang Committed by Yufeng Zhang Jul 23, 2013
Hide whitespace changes
Inline Side-by-side

Showing with 212 additions and 72 deletions

gcc/ChangeLog
+47 -0

gcc/config/aarch64/aarch64.c
+93 -39

gcc/config/aarch64/aarch64.h
+2 -1

gcc/config/aarch64/aarch64.md
+63 -31

gcc/config/aarch64/iterators.md
+7 -1

No files found.
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
 2013-07-23  Yufeng Zhang  <yufeng.zhang@arm.com>
+	* config/aarch64/aarch64.c (POINTER_BYTES): New define.
+	(aarch64_load_symref_appropriately): In the case of
+	SYMBOL_SMALL_ABSOLUTE, use the mode of 'dest' instead of Pmode
+	to generate new rtx; likewise to the case of SYMBOL_SMALL_GOT.
+	(aarch64_expand_mov_immediate): In the case of SYMBOL_FORCE_TO_MEM,
+	change to pass 'ptr_mode' to force_const_mem and zero-extend 'mem'
+	if 'mode' doesn't equal to 'ptr_mode'.
+	(aarch64_output_mi_thunk): Add an assertion on the alignment of
+	'vcall_offset'; change to call aarch64_emit_move differently depending
+	on whether 'Pmode' equals to 'ptr_mode' or not; use 'POINTER_BYTES'
+	to calculate the upper bound of 'vcall_offset'.
+	(aarch64_cannot_force_const_mem): Change to also return true if
+	mode != ptr_mode.
+	(aarch64_legitimize_reload_address): In the case of large
+	displacements, add new local variable 'xmode' and an assertion
+	based on it; change to use 'xmode' to generate the new rtx and
+	reload.
+	(aarch64_asm_trampoline_template): Change to generate the template
+	differently depending on TARGET_ILP32 or not; change to use
+	'POINTER_BYTES' in the argument passed to assemble_aligned_integer.
+	(aarch64_trampoline_size): Removed.
+	(aarch64_trampoline_init): Add new local constant 'tramp_code_sz'
+	and replace immediate literals with it.  Change to use 'ptr_mode'
+	instead of 'DImode' and call convert_memory_address if the mode
+	of 'fnaddr' doesn't equal to 'ptr_mode'.
+	(aarch64_elf_asm_constructor): Change to use assemble_aligned_integer
+	to output symbol.
+	(aarch64_elf_asm_destructor): Likewise.
+	* config/aarch64/aarch64.h (TRAMPOLINE_SIZE): Change to be dependent
+	on TARGET_ILP32 instead of aarch64_trampoline_size.
+	* config/aarch64/aarch64.md (movsi_aarch64): Add new alternatives
+	of 'mov' between WSP and W registers as well as 'adr' and 'adrp'.
+	(loadwb_pair<GPI:mode>_<PTR:mode>): Rename to ...
+	(loadwb_pair<GPI:mode>_<P:mode>): ... this.  Replace PTR with P.
+	(storewb_pair<GPI:mode>_<PTR:mode>): Likewise; rename to ...
+	(storewb_pair<GPI:mode>_<P:mode>): ... this.
+	(add_losym): Change to 'define_expand' and call gen_add_losym_<mode>
+	depending on the value of 'mode'.
+	(add_losym_<mode>): New.
+	(ldr_got_small_<mode>): New, based on ldr_got_small.
+	(ldr_got_small): Remove.
+	(ldr_got_small_sidi): New.
+	* config/aarch64/iterators.md (P): New.
+	(PTR): Change to 'ptr_mode' in the condition.
+2013-07-23  Yufeng Zhang  <yufeng.zhang@arm.com>
 	* config.gcc (aarch64*-*-*): Support --with-abi.
 	(aarch64*-*-elf): Support --with-multilib-list.
 	(aarch64*-*-linux*): Likewise.

--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -48,6 +48,9 @@
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
+/* Defined for convenience.  */
+#define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
 /* Classifies an address.
   ADDRESS_REG_IMM
@@ -543,13 +546,16 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
    {
    case SYMBOL_SMALL_ABSOLUTE:
      {
+	/* In ILP32, the mode of dest can be either SImode or DImode.  */
 	rtx tmp_reg = dest;
+	enum machine_mode mode = GET_MODE (dest);
+	gcc_assert (mode == Pmode || mode == ptr_mode);
 	if (can_create_pseudo_p ())
-	  {
+	  tmp_reg = gen_reg_rtx (mode);
-	    tmp_reg =  gen_reg_rtx (Pmode);
-	  }
-	emit_move_insn (tmp_reg, gen_rtx_HIGH (Pmode, imm));
+	emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
 	emit_insn (gen_add_losym (dest, tmp_reg, imm));
 	return;
      }
@@ -560,11 +566,33 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
    case SYMBOL_SMALL_GOT:
      {
+	/* In ILP32, the mode of dest can be either SImode or DImode,
+	   while the got entry is always of SImode size.  The mode of
+	   dest depends on how dest is used: if dest is assigned to a
+	   pointer (e.g. in the memory), it has SImode; it may have
+	   DImode if dest is dereferenced to access the memeory.
+	   This is why we have to handle three different ldr_got_small
+	   patterns here (two patterns for ILP32).  */
 	rtx tmp_reg = dest;
+	enum machine_mode mode = GET_MODE (dest);
 	if (can_create_pseudo_p ())
-	  tmp_reg =  gen_reg_rtx (Pmode);
+	  tmp_reg = gen_reg_rtx (mode);
-	emit_move_insn (tmp_reg, gen_rtx_HIGH (Pmode, imm));
-	emit_insn (gen_ldr_got_small (dest, tmp_reg, imm));
+	emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
+	if (mode == ptr_mode)
+	  {
+	    if (mode == DImode)
+	      emit_insn (gen_ldr_got_small_di (dest, tmp_reg, imm));
+	    else
+	      emit_insn (gen_ldr_got_small_si (dest, tmp_reg, imm));
+	  }
+	else
+	  {
+	    gcc_assert (mode == Pmode);
+	    emit_insn (gen_ldr_got_small_sidi (dest, tmp_reg, imm));
+	  }
 	return;
      }
@@ -885,8 +913,10 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	      aarch64_emit_move (dest, base);
 	      return;
 	    }
-	  mem = force_const_mem (mode, imm);
+	  mem = force_const_mem (ptr_mode, imm);
 	  gcc_assert (mem);
+	  if (mode != ptr_mode)
+	    mem = gen_rtx_ZERO_EXTEND (mode, mem);
 	  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
 	  return;
@@ -2518,7 +2548,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
    aarch64_add_constant (this_regno, IP1_REGNUM, delta);
  else
    {
-      gcc_assert ((vcall_offset & 0x7) == 0);
+      gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0);
      this_rtx = gen_rtx_REG (Pmode, this_regno);
      temp0 = gen_rtx_REG (Pmode, IP0_REGNUM);
@@ -2534,9 +2564,14 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
 	    aarch64_add_constant (this_regno, IP1_REGNUM, delta);
 	}
-      aarch64_emit_move (temp0, gen_rtx_MEM (Pmode, addr));
+      if (Pmode == ptr_mode)
+	aarch64_emit_move (temp0, gen_rtx_MEM (ptr_mode, addr));
+      else
+	aarch64_emit_move (temp0,
+			   gen_rtx_ZERO_EXTEND (Pmode,
+						gen_rtx_MEM (ptr_mode, addr)));
-      if (vcall_offset >= -256 && vcall_offset < 32768)
+      if (vcall_offset >= -256 && vcall_offset < 4096 * POINTER_BYTES)
 	  addr = plus_constant (Pmode, temp0, vcall_offset);
      else
 	{
@@ -2544,7 +2579,13 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
 	  addr = gen_rtx_PLUS (Pmode, temp0, temp1);
 	}
-      aarch64_emit_move (temp1, gen_rtx_MEM (Pmode,addr));
+      if (Pmode == ptr_mode)
+	aarch64_emit_move (temp1, gen_rtx_MEM (ptr_mode,addr));
+      else
+	aarch64_emit_move (temp1,
+			   gen_rtx_SIGN_EXTEND (Pmode,
+						gen_rtx_MEM (ptr_mode, addr)));
      emit_insn (gen_add2_insn (this_rtx, temp1));
    }
@@ -2722,8 +2763,15 @@ aarch64_cannot_force_const_mem (enum machine_mode mode ATTRIBUTE_UNUSED, rtx x)
  split_const (x, &base, &offset);
  if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF)
-    return (aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR)
+    {
-	    != SYMBOL_FORCE_TO_MEM);
+      if (aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR)
+	  != SYMBOL_FORCE_TO_MEM)
+	return true;
+      else
+	/* Avoid generating a 64-bit relocation in ILP32; leave
+	   to aarch64_expand_mov_immediate to handle it properly.  */
+	return mode != ptr_mode;
+    }
  return aarch64_tls_referenced_p (x);
 }
@@ -3918,6 +3966,10 @@ aarch64_legitimize_reload_address (rtx *x_p,
      HOST_WIDE_INT high = val - low;
      HOST_WIDE_INT offs;
      rtx cst;
+      enum machine_mode xmode = GET_MODE (x);
+      /* In ILP32, xmode can be either DImode or SImode.  */
+      gcc_assert (xmode == DImode || xmode == SImode);
      /* Reload non-zero BLKmode offsets.  This is because we cannot ascertain
 	 BLKmode alignment.  */
@@ -3951,16 +4003,16 @@ aarch64_legitimize_reload_address (rtx *x_p,
      cst = GEN_INT (high);
      if (!aarch64_uimm12_shift (high))
-	cst = force_const_mem (Pmode, cst);
+	cst = force_const_mem (xmode, cst);
      /* Reload high part into base reg, leaving the low part
 	 in the mem instruction.  */
-      x = gen_rtx_PLUS (Pmode,
+      x = gen_rtx_PLUS (xmode,
-			gen_rtx_PLUS (Pmode, XEXP (x, 0), cst),
+			gen_rtx_PLUS (xmode, XEXP (x, 0), cst),
 			GEN_INT (low));
      push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
-		   BASE_REG_CLASS, Pmode, VOIDmode, 0, 0,
+		   BASE_REG_CLASS, xmode, VOIDmode, 0, 0,
 		   opnum, (enum reload_type) type);
      return x;
    }
@@ -4108,41 +4160,47 @@ aarch64_return_addr (int count, rtx frame ATTRIBUTE_UNUSED)
 static void
 aarch64_asm_trampoline_template (FILE *f)
 {
-  asm_fprintf (f, "\tldr\t%s, .+16\n", reg_names [IP1_REGNUM]);
+  if (TARGET_ILP32)
-  asm_fprintf (f, "\tldr\t%s, .+20\n", reg_names [STATIC_CHAIN_REGNUM]);
+    {
+      asm_fprintf (f, "\tldr\tw%d, .+16\n", IP1_REGNUM - R0_REGNUM);
+      asm_fprintf (f, "\tldr\tw%d, .+16\n", STATIC_CHAIN_REGNUM - R0_REGNUM);
+    }
+  else
+    {
+      asm_fprintf (f, "\tldr\t%s, .+16\n", reg_names [IP1_REGNUM]);
+      asm_fprintf (f, "\tldr\t%s, .+20\n", reg_names [STATIC_CHAIN_REGNUM]);
+    }
  asm_fprintf (f, "\tbr\t%s\n", reg_names [IP1_REGNUM]);
  assemble_aligned_integer (4, const0_rtx);
-  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+  assemble_aligned_integer (POINTER_BYTES, const0_rtx);
-  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+  assemble_aligned_integer (POINTER_BYTES, const0_rtx);
-}
-unsigned
-aarch64_trampoline_size (void)
-{
-  return 32;  /* 3 insns + padding + 2 dwords.  */
 }
 static void
 aarch64_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 {
  rtx fnaddr, mem, a_tramp;
+  const int tramp_code_sz = 16;
  /* Don't need to copy the trailing D-words, we fill those in below.  */
  emit_block_move (m_tramp, assemble_trampoline_template (),
-		   GEN_INT (TRAMPOLINE_SIZE - 16), BLOCK_OP_NORMAL);
+		   GEN_INT (tramp_code_sz), BLOCK_OP_NORMAL);
-  mem = adjust_address (m_tramp, DImode, 16);
+  mem = adjust_address (m_tramp, ptr_mode, tramp_code_sz);
  fnaddr = XEXP (DECL_RTL (fndecl), 0);
+  if (GET_MODE (fnaddr) != ptr_mode)
+    fnaddr = convert_memory_address (ptr_mode, fnaddr);
  emit_move_insn (mem, fnaddr);
-  mem = adjust_address (m_tramp, DImode, 24);
+  mem = adjust_address (m_tramp, ptr_mode, tramp_code_sz + POINTER_BYTES);
  emit_move_insn (mem, chain_value);
  /* XXX We should really define a "clear_cache" pattern and use
     gen_clear_cache().  */
  a_tramp = XEXP (m_tramp, 0);
  emit_library_call (gen_rtx_SYMBOL_REF (Pmode, "__clear_cache"),
-		     LCT_NORMAL, VOIDmode, 2, a_tramp, Pmode,
+		     LCT_NORMAL, VOIDmode, 2, a_tramp, ptr_mode,
-		     plus_constant (Pmode, a_tramp, TRAMPOLINE_SIZE), Pmode);
+		     plus_constant (ptr_mode, a_tramp, TRAMPOLINE_SIZE),
+		     ptr_mode);
 }
 static unsigned char
@@ -4197,9 +4255,7 @@ aarch64_elf_asm_constructor (rtx symbol, int priority)
      s = get_section (buf, SECTION_WRITE, NULL);
      switch_to_section (s);
      assemble_align (POINTER_SIZE);
-      fputs ("\t.dword\t", asm_out_file);
+      assemble_aligned_integer (POINTER_BYTES, symbol);
-      output_addr_const (asm_out_file, symbol);
-      fputc ('\n', asm_out_file);
    }
 }
@@ -4216,9 +4272,7 @@ aarch64_elf_asm_destructor (rtx symbol, int priority)
      s = get_section (buf, SECTION_WRITE, NULL);
      switch_to_section (s);
      assemble_align (POINTER_SIZE);
-      fputs ("\t.dword\t", asm_out_file);
+      assemble_aligned_integer (POINTER_BYTES, symbol);
-      output_addr_const (asm_out_file, symbol);
-      fputc ('\n', asm_out_file);
    }
 }

--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -740,7 +740,8 @@ do {									     \
 #define RETURN_ADDR_RTX aarch64_return_addr
-#define TRAMPOLINE_SIZE	aarch64_trampoline_size ()
+/* 3 insns + padding + 2 pointer-sized entries.  */
+#define TRAMPOLINE_SIZE	(TARGET_ILP32 ? 24 : 32)
 /* Trampolines contain dwords, so must be dword aligned.  */
 #define TRAMPOLINE_ALIGNMENT 64

--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -827,23 +827,27 @@
 )
 (define_insn "*movsi_aarch64"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,*w,m,  m,*w, r,*w")
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m,  m,r,r  ,*w, r,*w")
-	(match_operand:SI 1 "aarch64_mov_operand"  " r,M,m, m,rZ,*w,rZ,*w,*w"))]
+	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,m, m,rZ,*w,S,Ush,rZ,*w,*w"))]
  "(register_operand (operands[0], SImode)
    || aarch64_reg_or_zero (operands[1], SImode))"
  "@
   mov\\t%w0, %w1
+   mov\\t%w0, %w1
+   mov\\t%w0, %w1
   mov\\t%w0, %1
   ldr\\t%w0, %1
   ldr\\t%s0, %1
   str\\t%w1, %0
   str\\t%s1, %0
+   adr\\t%x0, %a1
+   adrp\\t%x0, %A1
   fmov\\t%s0, %w1
   fmov\\t%w0, %s1
   fmov\\t%s0, %s1"
-  [(set_attr "v8type" "move,alu,load1,load1,store1,store1,fmov,fmov,fmov")
+  [(set_attr "v8type" "move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov")
   (set_attr "mode" "SI")
-   (set_attr "fp" "*,*,*,yes,*,yes,yes,yes,yes")]
+   (set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes")]
 )
 (define_insn "*movdi_aarch64"
@@ -1108,17 +1112,17 @@
 ;; Load pair with writeback.  This is primarily used in function epilogues
 ;; when restoring [fp,lr]
-(define_insn "loadwb_pair<GPI:mode>_<PTR:mode>"
+(define_insn "loadwb_pair<GPI:mode>_<P:mode>"
  [(parallel
-    [(set (match_operand:PTR 0 "register_operand" "=k")
+    [(set (match_operand:P 0 "register_operand" "=k")
-          (plus:PTR (match_operand:PTR 1 "register_operand" "0")
+          (plus:P (match_operand:P 1 "register_operand" "0")
-                  (match_operand:PTR 4 "const_int_operand" "n")))
+                  (match_operand:P 4 "const_int_operand" "n")))
     (set (match_operand:GPI 2 "register_operand" "=r")
-          (mem:GPI (plus:PTR (match_dup 1)
+          (mem:GPI (plus:P (match_dup 1)
                   (match_dup 4))))
     (set (match_operand:GPI 3 "register_operand" "=r")
-          (mem:GPI (plus:PTR (match_dup 1)
+          (mem:GPI (plus:P (match_dup 1)
-                   (match_operand:PTR 5 "const_int_operand" "n"))))])]
+                   (match_operand:P 5 "const_int_operand" "n"))))])]
  "INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE (<GPI:MODE>mode)"
  "ldp\\t%<w>2, %<w>3, [%1], %4"
  [(set_attr "v8type" "load2")
@@ -1127,16 +1131,16 @@
 ;; Store pair with writeback.  This is primarily used in function prologues
 ;; when saving [fp,lr]
-(define_insn "storewb_pair<GPI:mode>_<PTR:mode>"
+(define_insn "storewb_pair<GPI:mode>_<P:mode>"
  [(parallel
-    [(set (match_operand:PTR 0 "register_operand" "=&k")
+    [(set (match_operand:P 0 "register_operand" "=&k")
-          (plus:PTR (match_operand:PTR 1 "register_operand" "0")
+          (plus:P (match_operand:P 1 "register_operand" "0")
-                  (match_operand:PTR 4 "const_int_operand" "n")))
+                  (match_operand:P 4 "const_int_operand" "n")))
-     (set (mem:GPI (plus:PTR (match_dup 0)
+     (set (mem:GPI (plus:P (match_dup 0)
                   (match_dup 4)))
          (match_operand:GPI 2 "register_operand" "r"))
-     (set (mem:GPI (plus:PTR (match_dup 0)
+     (set (mem:GPI (plus:P (match_dup 0)
-                   (match_operand:PTR 5 "const_int_operand" "n")))
+                   (match_operand:P 5 "const_int_operand" "n")))
          (match_operand:GPI 3 "register_operand" "r"))])]
  "INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE (<GPI:MODE>mode)"
  "stp\\t%<w>2, %<w>3, [%0, %4]!"
@@ -3729,25 +3733,53 @@
 ;; and lo_sum's to be used with the labels defining the jump tables in
 ;; rodata section.
-(define_insn "add_losym"
+(define_expand "add_losym"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand 0 "register_operand" "=r")
-	(lo_sum:DI (match_operand:DI 1 "register_operand" "r")
+	(lo_sum (match_operand 1 "register_operand" "r")
-		   (match_operand 2 "aarch64_valid_symref" "S")))]
+		(match_operand 2 "aarch64_valid_symref" "S")))]
  ""
-  "add\\t%0, %1, :lo12:%a2"
+{
+  enum machine_mode mode = GET_MODE (operands[0]);
+  emit_insn ((mode == DImode
+	      ? gen_add_losym_di
+	      : gen_add_losym_si) (operands[0],
+				   operands[1],
+				   operands[2]));
+  DONE;
+})
+(define_insn "add_losym_<mode>"
+  [(set (match_operand:P 0 "register_operand" "=r")
+	(lo_sum:P (match_operand:P 1 "register_operand" "r")
+		  (match_operand 2 "aarch64_valid_symref" "S")))]
+  ""
+  "add\\t%<w>0, %<w>1, :lo12:%a2"
  [(set_attr "v8type" "alu")
-   (set_attr "mode" "DI")]
+   (set_attr "mode" "<MODE>")]
+)
+(define_insn "ldr_got_small_<mode>"
+  [(set (match_operand:PTR 0 "register_operand" "=r")
+	(unspec:PTR [(mem:PTR (lo_sum:PTR
+			      (match_operand:PTR 1 "register_operand" "r")
+			      (match_operand:PTR 2 "aarch64_valid_symref" "S")))]
+		    UNSPEC_GOTSMALLPIC))]
+  ""
+  "ldr\\t%<w>0, [%1, #:got_lo12:%a2]"
+  [(set_attr "v8type" "load1")
+   (set_attr "mode" "<MODE>")]
 )
-(define_insn "ldr_got_small"
+(define_insn "ldr_got_small_sidi"
  [(set (match_operand:DI 0 "register_operand" "=r")
-	(unspec:DI [(mem:DI (lo_sum:DI
+	(zero_extend:DI
-			      (match_operand:DI 1 "register_operand" "r")
+	 (unspec:SI [(mem:SI (lo_sum:DI
-			      (match_operand:DI 2 "aarch64_valid_symref" "S")))]
+			     (match_operand:DI 1 "register_operand" "r")
-		   UNSPEC_GOTSMALLPIC))]
+			     (match_operand:DI 2 "aarch64_valid_symref" "S")))]
-  ""
+		    UNSPEC_GOTSMALLPIC)))]
-  "ldr\\t%0, [%1, #:got_lo12:%a2]"
+  "TARGET_ILP32"
+  "ldr\\t%w0, [%1, #:got_lo12:%a2]"
  [(set_attr "v8type" "load1")
   (set_attr "mode" "DI")]
 )

--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -76,9 +76,15 @@
 ;; Vector modes for moves.
 (define_mode_iterator VDQM [V8QI V16QI V4HI V8HI V2SI V4SI])
+;; This mode iterator allows :P to be used for patterns that operate on
+;; addresses in different modes.  In LP64, only DI will match, while in
+;; ILP32, either can match.
+(define_mode_iterator P [(SI "ptr_mode == SImode || Pmode == SImode")
+			 (DI "ptr_mode == DImode || Pmode == DImode")])
 ;; This mode iterator allows :PTR to be used for patterns that operate on
 ;; pointer-sized quantities.  Exactly one of the two alternatives will match.
-(define_mode_iterator PTR [(SI "Pmode == SImode") (DI "Pmode == DImode")])
+(define_mode_iterator PTR [(SI "ptr_mode == SImode") (DI "ptr_mode == DImode")])
 ;; Vector Float modes.
 (define_mode_iterator VDQF [V2SF V4SF V2DF])