Commit 1133125e by Harsha Jagasia Committed by Harsha Jagasia

config.gcc: Add support for --with-cpu option for bdver1.

2010-05-14  Harsha Jagasia  <harsha.jagasia@amd.com>

	* config.gcc: Add support for --with-cpu option for bdver1.
	* config/i386/i386.h (TARGET_BDVER1): New macro.
	(ix86_tune_indices): Change SSE_UNALIGNED_MOVE_OPTIMAL
	to SSE_UNALIGNED_LOAD_OPTIMAL. Add SSE_UNALIGNED_STORE_OPTIMAL.
	(ix86_tune_features) :Change SSE_UNALIGNED_MOVE_OPTIMAL
	to SSE_UNALIGNED_LOAD_OPTIMAL. Add SSE_UNALIGNED_STORE_OPTIMAL.
	Add SSE_PACKED_SINGLE_INSN_OPTIMAL.
	(TARGET_CPU_DEFAULT_NAMES): Add bdver1.
	(processor_type): Add PROCESSOR_BDVER1.
	* config/i386/i386.md: Add bdver1 as a new cpu attribute to match
	processor_type in config/i386/i386.h.
	Add check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to emit
	movaps <reg, reg> instead of movapd <reg, reg> when replacing
	movsd <reg, reg> or movss <reg, reg> for SSE and AVX.
	Add check for  TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL
	to emit packed xor instead of packed double/packed integer
	xor for SSE and AVX when moving a zero value.
	* config/i386/sse.md: Add check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL
	 to emit movaps instead of movapd/movdqa for SSE and AVX.
	Add check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to emit packed single
	logical operations i.e and, or and xor instead of packed double logical
	operations for SSE and AVX. 
	* config/i386/i386-c.c: 
	(ix86_target_macros_internal): Add PROCESSOR_BDVER1.
	* config/i386/driver-i386.c: Turn on -mtune=native for BDVER1.
	(has_fma4, has_xop): New.
	* config/i386/i386.c (bdver1_cost): New variable.
	(m_BDVER1): New macro.
	(m_AMD_MULTIPLE): Add m_BDVER1.
	(x86_tune_use_leave, x86_tune_push_memory, x86_tune_unroll_strlen,
	 x86_tune_deep_branch_prediction, x86_tune_use_sahf, x86_tune_movx,
	 x86_tune_use_simode_fiop, x86_tune_promote_qimode, 
	 x86_tune_add_esp_8, x86_tune_tune_sub_esp_4, x86_tune_sub_esp_8,
	 x86_tune_integer_dfmode_moves, x86_tune_partial_reg_dependency,
	 x86_tune_sse_partial_reg_dependency, x86_tune_sse_unaligned_load_optimal,
	 x86_tune_sse_unaligned_store_optimal, x86_tune_sse_typeless_stores,
	 x86_tune_memory_mismatch_stall, x86_tune_use_ffreep,
	 x86_tune_inter_unit_moves, x86_tune_inter_unit_conversions,
	 x86_tune_use_bt, x86_tune_pad_returns, x86_tune_slow_imul_imm32_mem,
	 x86_tune_slow_imul_imm8, x86_tune_fuse_cmp_and_branch): 
	Enable/disable for bdver1.
	(processor_target_table): Add bdver1_cost.
	(cpu_names): Add bdver1.
	(override_options): Set up PROCESSOR_BDVER1 for bdver1 entry in
	 processor_alias_table.
	(ix86_expand_vector_move_misalign): Change 
	 TARGET_SSE_UNALIGNED_MOVE_OPTIMAL to TARGET_SSE_UNALIGNED_LOAD_OPTIMAL.
	 Check for TARGET_SSE_UNALIGNED_STORE_OPTIMAL.
	 Check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to emit movups instead
	 of movupd/movdqu for SSE and AVX.
	(ix86_tune_issue_rate): Add PROCESSOR_BDVER1.
	(ix86_tune_adjust_cost): Add code for bdver1.
	(standard_sse_constant_opcode): Add check for
	TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to emit packed single xor instead
	of packed double xor for SSE and AVX.

From-SVN: r159399
parent e972cc7e
2010-05-14 Harsha Jagasia <harsha.jagasia@amd.com>
* config.gcc: Add support for --with-cpu option for bdver1.
* config/i386/i386.h (TARGET_BDVER1): New macro.
(ix86_tune_indices): Change SSE_UNALIGNED_MOVE_OPTIMAL
to SSE_UNALIGNED_LOAD_OPTIMAL. Add SSE_UNALIGNED_STORE_OPTIMAL.
(ix86_tune_features) :Change SSE_UNALIGNED_MOVE_OPTIMAL
to SSE_UNALIGNED_LOAD_OPTIMAL. Add SSE_UNALIGNED_STORE_OPTIMAL.
Add SSE_PACKED_SINGLE_INSN_OPTIMAL.
(TARGET_CPU_DEFAULT_NAMES): Add bdver1.
(processor_type): Add PROCESSOR_BDVER1.
* config/i386/i386.md: Add bdver1 as a new cpu attribute to match
processor_type in config/i386/i386.h.
Add check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to emit
movaps <reg, reg> instead of movapd <reg, reg> when replacing
movsd <reg, reg> or movss <reg, reg> for SSE and AVX.
Add check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL
to emit packed xor instead of packed double/packed integer
xor for SSE and AVX when moving a zero value.
* config/i386/sse.md: Add check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL
to emit movaps instead of movapd/movdqa for SSE and AVX.
Add check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to emit packed single
logical operations i.e and, or and xor instead of packed double logical
operations for SSE and AVX.
* config/i386/i386-c.c:
(ix86_target_macros_internal): Add PROCESSOR_BDVER1.
* config/i386/driver-i386.c: Turn on -mtune=native for BDVER1.
(has_fma4, has_xop): New.
* config/i386/i386.c (bdver1_cost): New variable.
(m_BDVER1): New macro.
(m_AMD_MULTIPLE): Add m_BDVER1.
(x86_tune_use_leave, x86_tune_push_memory, x86_tune_unroll_strlen,
x86_tune_deep_branch_prediction, x86_tune_use_sahf, x86_tune_movx,
x86_tune_use_simode_fiop, x86_tune_promote_qimode,
x86_tune_add_esp_8, x86_tune_tune_sub_esp_4, x86_tune_sub_esp_8,
x86_tune_integer_dfmode_moves, x86_tune_partial_reg_dependency,
x86_tune_sse_partial_reg_dependency, x86_tune_sse_unaligned_load_optimal,
x86_tune_sse_unaligned_store_optimal, x86_tune_sse_typeless_stores,
x86_tune_memory_mismatch_stall, x86_tune_use_ffreep,
x86_tune_inter_unit_moves, x86_tune_inter_unit_conversions,
x86_tune_use_bt, x86_tune_pad_returns, x86_tune_slow_imul_imm32_mem,
x86_tune_slow_imul_imm8, x86_tune_fuse_cmp_and_branch):
Enable/disable for bdver1.
(processor_target_table): Add bdver1_cost.
(cpu_names): Add bdver1.
(override_options): Set up PROCESSOR_BDVER1 for bdver1 entry in
processor_alias_table.
(ix86_expand_vector_move_misalign): Change
TARGET_SSE_UNALIGNED_MOVE_OPTIMAL to TARGET_SSE_UNALIGNED_LOAD_OPTIMAL.
Check for TARGET_SSE_UNALIGNED_STORE_OPTIMAL.
Check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to emit movups instead
of movupd/movdqu for SSE and AVX.
(ix86_tune_issue_rate): Add PROCESSOR_BDVER1.
(ix86_tune_adjust_cost): Add code for bdver1.
(standard_sse_constant_opcode): Add check for
TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to emit packed single xor instead
of packed double xor for SSE and AVX.
2010-05-14 Pat Haugen <pthaugen@us.ibm.com> 2010-05-14 Pat Haugen <pthaugen@us.ibm.com>
* tree-ssa-loop.prefetch.c (prune_ref_by_group_reuse): Cast abs() * tree-ssa-loop.prefetch.c (prune_ref_by_group_reuse): Cast abs()
......
...@@ -1139,7 +1139,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | i ...@@ -1139,7 +1139,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | i
need_64bit_hwint=yes need_64bit_hwint=yes
need_64bit_isa=yes need_64bit_isa=yes
case X"${with_cpu}" in case X"${with_cpu}" in
Xgeneric|Xatom|Xcore2|Xnocona|Xx86-64|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) Xgeneric|Xatom|Xcore2|Xnocona|Xx86-64|Xbdver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
;; ;;
X) X)
if test x$with_cpu_64 = x; then if test x$with_cpu_64 = x; then
...@@ -1148,7 +1148,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | i ...@@ -1148,7 +1148,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | i
;; ;;
*) *)
echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
echo "generic atom core2 nocona x86-64 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 echo "generic atom core2 nocona x86-64 bdver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
exit 1 exit 1
;; ;;
esac esac
...@@ -1266,7 +1266,7 @@ i[34567]86-*-solaris2*) ...@@ -1266,7 +1266,7 @@ i[34567]86-*-solaris2*)
need_64bit_isa=yes need_64bit_isa=yes
use_gcc_stdint=wrap use_gcc_stdint=wrap
case X"${with_cpu}" in case X"${with_cpu}" in
Xgeneric|Xatom|Xcore2|Xnocona|Xx86-64|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) Xgeneric|Xatom|Xcore2|Xnocona|Xx86-64|Xbdver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
;; ;;
X) X)
if test x$with_cpu_64 = x; then if test x$with_cpu_64 = x; then
...@@ -1275,7 +1275,7 @@ i[34567]86-*-solaris2*) ...@@ -1275,7 +1275,7 @@ i[34567]86-*-solaris2*)
;; ;;
*) *)
echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
echo "generic atom core2 nocona x86-64 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 echo "generic atom core2 nocona x86-64 bdver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
exit 1 exit 1
;; ;;
esac esac
...@@ -1346,7 +1346,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*) ...@@ -1346,7 +1346,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
if test x$enable_targets = xall; then if test x$enable_targets = xall; then
tm_defines="${tm_defines} TARGET_BI_ARCH=1" tm_defines="${tm_defines} TARGET_BI_ARCH=1"
case X"${with_cpu}" in case X"${with_cpu}" in
Xgeneric|Xatom|Xcore2|Xnocona|Xx86-64|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) Xgeneric|Xatom|Xcore2|Xnocona|Xx86-64|Xbdver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
;; ;;
X) X)
if test x$with_cpu_64 = x; then if test x$with_cpu_64 = x; then
...@@ -1355,7 +1355,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*) ...@@ -1355,7 +1355,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
;; ;;
*) *)
echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
echo "generic atom core2 nocona x86-64 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 echo "generic atom core2 nocona x86-64 bdver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
exit 1 exit 1
;; ;;
esac esac
...@@ -2626,6 +2626,10 @@ case ${target} in ...@@ -2626,6 +2626,10 @@ case ${target} in
;; ;;
i686-*-* | i786-*-*) i686-*-* | i786-*-*)
case ${target_noncanonical} in case ${target_noncanonical} in
bdver1-*)
arch=bdver1
cpu=bdver1
;;
amdfam10-*|barcelona-*) amdfam10-*|barcelona-*)
arch=amdfam10 arch=amdfam10
cpu=amdfam10 cpu=amdfam10
...@@ -2703,6 +2707,10 @@ case ${target} in ...@@ -2703,6 +2707,10 @@ case ${target} in
;; ;;
x86_64-*-*) x86_64-*-*)
case ${target_noncanonical} in case ${target_noncanonical} in
bdver1-*)
arch=bdver1
cpu=bdver1
;;
amdfam10-*|barcelona-*) amdfam10-*|barcelona-*)
arch=amdfam10 arch=amdfam10
cpu=amdfam10 cpu=amdfam10
...@@ -3109,8 +3117,8 @@ case "${target}" in ...@@ -3109,8 +3117,8 @@ case "${target}" in
;; ;;
"" | x86-64 | generic | native \ "" | x86-64 | generic | native \
| k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \ | k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \
| opteron-sse3 | athlon-fx | amdfam10 | barcelona \ | opteron-sse3 | athlon-fx | bdver1 | amdfam10 \
| nocona | core2 | atom) | barcelona | nocona | core2 | atom)
# OK # OK
;; ;;
*) *)
......
...@@ -396,6 +396,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) ...@@ -396,6 +396,7 @@ const char *host_detect_local_cpu (int argc, const char **argv)
unsigned int has_movbe = 0, has_sse4_1 = 0, has_sse4_2 = 0; unsigned int has_movbe = 0, has_sse4_1 = 0, has_sse4_2 = 0;
unsigned int has_popcnt = 0, has_aes = 0, has_avx = 0; unsigned int has_popcnt = 0, has_aes = 0, has_avx = 0;
unsigned int has_pclmul = 0, has_abm = 0, has_lwp = 0; unsigned int has_pclmul = 0, has_abm = 0, has_lwp = 0;
unsigned int has_fma4 = 0, has_xop = 0;
bool arch; bool arch;
...@@ -460,6 +461,8 @@ const char *host_detect_local_cpu (int argc, const char **argv) ...@@ -460,6 +461,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
has_sse4a = ecx & bit_SSE4a; has_sse4a = ecx & bit_SSE4a;
has_abm = ecx & bit_ABM; has_abm = ecx & bit_ABM;
has_lwp = ecx & bit_LWP; has_lwp = ecx & bit_LWP;
has_fma4 = ecx & bit_FMA4;
has_xop = ecx & bit_XOP;
has_longmode = edx & bit_LM; has_longmode = edx & bit_LM;
has_3dnowp = edx & bit_3DNOWP; has_3dnowp = edx & bit_3DNOWP;
...@@ -490,6 +493,8 @@ const char *host_detect_local_cpu (int argc, const char **argv) ...@@ -490,6 +493,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
if (name == SIG_GEODE) if (name == SIG_GEODE)
processor = PROCESSOR_GEODE; processor = PROCESSOR_GEODE;
else if (has_xop)
processor = PROCESSOR_BDVER1;
else if (has_sse4a) else if (has_sse4a)
processor = PROCESSOR_AMDFAM10; processor = PROCESSOR_AMDFAM10;
else if (has_sse2 || has_longmode) else if (has_sse2 || has_longmode)
...@@ -629,6 +634,9 @@ const char *host_detect_local_cpu (int argc, const char **argv) ...@@ -629,6 +634,9 @@ const char *host_detect_local_cpu (int argc, const char **argv)
case PROCESSOR_AMDFAM10: case PROCESSOR_AMDFAM10:
cpu = "amdfam10"; cpu = "amdfam10";
break; break;
case PROCESSOR_BDVER1:
cpu = "bdver1";
break;
default: default:
/* Use something reasonable. */ /* Use something reasonable. */
...@@ -674,6 +682,10 @@ const char *host_detect_local_cpu (int argc, const char **argv) ...@@ -674,6 +682,10 @@ const char *host_detect_local_cpu (int argc, const char **argv)
options = concat (options, " -mabm", NULL); options = concat (options, " -mabm", NULL);
if (has_lwp) if (has_lwp)
options = concat (options, " -mlwp", NULL); options = concat (options, " -mlwp", NULL);
if (has_fma4)
options = concat (options, " -mfma4", NULL);
if (has_xop)
options = concat (options, " -mxop", NULL);
if (has_avx) if (has_avx)
options = concat (options, " -mavx", NULL); options = concat (options, " -mavx", NULL);
......
...@@ -107,6 +107,10 @@ ix86_target_macros_internal (int isa_flag, ...@@ -107,6 +107,10 @@ ix86_target_macros_internal (int isa_flag,
def_or_undef (parse_in, "__amdfam10"); def_or_undef (parse_in, "__amdfam10");
def_or_undef (parse_in, "__amdfam10__"); def_or_undef (parse_in, "__amdfam10__");
break; break;
case PROCESSOR_BDVER1:
def_or_undef (parse_in, "__bdver1");
def_or_undef (parse_in, "__bdver1__");
break;
case PROCESSOR_PENTIUM4: case PROCESSOR_PENTIUM4:
def_or_undef (parse_in, "__pentium4"); def_or_undef (parse_in, "__pentium4");
def_or_undef (parse_in, "__pentium4__"); def_or_undef (parse_in, "__pentium4__");
...@@ -182,6 +186,9 @@ ix86_target_macros_internal (int isa_flag, ...@@ -182,6 +186,9 @@ ix86_target_macros_internal (int isa_flag,
case PROCESSOR_AMDFAM10: case PROCESSOR_AMDFAM10:
def_or_undef (parse_in, "__tune_amdfam10__"); def_or_undef (parse_in, "__tune_amdfam10__");
break; break;
case PROCESSOR_BDVER1:
def_or_undef (parse_in, "__tune_bdver1__");
break;
case PROCESSOR_PENTIUM4: case PROCESSOR_PENTIUM4:
def_or_undef (parse_in, "__tune_pentium4__"); def_or_undef (parse_in, "__tune_pentium4__");
break; break;
......
...@@ -819,6 +819,93 @@ struct processor_costs amdfam10_cost = { ...@@ -819,6 +819,93 @@ struct processor_costs amdfam10_cost = {
1, /* cond_not_taken_branch_cost. */ 1, /* cond_not_taken_branch_cost. */
}; };
struct processor_costs bdver1_cost = {
COSTS_N_INSNS (1), /* cost of an add instruction */
COSTS_N_INSNS (2), /* cost of a lea instruction */
COSTS_N_INSNS (1), /* variable shift costs */
COSTS_N_INSNS (1), /* constant shift costs */
{COSTS_N_INSNS (3), /* cost of starting multiply for QI */
COSTS_N_INSNS (4), /* HI */
COSTS_N_INSNS (3), /* SI */
COSTS_N_INSNS (4), /* DI */
COSTS_N_INSNS (5)}, /* other */
0, /* cost of multiply per each bit set */
{COSTS_N_INSNS (19), /* cost of a divide/mod for QI */
COSTS_N_INSNS (35), /* HI */
COSTS_N_INSNS (51), /* SI */
COSTS_N_INSNS (83), /* DI */
COSTS_N_INSNS (83)}, /* other */
COSTS_N_INSNS (1), /* cost of movsx */
COSTS_N_INSNS (1), /* cost of movzx */
8, /* "large" insn */
9, /* MOVE_RATIO */
4, /* cost for loading QImode using movzbl */
{3, 4, 3}, /* cost of loading integer registers
in QImode, HImode and SImode.
Relative to reg-reg move (2). */
{3, 4, 3}, /* cost of storing integer registers */
4, /* cost of reg,reg fld/fst */
{4, 4, 12}, /* cost of loading fp registers
in SFmode, DFmode and XFmode */
{6, 6, 8}, /* cost of storing fp registers
in SFmode, DFmode and XFmode */
2, /* cost of moving MMX register */
{3, 3}, /* cost of loading MMX registers
in SImode and DImode */
{4, 4}, /* cost of storing MMX registers
in SImode and DImode */
2, /* cost of moving SSE register */
{4, 4, 3}, /* cost of loading SSE registers
in SImode, DImode and TImode */
{4, 4, 5}, /* cost of storing SSE registers
in SImode, DImode and TImode */
3, /* MMX or SSE register to integer */
/* On K8
MOVD reg64, xmmreg Double FSTORE 4
MOVD reg32, xmmreg Double FSTORE 4
On AMDFAM10
MOVD reg64, xmmreg Double FADD 3
1/1 1/1
MOVD reg32, xmmreg Double FADD 3
1/1 1/1 */
64, /* size of l1 cache. */
1024, /* size of l2 cache. */
64, /* size of prefetch block */
/* New AMD processors never drop prefetches; if they cannot be performed
immediately, they are queued. We set number of simultaneous prefetches
to a large constant to reflect this (it probably is not a good idea not
to limit number of prefetches at all, as their execution also takes some
time). */
100, /* number of parallel prefetches */
2, /* Branch cost */
COSTS_N_INSNS (4), /* cost of FADD and FSUB insns. */
COSTS_N_INSNS (4), /* cost of FMUL instruction. */
COSTS_N_INSNS (19), /* cost of FDIV instruction. */
COSTS_N_INSNS (2), /* cost of FABS instruction. */
COSTS_N_INSNS (2), /* cost of FCHS instruction. */
COSTS_N_INSNS (35), /* cost of FSQRT instruction. */
/* BDVER1 has optimized REP instruction for medium sized blocks, but for
very small blocks it is better to use loop. For large blocks, libcall can
do nontemporary accesses and beat inline considerably. */
{{libcall, {{6, loop}, {14, unrolled_loop}, {-1, rep_prefix_4_byte}}},
{libcall, {{16, loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
{{libcall, {{8, loop}, {24, unrolled_loop},
{2048, rep_prefix_4_byte}, {-1, libcall}}},
{libcall, {{48, unrolled_loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
4, /* scalar_stmt_cost. */
2, /* scalar load_cost. */
2, /* scalar_store_cost. */
6, /* vec_stmt_cost. */
0, /* vec_to_scalar_cost. */
2, /* scalar_to_vec_cost. */
2, /* vec_align_load_cost. */
2, /* vec_unalign_load_cost. */
2, /* vec_store_cost. */
2, /* cond_taken_branch_cost. */
1, /* cond_not_taken_branch_cost. */
};
static const static const
struct processor_costs pentium4_cost = { struct processor_costs pentium4_cost = {
COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of an add instruction */
...@@ -1276,7 +1363,8 @@ const struct processor_costs *ix86_cost = &pentium_cost; ...@@ -1276,7 +1363,8 @@ const struct processor_costs *ix86_cost = &pentium_cost;
#define m_ATHLON (1<<PROCESSOR_ATHLON) #define m_ATHLON (1<<PROCESSOR_ATHLON)
#define m_ATHLON_K8 (m_K8 | m_ATHLON) #define m_ATHLON_K8 (m_K8 | m_ATHLON)
#define m_AMDFAM10 (1<<PROCESSOR_AMDFAM10) #define m_AMDFAM10 (1<<PROCESSOR_AMDFAM10)
#define m_AMD_MULTIPLE (m_K8 | m_ATHLON | m_AMDFAM10) #define m_BDVER1 (1<<PROCESSOR_BDVER1)
#define m_AMD_MULTIPLE (m_K8 | m_ATHLON | m_AMDFAM10 | m_BDVER1)
#define m_GENERIC32 (1<<PROCESSOR_GENERIC32) #define m_GENERIC32 (1<<PROCESSOR_GENERIC32)
#define m_GENERIC64 (1<<PROCESSOR_GENERIC64) #define m_GENERIC64 (1<<PROCESSOR_GENERIC64)
...@@ -1321,7 +1409,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = { ...@@ -1321,7 +1409,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
~m_386, ~m_386,
/* X86_TUNE_USE_SAHF */ /* X86_TUNE_USE_SAHF */
m_ATOM | m_PPRO | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_PENT4 m_ATOM | m_PPRO | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER1 | m_PENT4
| m_NOCONA | m_CORE2 | m_GENERIC, | m_NOCONA | m_CORE2 | m_GENERIC,
/* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
...@@ -1425,10 +1513,16 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = { ...@@ -1425,10 +1513,16 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
while enabling it on K8 brings roughly 2.4% regression that can be partly while enabling it on K8 brings roughly 2.4% regression that can be partly
masked by careful scheduling of moves. */ masked by careful scheduling of moves. */
m_ATOM | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 | m_GENERIC m_ATOM | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 | m_GENERIC
| m_AMDFAM10, | m_AMDFAM10 | m_BDVER1,
/* X86_TUNE_SSE_UNALIGNED_MOVE_OPTIMAL */ /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */
m_AMDFAM10, m_AMDFAM10 | m_BDVER1,
/* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL */
m_BDVER1,
/* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL */
m_BDVER1,
/* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
are resolved on SSE register parts instead of whole registers, so we may are resolved on SSE register parts instead of whole registers, so we may
...@@ -1461,7 +1555,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = { ...@@ -1461,7 +1555,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
~(m_AMD_MULTIPLE | m_GENERIC), ~(m_AMD_MULTIPLE | m_GENERIC),
/* X86_TUNE_INTER_UNIT_CONVERSIONS */ /* X86_TUNE_INTER_UNIT_CONVERSIONS */
~(m_AMDFAM10), ~(m_AMDFAM10 | m_BDVER1),
/* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
than 4 branch instructions in the 16 byte window. */ than 4 branch instructions in the 16 byte window. */
...@@ -1497,11 +1591,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = { ...@@ -1497,11 +1591,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
/* X86_TUNE_SLOW_IMUL_IMM32_MEM: Imul of 32-bit constant and memory is /* X86_TUNE_SLOW_IMUL_IMM32_MEM: Imul of 32-bit constant and memory is
vector path on AMD machines. */ vector path on AMD machines. */
m_K8 | m_GENERIC64 | m_AMDFAM10, m_K8 | m_GENERIC64 | m_AMDFAM10 | m_BDVER1,
/* X86_TUNE_SLOW_IMUL_IMM8: Imul of 8-bit constant is vector path on AMD /* X86_TUNE_SLOW_IMUL_IMM8: Imul of 8-bit constant is vector path on AMD
machines. */ machines. */
m_K8 | m_GENERIC64 | m_AMDFAM10, m_K8 | m_GENERIC64 | m_AMDFAM10 | m_BDVER1,
/* X86_TUNE_MOVE_M1_VIA_OR: On pentiums, it is faster to load -1 via OR /* X86_TUNE_MOVE_M1_VIA_OR: On pentiums, it is faster to load -1 via OR
than a MOV. */ than a MOV. */
...@@ -1527,7 +1621,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = { ...@@ -1527,7 +1621,7 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
/* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction
with a subsequent conditional jump instruction into a single with a subsequent conditional jump instruction into a single
compare-and-branch uop. */ compare-and-branch uop. */
m_CORE2, m_CORE2 | m_BDVER1,
/* X86_TUNE_OPT_AGU: Optimize for Address Generation Unit. This flag /* X86_TUNE_OPT_AGU: Optimize for Address Generation Unit. This flag
will impact LEA instruction selection. */ will impact LEA instruction selection. */
...@@ -2067,6 +2161,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] = ...@@ -2067,6 +2161,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] =
{&generic32_cost, 16, 7, 16, 7, 16}, {&generic32_cost, 16, 7, 16, 7, 16},
{&generic64_cost, 16, 10, 16, 10, 16}, {&generic64_cost, 16, 10, 16, 10, 16},
{&amdfam10_cost, 32, 24, 32, 7, 32}, {&amdfam10_cost, 32, 24, 32, 7, 32},
{&bdver1_cost, 32, 24, 32, 7, 32},
{&atom_cost, 16, 7, 16, 7, 16} {&atom_cost, 16, 7, 16, 7, 16}
}; };
...@@ -2093,7 +2188,8 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = ...@@ -2093,7 +2188,8 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
"athlon", "athlon",
"athlon-4", "athlon-4",
"k8", "k8",
"amdfam10" "amdfam10",
"bdver1"
}; };
/* Implement TARGET_HANDLE_OPTION. */ /* Implement TARGET_HANDLE_OPTION. */
...@@ -2751,6 +2847,11 @@ override_options (bool main_args_p) ...@@ -2751,6 +2847,11 @@ override_options (bool main_args_p)
{"barcelona", PROCESSOR_AMDFAM10, CPU_AMDFAM10, {"barcelona", PROCESSOR_AMDFAM10, CPU_AMDFAM10,
PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
| PTA_SSE2 | PTA_SSE3 | PTA_SSE4A | PTA_CX16 | PTA_ABM}, | PTA_SSE2 | PTA_SSE3 | PTA_SSE4A | PTA_CX16 | PTA_ABM},
{"bdver1", PROCESSOR_BDVER1, CPU_BDVER1,
PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
| PTA_SSE2 | PTA_SSE3 | PTA_SSE4A | PTA_CX16 | PTA_ABM
| PTA_SSSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_AES
| PTA_PCLMUL | PTA_AVX | PTA_FMA4 | PTA_XOP | PTA_LWP},
{"generic32", PROCESSOR_GENERIC32, CPU_PENTIUMPRO, {"generic32", PROCESSOR_GENERIC32, CPU_PENTIUMPRO,
0 /* flags are only used for -march switch. */ }, 0 /* flags are only used for -march switch. */ },
{"generic64", PROCESSOR_GENERIC64, CPU_GENERIC64, {"generic64", PROCESSOR_GENERIC64, CPU_GENERIC64,
...@@ -7469,15 +7570,27 @@ standard_sse_constant_opcode (rtx insn, rtx x) ...@@ -7469,15 +7570,27 @@ standard_sse_constant_opcode (rtx insn, rtx x)
case MODE_V4SF: case MODE_V4SF:
return TARGET_AVX ? "vxorps\t%0, %0, %0" : "xorps\t%0, %0"; return TARGET_AVX ? "vxorps\t%0, %0, %0" : "xorps\t%0, %0";
case MODE_V2DF: case MODE_V2DF:
return TARGET_AVX ? "vxorpd\t%0, %0, %0" : "xorpd\t%0, %0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return TARGET_AVX ? "vxorps\t%0, %0, %0" : "xorps\t%0, %0";
else
return TARGET_AVX ? "vxorpd\t%0, %0, %0" : "xorpd\t%0, %0";
case MODE_TI: case MODE_TI:
return TARGET_AVX ? "vpxor\t%0, %0, %0" : "pxor\t%0, %0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return TARGET_AVX ? "vxorps\t%0, %0, %0" : "xorps\t%0, %0";
else
return TARGET_AVX ? "vpxor\t%0, %0, %0" : "pxor\t%0, %0";
case MODE_V8SF: case MODE_V8SF:
return "vxorps\t%x0, %x0, %x0"; return "vxorps\t%x0, %x0, %x0";
case MODE_V4DF: case MODE_V4DF:
return "vxorpd\t%x0, %x0, %x0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "vxorps\t%x0, %x0, %x0";
else
return "vxorpd\t%x0, %x0, %x0";
case MODE_OI: case MODE_OI:
return "vpxor\t%x0, %x0, %x0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "vxorps\t%x0, %x0, %x0";
else
return "vpxor\t%x0, %x0, %x0";
default: default:
break; break;
} }
...@@ -13233,6 +13346,14 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) ...@@ -13233,6 +13346,14 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
switch (GET_MODE_SIZE (mode)) switch (GET_MODE_SIZE (mode))
{ {
case 16: case 16:
/* If we're optimizing for size, movups is the smallest. */
if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
{
op0 = gen_lowpart (V4SFmode, op0);
op1 = gen_lowpart (V4SFmode, op1);
emit_insn (gen_avx_movups (op0, op1));
return;
}
op0 = gen_lowpart (V16QImode, op0); op0 = gen_lowpart (V16QImode, op0);
op1 = gen_lowpart (V16QImode, op1); op1 = gen_lowpart (V16QImode, op1);
emit_insn (gen_avx_movdqu (op0, op1)); emit_insn (gen_avx_movdqu (op0, op1));
...@@ -13259,6 +13380,13 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) ...@@ -13259,6 +13380,13 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
emit_insn (gen_avx_movups256 (op0, op1)); emit_insn (gen_avx_movups256 (op0, op1));
break; break;
case V2DFmode: case V2DFmode:
if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
{
op0 = gen_lowpart (V4SFmode, op0);
op1 = gen_lowpart (V4SFmode, op1);
emit_insn (gen_avx_movups (op0, op1));
return;
}
emit_insn (gen_avx_movupd (op0, op1)); emit_insn (gen_avx_movupd (op0, op1));
break; break;
case V4DFmode: case V4DFmode:
...@@ -13279,7 +13407,8 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) ...@@ -13279,7 +13407,8 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
if (MEM_P (op1)) if (MEM_P (op1))
{ {
/* If we're optimizing for size, movups is the smallest. */ /* If we're optimizing for size, movups is the smallest. */
if (optimize_insn_for_size_p ()) if (optimize_insn_for_size_p ()
|| TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
{ {
op0 = gen_lowpart (V4SFmode, op0); op0 = gen_lowpart (V4SFmode, op0);
op1 = gen_lowpart (V4SFmode, op1); op1 = gen_lowpart (V4SFmode, op1);
...@@ -13302,13 +13431,13 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) ...@@ -13302,13 +13431,13 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
{ {
rtx zero; rtx zero;
if (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL) if (TARGET_SSE_UNALIGNED_LOAD_OPTIMAL)
{ {
op0 = gen_lowpart (V2DFmode, op0); op0 = gen_lowpart (V2DFmode, op0);
op1 = gen_lowpart (V2DFmode, op1); op1 = gen_lowpart (V2DFmode, op1);
emit_insn (gen_sse2_movupd (op0, op1)); emit_insn (gen_sse2_movupd (op0, op1));
return; return;
} }
/* When SSE registers are split into halves, we can avoid /* When SSE registers are split into halves, we can avoid
writing to the top half twice. */ writing to the top half twice. */
...@@ -13337,12 +13466,12 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) ...@@ -13337,12 +13466,12 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
} }
else else
{ {
if (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL) if (TARGET_SSE_UNALIGNED_LOAD_OPTIMAL)
{ {
op0 = gen_lowpart (V4SFmode, op0); op0 = gen_lowpart (V4SFmode, op0);
op1 = gen_lowpart (V4SFmode, op1); op1 = gen_lowpart (V4SFmode, op1);
emit_insn (gen_sse_movups (op0, op1)); emit_insn (gen_sse_movups (op0, op1));
return; return;
} }
if (TARGET_SSE_PARTIAL_REG_DEPENDENCY) if (TARGET_SSE_PARTIAL_REG_DEPENDENCY)
...@@ -13361,7 +13490,8 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) ...@@ -13361,7 +13490,8 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
else if (MEM_P (op0)) else if (MEM_P (op0))
{ {
/* If we're optimizing for size, movups is the smallest. */ /* If we're optimizing for size, movups is the smallest. */
if (optimize_insn_for_size_p ()) if (optimize_insn_for_size_p ()
|| TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
{ {
op0 = gen_lowpart (V4SFmode, op0); op0 = gen_lowpart (V4SFmode, op0);
op1 = gen_lowpart (V4SFmode, op1); op1 = gen_lowpart (V4SFmode, op1);
...@@ -13382,19 +13512,37 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) ...@@ -13382,19 +13512,37 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
if (TARGET_SSE2 && mode == V2DFmode) if (TARGET_SSE2 && mode == V2DFmode)
{ {
m = adjust_address (op0, DFmode, 0); if (TARGET_SSE_UNALIGNED_STORE_OPTIMAL)
emit_insn (gen_sse2_storelpd (m, op1)); {
m = adjust_address (op0, DFmode, 8); op0 = gen_lowpart (V2DFmode, op0);
emit_insn (gen_sse2_storehpd (m, op1)); op1 = gen_lowpart (V2DFmode, op1);
emit_insn (gen_sse2_movupd (op0, op1));
}
else
{
m = adjust_address (op0, DFmode, 0);
emit_insn (gen_sse2_storelpd (m, op1));
m = adjust_address (op0, DFmode, 8);
emit_insn (gen_sse2_storehpd (m, op1));
}
} }
else else
{ {
if (mode != V4SFmode) if (mode != V4SFmode)
op1 = gen_lowpart (V4SFmode, op1); op1 = gen_lowpart (V4SFmode, op1);
m = adjust_address (op0, V2SFmode, 0);
emit_insn (gen_sse_storelps (m, op1)); if (TARGET_SSE_UNALIGNED_STORE_OPTIMAL)
m = adjust_address (op0, V2SFmode, 8); {
emit_insn (gen_sse_storehps (m, op1)); op0 = gen_lowpart (V4SFmode, op0);
emit_insn (gen_sse_movups (op0, op1));
}
else
{
m = adjust_address (op0, V2SFmode, 0);
emit_insn (gen_sse_storelps (m, op1));
m = adjust_address (op0, V2SFmode, 8);
emit_insn (gen_sse_storehps (m, op1));
}
} }
} }
else else
...@@ -19714,6 +19862,7 @@ ix86_issue_rate (void) ...@@ -19714,6 +19862,7 @@ ix86_issue_rate (void)
case PROCESSOR_NOCONA: case PROCESSOR_NOCONA:
case PROCESSOR_GENERIC32: case PROCESSOR_GENERIC32:
case PROCESSOR_GENERIC64: case PROCESSOR_GENERIC64:
case PROCESSOR_BDVER1:
return 3; return 3;
case PROCESSOR_CORE2: case PROCESSOR_CORE2:
...@@ -19903,6 +20052,7 @@ ix86_adjust_cost (rtx insn, rtx link, rtx dep_insn, int cost) ...@@ -19903,6 +20052,7 @@ ix86_adjust_cost (rtx insn, rtx link, rtx dep_insn, int cost)
case PROCESSOR_ATHLON: case PROCESSOR_ATHLON:
case PROCESSOR_K8: case PROCESSOR_K8:
case PROCESSOR_AMDFAM10: case PROCESSOR_AMDFAM10:
case PROCESSOR_BDVER1:
case PROCESSOR_ATOM: case PROCESSOR_ATOM:
case PROCESSOR_GENERIC32: case PROCESSOR_GENERIC32:
case PROCESSOR_GENERIC64: case PROCESSOR_GENERIC64:
......
...@@ -240,6 +240,7 @@ extern const struct processor_costs ix86_size_cost; ...@@ -240,6 +240,7 @@ extern const struct processor_costs ix86_size_cost;
#define TARGET_GENERIC64 (ix86_tune == PROCESSOR_GENERIC64) #define TARGET_GENERIC64 (ix86_tune == PROCESSOR_GENERIC64)
#define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64) #define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64)
#define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10) #define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10)
#define TARGET_BDVER1 (ix86_tune == PROCESSOR_BDVER1)
#define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM) #define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM)
/* Feature tests against the various tunings. */ /* Feature tests against the various tunings. */
...@@ -277,7 +278,9 @@ enum ix86_tune_indices { ...@@ -277,7 +278,9 @@ enum ix86_tune_indices {
X86_TUNE_INTEGER_DFMODE_MOVES, X86_TUNE_INTEGER_DFMODE_MOVES,
X86_TUNE_PARTIAL_REG_DEPENDENCY, X86_TUNE_PARTIAL_REG_DEPENDENCY,
X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY,
X86_TUNE_SSE_UNALIGNED_MOVE_OPTIMAL, X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL,
X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL,
X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL,
X86_TUNE_SSE_SPLIT_REGS, X86_TUNE_SSE_SPLIT_REGS,
X86_TUNE_SSE_TYPELESS_STORES, X86_TUNE_SSE_TYPELESS_STORES,
X86_TUNE_SSE_LOAD0_BY_PXOR, X86_TUNE_SSE_LOAD0_BY_PXOR,
...@@ -352,8 +355,12 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; ...@@ -352,8 +355,12 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
ix86_tune_features[X86_TUNE_PARTIAL_REG_DEPENDENCY] ix86_tune_features[X86_TUNE_PARTIAL_REG_DEPENDENCY]
#define TARGET_SSE_PARTIAL_REG_DEPENDENCY \ #define TARGET_SSE_PARTIAL_REG_DEPENDENCY \
ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY] ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY]
#define TARGET_SSE_UNALIGNED_MOVE_OPTIMAL \ #define TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
ix86_tune_features[X86_TUNE_SSE_UNALIGNED_MOVE_OPTIMAL] ix86_tune_features[X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL]
#define TARGET_SSE_UNALIGNED_STORE_OPTIMAL \
ix86_tune_features[X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL]
#define TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL \
ix86_tune_features[X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL]
#define TARGET_SSE_SPLIT_REGS ix86_tune_features[X86_TUNE_SSE_SPLIT_REGS] #define TARGET_SSE_SPLIT_REGS ix86_tune_features[X86_TUNE_SSE_SPLIT_REGS]
#define TARGET_SSE_TYPELESS_STORES \ #define TARGET_SSE_TYPELESS_STORES \
ix86_tune_features[X86_TUNE_SSE_TYPELESS_STORES] ix86_tune_features[X86_TUNE_SSE_TYPELESS_STORES]
...@@ -591,6 +598,7 @@ enum target_cpu_default ...@@ -591,6 +598,7 @@ enum target_cpu_default
TARGET_CPU_DEFAULT_athlon_sse, TARGET_CPU_DEFAULT_athlon_sse,
TARGET_CPU_DEFAULT_k8, TARGET_CPU_DEFAULT_k8,
TARGET_CPU_DEFAULT_amdfam10, TARGET_CPU_DEFAULT_amdfam10,
TARGET_CPU_DEFAULT_bdver1,
TARGET_CPU_DEFAULT_max TARGET_CPU_DEFAULT_max
}; };
...@@ -2193,6 +2201,7 @@ enum processor_type ...@@ -2193,6 +2201,7 @@ enum processor_type
PROCESSOR_GENERIC32, PROCESSOR_GENERIC32,
PROCESSOR_GENERIC64, PROCESSOR_GENERIC64,
PROCESSOR_AMDFAM10, PROCESSOR_AMDFAM10,
PROCESSOR_BDVER1,
PROCESSOR_ATOM, PROCESSOR_ATOM,
PROCESSOR_max PROCESSOR_max
}; };
......
...@@ -343,7 +343,7 @@ ...@@ -343,7 +343,7 @@
;; Processor type. ;; Processor type.
(define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,atom, (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,atom,
generic64,amdfam10" generic64,amdfam10,bdver1"
(const (symbol_ref "ix86_schedule"))) (const (symbol_ref "ix86_schedule")))
;; A basic instruction type. Refinements due to arguments to be ;; A basic instruction type. Refinements due to arguments to be
...@@ -3113,9 +3113,15 @@ ...@@ -3113,9 +3113,15 @@
case MODE_V4SF: case MODE_V4SF:
return "%vxorps\t%0, %d0"; return "%vxorps\t%0, %d0";
case MODE_V2DF: case MODE_V2DF:
return "%vxorpd\t%0, %d0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "%vxorps\t%0, %d0";
else
return "%vxorpd\t%0, %d0";
case MODE_TI: case MODE_TI:
return "%vpxor\t%0, %d0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "%vxorps\t%0, %d0";
else
return "%vpxor\t%0, %d0";
default: default:
gcc_unreachable (); gcc_unreachable ();
} }
...@@ -3127,9 +3133,15 @@ ...@@ -3127,9 +3133,15 @@
case MODE_V4SF: case MODE_V4SF:
return "%vmovaps\t{%1, %0|%0, %1}"; return "%vmovaps\t{%1, %0|%0, %1}";
case MODE_V2DF: case MODE_V2DF:
return "%vmovapd\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "%vmovaps\t{%1, %0|%0, %1}";
else
return "%vmovapd\t{%1, %0|%0, %1}";
case MODE_TI: case MODE_TI:
return "%vmovdqa\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "%vmovaps\t{%1, %0|%0, %1}";
else
return "%vmovdqa\t{%1, %0|%0, %1}";
case MODE_DI: case MODE_DI:
return "%vmovq\t{%1, %0|%0, %1}"; return "%vmovq\t{%1, %0|%0, %1}";
case MODE_DF: case MODE_DF:
...@@ -3263,9 +3275,15 @@ ...@@ -3263,9 +3275,15 @@
case MODE_V4SF: case MODE_V4SF:
return "%vxorps\t%0, %d0"; return "%vxorps\t%0, %d0";
case MODE_V2DF: case MODE_V2DF:
return "%vxorpd\t%0, %d0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "%vxorps\t%0, %d0";
else
return "%vxorpd\t%0, %d0";
case MODE_TI: case MODE_TI:
return "%vpxor\t%0, %d0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "%vxorps\t%0, %d0";
else
return "%vpxor\t%0, %d0";
default: default:
gcc_unreachable (); gcc_unreachable ();
} }
...@@ -3277,9 +3295,15 @@ ...@@ -3277,9 +3295,15 @@
case MODE_V4SF: case MODE_V4SF:
return "%vmovaps\t{%1, %0|%0, %1}"; return "%vmovaps\t{%1, %0|%0, %1}";
case MODE_V2DF: case MODE_V2DF:
return "%vmovapd\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "%vmovaps\t{%1, %0|%0, %1}";
else
return "%vmovapd\t{%1, %0|%0, %1}";
case MODE_TI: case MODE_TI:
return "%vmovdqa\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "%vmovaps\t{%1, %0|%0, %1}";
else
return "%vmovdqa\t{%1, %0|%0, %1}";
case MODE_DI: case MODE_DI:
return "%vmovq\t{%1, %0|%0, %1}"; return "%vmovq\t{%1, %0|%0, %1}";
case MODE_DF: case MODE_DF:
...@@ -3403,9 +3427,15 @@ ...@@ -3403,9 +3427,15 @@
case MODE_V4SF: case MODE_V4SF:
return "xorps\t%0, %0"; return "xorps\t%0, %0";
case MODE_V2DF: case MODE_V2DF:
return "xorpd\t%0, %0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "xorps\t%0, %0";
else
return "xorpd\t%0, %0";
case MODE_TI: case MODE_TI:
return "pxor\t%0, %0"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "xorps\t%0, %0";
else
return "pxor\t%0, %0";
default: default:
gcc_unreachable (); gcc_unreachable ();
} }
...@@ -3417,9 +3447,15 @@ ...@@ -3417,9 +3447,15 @@
case MODE_V4SF: case MODE_V4SF:
return "movaps\t{%1, %0|%0, %1}"; return "movaps\t{%1, %0|%0, %1}";
case MODE_V2DF: case MODE_V2DF:
return "movapd\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "movaps\t{%1, %0|%0, %1}";
else
return "movapd\t{%1, %0|%0, %1}";
case MODE_TI: case MODE_TI:
return "movdqa\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "movaps\t{%1, %0|%0, %1}";
else
return "movdqa\t{%1, %0|%0, %1}";
case MODE_DI: case MODE_DI:
return "movq\t{%1, %0|%0, %1}"; return "movq\t{%1, %0|%0, %1}";
case MODE_DF: case MODE_DF:
......
...@@ -194,9 +194,15 @@ ...@@ -194,9 +194,15 @@
return "vmovaps\t{%1, %0|%0, %1}"; return "vmovaps\t{%1, %0|%0, %1}";
case MODE_V4DF: case MODE_V4DF:
case MODE_V2DF: case MODE_V2DF:
return "vmovapd\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "vmovaps\t{%1, %0|%0, %1}";
else
return "vmovapd\t{%1, %0|%0, %1}";
default: default:
return "vmovdqa\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "vmovaps\t{%1, %0|%0, %1}";
else
return "vmovdqa\t{%1, %0|%0, %1}";
} }
default: default:
gcc_unreachable (); gcc_unreachable ();
...@@ -236,9 +242,15 @@ ...@@ -236,9 +242,15 @@
case MODE_V4SF: case MODE_V4SF:
return "movaps\t{%1, %0|%0, %1}"; return "movaps\t{%1, %0|%0, %1}";
case MODE_V2DF: case MODE_V2DF:
return "movapd\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "movaps\t{%1, %0|%0, %1}";
else
return "movapd\t{%1, %0|%0, %1}";
default: default:
return "movdqa\t{%1, %0|%0, %1}"; if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "movaps\t{%1, %0|%0, %1}";
else
return "movdqa\t{%1, %0|%0, %1}";
} }
default: default:
gcc_unreachable (); gcc_unreachable ();
...@@ -1611,7 +1623,12 @@ ...@@ -1611,7 +1623,12 @@
(match_operand:AVXMODEF2P 2 "nonimmediate_operand" "xm")))] (match_operand:AVXMODEF2P 2 "nonimmediate_operand" "xm")))]
"AVX_VEC_FLOAT_MODE_P (<MODE>mode) "AVX_VEC_FLOAT_MODE_P (<MODE>mode)
&& ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)" && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
"v<logic>p<avxmodesuffixf2c>\t{%2, %1, %0|%0, %1, %2}" {
if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "v<logic>ps\t{%2, %1, %0|%0, %1, %2}";
else
return "v<logic>p<avxmodesuffixf2c>\t{%2, %1, %0|%0, %1, %2}";
}
[(set_attr "type" "sselog") [(set_attr "type" "sselog")
(set_attr "prefix" "vex") (set_attr "prefix" "vex")
(set_attr "mode" "<avxvecmode>")]) (set_attr "mode" "<avxvecmode>")])
...@@ -1631,7 +1648,12 @@ ...@@ -1631,7 +1648,12 @@
(match_operand:SSEMODEF2P 2 "nonimmediate_operand" "xm")))] (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "xm")))]
"SSE_VEC_FLOAT_MODE_P (<MODE>mode) "SSE_VEC_FLOAT_MODE_P (<MODE>mode)
&& ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)" && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
"<logic>p<ssemodesuffixf2c>\t{%2, %0|%0, %2}" {
if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "<logic>ps\t{%2, %0|%0, %2}";
else
return "<logic>p<ssemodesuffixf2c>\t{%2, %0|%0, %2}";
}
[(set_attr "type" "sselog") [(set_attr "type" "sselog")
(set_attr "mode" "<MODE>")]) (set_attr "mode" "<MODE>")])
...@@ -1687,7 +1709,12 @@ ...@@ -1687,7 +1709,12 @@
(match_operand:MODEF 1 "register_operand" "x") (match_operand:MODEF 1 "register_operand" "x")
(match_operand:MODEF 2 "register_operand" "x")))] (match_operand:MODEF 2 "register_operand" "x")))]
"AVX_FLOAT_MODE_P (<MODE>mode)" "AVX_FLOAT_MODE_P (<MODE>mode)"
"v<logic>p<ssemodefsuffix>\t{%2, %1, %0|%0, %1, %2}" {
if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "v<logic>ps\t{%2, %1, %0|%0, %1, %2}";
else
return "v<logic>p<ssemodefsuffix>\t{%2, %1, %0|%0, %1, %2}";
}
[(set_attr "type" "sselog") [(set_attr "type" "sselog")
(set_attr "prefix" "vex") (set_attr "prefix" "vex")
(set_attr "mode" "<ssevecmode>")]) (set_attr "mode" "<ssevecmode>")])
...@@ -1698,7 +1725,12 @@ ...@@ -1698,7 +1725,12 @@
(match_operand:MODEF 1 "register_operand" "0") (match_operand:MODEF 1 "register_operand" "0")
(match_operand:MODEF 2 "register_operand" "x")))] (match_operand:MODEF 2 "register_operand" "x")))]
"SSE_FLOAT_MODE_P (<MODE>mode)" "SSE_FLOAT_MODE_P (<MODE>mode)"
"<logic>p<ssemodefsuffix>\t{%2, %0|%0, %2}" {
if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
return "<logic>ps\t{%2, %0|%0, %2}";
else
return "<logic>p<ssemodefsuffix>\t{%2, %0|%0, %2}";
}
[(set_attr "type" "sselog") [(set_attr "type" "sselog")
(set_attr "mode" "<ssevecmode>")]) (set_attr "mode" "<ssevecmode>")])
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment