Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
T
tic
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
wenyuanbo
tic
Commits
52c8db5b
Commit
52c8db5b
authored
Jul 27, 2017
by
Jian Weng
Committed by
Tianqi Chen
Jul 27, 2017
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[TUTORIAL] gemm tutorial image add! (#276)
* image add! * image path move to web-data
parent
095ce875
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
14 additions
and
46 deletions
+14
-46
tutorials/python/opt_gemm.py
+14
-46
No files found.
tutorials/python/opt_gemm.py
View file @
52c8db5b
...
...
@@ -114,54 +114,20 @@ print('Opt2: %f' % evaluator(a, b, c).mean)
# -------------
# Another important trick is array packing. This trick is to reorder the storage dimension of the
# array to convert the continuous access pattern on certain dimension to a sequential pattern after
# flattening. For the convienience of drawing a figure, we use 4x4 blocking as an example to
# demonstrate array packing:
# flattening.
#
# .. image:: https://github.com/dmlc/web-data/raw/master/tvm/tutorial/array-packing.png
# :align: center
# :scale: 100%
#
# First we observe memory access pattern of AB=C:
# A: B: C:
# ---- ---- ---- ---- |||| **** **** **** **** ++++ **** **** **** ****
# ---- ---- ---- ---- |||| **** **** **** **** ++++ **** **** **** ****
# ---- ---- ---- ---- |||| **** **** **** **** ++++ **** **** **** ****
# ---- ---- ---- ---- |||| **** **** **** **** ++++ **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# **** **** **** **** |||| **** **** **** **** **** **** **** **** ****
# We access A sequentially, but for B, we access it continuous on dimension of rows. Thus, what we
# want to do is to put this dimension to the inner most dimension. For 1x1 blocking, it is simply
# to transpose the matrix B. However, here is 4x4 case, array B is packed in this fashion:
# B:
# 0123 4567 89AB CDEF 0: 1234 1: 1234 2: 1234 3: 1234
# 0 |||| **** **** **** 0 |||| **** **** ****
# 1 |||| **** **** **** 1 |||| **** **** ****
# 2 |||| **** **** **** 2 |||| **** **** ****
# 3 |||| **** **** **** 3 |||| **** **** ****
# 4 |||| **** **** **** 4 |||| **** **** ****
# 5 |||| **** **** **** 5 |||| **** **** ****
# 6 |||| **** **** **** 6 |||| **** **** ****
# 7 |||| **** **** **** -> 7 |||| **** **** ****
# 8 |||| **** **** **** 8 |||| **** **** ****
# 9 |||| **** **** **** 9 |||| **** **** ****
# A |||| **** **** **** A |||| **** **** ****
# B |||| **** **** **** B |||| **** **** ****
# C |||| **** **** **** C |||| **** **** ****
# D |||| **** **** **** D |||| **** **** ****
# E |||| **** **** **** E |||| **** **** ****
# F |||| **** **** **** F |||| **** **** ****
###################################################################################################
# We reorder a 16x16 array to a [16/4][16][4] array so that the access pattern of B will be
# sequential when grabing the corresponding value from the packed array.
# Just as it is shown in the figure above, after blocking the computations, we can observe the array
# access pattern of B (after flattening), which is regular but discontinuous. We expect that after
# some transformation we can get continuous access pattern. We can reorder a [16][16] array to
# a [16/4][16][4] array, so that the access pattern of B will be sequential when grabing
# the corresponding value from the packed array.
#
# We have to re-write the algorithm slightly.
...
...
@@ -186,12 +152,14 @@ print('Opt3: %f' % evaluator(a, b, c).mean)
##################################################################################################
# Summary
# -------
# After applying three main tricks, we can
getnerly
90% performance of numpy. Further observation is
# After applying three main tricks, we can
almost
90% performance of numpy. Further observation is
# required to catch up with the performance of numpy.
#
# TODO(Jian Weng): Catch up with the performance of numpy.
_a
=
a
.
asnumpy
()
_b
=
b
.
asnumpy
()
now
=
time
.
clock
()
answer
=
numpy
.
dot
(
a
.
asnumpy
(),
b
.
asnumpy
()
)
answer
=
numpy
.
dot
(
_a
,
_b
)
print
(
"Numpy:
%
f"
%
(
time
.
clock
()
-
now
))
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment