Unverified Commit 7f8123c4 by Sayak Kundu Committed by GitHub

Merge pull request #34 from TILOS-AI-Institute/flow_scripts

Flow scripts
parents 8cd04019 dcd039a9
......@@ -5,11 +5,11 @@ We thank Google engineers for Q&A in a shared document, as well as live discussi
that have explained aspects of several of the following code elements used in Circuit Training.
All errors of understanding and implementation are the authors'.
We will rectify such errors as soon as possible after being made aware of them.
- [Gridding](./CodeElements/Gridding/) determines a dissection of the layout canvas into some number of rows (*n_rows*) and some number of columns (*n_cols*) of _gridcells_. In Circuit Training, the purpose of gridding is to control the size of the macro placement solution space,
- [Gridding](../../CodeElements/Gridding/) determines a dissection of the layout canvas into some number of rows (*n_rows*) and some number of columns (*n_cols*) of _gridcells_. In Circuit Training, the purpose of gridding is to control the size of the macro placement solution space,
thus allowing RL to train within reasonable runtimes. Gridding enables hard macros to find locations consistent with high solution quality, while allowing soft macros (standard-cell clusters) to also find good locations.
- [Grouping](./CodeElements/Grouping/) is to ensure that closely-related standard-cell logic,
- [Grouping](../../CodeElements/Grouping/) is to ensure that closely-related standard-cell logic,
which connect to the same macro or the same clump of IO (noted as IO cluster), belong to the same standard-cell clusters.
- [Hypergraph clustering](./CodeElements/Clustering/) clusters millions of standard cells into a few thousand clusters. In Circuit Training, the purpose of clustering is to enable an approximate but fast standard cell placement that facilitates policy network optimization.
- [Hypergraph clustering](../../CodeElements/Clustering/) clusters millions of standard cells into a few thousand clusters. In Circuit Training, the purpose of clustering is to enable an approximate but fast standard cell placement that facilitates policy network optimization.
We are glad to see [grouping (clustering)](https://github.com/google-research/circuit_training/tree/main/circuit_training/grouping) added to the Circuit Training GitHub.
However, these [grouping (clustering)](https://github.com/google-research/circuit_training/tree/main/circuit_training/grouping) scripts still rely on the wrapper functions of plc client, which is a black box for the community. In this doc, we document the implementation details of gridding, grouping and clustering. We implement all the code elements from scratch using python scripts, and our results match exactly that of Circuit Training.
......
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
module fakeram_32x32_dp
(
QA,
CLKA,
CENA,
AA,
CLKB,
CENB,
AB,
DB,
STOV,
EMAA,
EMASA,
EMAB,
RET1N
);
input CLKA;
input CLKB;
input CENA;
input [4:0] AA;
output [31:0] QA;
input CENB;
input [4:0] AB;
input [31:0] DB;
input STOV;
input [2:0] EMAA;
input EMASA;
input [2:0] EMAB;
input RET1N;
assign STOV = 1'b0;
assign EMASA = 1'b0;
assign EMAA = 3'b010; // Extra margin adjustment A: Default for 0.8V
assign EMAB = 3'b010; // Extra margin adjustment B: Default for 0.8V
assign RET1N = 1'b1;
wire [31:0] QB;
wire [31:0] QA1;
fakeram45_32x32 rmod_a
(
.rd_out(QA1),
.addr_in(AA),
.we_in(~CENA),
.wd_in(DB), //dummy
.w_mask_in(DB), //dummy
.clk(CLKA),
.ce_in(CENA)
);
fakeram45_32x32 rmod_b
(
.rd_out(QB), //dummy
.addr_in(AB),
.we_in(CENB),
.wd_in(DB),
.w_mask_in(DB),
.clk(CLKB),
.ce_in(CENB)
);
genvar k;
generate
for (k = 0; k < 32; k=k+1) begin
assign QA[k] = (~CENB & QB[k]) | (CENB & QA1[k]);
end
endgenerate
endmodule
......@@ -43,8 +43,8 @@ proc extract_from_power_rpt {power_rpt} {
}
proc extract_cell_area {} {
set macro_area [expr [join [dbget [dbget top.insts.cell.name *ram* -p2 ].area ] +]]
set std_cell_area [expr [join [dbget [dbget top.insts.cell.name *ram* -v -p2 ].area ] +]]
set macro_area [expr [join [dbget [dbget top.insts.cell.subClass block -p2 ].area ] +]]
set std_cell_area [expr [join [dbget [dbget top.insts.cell.subClass block -v -p2 ].area ] +]]
return [list $macro_area $std_cell_area]
}
......
......@@ -8,6 +8,8 @@ setenv SYN_HANDOFF $argv[1]
if ($#argv == 3) then
setenv pb_netlist $argv[2]
setenv plc_file $argv[3]
else if ($#argv == 2) then
setenv pl_file $argv[2]
else
echo "Required clustered netlist and plc file to generate macro placed defs"
endif
......
......@@ -56,9 +56,12 @@ if {[info exist ::env(pb_netlist)] &&
exec /home/sakundu/.conda/envs/py-tf/bin/python3.8 \
../../../../util/plc_pb_to_placement_tcl.py $::env(plc_file) $::env(pb_netlist) \
"macro_place.tcl" $origin_x $origin_y
source macro_place.tcl
} elseif { [info exist ::env(pl_file)] && [file exist $::env(pl_file)] } {
source ../../../../util/place_from_pl.tcl
place_macro_from_pl $::env(pl_file)
}
source macro_place.tcl
if { [info exist ::env(run_refine_macro_place)] && $::env(run_refine_macro_place) == 1 } {
dbset [dbget top.insts.cell.subClass block -p2 ].pStatus placed
......
......@@ -129,6 +129,8 @@ proc write_node_port { port_ptr fp {origin_x 0} {origin_y 0} } {
if {$side == "top" || $side == "bottom"} {
set X [expr $X - $origin_x]
} elseif { $side == "right" } {
set X [expr $X - 2*$origin_x]
}
print_float $fp "x" $X
......@@ -139,7 +141,10 @@ proc write_node_port { port_ptr fp {origin_x 0} {origin_y 0} } {
set Y [expr ($Y1 + $Y2)/2]
if {$side == "left" || $side == "right"} {
set Y [expr $Y - $origin_y]
} elseif { $side == "top" } {
set Y [expr $Y - 2*$origin_y]
}
print_float $fp "y" $Y
puts $fp "}"
......
setPinAssignMode -pinEditInBatch true
set group_width [dbget top.fplan.box_sizex]
set group_height [dbget top.fplan.box_sizey]
set NUM_GROUPS 4
set NUM_TILES 16
proc place_tcdm_top_bus {bus direction index width start end layers} {
global NUM_TILES
global group_height
# Number of ports per tile
set ports_per_tile [expr $width / $NUM_TILES]
# Which ports are we placing?
set ports []
# Shuffle the tiles to have the ports organized from left to right
set tile_shuffle [list 0 1 8 9 2 3 10 11 4 5 12 13 6 7 14 15]
if {$direction == "in"} {
for {set tile 0} {$tile < $NUM_TILES} {incr tile} {
set tile_sh [lindex $tile_shuffle $tile]
for {set idx 0} {$idx < $ports_per_tile} {incr idx} {
set act_idx [expr $index*$width + $ports_per_tile*$tile_sh + $ports_per_tile - 1 - $idx]
set port_name [lindex [dbget top.terms.name ${bus}_i[$act_idx] -e] 0]
if { $port_name != "" } {
lappend ports $port_name
}
}
foreach_in_collection port [get_ports ${bus}_valid_i[[expr $tile_sh + $index*$NUM_TILES]]] {
lappend ports [get_object_name $port]
}
foreach_in_collection port [get_ports ${bus}_ready_o[[expr $tile_sh + $index*$NUM_TILES]]] {
lappend ports [get_object_name $port]
}
}
} else {
for {set tile 0} {$tile < $NUM_TILES} {incr tile} {
set tile_sh [lindex $tile_shuffle $tile]
for {set idx 0} {$idx < $ports_per_tile} {incr idx} {
set act_idx [expr $index*$width + $ports_per_tile*$tile_sh + $ports_per_tile - 1 - $idx]
set port_name [lindex [dbget top.terms.name ${bus}_o[$act_idx] -e] 0]
if { $port_name != "" } {
lappend ports $port_name
}
}
foreach_in_collection port [get_ports ${bus}_valid_o[[expr $tile_sh + $index*$NUM_TILES]]] {
lappend ports [get_object_name $port]
}
foreach_in_collection port [get_ports ${bus}_ready_i[[expr $tile_sh + $index*$NUM_TILES]]] {
lappend ports [get_object_name $port]
}
}
}
set num_ports [llength $ports]
set offset [expr ($end - $start)/$num_ports]
set y [dbget top.fplan.box_ury]
set pt1 [list $start $y]
set pt2 [list $end $y]
editPin -pin $ports -edge 1 -start $pt1 -end $pt2 -fixedPin -layer $layers -spreadDirection clockwise -pattern fill_optimised
}
proc place_tcdm_left_bus {bus direction index width start end layers} {
global NUM_TILES
global group_height
# Number of ports per tile
set ports_per_tile [expr $width / $NUM_TILES]
# Shuffle the tiles to have the ports organized from top to bottom
set tile_shuffle [list 8 9 12 13 10 11 14 15 2 3 6 7 0 1 4 5]
# Which ports are we placing?
set ports []
if {$direction == "in"} {
for {set tile 0} {$tile < $NUM_TILES} {incr tile} {
set tile_sh [lindex $tile_shuffle $tile]
for {set idx 0} {$idx < $ports_per_tile} {incr idx} {
set act_idx [expr $index*$width + $ports_per_tile*$tile_sh + $ports_per_tile - 1 - $idx]
set port_name [lindex [dbget top.terms.name ${bus}_i[$act_idx] -e] 0]
if { $port_name != "" } {
lappend ports $port_name
}
}
foreach_in_collection port [get_ports ${bus}_valid_i[[expr $tile_sh + $index*$NUM_TILES]]] {
lappend ports [get_object_name $port]
}
foreach_in_collection port [get_ports ${bus}_ready_o[[expr $tile_sh + $index*$NUM_TILES]]] {
lappend ports [get_object_name $port]
}
}
} else {
for {set tile 0} {$tile < $NUM_TILES} {incr tile} {
set tile_sh [lindex $tile_shuffle $tile]
for {set idx 0} {$idx < $ports_per_tile} {incr idx} {
set act_idx [expr $index*$width + $ports_per_tile*$tile_sh + $ports_per_tile - 1 - $idx]
set port_name [lindex [dbget top.terms.name ${bus}_o[$act_idx] -e] 0]
if { $port_name != "" } {
lappend ports $port_name
}
}
foreach_in_collection port [get_ports ${bus}_valid_o[[expr $tile_sh + $index*$NUM_TILES]]] {
lappend ports [get_object_name $port]
}
foreach_in_collection port [get_ports ${bus}_ready_i[[expr $tile_sh + $index*$NUM_TILES]]] {
lappend ports [get_object_name $port]
}
}
}
set num_ports [llength $ports]
set offset [expr ($end - $start)/$num_ports]
set x [dbget top.fplan.box_urx]
set pt1 [list $x $start]
set pt2 [list $x $end]
editPin -pin $ports -edge 2 -start $pt1 -end $pt2 -fixedPin -layer $layers -spreadDirection counterclockwise -pattern fill_optimised
}
set master_req_bus_length [expr [llength [get_object_name [get_ports tcdm_master_req_o*]]] / ($NUM_GROUPS - 1)]
set master_resp_bus_length [expr [llength [get_object_name [get_ports tcdm_master_resp_i*]]] / ($NUM_GROUPS - 1)]
set slave_req_bus_length [expr [llength [get_object_name [get_ports tcdm_slave_req_i*]]] / ($NUM_GROUPS - 1)]
set slave_resp_bus_length [expr [llength [get_object_name [get_ports tcdm_slave_resp_o*]]] / ($NUM_GROUPS - 1)]
# North
place_tcdm_top_bus tcdm_master_req out 1 $master_req_bus_length [expr 0.35*$group_width] [expr 0.51*$group_width] [list metal4 metal6]
place_tcdm_top_bus tcdm_slave_req in 1 $slave_req_bus_length [expr 0.35*$group_width] [expr 0.51*$group_width] [list metal6 metal8]
place_tcdm_top_bus tcdm_slave_resp out 1 $slave_resp_bus_length [expr 0.51*$group_width] [expr 0.67*$group_width] [list metal6 metal8]
place_tcdm_top_bus tcdm_master_resp in 1 $master_resp_bus_length [expr 0.51*$group_width] [expr 0.67*$group_width] [list metal4 metal6]
# Northeast
place_tcdm_top_bus tcdm_slave_req in 2 $slave_req_bus_length [expr 0.67*$group_width] [expr 0.83*$group_width] [list metal4 metal6]
place_tcdm_top_bus tcdm_master_resp in 2 $master_resp_bus_length [expr 0.67*$group_width] [expr 0.83*$group_width] [list metal6 metal8]
# set ports []
# lappend ports [get_object_name [get_ports clk_i* ]]
# lappend ports [get_object_name [get_ports rst_ni*]]
# set ports [list $ports [get_object_name [get_ports scan*]]]
# lappend ports [get_object_name [get_ports testmode*]]
# puts "Ports are $ports"
set ports {clk_i rst_ni scan_enable_i scan_data_i scan_data_o testmode_i dma_meta_o[0] dma_meta_o[1] group_id_i[0] group_id_i[1]}
set y [dbget top.fplan.box_ury]
set x1 [expr 0.45*$group_width]
set x2 [expr 0.55*$group_width]
set pt1 [list $x1 $y]
set pt2 [list $x2 $y]
editPin -pin $ports -edge 1 -start $pt1 -end $pt2 -fixedPin -layer [list metal4 metal6] -spreadDirection clockwise -pattern fill_optimised
# Northeast
place_tcdm_left_bus tcdm_master_req out 2 $master_req_bus_length [expr 0.17*$group_height] [expr 0.33*$group_height] [list metal3 metal5]
place_tcdm_left_bus tcdm_slave_resp out 2 $slave_resp_bus_length [expr 0.17*$group_height] [expr 0.33*$group_height] [list metal5 metal7]
# East
place_tcdm_left_bus tcdm_master_req out 0 $master_req_bus_length [expr 0.33*$group_height] [expr 0.49*$group_height] [list metal3 metal5]
place_tcdm_left_bus tcdm_slave_req in 0 $slave_req_bus_length [expr 0.33*$group_height] [expr 0.49*$group_height] [list metal5 metal7]
place_tcdm_left_bus tcdm_master_resp in 0 $master_resp_bus_length [expr 0.49*$group_height] [expr 0.65*$group_height] [list metal3 metal5]
place_tcdm_left_bus tcdm_slave_resp out 0 $slave_resp_bus_length [expr 0.49*$group_height] [expr 0.65*$group_height] [list metal5 metal7]
set ports [dbget top.terms.name *dma_req*]
set x [dbget top.fplan.box_urx]
set y1 [expr 0.66*$group_height]
set y2 [expr 0.80*$group_height]
set pt1 [list $x $y1]
set pt2 [list $x $y2]
editPin -pin $ports -edge 2 -start $pt1 -end $pt2 -fixedPin -layer [list metal3 metal5] -spreadDirection counterclockwise -pattern fill_optimised
set ports [dbget top.terms.name *wake_up*]
set x [dbget top.fplan.box_llx]
set y1 [expr 0.17*$group_height]
set y2 [expr 0.32*$group_height]
set pt1 [list $x $y1]
set pt2 [list $x $y2]
editPin -pin $ports -edge 0 -start $pt1 -end $pt2 -fixedPin -layer [list metal3 metal5] -spreadDirection clockwise -pattern fill_optimised
set ports [dbget top.terms.name -regexp ".*axi.*|.*ro_cache.*"]
set x [dbget top.fplan.box_llx]
set y1 [expr 0.33*$group_height]
set y2 [expr 0.67*$group_height]
set pt1 [list $x $y1]
set pt2 [list $x $y2]
editPin -pin $ports -edge 0 -start $pt1 -end $pt2 -fixedPin -layer [list metal3 metal5] -spreadDirection clockwise -pattern fill_optimised
setPinAssignMode -pinEditInBatch false
......@@ -21,22 +21,28 @@ proc get_orient { tmp_orient } {
return $orient
}
proc place_macro_from_pl {file_path} {
proc place_macro_from_pl {file_path {place_std 0}} {
set dbu 100
set fp [open $file_path r]
while { [gets $fp line] >= 0} {
if {[llength $line] == 7} {
if {[llength $line] >= 6 && [llength $line] <= 7} {
set inst_name [lindex $line 0]
puts "$inst_name"
set pt_x [expr [lindex $line 1]/$dbu]
set pt_y [expr [lindex $line 2]/$dbu]
set tmp_orient [expr [lindex $line 5]]
set orient [get_orient $tmp_orient]
#puts "$inst_name"
if {[dbget top.insts.name $inst_name -e] == ""} {
puts "\[ERROR\] $inst_name does not exists."
#puts "\[ERROR\] $inst_name does not exists."
} else {
placeInstance $inst_name $pt_x $pt_y $orient -fixed
set pt_x [expr [lindex $line 1]/$dbu]
set pt_y [expr [lindex $line 2]/$dbu]
set tmp_orient [expr [lindex $line 5]]
set orient [get_orient $tmp_orient]
if {[dbget [dbget top.insts.name $inst_name -p ].cell.subClass block -e] != "" } {
puts "Placing $inst_name"
placeInstance $inst_name $pt_x $pt_y $orient -placed
}
if { $place_std == 1 && [dbget [dbget top.insts.name $inst_name -p ].cell.subClass core -e] != "" } {
placeInstance $inst_name $pt_x $pt_y
}
}
}
}
}
\ No newline at end of file
}
......@@ -11,7 +11,7 @@ if [ $PHY_SYNTH -eq 1 ]; then
export HMETIS_DIR="/home/zf4_projects/DREAMPlace/sakundu/GB/CT/hmetis-1.5-linux"
export PLC_WRAPPER_MAIN="/home/zf4_projects/DREAMPlace/sakundu/GB/CT/plc_wrapper_main"
#export CT_PATH="${PROJ_DIR}/../../../GB/CT/circuit_training"
#export CT_PATH="/home/zf4_projects/DREAMPlace/sakundu/ABK_MP/CT/09092022/circuit_training"
export CT_PATH="/home/zf4_projects/macro_placer/google_brain/TILOS_repo/grouping/circuit_training"
export CT_PATH="/home/zf4_projects/DREAMPlace/sakundu/ABK_MP/CT/09092022/circuit_training"
#export CT_PATH="/home/zf4_projects/macro_placer/google_brain/TILOS_repo/grouping/circuit_training"
bash -i ../../../../util/run_grp.sh 2>&1 | tee log/grouping.log
fi
......@@ -26,6 +26,8 @@ The list of available [testcases](./Testcases) is as follows.
- [RTL files for Mempool group design](./Testcases/mempool/)
- NVDLA (RTL)
- [RTL files for NVDLA Partition *c*](./Testcases/nvdla/)
- BlackParrot (RTL)
- [RTL files for BlackParrot](./Testcases/bp_quad)
In the [Nature Paper](https://www.nature.com/articles/s41586-021-03544-w), the authors report results for an Ariane design with 133 memory (256x16, single ported SRAM) macros. We observe that synthesizing from the available Ariane RTL in the [lowRISC](https://github.com/lowRISC/ariane) GitHub repository using 256x16 memories results in an Ariane design that has 136 memory macros. We outline the steps to instantiate the memories for Ariane 136 [here](./Testcases/ariane136/) and we show how we convert the Ariane 136 design to an Ariane 133 design that matches Google's memory macros count [here](./Testcases/ariane133/).
......@@ -55,10 +57,20 @@ We provide flop count, macro type and macro count for all the testcases in the t
<td class="tg-0lax">(256x32-bit SRAM) x 16 + (64x64-bit SRAM) x 4</td>
</tr>
<tr>
<td class="tg-0lax"><a href="./Testcases/mempool">MemPool Group</a></td>
<td class="tg-0lax">360724</td>
<td class="tg-0lax">(256x32-bit SRAM) x 256 + (64x64-bit SRAM) x 64 + (128x256-bit SRAM) x 2 + (128x32-bit SRAM) x 2</td>
</tr>
<tr>
<td class="tg-0lax"><a href="./Testcases/nvdla">NVDLA</a></td>
<td class="tg-0lax">45295</td>
<td class="tg-0lax">(256x64-bit SRAM) x 128</td>
</tr>
<tr>
<td class="tg-0lax"><a href="./Testcases/bp_quad">BlackParror</a></td>
<td class="tg-0lax">214441</td>
<td class="tg-0lax">(512x64-bit SRAM) x 128 + (64x62-bit SRAM) x 32 + (32x32-bit SRAM) x 32 + (64x124-bit SRAM) x 16 + (128x16-bit SRAM) x 8 + (256x48-bit SRAM) x 4</td>
</tr>
</tbody>
</table>
......@@ -165,6 +177,21 @@ In the following table, we provide the status details of each testcase on each o
<td class="tg-0lax">N/A</td>
</tr>
<tr>
<td class="tg-0lax">MemPool Group</td>
<td class="tg-0lax"><a href="./Flows/NanGate45/mempool_group">Link</a></td>
<td class="tg-0lax"><a href="./Flows/NanGate45/mempool_group">Link</a></td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
</tr>
<tr>
<td class="tg-0lax">NVDLA</td>
<td class="tg-0lax"><a href="./Flows/NanGate45/nvdla">Link</a></td>
<td class="tg-0lax"><a href="./Flows/NanGate45/nvdla">Link</a></td>
......@@ -179,6 +206,21 @@ In the following table, we provide the status details of each testcase on each o
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
</tr>
<tr>
<td class="tg-0lax">black parrot</td>
<td class="tg-0lax"><a href="./Flows/NanGate45/bp_quad">Link</a></td>
<td class="tg-0lax"><a href="./Flows/NanGate45/bp_quad">Link</a></td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
<td class="tg-0lax">N/A</td>
</tr>
</tbody>
</table>
......
# Netlist preparation of BlackParrot quad core
BlackParrot is a RISC-V multicore design. We use the verilog netlist of BlackParrot quad core design from the [OpenROAD GitHub](https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/tree/master/flow/designs/src/black_parrot) repo. We will provide details setup to prepare netlist from the [BlackParrot GitHub](https://github.com/black-parrot/black-parrot) repo.
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
# Netlist preparation of mempool tile
# Netlist preparation of MemPool tile and MemPool group
MemPool tile is part of MemPool which is an open-source many-core system targeting image processing applications. We downloaded the netlist from the [mempool](https://github.com/pulp-platform/mempool) GitHub repository. All the required SystemVerilog files are copied into the *./rtl* directory.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment