README.md

# **MacroPlacement**
**MacroPlacement** is an open, transparent effort to provide a public, baseline implementation of [Google Brain's Circuit Training](https://github.com/google-research/circuit_training) (Morpheus) deep RL-based placement method. We will provide (1) testcases in open enablements, along with multiple EDA tool flows; (2) implementations of missing or binarized elements of Circuit Training; (3) reproducible example macro placement solutions produced by our implementation; and (4) post-routing results obtained by full completion of the synthesis-place-and-route flow using both proprietary and open-source tools.

## **Our Latest Progress**
 - [Our Progress: A Chronology](https://tilos-ai-institute.github.io/MacroPlacement/Docs/OurProgress/) provides latest updates and is periodically synched to [this Google Doc](https://docs.google.com/document/d/1HHZNcid5CZvvRqj_njzF7hBhtNSpmRn3fCYniWNYBiY/edit).
 - Our [Proxy Cost](https://tilos-ai-institute.github.io/MacroPlacement/Docs/ProxyCost/) documentation gives implementation details to enable reproduction of the wirelength, density and congestion costs used by [Circuit Training](https://github.com/google-research/circuit_training).

## **Table of Contents**
  <!-- - [Reproducible Example Solutions](#reproducible-example-solutions) -->
  - [Testcases](#testcases) contains open-source designs such as Ariane, MemPool and NVDLA.
  - [Enablements](#enablements) contains PDKs for open-source enablements such as NanGate45, ASAP7 and SKY130HD with FakeStack. Memories required by the designs are also included.
  - [Flows](#flows) contains tool setups and runscripts for both proprietary and open-source SP&R tools such as Cadence Genus/Innovus and OpenROAD.
  - [Code Elements](#code-elements) contains implementation of engines such as Clustering, Grouping, Gridding, Format translators required by Circuit Training flow.
  - [Baseline for Circuit Training](#baseline-for-circuit-training) provides a baseline for [Google Brain's Circuit Training](https://github.com/google-research/circuit_training).
  - [FAQ](#faq)
  - [Related Links](#related-links)

## **Testcases**  
The list of available [testcases](./Testcases) is as follows.
- Ariane (RTL)
  - [RTL files for Ariane design with 136 macros](./Testcases/ariane136/), which are generated by instantiating 16-bit memories in Ariane netlist available in the [lowRISC](https://github.com/lowRISC/ariane) GitHub repository.
  - [RTL files for Ariane design with 133 macros](./Testcases/ariane133/), which are generated by updating the memory connections of the 136 macro version.
- MemPool (RTL)
  - [RTL files for Mempool tile design](./Testcases/mempool/)
  - [RTL files for Mempool group design](./Testcases/mempool/)
- NVDLA (RTL)
  - [RTL files for NVDLA Partition *c*](./Testcases/nvdla/)
  
In the [Nature Paper](https://www.nature.com/articles/s41586-021-03544-w), the authors report results for an Ariane design with 133 memory (256x16, single ported SRAM) macros. We observe that synthesizing from the available Ariane RTL in the [lowRISC](https://github.com/lowRISC/ariane) GitHub repository using 256x16 memories results in an Ariane design that has 136 memory macros. We outline the steps to instantiate the memories for Ariane 136 [here](./Testcases/ariane136/) and we show how we convert the Ariane 136 design to an Ariane 133 design that matches Google's memory macros count [here](./Testcases/ariane133/). 
  
We provide flop count, macro type and macro count for all the testcases in the the following table. 
<table class="tg">
<thead>
  <tr>
    <th class="tg-0lax">Testcase</th>
    <th class="tg-0lax">Flop Count</th>
    <th class="tg-0lax">Macro Details (macro type x macro count)</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0lax"><a href="./Testcases/ariane136">Ariane136</a></td>
    <td class="tg-0lax">19839</td>
    <td class="tg-0lax">(256x16-bit SRAM) x 136</td>
  </tr>
  <tr>
    <td class="tg-0lax"><a href="./Testcases/ariane133">Ariane133</a></td>
    <td class="tg-0lax">19807</td>
    <td class="tg-0lax">(256x16-bit SRAM) x 133</td>
  </tr>
  <tr>
    <td class="tg-0lax"><a href="./Testcases/mempool">MemPool tile</a></td>
    <td class="tg-0lax">18278</td>
    <td class="tg-0lax">(256x32-bit SRAM) x 16 + (64x64-bit SRAM) x 4</td>
  </tr>
  <tr>
    <td class="tg-0lax"><a href="./Testcases/nvdla">NVDLA</a></td>
    <td class="tg-0lax">45295</td>
    <td class="tg-0lax">(256x64-bit SRAM) x 128</td>
  </tr>
</tbody>
</table>

All the testcases are available in the [Testcases](./Testcases/) directory. Details of the sub-directories are  
  - *rtl*: directory contains all the required rtl files to synthesize the design.
  - *sv2v*: If the main repository contains multiple Verilog files or SystemVerilog files, then we convert it to a single synthesizable Verilog RTL. This is available in the *sv2v* sub-drectory.

## **Enablements**
The list of available enablements is as follows.
- [NanGate45](./Enablements/NanGate45/)
- [ASAP7](./Enablements/ASAP7/)
- [SKY130HD FakeStack](./Enablements/SKY130HD/)
  
Open-source enablements NanGate45, ASAP7 and SKY130HD are utilized in our SP&R flow. All the enablements are available under the [Enablements](./Enablements) directory. Details of the sub-directories are:
 - *lib* directory contains all the required liberty files for standard cells and hard macros.
 - *lef* directory contains all the required lef files.
 - *qrc* directory contains all the required qrc tech files.
  
We also provide the steps to generate the fakeram models for each of the enablements based on the required memory configurations.


## **Flows**
We provide multiple flows for each of the testcases and enablements. They are: (1) a logical synthesis-based SP&R flow using Cadence Genus and Innovus ([Flow-1](./Flows/figures/flow-1.PNG)), (2) a physical synthesis-based SP&R flow using Cadence Genus iSpatial and Innovus ([Flow-2](./Flows/figures/flow-2.PNG)), (3) a logical synthesis-based SP&R flow using Yosys and OpenROAD ([Flow-3](./Flows/figures/flow-3.PNG)), and (4) creation of input data for Physical synthesis-based Circuit Training using Genus iSpatial ([Flow-4](./Flows/figures/flow-4.PNG)).

The details of each flow are are given in the following.
- **Flow-1:**  
  <img src="./Flows/figures/flow-1.PNG" alt="Flow-1" width="800"/>
- **Flow-2:**  
  <img src="./Flows/figures/flow-2.PNG" alt="Flow-2" width="800"/>    
- **Flow-3:**  
  <img src="./Flows/figures/flow-3.PNG" alt="Flow-3" width="800"/>  
- **Flow-4:**  
  <img src="./Flows/figures/flow-4.PNG" alt="Flow-4" width="800"/>  


In the following table, we provide the status details of each testcase on each of the enablements for the different flows.
<table class="tg">
<thead>
  <tr>
    <th class="tg-0lax" rowspan="2">Test Cases</th>
    <th class="tg-0lax" colspan="4">Nangate45</th>
    <th class="tg-0lax" colspan="4">ASAP7</th>
    <th class="tg-0lax" colspan="4">SKY130HD FakeStack</th>
  </tr>
  <tr>
    <th class="tg-0lax">Flow-1</th>
    <th class="tg-0lax">Flow-2</th>
    <th class="tg-0lax">Flow-3</th>
    <th class="tg-0lax">Flow-4</th>
    <th class="tg-0lax">Flow-1</th>
    <th class="tg-0lax">Flow-2</th>
    <th class="tg-0lax">Flow-3</th>
    <th class="tg-0lax">Flow-4</th>
    <th class="tg-0lax">Flow-1</th>
    <th class="tg-0lax">Flow-2</th>
    <th class="tg-0lax">Flow-3</th>
    <th class="tg-0lax">Flow-4</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0lax">Ariane 136</td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/ariane136">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/ariane136">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/ariane136">Link</a></td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax"><a href="./Flows/ASAP7/ariane136">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/ASAP7/ariane136">Link</a></td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/ariane136">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/ariane136">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/ariane136">Link</a></td>
    <td class="tg-0lax">N/A</td>
  </tr>
  <tr>
    <td class="tg-0lax">Ariane 133</td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/ariane133">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/ariane133">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/ariane133">Link</a></td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax"><a href="./Flows/ASAP7/ariane133">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/ASAP7/ariane133">Link</a></td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/ariane133">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/ariane133">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/ariane133">Link</a></td>
    <td class="tg-0lax">N/A</td>
  </tr>
  <tr>
    <td class="tg-0lax">MemPool tile</td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/mempool_tile">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/mempool_tile">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/mempool_tile">Link</a></td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax"><a href="./Flows/ASAP7/mempool_tile">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/ASAP7/mempool_tile">Link</a></td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/mempool_tile">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/mempool_tile">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/mempool_tile">Link</a></td>
    <td class="tg-0lax">N/A</td>
  </tr>
  <tr>
    <td class="tg-0lax">NVDLA</td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/nvdla">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/NanGate45/nvdla">Link</a></td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax"><a href="./Flows/ASAP7/nvdla">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/ASAP7/nvdla">Link</a></td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/nvdla">Link</a></td>
    <td class="tg-0lax"><a href="./Flows/SKY130HD/nvdla">Link</a></td>
    <td class="tg-0lax">N/A</td>
    <td class="tg-0lax">N/A</td>
  </tr>
</tbody>
</table>


The directory structure is : *./Flows/\<enablement\>/\<testcase\>/<constraint\|def\|netlist\|scripts\|run>/*. Details of the sub-directories for each testcase on each enablement are as follows.
- *constraint* directory contains the *.sdc* file.
- *def* directory contains the def file with pin placement and die area information.
- *scripts* directory contains required scripts to run SP&R using the Cadence and OpenROAD tools.
- *netlist* directory contains the synthesized netlist. We provide a synthesized netlist that can be used to run P&R.
- *run* directory to run the scripts provided in the *scripts* directory.


## **Code Elements**
The code elements below are the most crucial undocumented portions of Circuit Training. We thank Google 
engineers for Q&A in a shared document, as well as live discussions on May 19, 2022, 
that have explained aspects of several of the following code elements used in Circuit Training. 
All errors of understanding and implementation are the authors'. 
We will rectify such errors as soon as possible after being made aware of them.


- [Gridding](./CodeElements/Gridding/) determines a dissection of the layout canvas into some number of rows (n_rows) and some number of columns (n_cols) of gridcells. In Circuit Training, the purpose of gridding is to control the size of the macro placement solution space, 
thus allowing RL to train within reasonable runtimes. Gridding enables hard macros to find locations consistent with high solution quality, 
while allowing soft macros (standard-cell clusters) to also find good locations. 
- [Grouping](./CodeElements/Grouping/) ensures that closely-related logic is kept close to hard macros and to clumps of IOs. The clumps of IOs are induced by IO locations with respect to the row and column coordinates in the gridded layout canvas.
- [Hypergraph clustering](./CodeElements/Clustering/) clusters millions of standard cells into a few thousand clusters.  In Circuit Training, the purpose of clustering is to enable an approximate but fast standard cell placement that facilitates policy network optimization.
- [Force-directed placement](./CodeElements/FDPlacement/) places the center of each standard cell cluster onto  centers of gridcells generated by [Gridding](./CodeElements/Gridding/).
- [Simulated annealing](./CodeElements/SimulatedAnnealing/) places the center of each macro onto centers of gridcells generated by [Gridding](./CodeElements/Gridding/).  In Circuit Training,  simulated annealing is used as a baseline to show the relative sample efficiency of RL.
- [LEF/DEF and Bookshelf (OpenDB, RosettaStone) translators](./CodeElements/FormatTranslators/) ease the translation between different representations of the same netlist.
- [Plc client](./CodeElements/Plc_client/) implements all three components of the proxy cost function: wirelength cost, density cost and congestion cost.


<!--## **Reproducible Example Solutions** -->

## **Baseline for Circuit Training**
We provide a competitive baseline for [Google Brain's Circuit Training](https://github.com/google-research/circuit_training) by placing macros manually following similar rules as the RL agent. The example for Ariane133 implemented on NanGate45 is shown [here](https://github.com/TILOS-AI-Institute/MacroPlacement/tree/main/Flows/NanGate45/ariane133). We generate the manual macro placement in two steps:  
(1) we call the [gridding](https://github.com/TILOS-AI-Institute/MacroPlacement/tree/main/CodeElements/Gridding) scripts to generate grid cells (27 x 27 in our case); (2) we manually place macros on the center of grid cells.


## **FAQ**
**Why are you doing this?**
- The challenges of data and benchmarking in EDA research have, in our view, been contributing factors in the controversy regarding the Nature work. The mission of the [TILOS AI Institute](https://tilos.ai/) includes finding solutions to these challenges -- in high-stakes applied optimization domains (such as IC EDA), and at community-scale. We hope that our effort will become an existence proof for transparency, reproducibility, and democratization of research in EDA. [We applaud and thank Cadence Design Systems for allowing their tool runscripts to be shared openly by researchers, enabling reproducibility of results obtained via use of Cadence tools.]
- We do understand that Google has been working hard to complete the open-sourcing of Morpheus, and that this effort continues today. However, as pointed out in [this Doc](https://docs.google.com/document/d/1vkPRgJEiLIyT22AkQNAxO8JtIKiL95diVdJ_O4AFtJ8/edit?usp=sharing), it has been more than a year since "Data and Code Availability" was committed with publication of the [Nature paper](https://www.nature.com/articles/s41586-021-03544-w). We consider our work a "backstop" or "safety net" for Google's internal efforts, and a platform for researchers to build on. 

**What can others contribute?**
- Our shopping list (updated August 2022) includes the following. Please join in!  
  - simulated annealing on the gridded canvas: documentation and implementation
  - force-directed placement: documentation and implementation
  - donated cloud resources (credits) for experimental studies
  - relevant testcases with reference implementations and implementation flows (Cadence, OpenROAD preferred since scripts can be shared)
  - improved "fakeram" generator for the ASAP7 research PDK

**What is your timeline?**
- We showed our [progress](https://open-source-eda-birds-of-a-feather.github.io/doc/slides/MacroPlacement-SpecPart-DAC-BOF-v5.pdf) at the Open-Source EDA and Benchmarking Summit birds-of-a-feather [meeting](https://open-source-eda-birds-of-a-feather.github.io/) on July 12 at DAC-2022.
- We are now (late August 2022) studying benefits and limitations of the CT methodology itself, as noted in [this Doc](https://docs.google.com/document/d/1c-uweo3DHiCWZyBzAdNCqqcOrAbKq1sVIfY0_4bFCYE/edit).


## **Related Links**
- F. -C. Chang, Y. -W. Tseng, Y. -W. Yu, S. -R. Lee, A. Cioba, et al., 
"Flexible multiple-objective reinforcement learning for chip placement",
*arXiv:2204.06407*, 2022. \[[paper](https://arxiv.org/pdf/2204.06407.pdf)\]
- S. Yue, E. M. Songhori, J. W. Jiang, T. Boyd, A. Goldie, A. Mirhoseini and S. Guadarrama, "Scalability and Generalization of Circuit Training for Chip Floorplanning", *ISPD*, 2022. \[[paper](https://dl.acm.org/doi/abs/10.1145/3505170.3511478)\]\[[ppt](http://www.ispd.cc/slides/2021/protected/2_2_Goldie_Mirhoseini.pdf)\]
- R. Cheng and J. Yan, "On joint learning for solving placement and routing in chip design",
*Proc. NeurIPS*, 2021. \[[paper](https://arxiv.org/pdf/2111.00234v1.pdf)\] \[[code](https://github.com/Thinklab-SJTU/EDA-AI)\]
- S. Guadarrama, S. Yue, T. Boyd, J. Jiang,  E. Songhori, et al.,
"Circuit training: an open-source framework for generating chip floor plans with distributed deep reinforcement learning", 2021. \[[code](https://github.com/google-research/circuit_training)\]
- A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, et al.,
"A graph placement methodology for fast chip design", *Nature*, 594(7862) (2021), pp. 207-212.
\[[paper](https://www.nature.com/articles/s41586-021-03544-w)\]
- A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, et al.,
"Chip Placement with Deep Reinforcement Learning",
*arXiv:2004.10746*, 2020. \[[paper](https://arxiv.org/pdf/2004.10746.pdf)\]
- Z. Jiang, E. Songhori, S. Wang, A. Goldie, A. Mirhoseini, et al., "Delving into Macro Placement with Reinforcement Learning", *MLCAD*, 2021. \[[paper](https://arxiv.org/pdf/2109.02587)\]
- A Gentle Introduction to Graph Neural Networks. [[Link](https://distill.pub/2021/gnn-intro/)]
- TILOS AI Institute. \[[link](https://tilos.ai/)\]