**Hypergraph clustering** is, in our view, one of the most crucial undocumented
portions of Circuit Training.
## **I. Information provided by Google.**
## **Information provided by Google.**
The Methods section of the [Nature paper](https://www.nature.com/articles/s41586-021-03544-w.epdf?sharing_token=tYaxh2mR5EozfsSL0WHZLdRgN0jAjWel9jnR3ZoTv0PW0K0NmVrRsFPaMa9Y5We9O4Hqf_liatg-lvhiVcYpHL_YQpqkurA31sxqtmA-E1yNUWVMMVSBxWSp7ZFFIWawYQYnEXoBE4esRDSWqubhDFWUPyI5wK_5B_YIO-D_kS8%3D) provides the following information.
* “(1) We group millions of standard cells into a few thousand clusters using hMETIS, a partitioning technique based
...
...
@@ -44,14 +44,14 @@ Finally, the Methods section of the [Nature paper](https://www.nature.com/articl
## **II. What *exactly* is the Hypergraph, and how is it partitioned?**
## **What *exactly* is the Hypergraph, and how is it partitioned?**
From the above information sources, the description of the [Grouping](https://github.com/TILOS-AI-Institute/MacroPlacement/blob/main/CodeElements/Grouping/README.md) process, and information provided by Google engineers, we are fairly certain of the following.
* (1) Clustering uses the hMETIS partitioner, which is run in “multiway” mode.
More specifically, hMETIS is **always** invoked with *npart* more than 500, with unit vertex weights.
The hyperparameters given in Extended Data Table 3 of the [Nature paper](https://www.nature.com/articles/s41586-021-03544-w.epdf?sharing_token=tYaxh2mR5EozfsSL0WHZLdRgN0jAjWel9jnR3ZoTv0PW0K0NmVrRsFPaMa9Y5We9O4Hqf_liatg-lvhiVcYpHL_YQpqkurA31sxqtmA-E1yNUWVMMVSBxWSp7ZFFIWawYQYnEXoBE4esRDSWqubhDFWUPyI5wK_5B_YIO-D_kS8%3D) are used.
(Additionally, Circuit Training explicitly sets reconst=1 and dbglvl=0.)
* (2) The hypergraph that is fed to hMETIS consists of macros, macro pins, IO ports and standard cells.
* (2) The hypergraph that is fed to hMETIS consists of macros, macro pins, IO ports, and standard cells.
The "fixed" file generated by [Grouping](https://github.com/TILOS-AI-Institute/MacroPlacement/blob/main/CodeElements/Grouping/README.md) process, is also fed as .fix input file to hMETIS.
...
...
@@ -74,64 +74,64 @@ in these clusters corresponds to an entry of the .fix file. The cluster id start
* The number of individual standard cells in the hypergraph that is actually partitioned by hMETIS is 200,000 - (100 * 300) - (20 * 50) = 169,000.
* Suppose that each macro has 64 macro pins. The hypergraph that is actually partitioned by hMETIS has
200,000 + 100 + 1000 + 100 * 64 = 207, 500 vertices. Although there are both macro pins and macros in the hypergraph, all the nets related to macros are connected to macro pins and there is no hyperedges related to macros. Each hyperedge in the hypergraph cooresponds to a net in the netlist. Note that Circuit Training assumes that there is only one output pin for each standard cell, thus there is only one hyperedge {**A**, **B**, **C**, **D**, **E**} for the following case.
200,000 + 100 + 1000 + 100 * 64 = 207,500 vertices. Although there are both macro pins and macros in the hypergraph, all the nets related to macros are connected to macro pins and there are no hyperedges incident to macros. Each hyperedge in the hypergraph corresponds to a net in the netlist. Note that Circuit Training assumes that there is only one output pin for each standard cell, thus there is only one hyperedge {**A**, **B**, **C**, **D**, **E**} for the following case.
<palign="center">
<imgsrc="./net_model.png"width="600"/>
<imgsrc="./images/net_model.png"width="600"/>
</p>
<palign="center">
Figure 1. Illustration of net model in Circuit Training.
Figure 3. Illustration of net model used in Circuit Training.
</p>
**nparts* = 500 + 120 = 620 is used when applying hMETIS to this hypergraph.
## **III. Break up clusters that span a distance larger than *breakup_threshold***
## **Break up clusters that span a distance larger than *breakup_threshold***
After partitioning the hypergraph, we can have *nparts* clusters.
Then Circuit Training break up clusters that span a distance larger than *breakup_threshold*.
Then Circuit Training breaks up clusters that span a distance larger than *breakup_threshold*.
Here *breakup_threshold = sqrt(canvas_width * canvas_height / 16)*.
For each cluster *c*, the breakup process is as following:
For each cluster *c*, the breakup process is as follows:
**cluster_x, cluster_y = c.GetWeightedCenter()*. Here the weighted center of cluster *c* is the average location of all the standard cells in the cluster, weighted according to their area.
*Use (*cluster_x*, *cluster_y*) as the origin and *breakup_threshold* as the step, to divide the bounding box of *c* into different regions.
*The elements (macro pins, macros, ports and standard cells) in each region form a new cluster.
The following figure shows an example: the left part shows the cluster *c<sub>1</sub>* before breakup process and the blue dot is the weighted center of *c<sub>1</sub>*; the right part shows the clusters after breakupup process. The "center" cluster still has the cluster id of 1.
**cluster_x, cluster_y = c.GetWeightedCenter()*. Here the weighted center of cluster *c* is the average location of all the *standard cells* in the cluster, weighted according to their area.
*use (*cluster_x*, *cluster_y*) as the origin and *breakup_threshold* as the step, to divide the bounding box of *c* into different regions.
*the elements (macro pins, macros, ports and standard cells) in each region form a new cluster.
The following figure shows an example: the left part shows the cluster *c<sub>1</sub>* before the breakup process, and the blue dot is the weighted center of *c<sub>1</sub>*; the right part shows the clusters after the breakup process. The "center" cluster still has the cluster id of 1.
<palign="center">
<imgsrc="./breakup.png"width="1600"/>
<imgsrc="./images/breakup.png"width="1600"/>
</p>
<palign="center">
Figure 2. Illustration of breaking up a cluster.
Figure 4. Illustration of breaking up a cluster.
</p>
Note that the netlist is generated by physical-aware synthesis, we know the (x, y) coordinate for each instance.
Note that since the netlist is generated by physical-aware synthesis, we know the (x, y) coordinate for each instance.
## **IV. Recursively merge small adjacent clusters**
## **Recursively merge small adjacent clusters**
After breaking up clusters which span large distance, there may be some small clusters with only tens of standard cells.
In this step, Circuit Training recursively merges small clusters to the most adjacent cluster if they are within a certain
distance *closeness* (*breakup_threshold* / 2.0), thus reducing number of clusters. A cluster is claimed as a small cluster
distance *closeness* (*breakup_threshold* / 2.0), thus reducing number of clusters. A cluster is defined to be a small cluster
if the number of elements (macro pins,
macros, IO ports and standard cells) is less than or equal to *max_num_nodes*, where *max_num_nodes* = *number_of_vertices* // *number_of_clusters_after_breakup* // 4. The merging process is as following:
* flag = False
*While (flag == False):
*Create adjacency matrix *adj_matrix* where *adj_matrix\[i\]\[j\]* represents the number of connections between cluster *c<sub>i</sub>* and cluster *c<sub>j</sub>*. For example, in the Figure 1, suppose *A*, *B*, *C*, *D* and *E* respectively belong to cluster *c<sub>1</sub>*, ..., *c<sub>5</sub>*, we have *adj_matrix\[1\]\[2\]* = 1, *adj_matrix\[1\]\[3\]* = 1, ...., *adj_matrix\[5\]\[3\]* = 1 and *adj_matrix\[5\]\[4\]* = 1. We want to emphasize that although there is no hyperedges related to macros in the hypergraph, *adj_matrix* considers the "virtual" connections between macros and macro pins. That is to say, if a macro and its macros pins belong to different clusters, for example, macro A in cluster *c<sub>1</sub>* and its macro pins in cluster *c<sub>2</sub>*, we have *adj_matrix\[1\]\[2\]* = 1 and *adj_matrix\[2\]\[1\]* = 1.
*Calculate the weighted center for each cluster. (see the breakup section for details)
*while (flag == False):
*create adjacency matrix *adj_matrix* where *adj_matrix\[i\]\[j\]* represents the number of connections between cluster *c<sub>i</sub>* and cluster *c<sub>j</sub>*. For example, in the Figure 1, suppose *A*, *B*, *C*, *D* and *E* respectively belong to cluster *c<sub>1</sub>*, ..., *c<sub>5</sub>*, we have *adj_matrix\[1\]\[2\]* = 1, *adj_matrix\[1\]\[3\]* = 1, ...., *adj_matrix\[5\]\[3\]* = 1 and *adj_matrix\[5\]\[4\]* = 1. We want to emphasize that although there are no hyperedges incident to macros in the hypergraph, *adj_matrix* considers the "virtual" connections between macros and macro pins. That is to say, if a macro and its macros pins belong to different clusters, for example, macro A in cluster *c<sub>1</sub>* and its macro pins in cluster *c<sub>2</sub>*, we have *adj_matrix\[1\]\[2\]* = 1 and *adj_matrix\[2\]\[1\]* = 1.
*calculate the weighted center for each cluster. (See "Break Up Clusters" above for details.)
* flag = True
*For each cluster *c*
*If *c* is not a small cluster
*for each cluster *c*
*if *c* is not a small cluster
* Continue
*Find all the clusters *close_clusters* which is close to *c*, i.e., the Manhattan distance between their weighted centers and the weighted center of *c* is less than or equal to *closeness*
*If there is no clusters close to *c*
*find all the clusters *close_clusters* which are close to *c*, i.e., the Manhattan distance between their weighted centers and the weighted center of *c* is less than or equal to *closeness*
*if there is no cluster close to *c*
* Continue
*Find the most adjacent cluster *adj_cluster* of *c* in *close_clusters*, i.e., maximize *adj_matrix\[c\]\[adj_cluster\]*
*Merge *c* to *adj_cluster*
*If *adj_cluster* is a small cluster
*find the most adjacent cluster *adj_cluster* of *c* in *close_clusters*, i.e., maximize *adj_matrix\[c\]\[adj_cluster\]*
*merge *c* to *adj_cluster*
*if *adj_cluster* is a small cluster
* flag = False
## **V. Pending Clarifications**
## **Pending Clarifications**
We call readers’ attention to the existence of significant aspects that are still pending clarification here.
While [Gridding](https://github.com/TILOS-AI-Institute/MacroPlacement/blob/main/CodeElements/Gridding/README.md) and
[Grouping](https://github.com/TILOS-AI-Institute/MacroPlacement/blob/main/CodeElements/Grouping/README.md) are hopefully well-understood,
...
...
@@ -141,17 +141,17 @@ we are still in the process of documenting and implementing such aspects as the
All methodologies that span synthesis and placement (of which we are aware) must make a fundamental decision with respect to the netlist that is produced by logic synthesis, as that netlist is passed on to placement: (A) delete buffers and inverters to avoid biasing the ensuing placement (spatial embedding) with the synthesis tool’s fanout clustering, or (B) leave these buffers and inverters in the netlist to maintain netlist area and electrical rules (load, fanout) sensibility. We do not yet know Google’s choice in this regard. Our experimental runscripts will therefore support both (A) and (B).
***[June 13]*****Update to Pending clarification #3:*** We are glad to see [grouping (clustering)](https://github.com/google-research/circuit_training/tree/main/circuit_training/grouping) added to the Circuit Training GitHub. The new scripts refer to (x,y) coordinates of nodes in the netlist, which leads to further pending clarifications (noted [here](https://github.com/google-research/circuit_training/issues/25)). The solution space for how the input to hypergraph clustering is obtained has expanded. A first level of options is whether **(A) a non-physical synthesis tool** (e.g., Genus, DesignCompiler or Yosys), or **(B) a physical synthesis tool** (e.g., Genus iSpatial or DesignCompiler Topological (Yosys cannot perform physical synthesis)), is used to obtain the netlist from starting RTL and constraints. In the regime of (B), to our understanding the commercial physical synthesis tools are invoked with a starting .def that includes macro placement. Thus, we plan to also enable a second level of sub-options for determining this macro placement: **(B.1)** use the auto-macro placement result from the physical synthesis tool, and **(B.2)** use a human PD expert (or, [OpenROAD RTL-MP](https://github.com/The-OpenROAD-Project/OpenROAD/tree/master/src/mpl2)) macro placement.
***[June 13]*****Update to Pending clarification #3:*** We are glad to see [grouping (clustering)](https://github.com/google-research/circuit_training/tree/main/circuit_training/grouping) added to the Circuit Training GitHub. The new scripts refer to (x,y) coordinates of nodes in the netlist, which leads to further pending clarifications (noted [here](https://github.com/google-research/circuit_training/issues/25)). The solution space for how the input to hypergraph clustering is obtained has expanded. A first level of options is whether **(A) a non-physical synthesis tool** (e.g., Genus, DesignCompiler or Yosys), or **(B) a physical synthesis tool** (e.g., Genus iSpatial or DesignCompiler Topological (Yosys cannot perform physical synthesis)), is used to obtain the netlist from starting RTL and constraints. In the regime of (B), to our understanding the commercial physical synthesis tools are invoked with a starting .def that includes macro placement. Thus, we plan to also enable a second level of sub-options for determining this macro placement: **(B.1)** use the auto-macro placement result from the physical synthesis tool, and **(B.2)** use a human PD expert (or, [OpenROAD RTL-MP](https://github.com/The-OpenROAD-Project/OpenROAD/tree/master/src/mpl2)) macro placement. Some initial progress toward these clarifications has been posted as [Our Progress](https://github.com/TILOS-AI-Institute/MacroPlacement/tree/main/Docs/OurProgress).
## **VI. Our Implementation of Hypergraph Clustering.**
## **Our Implementation of Hypergraph Clustering.**
Our implementation of hypergraph clustering takes the synthesized netlist and a .def file with placed IO ports as input,
then generates the clustered netlist (in lef/def format) using hMETIS (1998 binary).
In default mode, our implementation will generate the clustered netlist in protocol buffer format and cooresponding plc file.
We implement the entire flow based on [OpenROAD APIs](https://github.com/the-openroad-project).
**Please refer to [the OpenROAD repo](https://github.com/the-openroad-project) for explanation of each Tcl command.**
We implement the entire flow based on [OpenROAD APIs](https://github.com/ravi-varadarajan/OpenROAD.git).
**Please refer to [the OpenROAD repo](https://github.com/ravi-varadarajan/OpenROAD.git) for explanation of each Tcl command.**
Please note that [The OpenROAD Project](https://github.com/the-openroad-project) does not
Please note that [The OpenROAD Project](https://github.com/ravi-varadarajan/OpenROAD.git) does not
distribute any compiled binaries. You need to build your own OpenROAD binary before you run our scripts.
Input file: [setup.tcl](https://github.com/TILOS-AI-Institute/MacroPlacement/blob/main/CodeElements/Clustering/test/setup.tcl)(you can follow the example to set up your own design) and [FixFile](https://github.com/TILOS-AI-Institute/MacroPlacement/blob/main/CodeElements/Clustering/test/fix_files_grouping/ariane.fix.old)(This file is generated by our [Grouping](https://github.com/TILOS-AI-Institute/MacroPlacement/tree/main/CodeElements/Grouping) scripts)
...
...
@@ -160,6 +160,8 @@ Output_files: the clustered netlist in protocol buffer format and cooresponding
Note that the [example](https://github.com/TILOS-AI-Institute/MacroPlacement/tree/main/CodeElements/Clustering/test) that we provide is the ariane design implemented in NanGate45. The netlist and corresponding def file with placed instances are generated by [Genus iSpatial](https://github.com/TILOS-AI-Institute/MacroPlacement/tree/main/Flows/NanGate45/ariane133) flow. Here the macro placement is automatically done by the Genus and Innovus tools,