Commit 5c4bdf72 by sakundu

Added new FAQ: What are the outcomes of CT when the training is continued until convergence?

Signed-off-by: sakundu <sakundu@ucsd.edu>
parent f149d491
...@@ -6376,18 +6376,182 @@ Below are screenshots of Ariane-NG45-68%-1.3ns for (in order, top-down) CMP + P& ...@@ -6376,18 +6376,182 @@ Below are screenshots of Ariane-NG45-68%-1.3ns for (in order, top-down) CMP + P&
- Left to right: CT macro placement from the ISPD-2023 paper, with P&R using Innovus 19.1, 20.1 and 21.1. (21.1 is the same as in Figure 3 of our paper.) - Left to right: CT macro placement from the ISPD-2023 paper, with P&R using Innovus 19.1, 20.1 and 21.1. (21.1 is the same as in Figure 3 of our paper.)
<p align="center"> <p align="center">
<img width="200" src="./images/Ariane_Place_CT_19.1.png" alg="Place_CT_19.1"> <img height="200" src="./images/Ariane_Place_CT_19.1.png" alg="Place_CT_19.1">
<img width="200" src="./images/Ariane_Place_CT_20.1.png" alg="Place_CT_20.1"> <img height="200" src="./images/Ariane_Place_CT_20.1.png" alg="Place_CT_20.1">
<img width="200" src="./images/Ariane_Place_CT_21.1.png" alg="Place_CT_21.1"> <img height="200" src="./images/Ariane_Place_CT_21.1.png" alg="Place_CT_21.1">
</p> </p>
<p align="center"> <p align="center">
<img width="200" src="./images/Ariane_Route_CT_19.1.png" alg="Route_CT_19.1"> <img height="200" src="./images/Ariane_Route_CT_19.1.png" alg="Route_CT_19.1">
<img width="200" src="./images/Ariane_Route_CT_20.1.png" alg="Route_CT_20.1"> <img height="200" src="./images/Ariane_Route_CT_20.1.png" alg="Route_CT_20.1">
<img width="200" src="./images/Ariane_Route_CT_21.1.png" alg="Route_CT_21.1"> <img height="200" src="./images/Ariane_Route_CT_21.1.png" alg="Route_CT_21.1">
</p> </p>
<a id="Question17"></a>
**<span style="color:blue">Question 17.</span>** What are the outcomes of CT when the training is continued until convergence?
To put this question in perspective, training “until convergence” is not described in any of the guidelines provided by the CT GitHub repo for reproducing the results in the Nature paper. For the [ISPD 2023 paper](https://vlsicad.ucsd.edu/Publications/Conferences/396/c396.pdf), we adhere to the guidelines given in the [CT GitHub repo](https://github.com/google-research/circuit_training/blob/main/docs/ARIANE.md#train-job), use the same number of iterations for Ariane as Google engineers demonstrate in the [CT GitHub repo](https://github.com/google-research/circuit_training/blob/main/docs/ARIANE.md#train-job), and obtain results that closely align with Google's outcomes for Ariane. (See FAQs #4 and #13.)
We run CT training for an extended number (=600) of iterations, for each of Ariane, BlackParrot and MemPool Group on NG45, and make the following observations.
- For Ariane the proxy cost improves from 0.857 to 0.809 ([link](https://tensorboard.dev/experiment/XXSQ4NUoTkm0T7TLcqR9vg/) to the new tensorboard). However, the Nature Table 1 metrics are very similar: routed wirelength improves from 4,894mm to 4,739mm; Total power degrades from 828.7 mW to 829.4 mW; worst negative slack and total negative slack respectively degrade from -79ps to -85ps, and from -25.8ns to -62.7ns. The final proxy cost and the Nature Table 1 metrics achieved through training until convergence are still not better than those achieved by SA.
- For BlackParrot, the proxy cost improves significantly from 1.021 to 0.889 ([link](https://tensorboard.dev/experiment/9d2VsLUkSLqI5s5OmbZtRw/) to new tensorboard). Routed wirelength improves significantly from 36,845mm to 30,929mm. Also total power improves from 4627.4mW to 4547.8mW. However, the worst negative slack and total negative slack respectively degrade from -185ps to -199ps, and from -1040.8ns to -1263.4ns. The final proxy cost achieved by CT is better than that achieved by SA. The Nature Table 1 metrics are still similar to those achieved by SA.
- For MemPool Group, CT diverges, and it never converges. Thus, the final proxy cost is unchanged. Here is the [link](https://tensorboard.dev/experiment/w4txHNhAReCOV77LqvqkgQ/#scalars) to tensorboard. So, the CT code does not guarantee full convergence.
- Note 1: We have not studied what happens if SA is given triple the runtime used in our previously-reported experiments.
- Note 2: Our new data underscore the poor correlation between proxy cost and ground-truth metrics noted in Section 5.2.3 of the [ISPD-2023 paper](https://vlsicad.ucsd.edu/Publications/Conferences/396/c396.pdf).
Our new data from using triple the CT training budget indicate that training until convergence, compared to the configurations explored in the ISPD-2023 paper, improves proxy cost but does not significantly improve chip metrics on Ariane and MemPool Group. Among chip metrics for BlackParrot, routed wirelength improves significantly while other metrics are similar to what we previously reported. **Overall, training until convergence does not qualitatively change comparisons to results of Simulated Annealing and human macro placements reported in the [ISPD 2023 paper](https://vlsicad.ucsd.edu/Publications/Conferences/396/c396.pdf)**.
The subsequent tables and figures present the Nature Table 1 metrics of Ariane and BlackParrot on NG45, for macro placement solutions generated by CT training until convergence. (For MemPool Group, using triple the default number of CT iterations did not change the final proxy cost.)
<table>
<thead>
<tr>
<th colspan="10"><p align="center">Ariane133-NG45-68%-1.3ns CT result (<a href="https://tensorboard.dev/experiment/XXSQ4NUoTkm0T7TLcqR9vg/">Link</a> to tensorboard)</p></th>
</tr>
</thead>
<tbody>
<tr>
<td>Physical Design Stage</td>
<td>Core Area (um^2)</td>
<td>Standard Cell Area (um^2)</td>
<td>Macro Area (um^2)</td>
<td>Total Power (mW)</td>
<td>Wirelength (um)</td>
<td>WS (ns)</td>
<td>TNS (ns)</td>
<td>Congestion(H)</td>
<td>Congestion (V)</td>
</tr>
<tr>
<td>preCTS</td>
<td>1814274</td>
<td>242539</td>
<td>1018356</td>
<td>787.798</td>
<td>4577259</td>
<td>-0.095</td>
<td>-121.911</td>
<td>0.04%</td>
<td>0.11%</td>
</tr>
<tr>
<td>postCTS</td>
<td>1814274</td>
<td>244220</td>
<td>1018356</td>
<td>830.273</td>
<td>4610696</td>
<td>-0.07</td>
<td>-41.635</td>
<td>0.05%</td>
<td>0.13%</td>
</tr>
<tr>
<td>postRoute</td>
<td>1814274</td>
<td>244220</td>
<td>1018356</td>
<td>828.935</td>
<td>4734768</td>
<td>-0.095</td>
<td>-90.160</td>
<td></td>
<td></td>
</tr>
<tr>
<td>postRouteOpt</td>
<td>1814274</td>
<td>244666</td>
<td>1018356</td>
<td>829.419</td>
<td>4739136</td>
<td>-0.085</td>
<td>-62.685</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p align="center">
<img height="300" src="./images/Ariane_NG45_Place_itr_600.png" alg="Ariane_NG45_Place_itr600">
<img height="300" src="./images/Ariane_NG45_Route_itr_600.png" alg="Ariane_NG45_Route_itr600">
</p>
<table>
<thead>
<tr>
<th colspan="10"><p align="center">BlackParrot (Quad-Core)-NG45-68%-1.3ns CT result (<a href="https://tensorboard.dev/experiment/9d2VsLUkSLqI5s5OmbZtRw/">Link</a> to tensorboard)</p></th>
</tr>
</thead>
<tbody>
<tr>
<td>Physical Design Stage</td>
<td>Core Area (um^2)</td>
<td>Standard Cell Area (um^2)</td>
<td>Macro Area (um^2)</td>
<td>Total Power (mW)</td>
<td>Wirelength (um)</td>
<td>WS (ns)</td>
<td>TNS (ns)</td>
<td>Congestion (H)</td>
<td>Congestion (V)</td>
</tr>
<tr>
<td>preCTS</td>
<td>8449457</td>
<td>1922798</td>
<td>3917822</td>
<td>4185.939</td>
<td>29820259</td>
<td>-0.179</td>
<td>-648.911</td>
<td>0.10%</td>
<td>0.26%</td>
</tr>
<tr>
<td>postCTS</td>
<td>8449457</td>
<td>1935706</td>
<td>3917822</td>
<td>4563.875</td>
<td>29956480</td>
<td>-0.138</td>
<td>-355.347</td>
<td>0.12%</td>
<td>0.28%</td>
</tr>
<tr>
<td>postRoute</td>
<td>8449457</td>
<td>1935706</td>
<td>3917822</td>
<td>4542.299</td>
<td>30893195</td>
<td>-0.188</td>
<td>-2280.100</td>
<td></td>
<td></td>
</tr>
<tr>
<td>postRouteOpt</td>
<td>8449457</td>
<td>1940957</td>
<td>3917822</td>
<td>4547.832</td>
<td>30928844</td>
<td>-0.199</td>
<td>-1263.400</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p align="center">
<img height="300" src="./images/BP_NG45_Place_itr_600.png" alg="BP_NG45_Place_itr600">
<img height="300" src="./images/BP_NG45_Route_itr_600.png" alg="BP_NG45_Route_itr600">
</p>
## **Pinned (to bottom) question list:** ## **Pinned (to bottom) question list:**
...@@ -6407,4 +6571,5 @@ Below are screenshots of Ariane-NG45-68%-1.3ns for (in order, top-down) CMP + P& ...@@ -6407,4 +6571,5 @@ Below are screenshots of Ariane-NG45-68%-1.3ns for (in order, top-down) CMP + P&
**<span style="color:blue">[Question 13](#Question13).</span>** How good are human macro placements relative to Circuit Training? **<span style="color:blue">[Question 13](#Question13).</span>** How good are human macro placements relative to Circuit Training?
**<span style="color:blue">[Question 14](#Question14).</span>** What is the impact on CT results when DREAMPlace is used instead of force-directed placement? **<span style="color:blue">[Question 14](#Question14).</span>** What is the impact on CT results when DREAMPlace is used instead of force-directed placement?
**<span style="color:blue">[Question 15](#Question15).</span>** Should we factor in density cost while using DREAMPlace for CT? **<span style="color:blue">[Question 15](#Question15).</span>** Should we factor in density cost while using DREAMPlace for CT?
**<span style="color:blue">[Question 16](#Question16).</span>** Why does your study (and, ISPD-2023 paper) use Cadence CMP 21.1, which was not available to Google engineers when they wrote the Nature paper? **<span style="color:blue">[Question 16](#Question16).</span>** Why does your study (and, [ISPD-2023 paper](https://vlsicad.ucsd.edu/Publications/Conferences/396/c396.pdf)) use Cadence CMP 21.1, which was not available to Google engineers when they wrote the Nature paper?
\ No newline at end of file **<span style="color:blue">[Question 17](#Question17).</span>** What are the outcomes of CT when the training is continued until convergence?
\ No newline at end of file
...@@ -167,6 +167,18 @@ We used Innovus version 21.1 since it was the latest version of our place-and-ro ...@@ -167,6 +167,18 @@ We used Innovus version 21.1 since it was the latest version of our place-and-ro
- Using the latest version of CMP was also natural, given our starting assumption that RL from *Nature* would outperform the commercial state-of-the-art. - Using the latest version of CMP was also natural, given our starting assumption that RL from *Nature* would outperform the commercial state-of-the-art.
- We have now run further experiments using older versions of CMP and Innovus. The macro placements produced by CMP across versions 19.1, 20.1 and 21.1 lead to the same qualitative conclusions. Details are given [here](./Docs/OurProgress#Question16). - We have now run further experiments using older versions of CMP and Innovus. The macro placements produced by CMP across versions 19.1, 20.1 and 21.1 lead to the same qualitative conclusions. Details are given [here](./Docs/OurProgress#Question16).
**15. What are the outcomes of CT when the training is continued until convergence?**
To put this question in perspective, training “until convergence” is not described in any of the guidelines provided by the CT GitHub repo for reproducing the results in the Nature paper. For the [ISPD 2023 paper](https://vlsicad.ucsd.edu/Publications/Conferences/396/c396.pdf), we adhere to the guidelines given in the [CT GitHub repo](https://github.com/google-research/circuit_training/blob/main/docs/ARIANE.md#train-job), use the same number of iterations for Ariane as Google engineers demonstrate in the [CT GitHub repo](https://github.com/google-research/circuit_training/blob/main/docs/ARIANE.md#train-job), and obtain results that closely align with Google's outcomes for Ariane. (See FAQs #4 and #13.)
CT code **does not guarantee** convergence. This said, we have run CT training for an extended number (= 600, which is three times our default [value of 200](https://github.com/google-research/circuit_training/blob/main/docs/ARIANE.md#train-job)) of iterations, for each of Ariane, BlackParrot and MemPool Group, on NG45. For MemPool Group, CT diverges (tensorboard [link](https://tensorboard.dev/experiment/w4txHNhAReCOV77LqvqkgQ/#scalars)).
When convergence can be attained, the impact on key chip metrics is mixed. For instance, for Ariane, the chip metrics remain similar. In the case of BlackParrot, the routed wirelength significantly improves, but the TNS and WNS degrade. For Ariane and BlackParrot, the proxy cost improves significantly, but does not correlate with timing metrics. For more details, see [here](./Docs/OurProgress#Question17).
**In sum, training until convergence worsens some key chip metrics while improving others, highlighting the poor correlation between proxy cost and chip metrics. Overall, training until convergence does not qualitatively change comparisons to results of Simulated Annealing and human macro placements reported in the [ISPD 2023 paper](https://vlsicad.ucsd.edu/Publications/Conferences/396/c396.pdf).**
**Note:** We have not studied what happens if SA is given triple the runtime used in our reported experiments.
## **Testcases** ## **Testcases**
The list of available [testcases](./Testcases) is as follows. The list of available [testcases](./Testcases) is as follows.
- Ariane (RTL) - Ariane (RTL)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment