# **LSTP: A Logic Synthesis Timing Predictor**

**Haisheng Zheng** ♠ Zhuolun He ♠♡ Fangzhou Liu ♠♡ Zehua Pei ♠♡ Bei Yu♡

- ♣ Shanghai Artificial Intelligence Laboratory
- <sup>♡</sup> The Chinese University of Hong Kong







# Outline



1 Introduction

2 Algorithm

**3** Experiments

# Introduction

# Background



#### **Logic Synthesis is Critical:**

- Architecture exploration relies on the acquisition of metrics reported by logic synthesis <sup>[1]</sup>.
- Logic synthesis quality determines the best possible design space of subsequent procedure <sup>[2]</sup>.

04/23

<sup>[1]</sup> Chen Bai et al. (2021). "BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space Exploration Framework". In: *Proc. ICCAD*.

<sup>&</sup>lt;sup>[2]</sup> Ceyu Xu et al. (2022). "SNS's Not a Synthesizer: A Deep-Learning-Based Synthesis Predictor". In: *Proc. ISCA*.

# Background



#### **Logic Synthesis is Critical:**

- Architecture exploration relies on the acquisition of metrics reported by logic synthesis <sup>[1]</sup>.
- Logic synthesis quality determines the best possible design space of subsequent procedure <sup>[2]</sup>.

Can we efficiently predict the desired metrics without actually running expensive logic synthesis?

04/23

<sup>[1]</sup> Chen Bai et al. (2021). "BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space Exploration Framework". In: *Proc. ICCAD*.

<sup>&</sup>lt;sup>[2]</sup> Ceyu Xu et al. (2022). "SNS's Not a Synthesizer: A Deep-Learning-Based Synthesis Predictor". In: *Proc. ISCA*.

#### Obtain the Performance of a Circuit: Previous Work



| Work                  | Target                                   | Algorithm                    |  |  |
|-----------------------|------------------------------------------|------------------------------|--|--|
| D-SAGE [3]            | Timing                                   | Graph Neural Network         |  |  |
| Yu et al. [4]         | Timing, Area                             | Long Short Term Memory       |  |  |
| PowerNet [5]          | Dynamic IR Drop                          | Convolutional Neural Network |  |  |
| GRANNITE [6]          | Power                                    | Graph Neural Network         |  |  |
| Deep H-GCN [7]        | Analog Circuit Degradation (i.e., Aging) | Graph Neural Network         |  |  |
| De <i>et al</i> . [8] | Timing                                   | Machine Learning Methods     |  |  |
| SNS [9]               | Timing, Area, Power                      | Transformer                  |  |  |

<sup>[2]</sup> Ceyu Xu et al. (2022). "SNS's Not a Synthesizer: A Deep-Learning-Based Synthesis Predictor". In: Proc. ISCA.

<sup>[3]</sup> Ecenur Ustun et al. (2020). "Accurate Operation Delay Prediction for FPGA HLS Using Graph Neural Networks". In: Proc. ICCAD.

<sup>[4]</sup> Cunxi Yu et al. (2020). "Decision Making in Synthesis Cross Technologies Using LSTMs and Transfer Learning". In: Proc. MLCAD.

<sup>[5]</sup> Zhiyao Xie et al. (2020). "PowerNet: Transferable Dynamic IR Drop Estimation via Maximum Convolutional Neural Network". In: *Proc. ASPDAC*.

<sup>[6]</sup> Yanqing Zhang et al. (2020). "GRANNITE: Graph Neural Network Inference for Transferable Power Estimation". In: Proc. DAC.

<sup>[7]</sup> Tinghuan Chen et al. (2021). "Deep H-GCN: Fast Analog IC Aging-Induced Degradation Estimation". In: IEEE TCAD.

<sup>[8]</sup> Sayandip De et al. (2022). "Delay Prediction for ASIC HLS: Comparing Graph-Based and Non-Graph-Based Learning Models". In: IEEE TCAD. 05/23

# Logic Synthesis Recipes are NOT One-Size-Fits-All





Comparison of Different Top Percentages.

• OpenABC-D<sup>[10]</sup> has pointed out quantitatively that the similarity between the best synthesis recipes for a set of benchmark circuits is less than 30%.

<sup>[10]</sup> Animesh Basak Chowdhury et al. (2021). "OpenABC-D: A Large-Scale Dataset for Machine Learning Guided Integrated Circuit Synthesis". In: arXiv preprint.

# **Optimization Sequence Quality Improvement**



#### **Previous Works:**

- Yu *et al.* <sup>[11]</sup> propose to train a Convolutional Neural Network (CNN) to predict the quality of an optimization sequence.
- Reinforcement Learning (RL) is leveraged <sup>[12][13]</sup> to generate fixed-length optimization sequences.

<sup>[11]</sup> Cunxi Yu et al. (2018). "Developing Synthesis Flows without Human Knowledge". In: *Proc. DAC*.

<sup>[12]</sup> Winston Haaswijk et al. (2018). "Deep Learning for Logic Optimization Algorithms". In: *Proc. IS-CAS*.

<sup>[13]</sup> Keren Zhu et al. (2020). "Exploring Logic Optimizations with Reinforcement Learning and Graph Convolutional Network". In: *Proc. MLCAD*.

#### **Problem Formulation**



#### **Logic Synthesis Timing Prediction:**

• Given a gate-level netlist as an And-Inverter Graph (AIG) representing a set of Boolean functions and a sequence of subgraph optimization procedures for the AIG graph, design a novel learning methodology that automatically predicts the final timing after applying the optimization procedures to the AIG.

# Algorithm





Overall Flow of LSTP.

RTL-Analyzer compiles the input design and transforms it into an AIG representation.





Overall Flow of LSTP.

- RTL-Analyzer compiles the input design and transforms it into an AIG representation.
- ACCNN is a Customized GNN for node sampling and feature extraction of the AIG circuit.





Overall Flow of LSTP.

- RTL-Analyzer compiles the input design and transforms it into an AIG representation.
- ACCNN is a Customized GNN for node sampling and feature extraction of the AIG circuit.
- **SeqEncoder** is a Transformer Encoder for optimization sequence features extraction.





Overall Flow of LSTP.

- RTL-Analyzer compiles the input design and transforms it into an AIG representation.
- ACCNN is a Customized GNN for node sampling and feature extraction of the AIG circuit.
- **SeqEncoder** is a Transformer Encoder for optimization sequence features extraction.
- MLP aggregates both the optimization sequence features and the circuit diagram features to predict the timing of the input design.

# **RTL Analyser**





Visualization of RTL Analyser.

#### **ACCNN: Cascaded Cones**



#### Motivation:

- The delay of a circuit depends on the number of hops on the longest path from the primary inputs (PIs) to the primary outputs (POs).
- We wish to design an algorithm to effectively exploit the characteristics of graph representation for the longest path in AIG.

#### **ACCNN: Cascaded Cones**





A Visual Illustration of Sampling Cascaded Cones.

### A Random Walk-Based Approach to Sample Cascaded Cones Within the Circuit:

- Each 'path' originate from primary input (PI) and end at primary output (PO).
- The output of flip-flop  $\rightarrow$  PI.
- The input of flip-flop  $\rightarrow$  PO.

#### **ACCNN: Netlist Feature Extraction**



#### Motivation:

- We aim for a model that similar to logic simulation, efficiently propagating information step-by-step along the sampled paths.
- ABGNN<sup>[14]</sup> serves this purpose well.

#### **ACCNN: Netlist Feature Extraction**





A Visual Illustration of ACCNN.

• Feature Aggregation:

$$a_{\{i:\mathcal{D}(i,v)=\Delta-k\}}^{(k)} = \text{AGGREGATE}(\{h^{(k-1)}u : u \in N(i)\}). \tag{1}$$

• Update Function:

$$h_{\{i:\mathcal{D}(i,v)=\Delta-k\}}^{(k)} = \text{COMBINE}(a_i^{(k)}, h_i^{(0)}).$$
 (2)

# **Optimization Sequence Feature Extraction: Motivation**





- (a)–(d) Equivalent Factored Forms; (e) Area/Delay Trade-Off for the Trees.
- Boolean Expression: ab + acd + acef + acegh.
- Assuming zero arrival time for all PIs, unit area (A = 1), unit delay (D = 1).

# **Optimization Sequence Feature Extraction: Motivation**





- (a)–(d) Equivalent Factored Forms; (e) Area/Delay Trade-Off for the Trees.
- It is hardly possible for designers to determine the effect of optimization sequences for different designs.
- We need a model that takes into account optimization sequence ordering and position.

# **SeqEncoder: Optimization Sequence Feature Extraction**





A Visual Illustration of SegEncoder.

**Transformer**<sup>[15]</sup> is one of such models:

Attention
$$(\vec{Q}, \vec{K}, \vec{V}) = \operatorname{softmax}(\frac{\vec{Q}\vec{K}^{\top}}{\sqrt{d_k}})\vec{V}$$
 (3)

# **SeqEncoder: Optimization Sequence Feature Extraction**





A Visual Illustration of *SeqEncoder*.

#### **Optimization Methods**

Balancing, Reconfiguration, Replacing and Rewriting.

# **SeqEncoder: Optimization Sequence Feature Extraction**





A Visual Illustration of SegEncoder.

**SeqEncoder** supports extracting features of optimization sequences of length 20 or less, when the length of the optimization sequence is less than 20:

- Zero padding → 'empty optimization'.
- ullet [empty, rewrite, balance, . . . , resub, restructure, refactor].

Experiments

### **Experimental Setup**



- Developed the timing prediction framework in Python.
  - Tools: Yosys, ABC.
  - Libraries: Pytorch Geometric, PyTorch, NetworkX.
- Dataset: Open-Source Designs.
- Mean Absolute Percentage Error (MAPE):

$$MAPE = \frac{100\%}{n} \sum_{i=1}^{n} \left| \frac{\hat{Y}_i - Y_i}{Y_i} \right|.$$
 (3)

#### Performance of LSTP



Table: Evaluation Accuracy (MAPE).

| Name       | # PI  | # PO  | # Node | # Level | SNS [15] | Runtime (s) | LSTP   | Runtime (s) |
|------------|-------|-------|--------|---------|----------|-------------|--------|-------------|
| aes        | 683   | 529   | 39215  | 44      | 50.21%   | 2.85        | 25.44% | 3.38        |
| des3_area  | 303   | 64    | 7766   | 47      | 53.84%   | 2.16        | 20.29% | 0.70        |
| dft        | 37597 | 37417 | 488165 | 83      | 86.90%   | 27.18       | 33.56% | 55.53       |
| fpu        | 632   | 409   | 55935  | 1522    | 26.11%   | 4.96        | 3.35%  | 6.97        |
| idft       | 37603 | 37419 | 481184 | 82      | 5.07%    | 16.16       | 8.18%  | 54.04       |
| mem_ctrl   | 1187  | 962   | 29814  | 56      | 23.21%   | 32.02       | 19.22% | 3.71        |
| sasc       | 135   | 125   | 1214   | 15      | 21.44%   | 2.38        | 2.48%  | 0.12        |
| ss_pcm     | 104   | 90    | 762    | 13      | 67.82%   | 2.30        | 6.46%  | 0.09        |
| tinyRocket | 4561  | 4181  | 99775  | 156     | 37.31%   | 81.87       | 10.81% | 11.86       |
| usb_phy    | 132   | 90    | 893    | 16      | 14.4%    | 2.65        | 7.75%  | 0.10        |
| Average    |       |       |        |         | 38.63%   | 17.45       | 13.75% | 13.65       |

• LSTP outperforms prior works on all the test cases except for the idft.

Proc. ISCA. 20/23

 $<sup>^{[2]}</sup>$  Ceyu Xu et al. (2022). "SNS's Not a Synthesizer: A Deep-Learning-Based Synthesis Predictor". In:

# LSTP for Optimization Sequence Generation



Table: Comparison of Timing Minimums.

| Name       | Initial (ns) | resyn2-2 (ns) | Improvement | LSTP (ns) | Improvement |
|------------|--------------|---------------|-------------|-----------|-------------|
| aes        | 1.58         | 1.37          | 13.29%      | 1.24      | 21.52%      |
| des3_area  | 2.66         | 3.74          | -40.60%     | 3.33      | -25.19%     |
| dft        | 5.82         | 6.35          | -9.11%      | 4.94      | 15.12%      |
| fpu        | 51           | 41.5          | 18.63%      | 40.34     | 20.90%      |
| idft       | 5.82         | 6.35          | -9.11%      | 5.54      | 4.81%       |
| mem_ctrl   | 6.74         | 3.03          | 55.04%      | 2.94      | 56.38%      |
| sasc       | 0.89         | 0.69          | 22.47%      | 0.49      | 44.94%      |
| ss_pcm     | 0.66         | 0.58          | 12.12%      | 0.48      | 27.27%      |
| tinyRocket | 78.1         | 12            | 84.64%      | 10.39     | 86.70%      |
| usb_phy    | 0.41         | 0.42          | -2.44%      | 0.32      | 21.95%      |
| Average    |              |               | 14.49%      |           | 27.44%      |

• LSTP can be used to find a *Better Optimization Sequence*.

#### Conclusion



#### In this paper, we proposed:

- A Machine Learning-Driven logic synthesis timing predictor.
- A Specialized GNN to sample and learn the intrinsic features of circuit AIG.
- An Appropriate Neural Model to model the complex interaction between optimization passes and their effects on the input netlist.

# THANK YOU!