EDA² | ISEDA 2025 | Technical Session

Technical Session

Technical Session 12

16:00-18:00 | May 10, 2025 @ Sleeping Beauty 1/2

Advanced Floorplanning and Macro Placement

Session Chair: Hailong Yao, University of Science and Technology Beijing

IncreDFlip: Incremental Dataflow-Driven Macro Flipping for Efficient Macro Placement Refinement

Invited Speaker: Xiaotian Zhao, Shanghai Jiao Tong University

Abstract: Macro flipping is a simple yet effective action for improving wirelength and timing during chip floorplanning. However, in modern macro or mixed-size placers, flipping is often just one tool among many in a broad toolkit aimed at optimizing wirelength within certain thresholds. This approach has created an expansive search space due to the growing number of macros, leading to less-than-ideal macro placement outcomes because the significance of macro flipping is frequently underestimated. In this paper, we highlight the importance of incremental macro flipping and introduce IncreDFlip, a methodology that leverages dataflow information to narrow the search space and utilizes dataflow decomposition from the synthesized netlist to guide flipping decisions. Drawing inspiration from human floorplanning strategies, we further propose fine-tuning techniques to enhance dataflow-driven flipping actions. The combined approach achieves an average improvement of 4.87% in total routed wirelength compared to non-flipped macro placement results, outperforming flipping actions in state-of-the-art macro and mixed-size placers, including DREAMPlace 4.1, AutoDMP, and a commercial tool by 5.03%, 4.48%, and 2.06%, respectively. Additionally, it offers a 4.30% improvement in routed wirelength over the flipping-aware dataflowdriven placer, Hier-RTLMP. Furthermore, our method delivers an average improvement of 14.00% in worst negative slack (WNS) and 32.48% in total negative slack (TNS) after routing, along with a 43.59% reduction in runtime compared to Hier-RTLMP.

Multi-Row Standard Cell Layout Synthesis with Enhanced Scalability

Presenter: Kairong Guo, Peking University

Abstract: Multi-row standard cells are widely adopted in advanced technology nodes, especially for complicated and large cells like multi-bit flip-flops(MBFFs). Due to reduced cell heights and routing tracks, designing standard cell layouts in advanced technology nodes becomes increasingly challenging. Automatic standard cell layout synthesis is being actively explored. However, existing methods face scalability issues when synthesizing large-scale multi-row cells. In this paper, we propose a multi-row cell layout synthesis flow that addresses such scalability issue through a hierarchical approach, including transistor clustering, SMT-based row assignment, transistor-level and cluster-level placement, and genetic algorithm based sequential routing, which collectively enables efficient handling of large-scale designs. Experimental results on an industrial 7nm FinFET library demonstrate the ability to handle designs with up to 152 transistors, achieving area reductions up to 23% on large MBFF cells and reducing runtime by up to 36× compared to prior methods.

Point-Cap: An Efficient Model for Chip-scale Interconnect Capacitance Extraction

Presenter: Weizhe Zhang, Chinese University of Hong Kong, Shenzhen

Abstract: In this paper, we present a PointNet++-based method for capacitance extraction (Point-Cap) of chip-scale interconnects with high efficiency and accuracy. By modeling the layout structure as point-cloud-like data, the process of gathering features of conductors to the net level can be done efficiently and automatically by our model and then utilized to predict precise total capacitance and coupling capacitance. Compared to the previous state-of-the-art work, GNN-Cap, Point-Cap reduces the average relative errors in the total capacitance and coupling capacitance calculations by 28.8% and 38.6%, respectively.

NOIP: Node Overlay Initial Partitioning Technique for Hypergraph Partitioning Problem

Presenter: Jing Tang, Xidian University

Abstract: The hypergraph partitioning problem with the goal of minimizing the cut-size has extensive applications in the field of EDA. As an NP-hard problem, the industry commonly uses heuristic-based multilevel partitioning schemes to solve this problem. However, some well-known partitioners, like hMETIS and KaHyPar, each stage in the partitioning process is relatively independent, failing to make full use of the existing partitioning information. We propose a novel initial partitioning algorithm named Node Overlay Initial Partitioning (NOIP). This algorithm uses the existing partitioning results in the initial partitioning to find the fixed vertices at the maximum degree of overlay, and these fixed vertices are used to guide the subsequent initial partitioning. Through experiments based on Titan23, the partitioner integrated with NOIP can achieve a cut-size 0.86 times that of hMETIS within comparable time. Meanwhile, its characteristic of making full use of existing information during the partitioning process also brings new idea to the multilevel partitioning scheme, which enables the partitioner to achieve better results without consuming more time and resources.

DieRouter+: Enhancing Die-Level Routing with SOCP and Scheduler-Driven DP

Presenter: Qifu Hu, Shandong Massive Information Technology Research Institute

Abstract: Verifying functional correctness is a key challenge in VLSI design. FPGA prototyping balances runtime and cost in logic verification and is widely used for large-scale circuit verification. To accommodate the increasing complexity of circuits, it extends from a single FPGA to a Multi-FPGA System (MFS), a network of interconnected FPGA dies. Die-level routing, which determines a routing topology and its corresponding Time-Division Multiplexing (TDM) assignment scheme, is crucial for maximizing the operating frequency of MFS. This paper presents DieRouter+, an improved die-level router built upon DieRouter, the first published solution to this problem. DieRouter+ introduces three key innovations: (1) a simpler yet more effective initial routing method based on shortest path trees, (2) a Second-Order Cone Programming formulation of an extended relaxed TDM assignment problem to compute optimal continuous TDM ratios, thereby improving the optimality of the legalized TDM assignment derived from these ratios, and (3) a scheduler-driven Dynamic Programming (DP) based legalization technique that adaptively schedules state evaluations, reducing the number of state evaluations by about 50% compared to the original DP in DieRouter. We evaluate DieRouter+ on 10 benchmark test cases from the 2023 EDA Elite Design Challenge. It outperforms both the competition’s 1st-place method and DieRouter, reducing the maximum net delay by 10.12% and 5.71% on average for 5 challenging cases with up to millions of nets, while maintaining comparable performance on easier cases with at most a few thousand nets. Additionally, for large-scale instances, DieRouter+ achieves a 40% speedup over DieRouter. These results confirm the effectiveness of DieRouter+ in optimizing die-level routing.