Architecture and Automation for Low-cost Safety-critical Systems

Cost pressure is driving vendors of safety-critical systems to integrate previously distributed systems. On-Demand Redundancy (ODR) supports such integration while keeping costs low by allowing safety-critical and non-critical tasks, traditionally isolated to limit interference, to execute on shared resources. However, some workloads show little benefit from such system-level techniques; in these cases, microarchitectural reliability techniques that enable processors to partially or completely self-check may be the best way to achieve low-cost safety. Our research explores system- and microarchitecture-level reliability techniques, how they interact, and how they are best combined to achieve cost-effective fault coverage.

Selected publications

Jonah Caplan, Maria Isabel Mera, Peter Milder, Brett H. Meyer, "Trade-offs in Execution Signature Compression for Reliable Processor Systems," in the Proceedings of the Conference on Design, Automation, and Test in Europe, DATE '14, March 2014

Brett H. Meyer, Nishant George, Benton Calhoun, John Lach, Kevin Skadron, "Reducing the Cost of Redundant Execution in Safety-Critical Systems using Relaxed Dedication," DATE'11, March 2011.

Brett H. Meyer, Benton Calhoun, John Lach, Kevin Skadron, "Cost-effective Safety and Fault Localization using Distributed Temporal Redundancy," CASES'11, October 2011.

Lifetime and Yield Optimization

Design-space Exploration for Embedded MPSoCs

As manufacturing processes scale, designers are increasingly dependent on designs techniques that mitigate manufacturing defect and permanent failure. One advantage of managing failure at the system-level is that once the location of a failure has been identified, the cause can be abstracted away: whether a component is defective or fails in the field, in many cases the same techniques and resources can be used to avert system failure. In this project, we investigate the allocation of slack-- under-utilization in execution and storage resources-- for the purpose of system lifetime extension and manufacturing yield improvement.

Selected publications

Brett H. Meyer, Adam S. Hartman, Donald E. Thomas, "Cost-effective Lifetime and Yield Optimization for NoC-based MPSoCs," in ACM Transactions on Design Automation of Electronic Systems (TODAES), 19(2), April 2014.

Brett H. Meyer, Adam S. Hartman, Donald E. Thomas, "Architecture and Automation Insights for System-Level Lifetime and Yield Optimization in NoC-based MPSoCs," DFM&Y'09, July 2009.

Brett H. Meyer, Adam S. Hartman, Donald E. Thomas, "Cost-effective Slack Allocation for Lifetime Improvement in NoC-based MPSoCs," DATE'10, March 2010.

Brett H. Meyer, Adam S. Hartman, Donald E. Thomas, "Slack Allocation for Yield Improvement in NoC-based MPSoCs," ISQED'10, March 2010.

Yield Improvement for Parallel Architectures

Recent research as suggested that as more processor cores are incorporated on single chips, the appropriate granularity of redundancy for the purpose of failure and defect mitigation is at the system-level. We leverage this fact above, but have observed that some systems benefit from a combination of system-level and microarchitetural redundancy. In this project, we investigate the relationship between parallel application, parallel architecture (and single-instruction, multiple-thread architectures in particular), and redundancy allocation, based on the observation that as the demand for types of parallel resources changes (e.g., from many narrow cores to few wide cores), so ought the mix of redundant components (e.g., from redundant cores to cores with redundant lanes).

Selected publications

Daniel A. Epstein, Kevin Skadron, Brett H. Meyer, "Multi-Granularity Redundancy in Multi-Core SIMT," DFM&Y'12, June 2012.

Daniel A. Epstein, Kevin Skadron, Brett H. Meyer, "SIMD Performance and Yield Optimization with Multi-granularity Redundancy," in the Work-in-Progress Session at the 49th IEEE/ACM Design Automation Conference, June 2012.

VoltSpot: Power-delivery Network Modeling and Optimization

In future CMOS technology nodes, threshold and supply voltages are not scaling down as fast as device density is increasing. Higher current density and total current place greater demands on the power-delivery network (PDN); current-related chip phenomena such as electromigration (EM), resistive current (IR) drop, and inductive transient current (Ldi/dt) noise all get worse with higher current and larger current swings.

VoltSpot is an architecture-level model of the on-chip PDN including C4 pads, with a simple interface for use in other architecture-level tools. VoltSpot, when integrated with a performance simulator (such as gem5) and power estimation tool (such as McPAT), provides architects with the tools necessary to explore the effect of PDN design, including C4 pad allocation to VDD, GND and I/O and PDN metal width. VoltSpot also supports the exploration of run-time IR drop and Ldi/dt noise prediction, avoidance, and mitigation.

Selected publications

Ke Wang, Brett H. Meyer, Runjie Zhang, Kevin Skadron, Mircea Stan, "Managing C4 Placement for Transient Voltage Noise Minimization," in the Proceedings of the Design Automation Conference, DAC '14, June 2014.

Ke Wang, Brett H. Meyer, Runjie Zhang, Kevin Skadron, Mircea Stan, "Walking Pads: Fast Power-supply Pad-placement Optimization," in the Proceedings of the 19th Asia and South Pacific Design Automation Conference, ASP-DAC '14, January 2014. (Best Paper Candidate)

Runjie Zhang, Brett H. Meyer, Wei Huang, Kevin Skadron, Mircea R. Stan, "Some Limits of Power Delivery in Multicore Era," in the Proceedings of the 4th Workshop on Energy-Efficient Design, WEED'12, June 2012.

ArchFP: Architectural Floorplanning for Early Design Analysis

ArchFP is a simple, easy to use, architect-directed floorplanning tool. Floorplanning tools grew out of a need to automate the placement of standard cells and module in large, complex designs. Floorplans are often needed in order to estimate design area, performance, power, temperature, and therefore reliability. System architects need a way to generate floorplans for the same reason but system-level floorplans, which often consist of only a handful of blocks, often placed in some regular way (e.g., tiled cores), are poorly explored by tools designed to manage the complexity of thousands of blocks. ArchFP gives a system architect a tool that leverages their knowledge of the system and quickly produces a floorplan that can be used for further analysis.

Selected publications

Greg Faust, Brett H. Meyer, and Kevin Skadron, "Rapid Prototyping of CMP Floorplans: A Technical Report," Tech. Report CS-2012-02, UVA Dept. of Computer Science, March 2012.

Gregory G. Faust, Runjie Zhang, Kevin Skadron, Mircea R. Stan and Brett H. Meyer, "ArchFP: Rapid Prototyping of pre-RTL Floorplans," VLSI-SOC'12, October 2012.