CHIPS: Custom Hardware Instruction Processor Synthesis

Atasu, Kubilay; ÖZTURAN, CAN; DÜNDAR, GÜNHAN; Mencer, Oskar; Luk, Wayne

doi:10.1109/tcad.2008.915536

CHIPS: Custom Hardware Instruction Processor Synthesis

Atasu K., ÖZTURAN C., DÜNDAR G., Mencer O., Luk W.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, cilt.27, sa.3, ss.528-541, 2008 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 27 Sayı: 3
Basım Tarihi: 2008
Doi Numarası: 10.1109/tcad.2008.915536
Dergi Adı: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.528-541
Anahtar Kelimeler: Application-specific instruction-set processors (ASIPs), Custom instructions, Customizable processors, Extensible processors, Integer linear programming (ILP), Optimization algorithms
Boğaziçi Üniversitesi Adresli: Evet

Özet

This paper describes an integer-linear-programming (ILP)-based system called Custom Hardware Instruction Processor Synthesis (CHIPS) that identifies custom instructions for critical code segments, given the available data bandwidth and transfer latencies between custom logic and a baseline processor with architecturally visible state registers. Our approach enables designers to optionally constrain the number of input and output operands for custom instructions. We describe a design flow to identify promising area, performance, and code-size tradeoffs. We study the effect of input/output constraints, register-file ports, and compiler transformations such as if-conversion. Our experiments show that, in most cases, the solutions with the highest performance are identified when the input/output constraints are removed. However, input/output constraints help our algorithms identify frequently used code segments, reducing the overall area overhead. Results for 11 benchmarks covering cryptography and multimedia are shown, with speed-ups between 1.7 and 6.6 times, code-size reductions between 6% and 72%, and area costs ranging between 12 and 256 adders for maximum speed-up. Our ILP-based approach scales well: benchmarks with basic blocks consisting of more than 1000 instructions can be optimally solved, most of the time within a few seconds. © 2008 IEEE.