A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute

Hossein Valavi, Peter Jeffrey Ramadge, Eric Nestler, Naveen Verma

Research output: Contribution to journalArticle

Abstract

Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input-output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as 8× 8=64 in-memory-computing neuron tiles, supporting up to 512, 3× 3× 512-input HL neurons and 64, 3× 3× 3-input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having 1.8× the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm2), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm2), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.

Original languageEnglish (US)
Article number8660469
Pages (from-to)1789-1799
Number of pages11
JournalIEEE Journal of Solid-State Circuits
Volume54
Issue number6
DOIs
StatePublished - Jun 1 2019

Fingerprint

Tile
Neurons
Particle accelerators
Convolution
Data storage equipment
Chemical activation
Network layers
Metals
Energy efficiency
Scalability
Clocks
Capacitors
Throughput
Neural networks
Oxides
Deep neural networks

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Cite this

@article{7a353c5582d242569d26945fa796ada0,
title = "A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute",
abstract = "Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input-output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as 8× 8=64 in-memory-computing neuron tiles, supporting up to 512, 3× 3× 512-input HL neurons and 64, 3× 3× 3-input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having 1.8× the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm2), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm2), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.",
author = "Hossein Valavi and Ramadge, {Peter Jeffrey} and Eric Nestler and Naveen Verma",
year = "2019",
month = "6",
day = "1",
doi = "https://doi.org/10.1109/JSSC.2019.2899730",
language = "English (US)",
volume = "54",
pages = "1789--1799",
journal = "IEEE Journal of Solid-State Circuits",
issn = "0018-9200",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute. / Valavi, Hossein; Ramadge, Peter Jeffrey; Nestler, Eric; Verma, Naveen.

In: IEEE Journal of Solid-State Circuits, Vol. 54, No. 6, 8660469, 01.06.2019, p. 1789-1799.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute

AU - Valavi, Hossein

AU - Ramadge, Peter Jeffrey

AU - Nestler, Eric

AU - Verma, Naveen

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input-output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as 8× 8=64 in-memory-computing neuron tiles, supporting up to 512, 3× 3× 512-input HL neurons and 64, 3× 3× 3-input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having 1.8× the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm2), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm2), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.

AB - Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input-output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as 8× 8=64 in-memory-computing neuron tiles, supporting up to 512, 3× 3× 512-input HL neurons and 64, 3× 3× 3-input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having 1.8× the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm2), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm2), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.

UR - http://www.scopus.com/inward/record.url?scp=85066442557&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066442557&partnerID=8YFLogxK

U2 - https://doi.org/10.1109/JSSC.2019.2899730

DO - https://doi.org/10.1109/JSSC.2019.2899730

M3 - Article

VL - 54

SP - 1789

EP - 1799

JO - IEEE Journal of Solid-State Circuits

JF - IEEE Journal of Solid-State Circuits

SN - 0018-9200

IS - 6

M1 - 8660469

ER -