

Analog AI @ IBM Research–Almaden San Jose, CA USA

### Accelerating Deep Learning with Analog Memory -A Device, Circuit and Systems Approach

Pritish Narayanan, Geoffrey W. Burr, Stefano Ambrogio, Hsinyu (Sidney) Tsai, Charles Mackin, and An Chen

IBM Almaden – San Jose, CA USA May 29, 2019



© 2019 International Business Machines Corporation

## The power of deep neural networks (DNN)



Deep neural networks can solve some problems beyond human level accuracy.

Image recognition:

Speech recognition:

Machine translation:

Uno no es lo que es por lo que escribe, sino por lo que ha leído You are not what you write, but what you have read

www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html

IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan**  ibm.biz/analog\_AI ibm.biz/AI\_hardwar

person

### **Deep Neural Networks**

Synaptic weight Input data (images, raw speech data, etc.) input to neural network "MNIST" database ~1998 → check-reading ATMs Fully trained network Forward inference: "This is a seven." Hardware opportunity: Efficient, **low-power** deployment  $\rightarrow$  IBM *TrueNorth* UN-trained network Training: "um.. I have no idea?" This is a **seven**."

Hardware opportunity: Train & use big networks FASTER and at LOWER POWER.

A Deep Neural Network contains multiple layers, ...

each layer containing many **neurons**, ... each neuron driven through many synaptic weight connections from other neurons.



### **Computation needed for DNN: "Multiply-accumulate"**





IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan**  ibm.biz/analog\_AI ibm.biz/AI\_hardware

NVM technologies include: MRAM (Magnetic RAM) PCM (Phase-Change Memory) RRAM (Resistance RAM)

Like conventional memory (SRAM/DRAM/Flash), an NVM is addressed one row at a time, to retrieve previously-stored digital data.



IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan** 

ibm.biz/analog\_AI bm.biz/AI\_hardware

### **Multiply-accumulate with Analog Memory**

IBM



IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan** 

7

IBM.DIZ/AI NAT<u>AWATE</u>

# **DNN in-situ training using analog memory**



W.

1) Forward Inference

Excitations (x) read weights W



 $X_1$ 

# **DNN in-situ training using analog memory**

IBM

1) Forward Inference

Excitations (x) read weights W

- Backpropagate errors
  Deltas (δ) read weights W<sup>T</sup>
- 3) Weight update Combine **x** and  $\delta \rightarrow \Delta W \propto x_i * \delta_j$



0

Òm

W.

# Value Proposition (vs. a GPU)



#### Accuracy **Low Power** Still of interest for power-(essential that final Deep-NN constrained situations: accuracy be indistinguishable (inherent in the physics, but possible to lose in the from GPUs – hardest learning-in-cars, etc. technical challenge) engineering...) Of zero interest Sweet spot: rather than Still of interest for some buy GPUs, people buy situations: learning-in-Of zero this chip instead for server-room interest training of Deep-NN's (circuitry must be Of zero Of zero massively parallel) interest interest Faster Analog AI @ May 29, 2019 10 **AI Hardware Center IBM Research–Almaden** Pritish Naravanan

### High DNN accuracy despite imperfect PCM devices





**Problem:** Conductance changes in PCM are ...

- uni-directional
- stochastic
- non-linear  $\rightarrow$  asymmetric



#### What do we really want?

### For training...

Gentle, symmetric conductance changes

### Our published results in DNN training w/ PCM

**2014** – IEDM  $\rightarrow$  **82%** w/ "mixed-hardware-software" experiment

2018 – Nature → 98% (e.g., software-equivalent!) w/ new unit-cell

IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan**  ibm.biz/analog\_AI ibm.biz/AI\_hardware

### Novel 2T2R + 3T1C unit cell



Symmetry

- → Weight update performed on g+ only
- g<sup>-</sup> shared among many columns (e.g. 128 columns)
- Dynamic Range
- Non-Volatility
- $\rightarrow$  Weight transferred to PCMs infrequently (every 1000s of images)
- "CMOS variabilities" → Counteracted by "Polarity Inversion" technique

 $\rightarrow$  Gain factor F (e.g. F = 3)

IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan**  S. Ambrogio et al., *Nature*, 558, 60 (2018)

ibm.biz/analog\_AI bm.biz/AI\_hardware

### Accuracy on MNIST and MNIST with noise







#### S. Ambrogio et al., *Nature*, 558, 60 (2018)

IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan** 

ibm.biz/analog\_AI ibm.biz/AI\_hardware





### High DNN accuracy despite imperfect PCM devices







**Problem:** Conductance changes in PCM are ...

- uni-directional
- stochastic
- non-linear  $\rightarrow$  asymmetric



#### What do we really want?

For training...

Gentle, symmetric
 conductance changes

#### For inference...

- Precise tuning
- High yield
- No change over time

#### Our recent results in DNN inference w/ PCM

**2019** – Adv. Electr. Mater. → programming schemes for 4 PCM devices (simulations)

2019 – VLSI Tech. Symp. → software-equivalence in "mixed-hardware-software" experiment with Long-Short Term Memory (LSTM) networks (T8-1: Wed. June 12<sup>th</sup>, 10:30am)

IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan**  ibm.biz/analog\_AI ibm.biz/AI\_hardware

### **Programming strategies for multi-PCM weights**



- Minimize computation expense
- Minimize area cost
- 2 bits per weights (p, s)
- Program entire row in parallel



C. Mackin et al., *Adv. Electr. Mater.*, 1900026 (2019)

### Impact on Network Accuracy

- Two different types of networks
- Multiple parameters
- Software-equivalent accuracy despite NVM Variability



 $\mu_{G_{max}}, \sigma_{G_{max}}, \mu_{S_G}, \sigma_{S_G}$ 

Device:

### IBM Research worldwide team: A comprehensive approach to Analog Al



### **IBM Research AI Hardware Center**

IBM



AI Hardware Center About Research areas Partners Demos Leadership Contact

IBM Research

### AI Hardware Center

The IBM Research AI Hardware Center is a global research hub headquartered in Albany, New York. The center is focused on enabling next-generation chips and systems that support the tremendous processing power and unprecedented speed that AI requires to realize its full potential.

Explore AI hardware demo

ead announcement blog

### *ibm.biz/AI\_hardware*

### www.ibm.com/blogs/research/ 2019/02/ai-hardware-center/

IBM Research AI Hardware Center Analog AI @ IBM Research–Almader May 29, 2019 Pritish **Narayanan**  ibm.biz/analog\_AI ibm.biz/AI\_hardware

19



8 ≡

Q

Explore AI hardware demo

### Where are we on the Roadmap?

IBM



Al roadmap from IBM AI Hardware Center announcement

www.ibm.com/blogs/research/2019/02/ai-hardware-center/

H.-Y. Chang et. al, *IBM J. R&D,* invited paper, accepted May 2019

IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan**  ibm.biz/analog\_AI ibm.biz/AI\_hardware

### How can we further improve energy efficiency w/ NVM devices?

- 1) **Reduce average NVM conductance**  $\rightarrow$  reduces array currents during Multiply-Accumulates
  - $\rightarrow$  Current focus of various material and device design efforts
- 2) Reduce technology node

90nm -> 14nm

Benefits even just from scaling of routing energy

Area efficiency for inference: 10-70 TOPs/sec/mm<sup>2</sup>

(vs. ~0.3 TOPs/sec/mm2 for TPU v1: In-Datacenter Performance Analysis of a Tensor Processing Unit)

> H.-Y. Chang et. al, IBM J. R&D, invited paper, accepted May 2019





# Conclusion

- NVM-based crossbar arrays can accelerate Deep Machine Learning compared to GPUs
  - Multiply-accumulate performed at the data  $\rightarrow$  saves power and time
  - But conventional NVM devices (like PCM) are imperfect...
- Recent training results
  - Mixed-hardware-software experiments → software-equivalent training accuracy
    - 2T2R+3T1C unit cell
    - "polarity inversion" technique
    - MNIST, MNIST-backrand, CIFAR-10 and CIFAR-100 tested (S. Ambrogio et al, Nature, 558, 60 (2018))

#### Recent inference results

- Programming strategies for 4-PCM-based weights
- Mixed-hardware software experiments on LSTM

(C. Mackin et al., *Adv. Electr. Mater.*, 1900026 (2019)) (H. Tsai et al., *VLSI Tech. Symp.* (2019))

- Recent power projections based on real circuit designs
  - 100x better energy efficiency (+ 100x speedup) on fully-connected layers (for LSTM and other networks)

(H.-Y. Chang et al., IBM J. R&D, (2019))

### pnaraya@us.ibm.com

IBM Research AI Hardware Center Analog AI @ IBM Research–Almaden May 29, 2019 Pritish **Narayanan** 

### **Acknowledgements**





Geoffrey Burr





Bob Shelby Narayanan



Ambrogio



An Chen



Charles Mackin



Hosokawa



Scott Lewis

### **Management Support**

Hsinyu

Tsai



Vijay Narayanan

Heike

Riel



Matthew **BrightSky** 



Kumar



Spike Narayan



Winfried Wilcke



**Bulent** Kurdi

IBM Research **AI Hardware Center** 

# **IBM Research–Almaden**

Haensch

Pritish Narayanan

### **NVM-for-Machine Learning: Recent & upcoming papers**



- G. W. Burr, R. M. Shelby et al., "Neuromorphic computing using non-volatile memory," *Advances in Physics X*, 2(1), 89-124 (2017).
  - Review of the NVM-for-neuromorphic field as a whole...
- 2. P. Narayanan, A. Fumarola, et al., "Towards on-chip acceleration of the backpropagation algorithm using non-volatile memory," *IBM Journal of Research and Development*, **61**(4/5), 11:1-11 (**2017**)
  - · Summarizes the circuit design challenges
- 3. H. Tsai, S. Ambrogio, et al., "Recent progress in analog memory-based accelerators for Deep Learning," *Journal of Physics D*, 51(28), 283001 (2018)
  - Review & overview paper
- 4. S. Ambrogio, P. Narayanan, et al., "Equivalent-accuracy Neuromorphic Hardware Acceleration of Neural Network Training using Analog Memory," *Nature*, **558**(7708), 60 (**2018**)
  - Demonstrate software-equivalent accuracy on training of Fully-Connected networks w/ PCM-based mixed hardware-software experiment
- 5. G. Cristiano, M. Giordano, et al., "Perspective on training fully connected networks with resistive memories: Device requirements for multiple conductances of varying significance," *Journal of Applied Physics*, **124**(15), 151901 (**2018**)
  - · How does our multiple-conductance idea change the specifications for NVM devices needed for training?
- 6. C. Mackin, H. Tsai,, et al., "Weight Programming in DNN Analog Hardware Accelerators in the Presence of NVM Variability," *Advanced Electronic Materials,* 1900026 (2019)
  - How to accurately program multiple-conductance weights using NVM devices with device-to-device variability?
- 7. H. Tsai, S. Ambrogio, et al., "Inference of Long-Short Term Memory networks at software-equivalent accuracy using 2.5M analog Phase Change Memory devices," *VLSI Technology Symposium,* to be given (2019)
  - Demonstrate software-equivalent accuracy on inference of LSTM networks w/ PCM-based mixed hardware-software experiment
- 8. H.-Y. Chang, P. Narayanan, et al., "Al hardware acceleration with analog memory: micro-architectures for low energy at high speed," *IBM Journal of Research and Development*, to appear (2019)
  - Micro-architectural approaches that lead to both high energy efficiency AND large DNN acceleration

IBM Research AI Hardware Center May 29, 2019 Pritish **Narayanan**  bm.biz/analog\_AI m.biz/AI\_hardware

ibm.biz/analog Al

### **NVM-for-Machine Learning: IBM Collaborators**

- 9. S. Kim et al., "Analog CMOS-based Resistive Processing Unit for Deep Neural Network Training", arXiv, preprint 1706.06620
- 10. T. Gokmen et al., "Acceleration of deep neural network training with resistive cross-point devices: design considerations", **Frontiers in Neuroscience**, vol. 10, page 333, Jul 2016
- 11. Y. Li et al., "Capacitor-based Cross-point Array for Analog Neural Network with Record Symmetry and Linearity", VLSI Technology Symposium 2018
- 12. M.L. Gallo et al., "Mixed-precision training of deep neural networks using computational memory", **arXiv preprint** 1712.01192
- 13. I. Boybat et al., "Neuromorphic computing with multi-memristive synapses", **Nature communications**, vol. 9(1), page 2514, June 2018
- 14. A. Sebastian et al., "Temporal correlation detection using Computational Phase Change Memory, **Nature Communications**, vol. 8, page 1115, Oct 2017
- 15. S. R. Nandakumar et al., "Supervised learning in spiking neural networks with MLC PCM synapses", **Device Research Conference**, 2017
- 16. Gong et al., "Signal and Noise Extraction from Analog Memory Elements for Neuromorphic Computing", **Nature communications**, vol. 9(1), page 2102, May 2018
- 17. M. Salinga et al., "Monatomic phase change memory", Nature Materials, vol. 17, page 681-695, June 2018
- 18. I. Giannopoulos et al., "8-bit Precision In-Memory Multiplication with Projected Phase-Change Memory", **IEDM** 2018
- 19. J. Tang et al., "ECRAM as Scalable Synaptic Cell for High-Speed, Low-Power Neuromorphic Computing", IEDM 2018

### NVM-for-Machine Learning: (Some) Non-IBM Work



- S.B. Erylimaz et al., "Neuromorphic architectures with electronic synapses", International Symposium on Quality Electronic Design (ISQED), Mar 2016
- S. B. Eryilmaz, et al., "Device and system level design considerations for analog-non-volatile-memory based neuromorphic architectures," IEEE International Electron Devices Meeting (IEDM) 2015, pp. 4.1.1-4.1.4.
- S. Yu, "Neuro-Inspired Computing With Emerging Nonvolatile Memorys," in Proceedings of the IEEE, vol. 106, no. 2, pp. 260-285, Feb. 2018.
- P. Y. Chen, et al., "NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures," IEEE International Electron Devices Meeting (IEDM) 2017, pp. 6.1.1-6.1.4.
- E. J. Fuller et al., "Li-Ion Synaptic Transistor for Low Power Analog Computing", Advanced Materials, 29(4), 2017
- S. Agarwal *et al.*, "Achieving ideal accuracies in analog neuromorphic computing using periodic carry," Symposium on VLSI Technology, 2017, pp. T174-T175.
- <u>https://cross-sim.sandia.gov/</u>
- X. Guo *et al.*, "Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology," IEEE International Electron Devices Meeting (IEDM) 2017, pp. 6.5.1-6.5.4.
- M. Prezioso et al., "Training and operation of an integrated neuromorphic network based on metal-oxide memristors" *Nature*, vol. 521, pp. 61–64, 2015
- K. Moon *et al.*, "High density neuromorphic system with Mo/Pr0.7Ca0.3MnO3 synapse and NbO2 IMT oscillator neuron," IEEE International Electron Devices Meeting (IEDM) 2015, pp. 17.6.1-17.6.4.
- C. LI, Analogue signal and image processing with large memristor crossbars, Nature Electronics, vol. 1, pp. 52–59, 2018.
- S. Ambrogio, "Spike-timing dependent plasticity in a transistor-selected resistive switching memory", Nanotechnology 24 384012, 2013.
- E. Vianello et al., "Resistive Memories for Spike-Based Neuromorphic Circuits," 2017 IEEE International Memory Workshop (IMW), 2017

#### IBM Research AI Hardware Center

#### Analog AI @ IBM Research–Almaden

#### May 29, 2019 Pritish **Narayanan**