Central Processing Unit
FLOPS
Artificial Intelligence
Half-Precision Floating-Point Format
Integrated Circuit
Single-Precision Floating-Point Format
Deep Learning
Baidu
TensorFlow
Gigabyte
Software Development Kit
PCI Express
NVIDIA
Computing
Alibaba Group
Bitmain
Compiler
Huawei
Computer Vision
AI Accelerator
Apache MXNet
PyTorch
Unit of Data Rate
Server
DDR SDRAM
Artificial Neural Network
Mathematical Optimization
Random-Access Memory
Software Framework
High Bandwidth Memory
Software
C++
Semiconductor Device Fabrication
Integer Data Type
Computer Hardware
Nvidia Tesla
14-Nanometre
Computer Performance
Throughput
Arithmetic Logic Unit
Network Card
Application-Specific Integrated Circuit
Application Programming Interface
RISC-V
7-Nanometre
Hertz
Proprietary Software
Supercomputer
Computer Cluster
Data Compression
GeForce 10 Series
Data Center
Thermal Design Power
GeForce 20 Series
DDR4 SDRAM
Onnx
Cloud Computing
Machine Learning
Megabyte
Open-Source Software
Mobile DDR
Data Type
32-Bit
Input/output
Beijing
Liu Cixin
CPU Cache
Google
Matrix Multiplication
Systolic Array
Baidu Cloud
Bandwidth
Punched Card
Software Library
Three-Body Problem
Run Time
Annapurna Labs
Amazon Web Services
16-Bit Architecture
Builder's Old Measurement
Program Optimization
Kernel
Algorithmic Efficiency
Graphics Processing Unit
Samsung Electronics
Data
Latency (Engineering)
Computer
Apache License
Tensor Processing Unit
Texas Instruments
Computer Architecture
Batch Processing
System on a Chip
Computer Program
Hardware Acceleration
Neural Network
Bus
Semiconductor Fabrication Plant
Object
File Format
Vector Graphic
End-User
MOSFET
Cryptography
Programmer
Scheduler
Porting
Authentication
Execution
Source Code

Hardware for Deep Learning. Part 4: ASIC - Intento

blog.inten.to
16 min read
standard
This is a part about ASICs from the "Hardware for Deep Learning" series. The content of the series is here. As of beginning 2021, ASICs now is the only real alternative to GPUs for 1) deep learning…
References:

AWS

Amazon has its own solutions for both training and inference.

AWS Inferentia

AWS Inferentia was announced in November 2018. It was designed by Annapurna Labs, a subsidiary of Amazon.

Each AWS Inferentia chip contains four NeuronCores. Each NeuronCore implements a high-performance systolic array matrix multiply engine (as Google TPU). NeuronCores are also equipped with a large on-chip cache (but the exact numbers are unknown). [source]

AWS Inferentia supports FP16, BF16, and INT8 data types. Furthermore, Inferentia can take a 32-bit trained model and automatically run it at the speed of a 16-bit model using BF16.

Each chip can deliver 64 TFLOPS on FP16 and BF16, and 128 TOPS on INT8 data. (source)

You can have up to 16 Inferentia chips per EC2 Inf1 instance. Inferentia is optimized for maximizing throughput for small batch sizes, which is beneficial for applications that have strict latency requirements.

The AWS Neuron SDK consists of a compiler, run-time, and profiling tools. It enables complex neural net models, created and trained in popular frameworks such as TensorFlow, PyTorch, and MXNet, to be executed using Inf1 instances. AWS Neuron also supports the ability to split large models for execution across multiple Inferentia chips using a high-speed physical chip-to-chip interconnect.

The technical details on Inferentia are very scarce.

References:

AWS Trainium

December 1st, 2020 Amazon announced its AWS Trainium chip.

AWS Trainium is the second custom machine learning chip designed by AWS and it's targeted at training models in the cloud.

AWS Trainium shares the same AWS Neuron SDK as AWS Inferentia, so it's integrated with TensorFlow, PyTorch, and MXNet.

AWS Trainium will be available in 2021.

For now, almost no technical details are available.

References:

Huawei Ascend

Huawei has its own solutions for both training and inference as well. The lineage of AI products is pretty vast, but we'll focus on accelerator cards…
Grigory Sapunov
Read full article