A Skeleton Key for AI Hardware Experimentation

6 min read
fairly difficult
For those who are at the intersection of AI hardware and software, the open source Apache TVM effort is already well known and used among a number of
chipmakers as well as developers. It is a machine learning compiler framework that can meet devices at the edge to datacenter with optimized configurations no matter the target hardware.

If it wasn't already in use by AMD, Qualcomm, Arm, Xilinx, Amazon, and many others, it might smack of that "magic compiler" mojo some of the AI chip startups began with a few years ago. The idea that machine learning models don't need to be uniquely hand-tailored to individual hardware devices expanding potential for hardware startups and established vendors alike. And now might be its time to really shine as a standard base for new AI hardware to roll into production without the heavy burden on users to adopt an architecture-specific approach. There are, after all, plenty of devices for ML acceleration to choose from.

"There has been a proliferation of hardware targets and that has been fragmented, and so too has the software ecosystem around those. There's TensorFlow, Keras, PyTorch, and so on not to mention the increasingly complex interplay between ML models, software frameworks, and hardware," says Luis Ceze, professor at the University of Washington and co-founder and co-CEO of TVM-driven startup, OctoML. "The way these software stacks work now are use case specific or hardware specific (cuDNN, ROCm, etc) and these are all optimized by hand via an army of low-level engineers scrubbing linear algebra codes that connect the ML operator in the models into the hardware with a tuned hardware library." This has worked well for the vendors but in terms of capitalizing on the "Cambrian Explosion" of hardware devices, it is limiting.

In essence, TVM is a compiler plus runtime stack with a collection of intermediate representations that translate the models expressed in high-level frameworks (TensorFlow, PyTorch, etc.) into something that can be "re-targetable" to different hardware architectures—anything from server-class GPUs to low end mobile CPUs or even MIPS or RISC-V. This is…
Read full article