This podcast reviews Google's 2017 seminal research paper, “Attention Is All You Need,” which we see as the seed of the current round of innovation in generative AI. The simplified transformer architecture described in this paper allows for more efficient training while producing higher-quality results.
We start the conversation by examining why an architecture such as transformers is needed in AI use cases. Roman recalls the open-source software project Apache TVM (Tensor Virtual Machine), an abstraction layer that decouples deep learning applications from hardware implementations.
A standard architecture such as Transformers offers us a chance to standardize a kernel's execution, allowing us to optimize hardware and software for acceleration and cost reduction.
“Running around, reinventing the wheel all the time is just not good from any perspective” adds Roman.
Roman suggests we need a standard AI kernel, similar to a Java Virtual Machine (JVM). The JVM provides an abstraction that allows language developers on the right the freedom to support applications and then convert them all into standard JVM bytecode. This also allowed hardware developers on the left to support execution in hardware directly.
An open-standard AI kernel could similarly unlock more innovation at both the application, library, and hardware levels, helping us further improve performance while driving costs down.
We then jump into a quick overview of Google's paper from 2017 and discuss Transformer Architecture, which radically simplifies neural networks while also improving parallelism. This provides better results without greatly increasing computational costs. Roman points out that this architecture wasn’t designed for generative AI but specifically for machine translation. Transformer architecture is easier to understand in that context. Like a human, you parse the text and then generate the translated text.
This new architecture radically simplifies neural networks while also improving parallelism. This provides better results without greatly increasing computational cost.
Roman points out that this architecture wasn’t designed for generative AI, but specifically for machine translation. Transformer architecture is easier to understand in that context. Like a human, you parse the text then generate the translated text.
It turns out that this simplified architecture is useful for a wide range of applications, from generative AI to computer vision. Already the industry is beginning to standardize on this architecture, and there’s an opportunity to use this as the basis of creating an abstraction layer between software and hardware.
I recommend checking out the video replay of this talk if this topic interests you. Also, please subscribe to the AIFoundry.org calendar to keep informed of our upcoming podcast episodes and community AI Hack Labs.