Building an open ecosystem for AI requires asking questions and only a rough consensus for the answers
During last week’s podcast, “The Search for the Transformer Model Format”, someone from the audience asked a very understandable question: “Why can’t we just use what’s already here? Why do we need a new standard?” You can listen to the question and mine and Roman’s specific answer here. However, the nature of these kinds of questions inspired us to write this blog.
It's typical for people to ask “Why not just?” questions. They stem from folks seeking to evolve from what they already know or have invested in to avoid disruption. This thinking makes sense in mature markets and technology products with significant adoption. However, we at AIFoundry.org feel that it's still in the early days of Generative AI for reasons we will discuss later in this blog. So, we believe it is essential to be extremely curious, and to ask questions about the assumptions people are making and supposed restrictions people think they’re seeing. It's important to ask both the why and why not questions.
AI has been an academic topic in Computer Science for decades. Its with the recent arrival of Generative AI that its now receiving new attention and excitement. Before now, AI products were rarely expected to meet requirements for stability, availability, or efficiency. AI simmered inside academia and big tech research labs. As a result, the machine learning community had little in common with the software engineering and DevOps communities.
Now, the inventors of AI find themselves having to service end-user adoption. They are on the hard road of turning their inventions into innovations. This means having to adopt practices and expectations of the mature engineering world, integrating into existing processes and environments, and having to satisfy the security and operability concerns of customers – all new requirements.
When you go to AI meetups and talk to startup founders or investors about AI these days, you’ll hear the typical positioning such as being the DockerHub for AI models, or the Kubernetes for AI infrastructure. It seems the best anybody can do is explain their approach in the language of the last generation’s ideas. We would argue that this thinking limits the imagination, similar to when people used to call automobiles horseless carriages. This is why we advocate for questioning everything about AI, not just seeing how we could make do with the current state of technology.
Nothing is settled in AI yet, so everything is still fair game. The main difference from the previous tech waves is that generative AI is advancing extremely fast, all at once simultaneously. Previously, technologies took years to grow communities and gain adoption. The technologies and products had time to grow with their markets.
AI, on the other hand, is under pressure to be adopted, scaled, supported, and stable even though some essential elements depend on a few individuals in their world for their development. Take for example, Llama.cpp, the widely adopted inference engine that allows for running LLMs on smaller systems such as laptops. The actual committers of Llama.cpp are a small insolar group that is rife with its own internal drama. It lacks the stability of a foundation or other organization to ensure its ongoing survival, unlike mature open source projects such as PostgreSQL.
We want this chaotic market to stabilize itself before it crumbles down. Many try to apply best practices from approaches in the past to avoid the pain of learning them from scratch. It would be great if there were a well-developed ecosystem of support for AI, as Docker provided for containers and Redhat built for Linux, but we have yet to arrive there.
To speed up innovation while reducing chaos, we do need some areas of agreement, or at least rough consensus—common standards, ways to collaborate, knowledge transfer, and shared ecosystem elements that allow innovation to occur in parallel in interconnected ways..
That’s why when we hear questions like “Why don’t you just use PyTorch for inference?”, what comes to our mind is the industry complaints about the amount of work it takes to optimize the hardware implementations for inference with Python. If you have money and time, you can surely take your turn optimizing CUDA, leaving aside your dependency on Nvidia for now. But, what if we take a step back and challenge why Pytorch has won so far and see if there might be better alternatives for other use cases and if these alternatives provide even better coverage for many industry use cases?
We’ll leave you with one last thought: when you find yourself asking “why/why not” questions, consider giving weight to answers that allow more people to contribute and adopt and accelerate innovation and openness. Don’t just default the first emerging status quo that gives more power to early monopolists to prematurely build moats.
If this kind of discussion interests you, we invite you to join our future AIFoundry.org podcasts as scheduled in the calendar below. We hope to see you there!
AIFoundry.org Lu.ma calendar: https://lu.ma/aifoundryorg
Join our Discord: https://discord.gg/rxDX7hr5Xs