Welcome to another episode of our podcast! Today, we’re diving deep into the intricacies of AI tool calling models and their application, specifically through the lens of Rubra AI and one of its founders, Sanjay Nadhavajhala.
The Challenge with Function Calling in Open Models
As we all know, the world of AI is rapidly evolving, and with the release of GPT Script earlier this year, Acorn Labs made a bold step in relying on OpenAI models to perform function calling. However, the reality was that outside of OpenAI, there were few options, with models like Anthropic offering limited beta features. By summer, even as the open-source community eagerly awaited progress, nothing significant came out to address function calling, prompting Rubra to take the plunge and build their own solution: https://docs.rubra.ai/
Rubra's goal was to create function calling models that could handle complex operations efficiently. Function calling in this context is essentially about giving the model the capability to use specific tools dynamically, expanding its potential far beyond standard interactions. It’s a way to leverage real-time data and make the model not only conversational but also action-oriented.
Diving into the Technical Challenges
But what are the actual technical hurdles here? Sanjay from Rubra AI breaks it down into three significant obstacles: inadequate datasets, the issue of catastrophic forgetting, and the poor distribution of data in base models. He explained that open-source datasets for multi-turn function calling were insufficient—they lacked the complexity needed to make effective function calls, particularly multi-turn ones where the model might need to call several tools in succession based on user feedback.
To address this, Rubra curated its own data, harvesting examples from GPT-4 function executions and creating a set of over 500,000 synthetic examples. They fed this data to the 70-billion parameter models they were training, refining them to learn specific tasks more deeply. This extensive groundwork was crucial because there was often a severe “catastrophic forgetting” issue when trying to integrate function calling. The model would tend to forget previously learned capabilities, and this was particularly difficult to overcome in scenarios involving fine-tuning.
Block Expansion—A Key Technique
One method that stood out in Rubra's journey was the use of block expansion, where specific parts of a model are expanded to allow greater learning capacity while the rest of the model remains frozen. This technique helps prevent forgetting while allowing the new blocks to adapt to specific tasks, such as function calling. This form of fine-tuning enabled them to create a model capable of reliably performing these actions.
Beyond Llama: Using Prototypes and Experiments
Rubra took inspiration from research papers like Llama Pro to refine their function calling models further. The Llama Pro paper outlined strategies for expanding the depth of the network by introducing more blocks. Interestingly, it’s a method that has gained a lot of traction recently. By adding depth to their network, Rubra was able to significantly improve their models without having to reconfigure the entire architecture.
Sanjay mentioned that for training these models, they didn’t need expansive and costly hardware setups. In fact, they used just a few GPUs for training smaller versions of their models, making it much more accessible for smaller research teams or companies to achieve similar results. This point was particularly fascinating because it dispelled the myth that only large companies with vast resources could afford to fine-tune language models. Rubra was able to use consumer-grade GPUs like a 4090 to train the block expansion layers—showing the practicality of their approach.
Use Cases and Observations
So what exactly are people using Rubra models for? The most common applications are in DevOps and technical support scenarios. These models are particularly useful for debugging Kubernetes clusters, where the knowledge of various terminal commands is essential. While GPT-4 is excellent at these tasks, smaller Rubra models can be optimized for DevOps tools like AWS CLI or kubectl to make them more accessible to general developers.
Sanjay also shared an interesting personal project where he explored using function calling for applying git diffs. By feeding a code diff generated by a model into a function, the model could apply the changes directly to the codebase, streamlining workflows for software developers. This kind of augmentation, where routine coding tasks are partially or entirely automated, represents a fascinating evolution in software engineering.
Another significant application of function calling is in information extraction and aggregation—for instance, scraping news articles or financial data and putting it into a database. Rubra is working on enabling smaller models to scrape structured and unstructured data effectively, which has been a major challenge for many companies seeking to leverage large language models for business purposes.
Tool Calling and Compound AI—A Broader Perspective
This brings us to the broader picture of tool calling in AI—how does this concept fit into the larger ecosystem of artificial intelligence models? Tool calling refers to the ability of a model to interact with various other models or tools during its execution. While DevOps is ripe for this kind of implementation, other fields like SQL generation for querying databases are also experimenting with these capabilities. By making these models capable of intelligently choosing tools, we create a more cohesive system where complex tasks can be executed more effectively.
The concept of compound AI—where multiple models work in harmony with each other—represents a possible future for AI systems. However, there’s still debate about whether orchestrating multiple models through a tool-calling approach is the most efficient way to solve real-world problems. There is also growing interest in fusing models together rather than having a single orchestrator, thus moving beyond a centralized authority model and creating more of a collaborative multi-model system.
The Challenges and Evolution of the AI Community
Interestingly, there are also significant challenges in building an active and supportive community around these models. Sanjay mentioned how the AI Foundry’s Discord server and local Llama Reddit community have been instrumental in creating conversations around function calling and tool-calling AI systems. However, these spaces can sometimes feel fragmented and opaque, leading to what he described as “AI’s dark web”—a hidden network where information is passed around in small groups rather than openly shared.
This is an ongoing issue for many in the community, as even promising projects struggle with reaching the right audience. Events like FOSDEM (Free and Open source Software Developers' European Meeting) play an essential role in creating environments where enthusiasts and experts can gather to discuss new techniques, share experiences, and promote innovation. Conferences like these can sometimes be the only opportunity to share advances in function calling, compound AI, or even foundational AI knowledge.
Final Thoughts
So where does this leave us today? Rubra has shown that smaller, fine-tuned models can serve very specialized purposes—from debugging to automating repetitive developer tasks—without needing massive amounts of compute. But more than that, the conversation also revealed the importance of community and how function calling, although technically challenging, is becoming an integral part of how we envision machine learning models in the future. It’s no longer just about creating a model that understands language—it’s about building a model that takes meaningful actions in the world.
If you’re interested in learning more or even experimenting with these models, remember that the barriers are lower than ever. It’s possible to start small, use open-source tools, and participate in discussions with a vibrant online community, whether on Reddit, Discord, or by attending conferences like FOSDEM.
___
AIFoundry.org Team