Welcome to our "Opening AI" podcast, where we literally open up the world of AI and delve into the challenges and intricacies of deploying large language models (LLMs) in production. Today, we have Avi Deitcher as our guest speaker, who will be walking us through the process of building systems for production, discussing everything from model deployment to inference, and the challenges of bringing LLMs into the real world.
Avi breaks down the concept into three layers, each with its distinct challenges and considerations. Let's explore these layers in detail.
Layer 1: The Scientist's Perspective
At the foundational level, Avi describes the work of the scientist. This is where the core model is created and trained, involving decisions about parameters, weights, and overall model structure. The focus here is on whether the model itself can run on specific hardware—whether that be a GPU, CPU, or specialized chip. Scientists are typically concerned with making the model work and optimizing it to leverage the hardware available. They ensure that the model—once created—can be operational on a technical level, but their interest largely stops there. It's about making sure the model is technically capable of running on the given system, addressing what can be described as the intrinsic characteristics of the model.
Layer 2: The Engineer's Perspective
Moving up a layer, we reach what Avi calls the software engineer's perspective. Engineers at this level are more interested in how to use existing models rather than creating them. They need tools to load and run these models efficiently. This layer is subdivided further into two categories. The first is related to managing the model—loading it into memory and making it work with the hardware resources available. This is where tools like PyTorch or Llama.cpp come into play. PyTorch, for example, is a library that allows engineers to create, train, and use models, while Llama.cpp is a standalone software that can take models and run them directly. The second part concerns managing multiple models and making the whole system easy to operate without needing to be overly involved in the minutiae of handling individual models. Engineers need robust tooling that can simplify this management, allowing them to focus on using LLMs effectively rather than constantly dealing with the complexities of model deployment.
Layer 3: Productionization - Extrinsic Considerations
The third layer is where Avi introduces the concept of productionization. This term might sound a bit pretentious, but it has significant value. Here, it's about making sure that a system—whether it be a model, software, or an entire deployment environment—is ready for "prime time." This layer encompasses what Avi refers to as extrinsic characteristics: everything that surrounds the model to ensure it is production-ready. Productionization includes aspects like compliance, monitoring, security, resource allocation, deployment processes, and the provenance of data used to train or fine-tune models. For instance, an organization must have a clear idea of the data used to train a model, especially in compliance-heavy industries, and understand how to track, test, and secure that model in production environments. This layer often involves the intersection of engineering and operations teams to manage deployment pipelines, ensure security protocols are in place, and maintain system reliability.
These layers are not isolated; each layer depends on the one below and influences the one above. As Avi pointed out, whether you are a scientist developing a model or an operations professional trying to deploy it, there are layers above and below you that have concerns and responsibilities that must be considered for the entire system to function effectively. Understanding these interdependent layers is crucial for anyone involved in building AI systems for production.
__
AIFoundry.org Team