Earlier this month, the team at Nekko.ai, brought together some friends and colleagues for a unique event, our first AI Hack Lab. Collectively, we explored the question, “Can we make developing AI applications more compatible with agile programming?”
Consider joining our next AI Hack Lab:
In numerous interviews with developers and machine learning engineers, we learned that there is a stark contrast between the worlds of application development and training and fine-tuning AI models and prompts. While developers, particularly those practicing Extreme Programming (XP), expected projects to adhere to Test Driven Development (TDD), ML engineers often struggled with basic practices such as version controlling their Jupyter Notebooks. This disparity led to problems such as a lack of reproducibility of results, ad-hoc prompt and model changes degrading the accuracy of AI responses, and a lot of complexity in debugging AI features in production.
“Sources of unreliability of with LLMs: LLMs do not give a correct response as expected, which drives 50% of software engineers nuts. One of them quit after one project.” – Paul Zabelin, Artium.
AI Hack Lab Starting Topics
The day started with asking attendees to bring AI projects they were working on and problems they would like to solve with other colleagues collaboratively. This yielded four talks & demos, including:
How to enable local development of LLMs using Apache DataLab
Sergey Sergyenko starts his talk by telling the story of Llama.cpp and how this made running LLMs on laptops possible. He then shows an open-source tool called Apache DataLab that can potentially allow model developers to fine-tune models in cloud infrastructure and local systems easily. Is it possible to make the training and fine-tuning of large language models easier on local systems, similar to how Llama.cpp makes it easier to run LLMs on local systems?
How to apply test-driven development approaches to prompt engineering
Paul Zabelin shares his company’s experience creating a testing suite for measuring AI response accuracy. In their testing suite, they achieve 99% and better accuracy rates. He demos an AI-driven example app called Apex, which his company, Artium, uses with clients as an inception process for new applications.
How to leverage market forces and incentives to improve quality in open-source software projects and AI.
Don Marti describes his work with Pinfactory which creates a marketplace for fixing bugs in open-source projects. His proposed bug trading system allows users to trade on software bugs through futures contracts, incentivizing developers to fix bugs and providing information about software quality and risk. He proposes utilizing AI to handle trading, and later, this mechanism is proposed to improve AI response quality. A demonstration web application showcases how this bug trading system works. Don asks how we might utilize this approach to automate AI response accuracy testing.
Comparing capabilities of LLMs applied to automating developer experience
Jim Clark from Docker demonstrates an experiment to improve the usability of developer tools by adding automation to IDEs via generative AI. This starts with generating Markdown run books for Github projects with gen AI to document how a project is used and then binding tool commands to shortcuts in the IDE. In his demo, Jim switches between the OpenAI GPT API and local LLMs showing their strengths and weaknesses in different environments. Jim asks, “What is the right form factor to improve productivity for tool developers and make their tools more usable with generative AI?”
After these discussion topics and demos were presented, attendees formed multiple workgroups based on their interests to explore these and related issues. In some cases, attendees moved between teams, and the group discussions segued into adjacent topics. Later in the afternoon, the groups reassembled to report on the fruits of their explorations.
AI Hack Lab Group Readouts
Here is a summary of the read-outs from the various groups.
- 0:00 - 6:48 The first group started by exploring how we might use Don Marti’s work with market mechanisms to improve AI response quality. They then segued into exploring whether there were any market inhibitors preventing more developers from working on open-source models.
- 6:48 - 12:32 The second group took up Paul Zabelin's approach to testing AI response accuracy and considered what might be involved in creating a “repository of practices” for developing AI applications.
- 12:32 - 17:17 Also, from the second group, Paul Zabelin answers the question, “Why are we stuck with OpenAI APIs for building AI apps?”
- 17:17 The last group discussed the state of Gen AI developer tools and noted they only seem best for the most common programming languages and use cases. Can we create AI tools to help all developers, or are we all just going to adopt Javascript because that’s where the best AI coding assistants are available?
Join our next AI Hack Lab!
If you want to join our next community AI Hack Lab, you are in luck! We are accepting registrations for the virtual AI Hack Lab #2 on Saturday July 13, 2024.
Leave a comment: