From OTEL to SLMs: Distilling Frontier Model Behaviour from Production Telemetry

Reynand Wu3 days ago

0 0 6 minutes read

Ben O’Mahony, Principal AI Engineer at Thoughtworks, recently presented a compelling case for leveraging OpenTelemetry (OTEL) and distilled smaller language models (SLMs) to enhance AI agent development and deployment. His presentation, delivered at a prominent industry event, detailed a novel approach to capturing and utilizing production telemetry for continuous improvement of AI tools, moving beyond the limitations of off-the-shelf solutions.

O’Mahony began by invoking John Culkin’s famous paraphrase of Marshall McLuhan: "We shape our tools, and thereafter our tools shape us." This foundational statement underscored his central argument: the design and integration of AI tools have a profound impact not only on workflows but also on the very nature of the individuals using them. He illustrated this with his personal experience of meticulously configuring his Neovim editor, emphasizing the desire for tools that are not just functional but also conducive to personal growth and improved decision-making.

Table of Contents

The Quest for Intelligent Code Assistance

The core of O’Mahony’s presentation revolved around his development of an AI-powered Language Server Protocol (LSP). Traditional LSPs, such as pyright and rust-analyzer, excel at static code analysis, parsing code into abstract syntax trees and applying rule-based checks. While fast, precise, and reliable for identifying syntax errors and type mismatches, they often fall short in understanding semantic intent. They struggle to flag issues like function names that no longer accurately reflect their purpose or suggest more idiomatic error handling patterns.

O’Mahony envisioned an LSP that could proactively offer semantic code analysis, inline hints, and intelligent suggestions, moving beyond the reactive nature of traditional chat interfaces. He aimed for an "agentic" tool that actively participates in the development process. A key challenge he encountered was the inability to deeply customize the AI models powering these suggestions, being limited to simply swapping out different frontier models for potentially better quality. Furthermore, the cost of continuously sending extensive code data to these large models for analysis became a significant concern, particularly for developers working full-time.

OpenTelemetry: The Unseen Backbone of AI Observability

The breakthrough came with the realization that his existing development workflow, powered by OpenTelemetry, was already generating a rich dataset of user interactions. OpenTelemetry, an industry standard for observability, provides distributed tracing capabilities, allowing developers to follow requests across an entire system, service by service. Each step in this journey is recorded as a "span," complete with timing information, metadata, and context.

For generative AI systems, O’Mahony argued, these traces are even more invaluable. OpenTelemetry’s specifications include provisions for capturing events, such as inputs and outputs (user messages, assistant messages, tool calls), and metrics like token usage, operation duration, and time to first token. Crucially, it defines "model spans" and "agent spans," which are composed of various operations like agent creation, workflow execution, chat completion, multi-modal generation, and tool execution.

"Every agent should produce traces, in my opinion," O’Mahony stated emphatically. "Instrument them from day one. Don’t wait until you’re thinking about distillation, just get the telemetry in place. Capture the context, the prompts, all those things, the user action." He highlighted that auto-instrumentation, available through many modern frameworks, could provide approximately 90% of the necessary data with minimal effort.

Distilling Behavior into Smaller, Efficient Models

The core innovation presented by O’Mahony lies in using this rich telemetry data to train smaller, more efficient language models (SLMs). By instrumenting his AI LSP with OpenTelemetry, he was able to capture explicit user feedback through distinct actions:

Accepted Fix: A strong positive signal indicating that a suggested code fix was applied. This signifies not only the identification of an issue but also the user’s agreement with the proposed solution.
Regenerated Suggestion: A weak positive signal, suggesting that the user agreed with the diagnosis of a problem but not the proposed solution, prompting the AI to generate an alternative.
Dismissed Suggestion: A strong negative signal, indicating clear dissatisfaction with the AI’s output.

These user actions, captured as spans within the OpenTelemetry traces, serve as valuable labeled data. This "data flywheel" approach allows for continuous learning. As O’Mahony described, "Every user interaction with your AI system is an opportunity to collect training data." Over time, this data accumulates, revealing patterns in user preferences and AI performance.

The process of distillation involves exporting these OpenTelemetry traces to a compatible backend (such as Logfire, Honeycomb, or Datadog). A custom extraction pipeline can then query these traces, filter for relevant user actions (accepted, regenerated, dismissed), and transform them into a simple JSONL format, where each line represents a training example.

Using frontier models like Gemini, O’Mahony demonstrated how this curated dataset could be used to fine-tune smaller, open-source models in the 7-billion to 13-billion parameter range. These SLMs, already possessing general coding knowledge, are then tuned to the specific codebase, development style, and idiosyncratic needs of the user. The training process, he noted, can take just a few hours, making it significantly more accessible and cost-effective than relying solely on large, proprietary models.

Platform Capabilities and Generalization

O’Mahony stressed that this approach is not a one-off solution but a foundational set of "platform capabilities" that can be generalized across various AI agents. The core pattern involves:

Instrumenting agents with OpenTelemetry: Capturing user actions as training signals.
Distilling model behavior into an SLM: Creating a specialized, efficient model.
Deploying with safety rails: Ensuring responsible AI usage.
Continuous iteration and improvement: Feeding new data back into the distillation loop.

This architecture creates a shared instrumentation layer, a common extraction pipeline, and consistent fine-tuning and model-serving infrastructure. This allows organizations to scale their AI development efforts, enabling multiple teams to leverage the same robust framework. Adding a new agent to this platform becomes a more incremental process, requiring instrumentation, definition of user actions, and mapping feedback signals to the existing infrastructure.

The economic implications are significant. Traditional machine learning often involves expensive data labeling processes, requiring specialized infrastructure and tooling. The data flywheel approach, however, turns product usage itself into the labeling process. Users, or in O’Mahony’s case, the individual developer, become the labelers, generating valuable data as a natural byproduct of their work. This compounds over time: better models lead to more usage, which generates more data, which in turn leads to even better models.

The Future of Developer Tools: Local, Personalized, and Controlled

The presented methodology offers a pathway towards more personalized and controllable AI tools for developers. The distilled SLMs are not only cheaper and faster but can also be deployed locally, addressing concerns about data privacy and intellectual property for organizations that restrict the use of external AI services. While the quality of these distilled models might not match frontier models on highly novel tasks (estimated at 80-85% accuracy in O’Mahony’s experience), they offer a significant improvement in personalization and a reduction in the generation of consistently dismissed suggestions.

O’Mahony posited that this approach leads to a future of "many different intelligences rather than a single AGI," where smaller, specialized models are tailored to individual users, organizations, and specific tasks. This preserves user control over their workflows and interactions with AI. The ability to run these models locally also enhances portability and accessibility, particularly for client projects with strict security protocols.

Key Takeaways for Immediate Application

O’Mahony concluded by offering three concrete takeaways for immediate application:

Instrument all agents with OpenTelemetry: This foundational step is crucial for capturing the necessary data, especially user actions, which provide invaluable feedback long before model distillation is considered. Auto-instrumentation is a powerful tool to achieve this rapidly.
Design user experiences for observable decisions: The subtle but critical aspect of making user decisions observable is paramount. If user interactions with AI cannot be translated into meaningful signals, valuable data is lost. Explicit actions within the user workflow, beyond simple thumbs up/down, offer significantly more powerful feedback.
Embrace distillation when data is sufficient: The process of distilling models is more accessible and cost-effective than often perceived, with tooling rapidly accelerating. Enabling more individuals to understand and implement this process through user-friendly platforms and local deployment options can drive broader AI adoption.

He encouraged practitioners to start small, instrumenting a single agent or workflow, collecting a month’s worth of usage data, and then attempting distillation. The results, he suggested, can be surprisingly impactful, significantly improving personal workflows and demonstrating the tangible benefits of this continuous improvement loop.

The broader implications extend to organizations with numerous internal AI agents, such as customer support bots or code review assistants. By implementing this telemetry-driven approach, each agent can learn from production usage, and crucially, agents can learn from each other. The interactions between different AI agents can be captured and analyzed, creating a truly learning organization where AI tools continuously adapt and improve based on real-world performance. This vision of AI tooling – local, personalized, and continuously improving – represents a significant shift in how developers and organizations can harness the power of artificial intelligence.

The Quest for Intelligent Code Assistance

OpenTelemetry: The Unseen Backbone of AI Observability

Distilling Behavior into Smaller, Efficient Models

Platform Capabilities and Generalization

The Future of Developer Tools: Local, Personalized, and Controlled

Key Takeaways for Immediate Application

Share this:

Related posts:

Reynand Wu

Related Articles

Pretext: Cheng Lou’s 15KB JavaScript Library Revolutionizes Web Text Layout by Eliminating DOM Reflows

Lovable Platform API Vulnerability Exposes User Source Code, Database Credentials, and AI Chat Histories Amidst Broader AI-Dev Stack Security Concerns

Harness Engineering for Coding Agent Users: Building Trust and Quality in AI-Assisted Development

Stop setting up json-server for every prototype. There’s a faster way.

Leave a Reply Cancel reply