The Gen AI You Didn’t Know to Ask For
AI’s RAGged Edge
Most AI features being released today follow a familiar query/response rhythm:
- You ask a question (or click for help, or fill out a form…).
- Under the hood, your query gets stirred up with additional info retrieved from a knowledge base or the context of the app.
- An LLM digests the bundle and composes a reply (or message draft, code block, etc…).
In the dev community, this pattern goes by the nickname RAG: Retrieval Augmented Generation. Since Meta researchers coined the acronym in 2020, RAG has emerged as a rallying point for “pick and shovel” products that simplify creation of AI-powered features. RAG tools include integrated solutions like OpenAI’s Assistant API or Amazon’s Bedrock Knowledge Bases, composable APIs such as Google’s Vertex and Microsoft’s Azure AI Search, and open source frameworks like LangChain.
Consensus around RAG is helping LLM-centric development evolve from dark art into practical engineering. But it also risks narrowing our industry’s understanding of AI around the priorities and limitations of one usage pattern. RAG systems are inherently reactive to a user’s query. They speak only when spoken to..
How can LLMs proactively draw our attention to where we need to act?
STAG: Generative, meet Proactive
STAG stands for Stream/Trigger Augmented Generation. STAG systems are proactive. Where RAG exposes curated knowledge to dynamic queries, STAG tracks fixed queries against massive amounts of streaming data. Where RAG amplifies user knowledge, STAG amplifies user vigilance.
- Streams produce events, which may combine structured data (like user identities) or unstructured data, such as message texts, profile fields, or transcriptions.
- Triggers consume sets of events under predefined conditions, for example digesting the latest messages and profile for a user whenever a threshold is reached.
- Each trigger generates a targeted analysis, potentially combining quantitative logic with LLM-generated summaries, which in turn alert users or drive further workflows.
These primitives will look familiar to anyone who has worked with streaming frameworks like Spark or Flink. What’s new is what we can accomplish by integrating LLMs into stream processing: dynamically monitoring mixed-mode, semi-structured data and surfacing insights with detail and qualitative nuance that people can act on.
Opportunity 1: Aggregate and Monitor Unstructured Data. Businesses produce far more documents, meetings, call notes, service tickets, and so on than they do structured records; and the unstructured category is growing faster – over 40% CAGR1! Yet the vast majority (>99%) go unanalyzed – meaning they provide no benefit outside the original interaction.
This is a particularly important oversight since unstructured data captures what our systems fail to automate: user complaints, laborious communication, and external announcements. LLMs offer a new opportunity to process data at massive scale, but with qualitative nuance that goes beyond predefined tagged or reductive sentiment scores.
Opportunity 2: Create Alerts that are Actually Useful. Alerting platforms powered purely by structured data often raise questions rather than answers. Knowing that you have 2x the typical number of bug reports today isn’t helpful, until you investigate to find why and determine a course of action. The cost of investigating an alert must be balanced against the confidence threshold for doing so, meaning that recipients wait longer for proof that something is wrong before investigating.
The STAG pattern can help LLMs propose a “why” behind quantitative changes, and sometimes even propose remedial actions. In the same scenario of increasing bug reports, AI can enumerate key symptoms, draft a memo to engineering, and prepare appropriate “we’re working on it” language to post to customers. These explanations and proposals won’t be perfect – but they are easier to verify and engage with than opaque quantitative data, letting teams consider and act on more alerts, sooner.
Opportunity 3: Bridge Silos. The biggest gap in query-based systems is that users must know which questions to ask. In large organizations, it’s incredibly common for Team A to collect information that would be vital to Team B, but both teams only discover once an opportunity is missed or a conflict surfaced. With STAG, each team may register proactive queries into datasets they don’t control, and be informed of updates they didn’t know to ask about. In our bug report example, a Product team monitoring a recent release could be proactively informed of the relevant bug and fast-track a solution, rather than waiting for unrelated engineering teams to sort out the relationship.
STAG systems can bridge silos within existing org charts and processes, without “breaking them down”. An offshore support team can’t and shouldn’t be trained to recognize specific priorities of each product manager, or each regional GM. But a STAG system can monitor on those stakeholders’ behalf and generate useful summaries.
New Architecture, New Challenges
Building STAG systems poses engineering and design challenges that differ from traditional streaming architectures or RAG. Here are a few we’ve encountered at Frame AI.
Challenge 1: Right-Sizing AI models throughout the stream.
RAG applications benefit from an economic correlation: since they require human attention, they are usually applied to tasks that justify spending on expensive LLM queries. Since STAG applications must monitor data which may or may not be useful, we don’t have the same affordance. Exposing cutting edge, 70B-parameter and above LLMs to every piece of unstructured data in an enterprise isn’t practical – particularly since many applications require multiple queries.
Fortunately, the open source LLM community has made incredible headway on methods for creating cheaper, specialized models suitable to specific tasks. At Frame AI, we’ve found that embedding-based classifiers for tagging data can direct our application of specialized LLMs to get targeted summaries of individual records. Further down the stream, score based criteria are used to trigger more expensive LLM queries that summarize across records for a specific use.
Challenge 2: Eliciting and representing user objectives.
STAG applications alert and potentially interrupt users with data they didn’t know they needed. This is only beneficial if we know what a user needs. The feedback loop of eliciting and calibrating these objectives is the central product design problem in a STAG application.
At Frame AI, we approach this by focusing on existing metrics and processes, then backsolving for how unstructured data monitoring can benefit them. For example, a Product Manager may not know what the typical frequency in usability complaints in a support ticket is, but they would probably like to know if a recent release changed it – and why.
Our experience has been that once users see insights aligned with their existing metrics, it opens them up to conceiving new queries that weren’t formerly possible.
Challenge 3: Avoiding Context Collapse
Data processing systems discard context by design – their whole purpose is to take large quantities of noisy data and produce narrower, better aligned datasets that are easy to act on. But for most STAG applications, it’s essential that an end user be able to explore and verify examples of the underlying data that powered an insight.
At Frame AI, we’ve addressed this by prioritizing traceability: at each point in the stream, generated data is associated with the cone of data that preceded it. Each layer of LLM analysis has access to the qualitative summaries from the preceding layer, allowing qualitative context to bubble forward. This also allows us to power end-user applications where users explore the data underlying each insight.
STAG <3 RAG: Better Together
RAG is popular because it is a simple, effective pattern for human-AI collaboration. STAG doesn’t replace that – on the contrary, STAG systems have the potential to supercharge RAG usage by finding more opportunities _worth _human-AI attention.
In this pattern, a STAG architecture monitors unstructured data and surfaces emerging trends or opportunities. Users engage with those trends and opportunities via additional RAG-style query response. For example:
- A wizard for creating knowledge base documentation can be triggered by a STAG system identifying usability complaints.
- A marketing campaign generator can be triggered by a STAG system identifying trending correlations between certain user traits and intent-to-buy.
- A RAG-based summarization of contracts for a given account could be triggered by STAG identifying a disagreement about terms in a recent renewal call.
We are in a period where many stakeholders are interested in AI, but don’t know where it can be applied. STAG and RAG make a terrific tag team by surfacing reasons to act that are already familiar to them, then enabling actions that are easier than ever before.
Balancing Reactive and Proactive
As businesses began to experiment with LLMs over 2023, one unexpected discovery was that management skills might be as relevant as engineering to AI deployment: with AI, the biggest challenge is properly defining and supervising objectives, rather than designing processes.
In that context, imagine a human organization where there were no standing responsibilities, and instead every task was explicitly delegated by a manager – from the CEO down to fulfillment. It would be an exhausting blizzard of repetitive communication. Instead, effective organizations establish standing roles with their own objectives and processes, that can operate independently, escalate the unexpected, and still respond to strategic requests when called for.
Deployment of AI should follow what we’ve already learned about working with each other. The most effective businesses will combine proactive, autonomous STAG systems that monitor fixed objectives with dynamic, reactive RAG systems that help humans do strategic work.