How Event Sourcing Makes LLM Fine-Tuning Easier (and Smarter)
When I first learned about Event Sourcing, I absolutely hated it. Why store every sneeze a user ever made when all I care about is their current account balance? I would like to preface this article by stating that this software architecture pattern is not suitable for every application. But here’s the thing: I’m a big fan of using the right tool for the job, and when it comes to fine-tuning a large language model (LLM) based on real-world user feedback, Event Sourcing really shines.
Event Sourcing Primer
Event Sourcing is a software architecture pattern where all changes to a system’s state are recorded as a sequence of immutable events. For example, imagine you’re building a banking application and need to store the user’s current balance. In Event Sourcing, instead of storing “the user’s balance is $42”, you would store a series of events (in this case transactions) like “user deposited $50”, “user bought coffee for $5”, and so on. In this architecture, the source of truth for the user’s balance would be derived by replaying the events instead of being stored and read directly.
Core Principles of Event Sourcing
- State is derived, not stored directly.
- Events are immutable. Every change is recorded forever.
- You can reply events to reconstruct the state at any point in time.
LLM Fine-Tuning Feedback Loops
In any LLM-driven system in production, human feedback is essential in the early stages to identify edge cases, verify accuracy, and submit feedback. But if you’re only storing the final result, you’re throwing away valuable data.
With Event Sourcing, every interaction is captured as part of the stream by design, is timestamped, and tied to a session. This means that you can later ask:
- What inputs resulted in bad LLM-generated responses?
- What were the corrections submitted by humans for these responses?
- What is the accuracy of this LLM-driven task?
This is the kind of data you want when evaluating and fine-tuning your LLM model. The immutable stream of interaction history becomes a goldmine for building smarter feedback loops.
Transforming Events to LLM Fine-Tuning Training Data
Once you have enough events, you can transform them into datasets ready for fine-tuning. Because the event stream is immutable, it can be reprocessed at any time with new logic. Let’s say you decide you care about a different signal; having a complete event stream means you can re-run your transformation pipeline without data loss or guesswork.
When (and When Not) to Use Event Sourcing
The Event Sourcing architecture pattern is both powerful and complex. Make sure you thoroughly evaluate your needs before fully committing. Sometimes, it’s not worth the complexity if you’re building a simple CRUD app.
- Use it when:
- You need reproducibility for LLM fine-tuning
- Your LLM use case requires high accuracy and internal benchmarking
- You need to extract data or signals from unstructured text using LLMs
- Avoid it when:
- Your LLM use case is better suited for Retrieval-Augmented Generation (RAG)
- You’re building a simple CRUD app
I Don’t Hate It Anymore
It turns out that this pattern can be a power tool when used for the right problems. Using LLMs in production often presents challenges that you don’t foresee in happy-path prototyping. It took a few LLM projects for us to learn this the hard way. When every interaction is preserved in a clean, replayable format, it’s kind of like having your own black box flight recorder for LLM fine-tuning.
Need Help With LLM Fine-Tuning?
We’ve done that before. Contact us to see how we can help!