Pydantic: Data Validation That Spans MLOps and LLMOps
Start writing data science code that holds up in production
👨💻 Welcome to the Engineering Skills for Data Scientists series, where I'll be teaching the engineering skills that quietly separate data scientists who can ship from the ones who can't. This is article 1 of 5. You can find the full series here.
You’ve probably seen this too…
Data scientists are shipping more code into production than ever before.
On one side, the classic ML lane: pipelines, model serving, feature stores.
On the other, the LLM lane: agents, structured outputs, tool calling, retrieval systems.
Both are growing fast, and more data scientists are expected to work in at least one of them.
Of all the challenges these two lanes share, one of the most underrated is the simplest: data showing up in shapes your code wasn’t ready for.
A column that changed type upstream.
An LLM that returned a field your code didn’t expect.
A request payload missing a key.
None of these are rare failures. They’re the ones that quietly cause the most pain in production, and most data scientists don’t have a systematic way to handle them.
And to be fair, unless you come from a CS background, you probably never learned it, and you didn’t need to, until now.
Pydantic is the Python library that makes it cheap to learn, and learning this engineering habit will pay off in everything you ship from here on.
In this article, I’ll show you how to get started
Let’s get to it!
Here’s what we’ll cover
Why “boundary problems” are the silent killers in production
What Pydantic actually does
Where Pydantic fits in LLMOps (the schema-driven approach to LLM outputs)
Where Pydantic fits in MLOps (validating requests at your model endpoint)\
🎥 Try Pydantic on your own (a short video + Colab notebook)
Why this skill compounds across everything you ship
Why your data validation falls short
Here’s a pattern that shows up in most production code written by data scientists:
Data comes in, gets passed through several transformations, and somewhere deep inside the pipeline (or worse, inside the model itself), something breaks.
A missing key (
KeyError).A value that came in as the wrong type (a number stored as a string, a float where you expected an integer).
A silent data type conversion that doesn't crash but produces wrong numbers downstream.
The instinct most data scientists have is to add defensive checks: an assert here, a try/except there, a quick if "field" in data somewhere. It feels like good engineering, but honestly, it falls short.

The problem isn’t that you didn’t add enough checks. The problem is that the checks live in the wrong place. They’re scattered throughout the code, which means:
You validate the same thing in multiple places (or worse, inconsistently)
When something does fail, you find out three steps deep into your pipeline instead of at the entry point
Bad data has already corrupted your state by the time you notice
The fix isn’t more checks. It’s moving validation to the boundary: the moment data enters your code, before anything else touches it. Define the shape you expect, validate against it once, and from that point on, you can trust what you’re working with.
That’s schema-driven design. And it applies whether the data is coming from an LLM, a feature store, an API request, or a row in a database.
Two half-measures that don’t actually solve the problem
If you’ve ever tried to do this, you’ve probably reached for one of two tools that look like they solve it. Neither actually does.
Plain dicts: no schema, no safety
# Whatever the LLM returns, you just hope it's right
ticket = {
"title": "Login broken",
"priority": "high",
"user_id": 42
}
# Nothing stops this from happening:
ticket = {
"titel": "Login broken", # typo
"priority": "super urgent", # invalid value
"user_id": "forty-two" # wrong type
}
# Code downstream will break, silently or loudly.You parse whatever comes in into a dictionary and hope it’s right. There’s no enforcement at all. A typo’d key, an invalid value, or a wrong type all flow through silently until something downstream breaks.
TypedDict: annotation only, no runtime check
from typing import TypedDict, Literal
class Ticket(TypedDict):
title: str
priority: Literal["low", "medium", "high"]
user_id: int
# IDE/type-checker warns you. Python at runtime does not.
ticket: Ticket = {
"title": "Login broken",
"priority": "super urgent", # type checker complains, runtime doesn't
"user_id": "forty-two" # same, runtime accepts it
}TypedDict lets you describe the shape with type hints. It feels safer because your IDE will warn you about mistakes you make in code you wrote yourself. But when the code actually runs, Python doesn’t enforce it. If the data coming in doesn’t match the shape, TypedDict won’t catch it.
💡 What both of these miss is the part that matters: validation has to actually happen when the code runs, not just when you’re editing it. That’s the part that’s hard to get right by hand, and it’s what makes a real data validation library worth the small upfront cost.
What Pydantic actually is
Pydantic is the most popular data validation library in Python.
The idea is simple: you describe the shape of the data you're expecting, and Pydantic checks every piece of incoming data against that description. If something doesn't match, you get a clear error right away, before the bad data has a chance to break anything downstream.
You describe the shape using type hints, the same : str, : int, : float annotations you’ve probably already seen in Python code. Pydantic just makes those annotations enforced when your code actually runs, instead of being suggestions your IDE shows you.
Here's what that looks like:
from pydantic import BaseModel
from typing import Literal
class Ticket(BaseModel):
title: str
priority: Literal["low", "medium", "high"]
user_id: intThat class is now a contract. Anything that claims to be a Ticket must have a string title, a priority that's one of three values, and an integer user_id. Pass in anything that doesn't match, and Pydantic raises an error pointing at exactly what's wrong.
You describe what good data looks like, and Pydantic tells you when something doesn't fit.
But now, let’s see how we leverage it across different use cases.
How to use Pydantic in LLMOps
Whenever you ask an LLM to return data your code is going to use, you’re stepping into the same boundary problem we just talked about.
Say you’re using an LLM to tag support tickets with a priority level. You want it to return something like:
{"title": "Login broken", "priority": "high", "user_id": 42}Most of the time it will. But not always. Sometimes the model might:
Drop a required field
Add an extra one you didn’t ask for
Return
"42"(a string) where you expected42(a number)Pick a priority value you didn’t define (like
"super urgent"instead of"low","medium", or"high")Misspell a key (
"titel"instead of"title")
In development, with a few test prompts, you might never see it happen. In production, with thousands of requests, you absolutely will.
This is where Pydantic earns its place. You reuse the same Ticket model from earlier and let it validate every LLM response before your code touches it:
from pydantic import ValidationError
raw = call_llm_to_tag_ticket(message) # returns a dict
try:
ticket = Ticket(**raw)
# safe to use: ticket.title, ticket.priority, ticket.user_id
except ValidationError as e:
# log it, retry the call, or fall back gracefully
handle_bad_response(e)The downstream code never sees malformed data.
Either it gets a valid Ticket object it can trust, or it gets a clear exception pointing at exactly what was wrong with the LLM's response.
The bonus: Pydantic also tells the LLM what shape to return
Here’s the part that earns Pydantic its place in LLMOps specifically.
The same class Ticket class you use to validate the response can be turned into a JSON Schema: a description of the shape, written in a format that LLM providers like OpenAI, Anthropic, and Google understand.
{
"title": "Ticket",
"type": "object",
"properties": {
"title": { "type": "string" },
"priority": { "type": "string", "enum": ["low", "medium", "high"] },
"user_id": { "type": "integer" }
},
"required": ["title", "priority", "user_id"]
}If you’ve worked with OpenAI function calling, Anthropic tool use, or any provider’s structured outputs feature, you’ve already used JSON Schema, even if you didn’t call it that. It’s the format these APIs use to tell the model what shape to return.
You send that schema to the model along with your prompt, and the provider constrains the output to match it. So Pydantic ends up working on both sides:
On the way out: it tells the LLM provider exactly what shape to enforce.
On the way back: it validates the response before your code uses it.
One source of truth. Enforced from both directions.
💡 The pattern: anywhere an LLM’s output meets your code, define the expected shape explicitly, and validate against it. Don’t trust the model to follow instructions perfectly. It won’t.
How to use Pydantic in MLOps
The same habit applies on the other side of your work: classic ML in production.
The most common place data scientists hit this is at the model serving endpoint. You’ve trained a model. You wrap it in FastAPI (or similar). A client sends a prediction request, your endpoint passes the features to the model, and you return a prediction.
The naive version looks something like this:
@app.post("/predict")
def predict(request: dict):
features = [request["age"], request["income"], request["tenure"]]
prediction = model.predict([features])
return {"prediction": prediction[0]}This works until it doesn’t.
A client sends age as a string instead of a number. A field is missing. A new field is added that your model wasn’t trained on. Each of these will either crash deep inside your model code, or worse, produce a silently wrong prediction because Python coerced something it shouldn’t have.
With Pydantic, the contract sits at the endpoint:
from pydantic import BaseModel
class PredictionRequest(BaseModel):
age: int
income: float
tenure: int
@app.post("/predict")
def predict(request: PredictionRequest):
features = [request.age, request.income, request.tenure]
prediction = model.predict([features])
return {"prediction": prediction[0]}Now, any request that doesn’t match the contract is rejected before your model ever sees it. The client gets a clear 422 Unprocessable Entity with the specific field that failed validation. Your model code stays clean because it only ever runs on validated inputs.
💡 By the way, FastAPI integrates with Pydantic natively, so this isn’t extra work. You’re already writing the function signature. Adding the type just turns the parameter into a validated contract.
The same pattern applies anywhere data enters your code:
Pipeline step receiving features from upstream
Configuration loaded from a YAML file
Data read from a feature store before being passed to a model
Request payloads at any API endpoint
Wherever there’s a boundary, there’s a contract worth enforcing.
Try Pydantic on your own
Reading about Pydantic gets you halfway there. The other half is feeling what it’s like when validation actually catches something for you.
I put together a short Google Colab notebook with a few examples to work through. You’ll see Pydantic validate good input, catch bad input, and generate a JSON Schema you could send to an LLM provider. Nothing heavy, maybe 15 minutes.
If you’d like a guided walkthrough, I also recorded a short video covering the same examples.
💡 Once you’ve done the notebook, pick one script you’ve already written that takes data from somewhere you don’t control (an API, an LLM, a CSV, a feature store) and add a
BaseModelat the entry point.That’s the real test to make sure if this lesson stuck.
Final thoughts
The reason Pydantic is worth learning isn’t that it's a popular library (though it has over 28K stars on GitHub). It’s because the habit it teaches you applies to everything you ship.
Once you start thinking in contracts, your code looks different:
You stop writing scattered asserts and start defining shapes at function boundaries
You stop debugging mysterious downstream failures and start seeing errors at the entry point where they belong
You stop trusting “it worked in dev” and start designing for the failure modes you can’t predict
This is the engineering habit that quietly separates data scientists who can ship reliably from the ones who can’t. It’s not a glamorous skill. It doesn’t show up in interview questions. But it’s the difference between a model in production that silently fails for a week and one that fails loudly the first time something is off.
💡 If you’re earlier in your career: don’t worry about every Pydantic feature (there are many). Start with the one move: define a
BaseModelfor any data that crosses into your code from somewhere you don’t control. That single habit will compound over the rest of your career.
A couple of other great resources:
🎥 Want to follow along on YouTube? I just launched a channel for data scientists. Subscribe, first video drops soon.
🤝 Want to connect? Find me on LinkedIn, where I share more on data science careers and the AI shift.
Thank you for reading! And stay tuned for next week's article in the Engineering Skills for Data Scientists series: Testing & Logging.
- Andres Vourakis
Before you go, please hit the like ❤️ button at the bottom of this email to help support me. It truly makes a difference!



