MLOps: how a model gets shipped and kept healthy
Training a model is the easy part. The hard part is getting it to run reliably for real users — and keeping it working after the world it learned from has moved on. MLOps is the set of practices for that: deploying models, serving them at the right speed and cost, watching them for trouble, and retraining when they slip. Think of it as DevOps for machine learning.
01What MLOps actually is
The AI Governance Charter — establish ownership, scope, and accountability for AI.
Get the charter Browse all templatesYour purchase helps keep our hubs free to read.
Building a model is a bit like cooking a great dish once in your own kitchen. Serving it to a packed restaurant, every night, exactly the same — that's a whole different job. MLOps is that second job. It's the set of practices for reliably deploying and operating machine-learning models in production, and the simplest way to describe it is DevOps for machine learning: the same automate-test-monitor discipline that software teams use to ship code, applied to models. The surprising part is how little of the work is the model itself. In a real production system, the trained model is a small box in a much bigger diagram — most of the effort is the data plumbing, serving, and monitoring around it.
- MLOps is a set of practices and pipelines, not a single tool you install.
- A good offline score is the start, not the finish — operating the model is most of the work.
- The model is one piece; the surrounding system (data, serving, monitoring) is where MLOps lives.
02The model lifecycle, as a loop
A model isn't shipped once and forgotten. It moves through a repeating lifecycle: you prepare data, train a model, evaluate it, package it into something deployable, deploy and serve it, then monitor it in production — and when it slips, you retrain and go around again. Tap each stage to see what it does and why it matters.
Data & train
Prepare the data the model learns from, then train the model on it. This is the part most people picture when they think of machine learning — but in the lifecycle it's just the first step, and everything after it is what MLOps adds.
03Walk one model through its life
Theory is easier to hold onto when you follow a single, concrete case. Below, one illustrative model — an email-triage classifier that sorts incoming support email into categories — is walked through every stage, from training to the moment drift is detected and it has to be retrained. Step forward to see what happens at each stage. This is a teaching example, clearly labelled — not measured results from any real product.
04Two ways to serve: batch vs. real-time
"Deploying" a model means making it answer questions, and there are two main patterns. Batch inference scores a big pile of inputs all at once, usually on a schedule — think of a nightly job that labels millions of records by morning, with no one waiting. Real-time (online) inference answers individual requests on demand, usually behind an API, where each answer must come back fast. The pattern you pick drives the concerns that follow it: an online endpoint is judged on latency (speed per request), throughput (requests handled per second), scaling (coping with load), and cost — which for larger models often means paying for GPUs.
- Batch: score many inputs at once on a schedule; no always-on endpoint, cheaper for bulk jobs.
- Real-time / online: answer single requests on demand behind an API; latency and scaling matter.
- Serving tradeoffs are latency, throughput, scaling, and cost — including GPU usage for big models.
05Keeping it healthy: drift, monitoring & CI/CD
A deployed model can quietly get worse without anyone touching it, because the world it learned from keeps changing. That slow decay is called drift, and it's the reason production models need monitoring — and the reason teams build CI/CD pipelines to redeploy safely. Switch between the three views to see how a model is kept healthy over its life.
Drift — when a model quietly goes stale
Drift is when the live data, or the relationship the model captures, changes over time so the model's accuracy can silently degrade. The danger is that the model keeps returning confident answers while getting steadily less correct — nothing crashes, so nothing obviously signals the problem. Launch-day accuracy is no guarantee of future accuracy.
Monitoring — watching a model in production
Because drift is silent, you monitor a live model the way you'd watch any production system: track its inputs and outputs and alert when something looks off, so degradation is caught before it causes harm. Monitoring doesn't fix the model on its own — it's the early-warning system that tells you when it's time to retrain.
CI/CD & rollback — shipping models safely
CI/CD for ML is like CI/CD for software, with more to manage: it versions and tests not just code but data and models too, and can add continuous training (CT) — automated pipelines that retrain and redeploy. Versioning everything together gives you reproducibility and a rollback path: if a new model version misbehaves in production, you can revert to the last known-good model immediately while you investigate.
06Check your understanding
07Take it with you & go deeper
MLOps — AI Glossary
The concise definition, plus related terms, in the AI Glossary.
Open →Customizing a model: prompt, RAG, or fine-tune
The companion module on how a model is built and adapted — the step before it gets shipped.
Open →Model serving deep dive (latency, batching, GPUs)
How real-time endpoints scale, how request batching and quantization cut cost, and how LLM serving differs from classic models.
Coming soonMonitoring & drift detection in practice
What to watch, how to alert on drift, and how to wire monitoring into an automated retrain-and-redeploy pipeline.
Coming soon→Continue learning
⊕Concept map
A quick map of how this lesson fits together — expand any branch to see its key ideas.
What MLOps actually is
- MLOps is a set of practices and pipelines for reliably running models in production — essentially DevOps for machine learning, not a single tool you install.
- A good offline score is the start, not the finish — operating the model is most of the work.
- The model is one piece; the surrounding system (data, serving, monitoring) is where MLOps lives.
The model lifecycle, as a loop
- A model moves through a repeating cycle: prepare data → train → evaluate → package → deploy and serve → monitor.
- When performance slips, you retrain and go around again — it's a loop, not a straight line.
- Training is just the first step; everything after it is what MLOps adds.
Two ways to serve: batch vs. real-time
- Batch inference — score many inputs at once on a schedule; no always-on endpoint, cheaper for bulk jobs.
- Real-time / online inference — answer single requests on demand behind an API, where each answer must come back fast.
- Serving tradeoffs are latency, throughput, scaling, and cost — including GPU usage for larger models.
Keeping it healthy: drift, monitoring & CI/CD
- Drift — live data shifts away from the training data over time, so accuracy can silently degrade with no crash to warn you.
- Monitoring — track a live model's inputs and outputs and alert when something looks off; it detects, it doesn't repair.
- CI/CD (with continuous training) — version code, data, and models together for reproducibility and a rollback path to the last known-good version.
→Related lessons
Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts and is grounded in the references below; figures shown in the interactives are illustrative and labelled as such.
- Hidden Technical Debt in Machine Learning Systems — Sculley et al., Google (NeurIPS 2015)
- MLOps: Continuous delivery and automation pipelines in ML — Google Cloud Architecture
- The ML Test Score: A Rubric for ML Production Readiness — Breck et al., Google (2017)
- Deploy a model to an online endpoint (Azure ML) — Microsoft Learn
- Batch transform (Amazon SageMaker) — AWS Documentation
- Monitor data drift on deployed models (Azure ML) — Microsoft Learn
MLOps & model deployment — in 5 minutes
Tech Jacks Solutions · AI Knowledge Hub · educational summary
What MLOps is
MLOps is the set of practices for reliably deploying and operating machine-learning models in production — DevOps applied to ML. The trained model is only a small part of a real system; most of the work is the data, serving, and monitoring around it.
The lifecycle (a loop)
Data → train → evaluate → package (bundle the model + dependencies into a versioned artifact) → deploy → serve → monitor → retrain. It loops because monitoring feeds back into retraining over the model's life.
Deployment & serving
Batch inference scores many inputs at once on a schedule (no always-on endpoint). Real-time / online inference answers single requests on demand behind an API. Online serving is judged on latency, throughput, scaling, and cost — including GPU usage for larger models.
Drift & monitoring
Drift is when live data or its relationships change over time so accuracy quietly degrades — the model keeps answering confidently while getting worse. Monitoring watches inputs and outputs to catch this and trigger a retrain. Launch-day accuracy is not permanent.
CI/CD, versioning & rollback
CI/CD for ML versions and tests code + data + model, and can add continuous training (automated retrain & redeploy). Versioning everything gives reproducibility and a rollback path: revert to the last known-good model if a deployment misbehaves. LLM serving adds high-level tricks — KV cache, request batching, quantization.
sources.json for grounding and editorial cautions.