Learning lesson

Track 03 · Applied & Agentic Intermediate ~8 min

MLOps: how a model gets shipped and kept healthy

Training a model is the easy part. The hard part is getting it to run reliably for real users — and keeping it working after the world it learned from has moved on. MLOps is the set of practices for that: deploying models, serving them at the right speed and cost, watching them for trouble, and retraining when they slip. Think of it as DevOps for machine learning.

Module progress

01What MLOps actually is

Building a model is a bit like cooking a great dish once in your own kitchen. Serving it to a packed restaurant, every night, exactly the same — that's a whole different job. MLOps is that second job. It's the set of practices for reliably deploying and operating machine-learning models in production, and the simplest way to describe it is DevOps for machine learning: the same automate-test-monitor discipline that software teams use to ship code, applied to models. The surprising part is how little of the work is the model itself. In a real production system, the trained model is a small box in a much bigger diagram — most of the effort is the data plumbing, serving, and monitoring around it.

MLOps is a set of practices and pipelines, not a single tool you install.
A good offline score is the start, not the finish — operating the model is most of the work.
The model is one piece; the surrounding system (data, serving, monitoring) is where MLOps lives.

02The model lifecycle, as a loop

A model isn't shipped once and forgotten. It moves through a repeating lifecycle: you prepare data, train a model, evaluate it, package it into something deployable, deploy and serve it, then monitor it in production — and when it slips, you retrain and go around again. Tap each stage to see what it does and why it matters.

ExploreTap a stage

A repeating loop, not a straight line

Data & trainbuild it

Evaluate & packagecheck & bundle

Deploy & serveship it

Monitor & retrainkeep it healthy

Stage 1 — build it

Data & train

Prepare the data the model learns from, then train the model on it. This is the part most people picture when they think of machine learning — but in the lifecycle it's just the first step, and everything after it is what MLOps adds.

03Walk one model through its life

Theory is easier to hold onto when you follow a single, concrete case. Below, one illustrative model — an email-triage classifier that sorts incoming support email into categories — is walked through every stage, from training to the moment drift is detected and it has to be retrained. Step forward to see what happens at each stage. This is a teaching example, clearly labelled — not measured results from any real product.

InteractiveStep through the lifecycle

Lifecycle stage

04Two ways to serve: batch vs. real-time

"Deploying" a model means making it answer questions, and there are two main patterns. Batch inference scores a big pile of inputs all at once, usually on a schedule — think of a nightly job that labels millions of records by morning, with no one waiting. Real-time (online) inference answers individual requests on demand, usually behind an API, where each answer must come back fast. The pattern you pick drives the concerns that follow it: an online endpoint is judged on latency (speed per request), throughput (requests handled per second), scaling (coping with load), and cost — which for larger models often means paying for GPUs.

Batch: score many inputs at once on a schedule; no always-on endpoint, cheaper for bulk jobs.
Real-time / online: answer single requests on demand behind an API; latency and scaling matter.
Serving tradeoffs are latency, throughput, scaling, and cost — including GPU usage for big models.

05Keeping it healthy: drift, monitoring & CI/CD

A deployed model can quietly get worse without anyone touching it, because the world it learned from keeps changing. That slow decay is called drift, and it's the reason production models need monitoring — and the reason teams build CI/CD pipelines to redeploy safely. Switch between the three views to see how a model is kept healthy over its life.

ExploreSwitch view

Drift — when a model quietly goes stale

Drift is when the live data, or the relationship the model captures, changes over time so the model's accuracy can silently degrade. The danger is that the model keeps returning confident answers while getting steadily less correct — nothing crashes, so nothing obviously signals the problem. Launch-day accuracy is no guarantee of future accuracy.

what live data shifts away from the training data over time

symptom confident answers, quietly worse — no crash to warn you

fix detect it with monitoring, then retrain on fresh data

Monitoring — watching a model in production

Because drift is silent, you monitor a live model the way you'd watch any production system: track its inputs and outputs and alert when something looks off, so degradation is caught before it causes harm. Monitoring doesn't fix the model on its own — it's the early-warning system that tells you when it's time to retrain.

watch inputs and outputs of the live model

alert when behaviour drifts from what's expected

then trigger a retrain — monitoring detects, doesn't repair

CI/CD & rollback — shipping models safely

CI/CD for ML is like CI/CD for software, with more to manage: it versions and tests not just code but data and models too, and can add continuous training (CT) — automated pipelines that retrain and redeploy. Versioning everything together gives you reproducibility and a rollback path: if a new model version misbehaves in production, you can revert to the last known-good model immediately while you investigate.

versions code + data + model, together, for reproducibility

continuous training automated retrain & redeploy pipelines

rollback revert to the last known-good version when a deploy goes bad

06Check your understanding

TJS Quiz

07Take it with you & go deeper

"MLOps & model deployment" — one-page summary

The whole module distilled to a printable cheat-sheet.

▸ Already on the site — go deeper

Glossary

MLOps — AI Glossary

The concise definition, plus related terms, in the AI Glossary.

Open →

Learning lesson

Customizing a model: prompt, RAG, or fine-tune

The companion module on how a model is built and adapted — the step before it gets shipped.

Open →

▸ Coming next — deeper progression

Coming soon

Model serving deep dive (latency, batching, GPUs)

How real-time endpoints scale, how request batching and quantization cut cost, and how LLM serving differs from classic models.

Coming soon

Monitoring & drift detection in practice

What to watch, how to alert on drift, and how to wire monitoring into an automated retrain-and-redeploy pipeline.

Coming soon

→Continue learning

⊕Concept map

A quick map of how this lesson fits together — expand any branch to see its key ideas.

What MLOps actually is

MLOps is a set of practices and pipelines for reliably running models in production — essentially DevOps for machine learning, not a single tool you install.
A good offline score is the start, not the finish — operating the model is most of the work.
The model is one piece; the surrounding system (data, serving, monitoring) is where MLOps lives.

The model lifecycle, as a loop

A model moves through a repeating cycle: prepare data → train → evaluate → package → deploy and serve → monitor.
When performance slips, you retrain and go around again — it's a loop, not a straight line.
Training is just the first step; everything after it is what MLOps adds.

Two ways to serve: batch vs. real-time

Batch inference — score many inputs at once on a schedule; no always-on endpoint, cheaper for bulk jobs.
Real-time / online inference — answer single requests on demand behind an API, where each answer must come back fast.
Serving tradeoffs are latency, throughput, scaling, and cost — including GPU usage for larger models.

Keeping it healthy: drift, monitoring & CI/CD

Drift — live data shifts away from the training data over time, so accuracy can silently degrade with no crash to warn you.
Monitoring — track a live model's inputs and outputs and alert when something looks off; it detects, it doesn't repair.
CI/CD (with continuous training) — version code, data, and models together for reproducibility and a rollback path to the last known-good version.

Sources & review

Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts and is grounded in the references below; figures shown in the interactives are illustrative and labelled as such.

Hidden Technical Debt in Machine Learning Systems — Sculley et al., Google (NeurIPS 2015)
MLOps: Continuous delivery and automation pipelines in ML — Google Cloud Architecture
The ML Test Score: A Rubric for ML Production Readiness — Breck et al., Google (2017)
Deploy a model to an online endpoint (Azure ML) — Microsoft Learn
Batch transform (Amazon SageMaker) — AWS Documentation
Monitor data drift on deployed models (Azure ML) — Microsoft Learn

MLOps & model deployment — in 5 minutes

Tech Jacks Solutions · AI Knowledge Hub · educational summary

What MLOps is

MLOps is the set of practices for reliably deploying and operating machine-learning models in production — DevOps applied to ML. The trained model is only a small part of a real system; most of the work is the data, serving, and monitoring around it.

The lifecycle (a loop)

Data → train → evaluate → package (bundle the model + dependencies into a versioned artifact) → deploy → serve → monitor → retrain. It loops because monitoring feeds back into retraining over the model's life.

Deployment & serving

Batch inference scores many inputs at once on a schedule (no always-on endpoint). Real-time / online inference answers single requests on demand behind an API. Online serving is judged on latency, throughput, scaling, and cost — including GPU usage for larger models.

Drift & monitoring

Drift is when live data or its relationships change over time so accuracy quietly degrades — the model keeps answering confidently while getting worse. Monitoring watches inputs and outputs to catch this and trigger a retrain. Launch-day accuracy is not permanent.

CI/CD, versioning & rollback

CI/CD for ML versions and tests code + data + model, and can add continuous training (automated retrain & redeploy). Versioning everything gives reproducibility and a rollback path: revert to the last known-good model if a deployment misbehaves. LLM serving adds high-level tricks — KV cache, request batching, quantization.

Before you act on AI output. This is an educational module. AI systems can produce plausible-sounding but incorrect guidance. For decisions that carry real consequences — security, legal, financial, medical, or compliance — verify with a qualified professional before acting. The deployment-lifecycle example here (an email-triage classifier) is an illustrative teaching scenario, not measured results or claims about any specific product or vendor. External links are provided for learning and may change; confirm against the official source. See sources.json for grounding and editorial cautions.

Gallery

Contacts

MLOps: how a model gets shipped and kept healthy

01What MLOps actually is