How to Use Amazon SageMaker: Complete Beginner's Guide (2026)
Last verified: May 14, 2026 · Format: Guide · Est. time: 25-35 min
By the end of this guide, you will have a working SageMaker environment, a trained model, and a live inference endpoint. The platform has over 30 components, but you only need about five of them to go from zero to a deployed model. This guide focuses on those five.
Amazon SageMaker is AWS's fully managed machine learning platform, in production since November 2017 with over 250 features shipped since launch. It covers the full ML lifecycle: data preparation, model training, deployment, monitoring, and governance. For a deeper look at SageMaker's architecture and how it compares to AWS Bedrock, read the full breakdown. This guide is the practical follow-up: step-by-step instructions to build your first project.
What You Need Before Starting
SageMaker runs entirely within the AWS ecosystem. Before you open a notebook, verify these items are in place. Missing IAM permissions are the number-one reason first-time users get stuck at step one.
AmazonSageMakerFullAccess policy attached to your IAM user or role. For production, scope down to least-privilege.- ✓Step 1: Set Up SageMaker Studio
- ✓Step 2: Create Your First Notebook
- ✓Step 3: Load and Prepare Data
- ✓Step 4: Train a Model
- ✓Step 5: Use JumpStart Foundation Models
- ✓Step 6: Deploy to an Endpoint
- ✓Step 7: Monitor with Model Monitor
- ✓Step 8: Cost Optimization
Step 1: Setting Up SageMaker Studio
SageMaker Studio is the web-based IDE where you will do most of your work. It provides JupyterLab notebooks, terminal access, experiment tracking, and model deployment tooling in a single browser tab. As of March 2025, Unified Studio (GA) consolidates data processing, SQL analytics, and ML development into one environment.
- Sign in to the AWS Management Console and navigate to Amazon SageMaker.
- In the left sidebar, click Studio (or Unified Studio if you see the new navigation).
- If this is your first time, SageMaker prompts you to create a SageMaker Domain. Choose Quick setup for a single-user domain. This creates an IAM execution role automatically.
- Select a VPC configuration. For learning, the default VPC is fine. For production, use a private VPC with no public internet access.
- Click Submit. Domain creation takes 3-5 minutes.
- Once the domain status shows InService, click Open Studio to launch the IDE.
Verification: You should see the Studio home screen with launcher tiles for JupyterLab, Canvas, and other tools. If you get an access denied error, check that your IAM user has the AmazonSageMakerFullAccess policy attached. The domain takes a few minutes to provision on first launch.
Step 2: Creating Your First Notebook
Notebooks are where you write Python code to interact with SageMaker's APIs. Studio provides managed JupyterLab with pre-installed ML libraries and direct access to AWS services.
- From the Studio home screen, click JupyterLab (or Open Launcher then select a notebook).
- Choose an instance type. For experimentation, start with
ml.t3.medium(covered by free tier: 250 hours for 2 months). Do not select a GPU instance unless you need one. - Select a kernel. Choose
Python 3 (Data Science 3.0)for general ML work. This includes pandas, NumPy, scikit-learn, and the SageMaker Python SDK pre-installed. - A new notebook opens. Test connectivity with a quick cell:
import sagemaker
print(sagemaker.Session().default_bucket())
print(sagemaker.get_execution_role())
This confirms your SageMaker session is working and prints your default S3 bucket and IAM execution role ARN. These two values are used in every SageMaker operation.
Verification: Both print statements should return values without errors. The bucket name follows the pattern sagemaker-{region}-{account-id}. The role ARN starts with arn:aws:iam::. If either fails, your domain's execution role may lack S3 permissions.
Step 3: Loading and Preparing Data
SageMaker reads training data from Amazon S3 and writes model artifacts back to S3. The pattern is always: prepare your data locally (or in a notebook), upload to S3, then point your training job at the S3 path.
Upload Data to S3
- In your notebook, load a dataset. For this guide, use a built-in sample:
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris(as_frame=True)
df = iris.frame
df.to_csv('iris.csv', index=False)
- Upload the CSV to your default S3 bucket:
import sagemaker
session = sagemaker.Session()
input_data = session.upload_data('iris.csv', key_prefix='sagemaker-guide/data')
print(f'Data uploaded to: {input_data}')
Data Wrangler (Optional)
For larger datasets or complex transformations, SageMaker Data Wrangler provides a visual interface for data cleaning, feature engineering, and transformation flows. It connects to over 50 data sources and costs $0.24 per DPU-hour.
Verification: The upload_data call should return an S3 URI like s3://sagemaker-us-east-1-123456789012/sagemaker-guide/data/iris.csv. Verify the file exists by running !aws s3 ls {input_data} in a notebook cell.
Step 4: Training a Model
SageMaker manages the training infrastructure for you. You specify the algorithm, the instance type, and the S3 paths for input and output. SageMaker provisions the compute, runs the training, saves the model artifact to S3, and terminates the instances. You pay only for the seconds your training job ran.
Using a Built-In Algorithm (XGBoost)
- Specify the training configuration:
import sagemaker
from sagemaker import image_uris
session = sagemaker.Session()
role = sagemaker.get_execution_role()
region = session.boto_region_name
# Get the XGBoost container URI
container = image_uris.retrieve('xgboost', region, version='1.7-1')
xgb = sagemaker.estimator.Estimator(
container,
role,
instance_count=1,
instance_type='ml.m5.xlarge',
output_path=f's3://{session.default_bucket()}/sagemaker-guide/output',
sagemaker_session=session
)
- Set hyperparameters and start training:
xgb.set_hyperparameters(
objective='multi:softmax',
num_class=3,
num_round=100
)
xgb.fit({'train': input_data})
Training on the Iris dataset with ml.m5.xlarge ($0.269/hr) takes under 5 minutes. The .fit() call blocks until training completes, showing real-time logs in your notebook.
Spot Training (Up to 90% Savings)
For longer training jobs, add use_spot_instances=True and set a max_wait time. SageMaker uses spare AWS GPU capacity at discounted rates and handles automatic checkpointing for interruption recovery.
Verification: When training completes, the output shows Training job status: Completed. Check xgb.model_data to confirm the model artifact S3 path. In the SageMaker Console, navigate to Training > Training Jobs to see the job details including billable seconds and instance utilization.
use_spot_instances=True and max_wait to your Estimator for automatic checkpointing and interruption recovery.Step 5: Using JumpStart Foundation Models
JumpStart is SageMaker's model marketplace. It hosts over 1,000 pre-trained models from Meta (Llama), Mistral, DeepSeek, Google (Gemma), Microsoft (Phi), Hugging Face, and others. Instead of training from scratch, you can deploy a foundation model with a single API call or fine-tune it on your domain-specific data.
Deploy a Pre-Trained Model
- In Studio, click JumpStart in the left sidebar (or navigate to Home > JumpStart).
- Browse or search for a model. For example, search for "Meta Llama" or "Mistral."
- Click a model card to see deployment options, hardware requirements, and estimated costs.
- Click Deploy. JumpStart provisions an endpoint with the appropriate GPU instance and optimized inference container.
Deploy via SDK
from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(model_id='huggingface-text2text-flan-t5-base')
predictor = model.deploy()
This deploys the Flan-T5 Base model to a real-time endpoint. The deploy call takes 5-10 minutes as it provisions the instance and loads the model weights. As of April 2026, JumpStart Optimized Deployments support four optimization targets: latency, throughput, cost, and accuracy.
Fine-Tuning
JumpStart models include built-in fine-tuning workflows. Prepare your training data in the format specified in the model card (usually JSONL), upload to S3, and call model.fit() with your dataset. Fine-tuning Llama 3 70B requires ml.g5.48xlarge instances or larger.
Verification: After deployment completes, test the endpoint: predictor.predict({"inputs": "Summarize machine learning in one sentence."}). You should receive a model response within seconds. In the Console, check Inference > Endpoints to confirm the endpoint status is InService.
Step 6: Deploying to an Endpoint
Deployment connects your trained model to a URL that applications can call for predictions. SageMaker offers four inference modes. Pick based on your traffic pattern and latency requirements.
Real-Time Endpoints (Most Common)
- Deploy the model you trained in Step 4:
predictor = xgb.deploy(
initial_instance_count=1,
instance_type='ml.m5.xlarge',
serializer=sagemaker.serializers.CSVSerializer()
)
- Send a test prediction:
result = predictor.predict('5.1,3.5,1.4,0.2')
print(result) # Returns predicted class
Real-time endpoints auto-scale based on traffic. Configure scaling policies through the Console or Application Auto Scaling API.
Serverless Endpoints (Scale to Zero)
For variable or low-traffic models, serverless inference eliminates the cost of idle endpoints. Pricing starts at $0.00004 per second at 2GB memory. The trade-off: cold starts take several seconds when the endpoint spins up from zero.
from sagemaker.serverless import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=2048,
max_concurrency=5
)
predictor = xgb.deploy(serverless_inference_config=serverless_config)
Batch Transform (Offline Processing)
For large datasets that do not need real-time responses, batch transform processes an entire S3 dataset and writes predictions to an output S3 path. No persistent endpoint required.
Async Inference (Long-Running Predictions)
Queues requests for models that take over 60 seconds per prediction (large language models, complex image processing). Sends an SNS notification when results are ready.
Verification: For real-time endpoints, the predictor.predict() call should return a result within 1-2 seconds. In the Console, confirm the endpoint status is InService under Inference > Endpoints. For serverless endpoints, the first call after idle will be slower (cold start).
Step 7: Monitoring with SageMaker Model Monitor
A deployed model is not a finished product. Data distributions change (data drift), model accuracy degrades, and edge cases emerge in production. Model Monitor detects these issues before they affect business outcomes.
What Model Monitor Tracks
- Data quality: Detects when incoming data drifts from the training distribution (new categories, shifted ranges, null values)
- Model quality: Compares predictions against ground truth labels (when available) to track accuracy, precision, recall
- Bias drift: Uses SageMaker Clarify to detect fairness metric changes across protected groups over time. For content-level safety controls on foundation models, see Bedrock Guardrails.
- Feature attribution drift: Monitors which features drive predictions and flags when feature importance shifts unexpectedly
Set Up a Monitoring Schedule
- Enable data capture on your endpoint to log incoming requests and predictions to S3.
- Create a baseline from your training data. Model Monitor uses this baseline to detect distribution shifts.
- Configure a monitoring schedule (hourly or daily). Model Monitor runs a processing job on each interval, comparing current traffic against the baseline.
- Set up CloudWatch alarms to alert your team when violations exceed thresholds.
Verification: After enabling data capture, send several test requests to your endpoint. Then check S3 for captured data files at the path you configured. In the Console, navigate to Inference > Model Monitor to confirm the schedule is active and the baseline has been created.
Step 8: Cost Optimization
SageMaker bills per-second for compute. Without cost controls, a single forgotten GPU endpoint can generate hundreds of dollars in charges overnight. These strategies keep costs predictable.
Instance Selection
- Notebooks: Use
ml.t3.mediumfor exploration ($0.058/hr, free tier eligible). Switch to GPU only when training. - Training: Start with
ml.m5.xlargefor tabular data ($0.269/hr). Useml.g4dn.xlarge($0.7364/hr, 1x T4 GPU) for small deep learning models. - Inference: Right-size based on traffic. A
ml.c5.xlarge($0.204/hr) handles most tabular model inference. Reserve GPU instances only for large language models or computer vision.
Spot Instances (Up to 90% Off)
Managed Spot Training uses spare AWS capacity at steep discounts. SageMaker handles checkpointing automatically so training resumes after interruptions. Add two parameters to your Estimator:
xgb = sagemaker.estimator.Estimator(
...,
use_spot_instances=True,
max_wait=7200 # Max seconds to wait for spot capacity
)
Auto-Scaling
Configure auto-scaling on real-time endpoints so instances scale down during low-traffic periods. Set a minimum instance count of 1 (or use serverless endpoints for scale-to-zero).
ML Savings Plans
For predictable workloads, AWS ML Savings Plans offer up to 64% savings with 1-3 year commitments. These apply across SageMaker instance families, so you are not locked to a specific instance type.
Shutdown Checklist
- Delete or stop endpoints you are not actively using:
predictor.delete_endpoint() - Stop notebook instances when not in use (Studio auto-stops after configurable idle time)
- Set up AWS Budgets alerts at 50%, 80%, and 100% of your monthly threshold
- Review the SageMaker Cost Explorer view for unused resources weekly
Verification: In the AWS Console, navigate to Billing > Budgets and confirm alerts are configured. Under SageMaker > Inference > Endpoints, verify no unexpected endpoints are running. A clean account should show zero active endpoints after you complete this guide and clean up.
A running ml.g5.24xlarge endpoint costs over $240 per day. Always delete endpoints after testing with predictor.delete_endpoint() and set up AWS Budgets alerts at 50%, 80%, and 100% thresholds.
New AWS accounts default to 0 GPU instances for SageMaker. Training jobs and deployments fail with "ResourceLimitExceeded" until you request a quota increase through Service Quotas, which takes 1-3 business days.
SageMaker requires permissions on your console user AND the execution role used by training jobs. Missing S3 access on the execution role is the most common cause of training failures even when console access works fine.
Serverless inference endpoints scale to zero, but cold starts take several seconds when traffic resumes. For latency-sensitive applications, use real-time endpoints with auto-scaling instead.
Troubleshooting and FAQ
AmazonSageMakerFullAccess managed policy as a starting point. For S3 data access, you also need s3:GetObject and s3:PutObject on your training data bucket. Check the SageMaker execution role (the role used by training jobs) separately from your console user role.predictor.delete_endpoint(). For production, configure auto-scaling with a minimum of 1 instance rather than leaving oversized instances running 24/7.Next Step
Build a production-grade pipeline. Take the model you trained in Step 4, register it in SageMaker Model Registry, and create a SageMaker Pipeline that automates the entire workflow: data processing, training, evaluation, and conditional deployment. This is the bridge between experimentation and repeatable ML in production. For a deeper understanding of SageMaker's full architecture, read the What Is Amazon SageMaker breakdown.
Amazon SageMaker, AWS, Amazon Web Services, S3, JumpStart, Bedrock, and related marks are trademarks of Amazon.com, Inc. or its affiliates. This article is not affiliated with or endorsed by Amazon.