Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Amazon Web Services

How to Use Amazon SageMaker: Complete Beginner's Guide (2026)

Last verified: May 14, 2026  ·  Format: Guide  ·  Est. time: 25-35 min

1,000+
Pre-trained models in JumpStart (Meta, Mistral, Google, Hugging Face)
Source: AWS SageMaker JumpStart docs
$0
Free tier: 250hr notebooks + 50hr training (2 months)
Source: AWS Free Tier, May 2026
Up to 90%
Cost savings with Managed Spot Training on spare capacity
Source: AWS SageMaker Pricing
4 Modes
Inference options: real-time, serverless, batch, and async
Source: AWS SageMaker documentation

By the end of this guide, you will have a working SageMaker environment, a trained model, and a live inference endpoint. The platform has over 30 components, but you only need about five of them to go from zero to a deployed model. This guide focuses on those five.

Amazon SageMaker is AWS's fully managed machine learning platform, in production since November 2017 with over 250 features shipped since launch. It covers the full ML lifecycle: data preparation, model training, deployment, monitoring, and governance. For a deeper look at SageMaker's architecture and how it compares to AWS Bedrock, read the full breakdown. This guide is the practical follow-up: step-by-step instructions to build your first project.

What You Need Before Starting

SageMaker runs entirely within the AWS ecosystem. Before you open a notebook, verify these items are in place. Missing IAM permissions are the number-one reason first-time users get stuck at step one.

Prerequisites Checklist
AWS account with billing enabled (credit card on file). Free tier covers initial experimentation.
IAM permissions for SageMaker: AmazonSageMakerFullAccess policy attached to your IAM user or role. For production, scope down to least-privilege.
S3 bucket for training data and model artifacts. SageMaker reads input data from S3 and writes outputs to S3.
Billing alerts configured in AWS Budgets. GPU instances bill per-second. A forgotten ml.g5.24xlarge costs over $240/day.
Python and AWS SDK knowledge (basic). Familiarity with pandas and boto3 helps but is not strictly required for JumpStart workflows.
Optional: a dataset ready to use. SageMaker includes sample datasets for learning, so you can start without your own data.
0 of 6 complete
Guide Progress
0 of 8 steps complete
  • Step 1: Set Up SageMaker Studio
  • Step 2: Create Your First Notebook
  • Step 3: Load and Prepare Data
  • Step 4: Train a Model
  • Step 5: Use JumpStart Foundation Models
  • Step 6: Deploy to an Endpoint
  • Step 7: Monitor with Model Monitor
  • Step 8: Cost Optimization

Step 1: Setting Up SageMaker Studio

SageMaker Studio is the web-based IDE where you will do most of your work. It provides JupyterLab notebooks, terminal access, experiment tracking, and model deployment tooling in a single browser tab. As of March 2025, Unified Studio (GA) consolidates data processing, SQL analytics, and ML development into one environment.

  1. Sign in to the AWS Management Console and navigate to Amazon SageMaker.
  2. In the left sidebar, click Studio (or Unified Studio if you see the new navigation).
  3. If this is your first time, SageMaker prompts you to create a SageMaker Domain. Choose Quick setup for a single-user domain. This creates an IAM execution role automatically.
  4. Select a VPC configuration. For learning, the default VPC is fine. For production, use a private VPC with no public internet access.
  5. Click Submit. Domain creation takes 3-5 minutes.
  6. Once the domain status shows InService, click Open Studio to launch the IDE.

Verification: You should see the Studio home screen with launcher tiles for JupyterLab, Canvas, and other tools. If you get an access denied error, check that your IAM user has the AmazonSageMakerFullAccess policy attached. The domain takes a few minutes to provision on first launch.

250 Hours Free
ml.t3.medium notebook time included in the 2-month AWS Free Tier, plus 50 hours of ml.m5.xlarge training and 125 hours of inference
Source: AWS Free Tier, verified May 2026

Step 2: Creating Your First Notebook

Notebooks are where you write Python code to interact with SageMaker's APIs. Studio provides managed JupyterLab with pre-installed ML libraries and direct access to AWS services.

  1. From the Studio home screen, click JupyterLab (or Open Launcher then select a notebook).
  2. Choose an instance type. For experimentation, start with ml.t3.medium (covered by free tier: 250 hours for 2 months). Do not select a GPU instance unless you need one.
  3. Select a kernel. Choose Python 3 (Data Science 3.0) for general ML work. This includes pandas, NumPy, scikit-learn, and the SageMaker Python SDK pre-installed.
  4. A new notebook opens. Test connectivity with a quick cell:
import sagemaker
print(sagemaker.Session().default_bucket())
print(sagemaker.get_execution_role())

This confirms your SageMaker session is working and prints your default S3 bucket and IAM execution role ARN. These two values are used in every SageMaker operation.

Verification: Both print statements should return values without errors. The bucket name follows the pattern sagemaker-{region}-{account-id}. The role ARN starts with arn:aws:iam::. If either fails, your domain's execution role may lack S3 permissions.

Step 3: Loading and Preparing Data

SageMaker reads training data from Amazon S3 and writes model artifacts back to S3. The pattern is always: prepare your data locally (or in a notebook), upload to S3, then point your training job at the S3 path.

Upload Data to S3

  1. In your notebook, load a dataset. For this guide, use a built-in sample:
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris(as_frame=True)
df = iris.frame
df.to_csv('iris.csv', index=False)
  1. Upload the CSV to your default S3 bucket:
import sagemaker
session = sagemaker.Session()
input_data = session.upload_data('iris.csv', key_prefix='sagemaker-guide/data')
print(f'Data uploaded to: {input_data}')

Data Wrangler (Optional)

For larger datasets or complex transformations, SageMaker Data Wrangler provides a visual interface for data cleaning, feature engineering, and transformation flows. It connects to over 50 data sources and costs $0.24 per DPU-hour.

Verification: The upload_data call should return an S3 URI like s3://sagemaker-us-east-1-123456789012/sagemaker-guide/data/iris.csv. Verify the file exists by running !aws s3 ls {input_data} in a notebook cell.

Step 4: Training a Model

SageMaker manages the training infrastructure for you. You specify the algorithm, the instance type, and the S3 paths for input and output. SageMaker provisions the compute, runs the training, saves the model artifact to S3, and terminates the instances. You pay only for the seconds your training job ran.

Using a Built-In Algorithm (XGBoost)

  1. Specify the training configuration:
import sagemaker
from sagemaker import image_uris

session = sagemaker.Session()
role = sagemaker.get_execution_role()
region = session.boto_region_name

# Get the XGBoost container URI
container = image_uris.retrieve('xgboost', region, version='1.7-1')

xgb = sagemaker.estimator.Estimator(
  container,
  role,
  instance_count=1,
  instance_type='ml.m5.xlarge',
  output_path=f's3://{session.default_bucket()}/sagemaker-guide/output',
  sagemaker_session=session
)
  1. Set hyperparameters and start training:
xgb.set_hyperparameters(
  objective='multi:softmax',
  num_class=3,
  num_round=100
)

xgb.fit({'train': input_data})

Training on the Iris dataset with ml.m5.xlarge ($0.269/hr) takes under 5 minutes. The .fit() call blocks until training completes, showing real-time logs in your notebook.

Spot Training (Up to 90% Savings)

For longer training jobs, add use_spot_instances=True and set a max_wait time. SageMaker uses spare AWS GPU capacity at discounted rates and handles automatic checkpointing for interruption recovery.

Verification: When training completes, the output shows Training job status: Completed. Check xgb.model_data to confirm the model artifact S3 path. In the SageMaker Console, navigate to Training > Training Jobs to see the job details including billable seconds and instance utilization.

Up to 90%
Savings with Managed Spot Training using spare AWS GPU capacity. Add use_spot_instances=True and max_wait to your Estimator for automatic checkpointing and interruption recovery.
Source: AWS SageMaker Pricing, verified May 2026

Step 5: Using JumpStart Foundation Models

JumpStart is SageMaker's model marketplace. It hosts over 1,000 pre-trained models from Meta (Llama), Mistral, DeepSeek, Google (Gemma), Microsoft (Phi), Hugging Face, and others. Instead of training from scratch, you can deploy a foundation model with a single API call or fine-tune it on your domain-specific data.

Deploy a Pre-Trained Model

  1. In Studio, click JumpStart in the left sidebar (or navigate to Home > JumpStart).
  2. Browse or search for a model. For example, search for "Meta Llama" or "Mistral."
  3. Click a model card to see deployment options, hardware requirements, and estimated costs.
  4. Click Deploy. JumpStart provisions an endpoint with the appropriate GPU instance and optimized inference container.

Deploy via SDK

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id='huggingface-text2text-flan-t5-base')
predictor = model.deploy()

This deploys the Flan-T5 Base model to a real-time endpoint. The deploy call takes 5-10 minutes as it provisions the instance and loads the model weights. As of April 2026, JumpStart Optimized Deployments support four optimization targets: latency, throughput, cost, and accuracy.

Fine-Tuning

JumpStart models include built-in fine-tuning workflows. Prepare your training data in the format specified in the model card (usually JSONL), upload to S3, and call model.fit() with your dataset. Fine-tuning Llama 3 70B requires ml.g5.48xlarge instances or larger.

Verification: After deployment completes, test the endpoint: predictor.predict({"inputs": "Summarize machine learning in one sentence."}). You should receive a model response within seconds. In the Console, check Inference > Endpoints to confirm the endpoint status is InService.

4 Optimization Targets
JumpStart Optimized Deployments balance latency, throughput, cost, and accuracy per model endpoint as of April 2026
Source: AWS SageMaker JumpStart, April 2026

Step 6: Deploying to an Endpoint

Deployment connects your trained model to a URL that applications can call for predictions. SageMaker offers four inference modes. Pick based on your traffic pattern and latency requirements.

Real-Time Endpoints (Most Common)

  1. Deploy the model you trained in Step 4:
predictor = xgb.deploy(
  initial_instance_count=1,
  instance_type='ml.m5.xlarge',
  serializer=sagemaker.serializers.CSVSerializer()
)
  1. Send a test prediction:
result = predictor.predict('5.1,3.5,1.4,0.2')
print(result) # Returns predicted class

Real-time endpoints auto-scale based on traffic. Configure scaling policies through the Console or Application Auto Scaling API.

Serverless Endpoints (Scale to Zero)

For variable or low-traffic models, serverless inference eliminates the cost of idle endpoints. Pricing starts at $0.00004 per second at 2GB memory. The trade-off: cold starts take several seconds when the endpoint spins up from zero.

from sagemaker.serverless import ServerlessInferenceConfig

serverless_config = ServerlessInferenceConfig(
  memory_size_in_mb=2048,
  max_concurrency=5
)
predictor = xgb.deploy(serverless_inference_config=serverless_config)

Batch Transform (Offline Processing)

For large datasets that do not need real-time responses, batch transform processes an entire S3 dataset and writes predictions to an output S3 path. No persistent endpoint required.

Async Inference (Long-Running Predictions)

Queues requests for models that take over 60 seconds per prediction (large language models, complex image processing). Sends an SNS notification when results are ready.

Verification: For real-time endpoints, the predictor.predict() call should return a result within 1-2 seconds. In the Console, confirm the endpoint status is InService under Inference > Endpoints. For serverless endpoints, the first call after idle will be slower (cold start).

Step 7: Monitoring with SageMaker Model Monitor

A deployed model is not a finished product. Data distributions change (data drift), model accuracy degrades, and edge cases emerge in production. Model Monitor detects these issues before they affect business outcomes.

What Model Monitor Tracks

  • Data quality: Detects when incoming data drifts from the training distribution (new categories, shifted ranges, null values)
  • Model quality: Compares predictions against ground truth labels (when available) to track accuracy, precision, recall
  • Bias drift: Uses SageMaker Clarify to detect fairness metric changes across protected groups over time. For content-level safety controls on foundation models, see Bedrock Guardrails.
  • Feature attribution drift: Monitors which features drive predictions and flags when feature importance shifts unexpectedly

Set Up a Monitoring Schedule

  1. Enable data capture on your endpoint to log incoming requests and predictions to S3.
  2. Create a baseline from your training data. Model Monitor uses this baseline to detect distribution shifts.
  3. Configure a monitoring schedule (hourly or daily). Model Monitor runs a processing job on each interval, comparing current traffic against the baseline.
  4. Set up CloudWatch alarms to alert your team when violations exceed thresholds.

Verification: After enabling data capture, send several test requests to your endpoint. Then check S3 for captured data files at the path you configured. In the Console, navigate to Inference > Model Monitor to confirm the schedule is active and the baseline has been created.

Step 8: Cost Optimization

SageMaker bills per-second for compute. Without cost controls, a single forgotten GPU endpoint can generate hundreds of dollars in charges overnight. These strategies keep costs predictable.

Instance Selection

  • Notebooks: Use ml.t3.medium for exploration ($0.058/hr, free tier eligible). Switch to GPU only when training.
  • Training: Start with ml.m5.xlarge for tabular data ($0.269/hr). Use ml.g4dn.xlarge ($0.7364/hr, 1x T4 GPU) for small deep learning models.
  • Inference: Right-size based on traffic. A ml.c5.xlarge ($0.204/hr) handles most tabular model inference. Reserve GPU instances only for large language models or computer vision.

Spot Instances (Up to 90% Off)

Managed Spot Training uses spare AWS capacity at steep discounts. SageMaker handles checkpointing automatically so training resumes after interruptions. Add two parameters to your Estimator:

xgb = sagemaker.estimator.Estimator(
  ...,
  use_spot_instances=True,
  max_wait=7200 # Max seconds to wait for spot capacity
)

Auto-Scaling

Configure auto-scaling on real-time endpoints so instances scale down during low-traffic periods. Set a minimum instance count of 1 (or use serverless endpoints for scale-to-zero).

ML Savings Plans

For predictable workloads, AWS ML Savings Plans offer up to 64% savings with 1-3 year commitments. These apply across SageMaker instance families, so you are not locked to a specific instance type.

Shutdown Checklist

  • Delete or stop endpoints you are not actively using: predictor.delete_endpoint()
  • Stop notebook instances when not in use (Studio auto-stops after configurable idle time)
  • Set up AWS Budgets alerts at 50%, 80%, and 100% of your monthly threshold
  • Review the SageMaker Cost Explorer view for unused resources weekly

Verification: In the AWS Console, navigate to Billing > Budgets and confirm alerts are configured. Under SageMaker > Inference > Endpoints, verify no unexpected endpoints are running. A clean account should show zero active endpoints after you complete this guide and clean up.

Common Pitfalls to Watch For
Forgotten GPU Endpoints

A running ml.g5.24xlarge endpoint costs over $240 per day. Always delete endpoints after testing with predictor.delete_endpoint() and set up AWS Budgets alerts at 50%, 80%, and 100% thresholds.

GPU Quota Limits

New AWS accounts default to 0 GPU instances for SageMaker. Training jobs and deployments fail with "ResourceLimitExceeded" until you request a quota increase through Service Quotas, which takes 1-3 business days.

IAM Permission Gaps

SageMaker requires permissions on your console user AND the execution role used by training jobs. Missing S3 access on the execution role is the most common cause of training failures even when console access works fine.

Serverless Cold Starts

Serverless inference endpoints scale to zero, but cold starts take several seconds when traffic resumes. For latency-sensitive applications, use real-time endpoints with auto-scaling instead.


Troubleshooting and FAQ

Common Questions
Your IAM user or role is missing required permissions. Attach the AmazonSageMakerFullAccess managed policy as a starting point. For S3 data access, you also need s3:GetObject and s3:PutObject on your training data bucket. Check the SageMaker execution role (the role used by training jobs) separately from your console user role.
AWS accounts have default service quotas for SageMaker instance types. New accounts often have a limit of 0 for GPU instances. Go to Service Quotas in the AWS Console, find SageMaker, and request a quota increase for the instance type you need. GPU quota increases typically take 1-3 business days to approve.
Use SageMaker when you need to train custom models, control the training infrastructure, or run a full MLOps pipeline. Use Bedrock when you want to call pre-trained foundation models (Claude, Llama, Titan) through APIs without managing infrastructure. Many teams use both: training in SageMaker and serving through Bedrock. SageMaker charges per compute-hour; Bedrock charges per API call or per token.
Three defenses: (1) Set up AWS Budgets with alerts at 50%, 80%, 100% of your monthly target. (2) Use serverless endpoints for development, which scale to zero when idle. (3) Delete endpoints immediately after testing with predictor.delete_endpoint(). For production, configure auto-scaling with a minimum of 1 instance rather than leaving oversized instances running 24/7.
Yes, through Canvas and JumpStart. Canvas provides a no-code visual interface for building ML models, designed for business analysts. JumpStart offers over 1,000 pre-trained models with one-click deployment and built-in fine-tuning workflows. For custom training and production MLOps, ML engineering experience is expected.
SageMaker supports PyTorch, TensorFlow, MXNet, Scikit-learn, Keras, and Horovod natively through pre-built Docker containers with GPU drivers, CUDA, and distributed training libraries pre-configured. You can also bring any framework by packaging it in a custom Docker container.

Next Step

Build a production-grade pipeline. Take the model you trained in Step 4, register it in SageMaker Model Registry, and create a SageMaker Pipeline that automates the entire workflow: data processing, training, evaluation, and conditional deployment. This is the bridge between experimentation and repeatable ML in production. For a deeper understanding of SageMaker's full architecture, read the What Is Amazon SageMaker breakdown.


Amazon SageMaker, AWS, Amazon Web Services, S3, JumpStart, Bedrock, and related marks are trademarks of Amazon.com, Inc. or its affiliates. This article is not affiliated with or endorsed by Amazon.