Perplexity API & Developer Guide: Search-Grounded AI From Setup to Production (2026)
Last verified: May 7, 2026 · Format: Guide · Est. time: 20-25 min
Perplexity's API gives you programmatic access to something no other major AI provider offers natively: search-grounded responses with inline citations. Every answer the Sonar models return comes with URLs pointing to the web sources that informed the response. If you have built applications that chain an LLM call with a separate search API, then parse and inject results into context, Perplexity collapses that entire pipeline into a single API call.
The Sonar model family spans four tiers: a lightweight model for quick factual lookups, a pro model for complex multi-source queries, a reasoning model with chain-of-thought for analytical tasks, and a deep research model that conducts exhaustive multi-step searches. This guide walks through every step from API key generation to production deployment, covering search-grounded completions, structured JSON output, domain filtering, and the cost and rate limit decisions that determine whether your integration runs efficiently at scale.
What You Need Before Starting
Perplexity's Sonar API is accessed through their developer console at console.perplexity.ai. You will need an account, an API key, and either the OpenAI SDK (which works out of the box) or direct REST calls. The Sonar API follows the OpenAI chat completions format with Bearer token authentication at the base URL https://api.perplexity.ai.
pip install openai) for SDK compatibility- ✓Step 1: Get API Key & Authenticate
- ✓Step 2: First Search-Grounded Request
- ✓Step 3: Choose the Right Model
- ✓Step 4: Search Filters & Domain Control
- ✓Step 5: Structured JSON Output
- ✓Step 6: Agent API
- ✓Step 7: Streaming
- ✓Step 8: Deploy to Production
Step 1: Get Your API Key and Configure Authentication
Start at console.perplexity.ai. Create an account if you do not have one, then navigate to the API Keys section. Generate a new key, give it a descriptive name, and copy it immediately. You will not see the full key again after leaving the page.
Set the environment variable
macOS / Linux:
export PERPLEXITY_API_KEY="your_api_key_here"
Windows (PowerShell):
$env:PERPLEXITY_API_KEY = "your_api_key_here"
For persistence, add the export to your shell profile (.bashrc, .zshrc) or use a .env file with a library like python-dotenv.
Authentication format
All API requests use Bearer token authentication via the Authorization header:
Authorization: Bearer YOUR_API_KEY
The Sonar API base URL is https://api.perplexity.ai. The chat completions endpoint follows the OpenAI-compatible format at POST /v1/chat/completions.
Verification: Run curl -H "Authorization: Bearer $PERPLEXITY_API_KEY" https://api.perplexity.ai/v1/models to confirm your key is valid. A 401 error means the key is invalid or not set correctly.
Step 2: Send Your First Search-Grounded Request
The Sonar API's chat completions endpoint accepts messages in the same format as OpenAI's API. The key difference: every response includes a citations array with URLs to the web sources that grounded the answer.
cURL example
curl -X POST \
"https://api.perplexity.ai/v1/chat/completions" \
-H "Authorization: Bearer $PERPLEXITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sonar",
"messages": [
{
"role": "system",
"content": "You are a technical research "
"assistant. Be precise and "
"cite sources."
},
{
"role": "user",
"content": "What are the current rate "
"limits for the OpenAI API?"
}
]
}'
Python example (using OpenAI SDK)
Because the Sonar API is OpenAI-compatible, you can use the OpenAI Python SDK by changing the base URL:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["PERPLEXITY_API_KEY"],
base_url="https://api.perplexity.ai"
)
response = client.chat.completions.create(
model="sonar",
messages=[
{
"role": "user",
"content": "What are the latest changes "
"to the Python packaging "
"ecosystem?"
}
]
)
print(response.choices[0].message.content)
# Response includes inline citations [1], [2]
# Citations array in response metadata
Understanding citations
The response includes a citations field: an array of URLs that the model referenced when generating its answer. The response text contains numbered references (e.g., [1], [2]) that map to positions in this array. This is the core differentiator: you get verifiable, source-attributed answers without building a separate retrieval pipeline.
Verification: You should see a response with numbered citations in the text and a citations array in the response object. If citations are missing, ensure you are using a sonar model, not a third-party model via the Agent API.
Step 3: Choose the Right Sonar Model
According to Perplexity, the Sonar family includes four models, each targeting a different complexity and cost profile.
| Model | API Name | Best For | Input / 1M | Output / 1M |
|---|---|---|---|---|
| Sonar | sonar |
Quick factual queries, topic summaries | $1 | $1 |
| Sonar Pro | sonar-pro |
Complex multi-source queries, follow-ups | $3 | $15 |
| Sonar Reasoning Pro | sonar-reasoning-pro |
Multi-step analysis, chain-of-thought | $2 | $8 |
| Sonar Deep Research | sonar-deep-research |
Exhaustive research reports, synthesis | $2 | $8 |
Source: Perplexity API Pricing (May 2026)
Beyond token costs, each model incurs per-request fees based on search context size (Low, Medium, or High). Sonar starts at $5 per 1,000 requests (Low context) and scales to $12 (High context). Sonar Pro ranges from $6 to $14 per 1,000 requests. According to Perplexity, Sonar Pro also supports Pro Search modes (fast, pro, auto) with higher per-request fees ($14-$22 per 1,000 requests).
Decision heuristic: Start with sonar for prototyping and simple lookups. Upgrade to sonar-pro when you need multi-source synthesis or follow-up conversation support. Use sonar-reasoning-pro for analytical tasks requiring step-by-step logic. Reserve sonar-deep-research for comprehensive research reports where thoroughness matters more than latency.
Verification: Test the same prompt across all four models and compare response quality, citation count, and latency. The cost difference between sonar ($1/M input) and sonar-pro ($3/M input, $15/M output) is significant at scale.
Step 4: Control Search Behavior With Filters
The Sonar API provides parameters that control how the underlying search operates, which directly affects the quality and relevance of grounded responses.
Domain filtering
Restrict searches to specific domains using search_domain_filter:
response = client.chat.completions.create(
model="sonar",
messages=[
{
"role": "user",
"content": "What are the latest "
"Kubernetes security "
"best practices?"
}
],
extra_body={
"search_domain_filter": [
"kubernetes.io",
"cncf.io",
"github.com"
]
}
)
This ensures citations only come from the specified domains, useful for compliance, medical, legal, or technical documentation use cases.
Recency filtering
Control how recent the search results should be with search_recency_filter:
response = client.chat.completions.create(
model="sonar-pro",
messages=[
{
"role": "user",
"content": "What AI regulations were "
"announced this week?"
}
],
extra_body={
"search_recency_filter": "week"
}
)
Accepted values include hour, day, week, and month. For rapidly evolving topics like AI news, security advisories, or market data, recency filtering prevents stale information from contaminating responses.
Search context size
The search_context_size parameter controls how much search context the model ingests. Options are low, medium, and high. Higher context means more thorough grounding but increases per-request cost. Use low for simple factual lookups and high for research-grade queries.
Related questions and images
Two additional parameters expand what comes back in the response:
return_related_questions: Returns suggested follow-up queries, useful for building conversational research interfacesreturn_images: Includes relevant images in the response, useful for visual content aggregation
Verification: Run the same query with and without domain filtering. Compare the citations arrays to confirm filtering is working. Restricted queries should only return URLs from the specified domains.
Step 5: Request Structured JSON Output
For applications that need machine-parseable responses, the Sonar API supports structured output via the response_format parameter with JSON Schema.
response = client.chat.completions.create(
model="sonar-pro",
messages=[
{
"role": "user",
"content": "Compare the pricing of "
"the top 3 cloud GPU "
"providers for A100 "
"instances."
}
],
extra_body={
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "gpu_pricing",
"schema": {
"type": "object",
"properties": {
"providers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"hourly_rate": {
"type": "number"
},
"gpu_model": {
"type": "string"
}
},
"required": [
"name",
"hourly_rate",
"gpu_model"
]
}
}
},
"required": ["providers"]
}
}
}
}
)
According to Perplexity, schema names must be 1-64 alphanumeric characters, and new schemas may experience 10-30 second delays on first use due to preparation. Responses match the specified format unless the output exceeds max_tokens.
One important caveat: according to Perplexity, requesting links as part of a JSON response may not work reliably. For verifiable URLs, use the citations field from the response metadata rather than asking the model to embed URLs in the JSON output.
Verification: Parse the response with json.loads(). Validate against your schema. If parsing fails, check that max_tokens is high enough for the complete JSON output.
Step 6: Use the Agent API for Multi-Provider Workflows
The Agent API (POST https://api.perplexity.ai/v1/agent) offers a higher-level interface that supports multiple AI providers, built-in tool use, and multi-step research workflows.
import requests
import os
api_key = os.environ["PERPLEXITY_API_KEY"]
response = requests.post(
"https://api.perplexity.ai/v1/agent",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"input": "Research the current state "
"of WebAssembly adoption in "
"production environments.",
"preset": "pro-search",
"tools": [
{
"type": "web_search",
"config": {
"filters": {
"domains": [
"github.com",
"stackoverflow.com",
"developer.mozilla.org"
]
}
}
}
],
"reasoning": {
"effort": "high"
},
"max_steps": 5
}
)
The Agent API supports three presets: fast-search for quick lookups, pro-search for thorough multi-source queries, and deep-research for exhaustive investigations with up to 10 research loop iterations.
Tool costs on the Agent API are separate from model token costs: web_search costs $0.005 per invocation and fetch_url costs $0.0005 per invocation, according to Perplexity's pricing page.
Key Agent API features
- Model fallback chains: The
modelsparameter accepts up to 5 models for automatic failover if your primary model is unavailable - Reasoning effort: Configurable at
low,medium, orhighvia thereasoningobject - Custom functions: Define your own tools alongside built-in
web_search,finance_search, andfetch_url - Cost tracking: The response
usageobject includes acostbreakdown in USD covering input, output, cache operations, and tool invocations
Verification: Check the response status field. It should be completed. If it shows failed, inspect the error object for details. Monitor the usage object's cost breakdown to track per-request spend.
Step 7: Handle Streaming and Real-Time Output
For chat interfaces or long-form generation, enable streaming to receive tokens as they are generated:
response = client.chat.completions.create(
model="sonar",
messages=[
{
"role": "user",
"content": "Explain the differences "
"between gRPC and "
"REST APIs."
}
],
stream=True
)
for chunk in response:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content,
end="", flush=True)
The Agent API supports streaming via Server-Sent Events (SSE) with typed event discriminators:
response.output_text.deltafor incremental text outputresponse.reasoning.search_queriesfor search activity visibilityresponse.reasoning.search_resultsfor incoming search dataresponse.completedfor final status
Verification: You should see tokens appear incrementally in your terminal. If the stream hangs, check your network connection and ensure your HTTP client supports SSE.
Step 8: Deploy to Production
Moving from prototype to production requires attention to rate limits, error handling, and cost control.
Rate limits by tier
According to Perplexity, accounts progress through six tiers based on cumulative spending:
| Tier | Spend Threshold | QPS | Requests/Min |
|---|---|---|---|
| 0 | $0 | 1 | 50 |
| 1 | $50+ | 3 | 150 |
| 2 | $250+ | 8 | 500 |
| 3 | $500+ | 17 | 1,000 |
| 4 | $1,000+ | 33 | 2,000 |
| 5 | $5,000+ | 33 | 2,000 |
Source: Perplexity Rate Limits (May 2026)
Sonar Deep Research has significantly lower limits: 5 RPM at Tier 0, scaling to 100 RPM at Tier 5. The rate limiting system uses a leaky bucket algorithm that permits burst capacity while enforcing sustained rate control. Plan your architecture accordingly if deep research is a core workflow.
Error handling
import time
def query_perplexity(prompt, max_retries=3):
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="sonar",
messages=[
{"role": "user",
"content": prompt}
]
)
return response.choices[0].message.content
except Exception as e:
if "429" in str(e):
wait = 2 ** attempt
time.sleep(wait)
continue
elif "401" in str(e):
raise ValueError(
"Invalid API key"
)
else:
raise
raise RuntimeError(
"Max retries exceeded"
)
Cost optimization checklist
- Use
sonarfor simple lookups ($1/M tokens input) and reservesonar-pro($3/M input, $15/M output) for complex queries - Set
search_context_sizetolowwhen full grounding is not critical to reduce per-request fees - Set
max_tokensto the minimum needed for your use case - Cache responses for repeated queries to avoid redundant search costs
- Monitor usage via the Perplexity console dashboard
- Use the Agent API's
costfield in responseusageto track per-request spend
Verification: Deploy to a staging environment and run 50-100 test requests. Monitor response times, error rates, and per-request costs. Set up alerts for 429 (rate limit) and 500 (server error) responses.
Troubleshooting and FAQ
echo $PERPLEXITY_API_KEY (macOS/Linux) or $env:PERPLEXITY_API_KEY (PowerShell). Generate a new key at console.perplexity.ai if needed. Keys are shown only once at creation time.sonar, sonar-pro, sonar-reasoning-pro, sonar-deep-research). Third-party models accessed through the Agent API do not return search-grounded citations. Also verify your request targets /v1/chat/completions, not the Agent endpoint.max_tokens. According to Perplexity, responses match the specified schema unless output exceeds the token limit. Also note that new schemas may take 10-30 seconds on first use due to preparation. Add format hints to your prompt for better adherence.search_recency_filter with day or week to restrict results to recent sources. Without this parameter, the model may surface older but higher-authority sources. For time-sensitive queries, combine recency filtering with domain filtering for maximum relevance.sonar for cost-sensitive prototyping at $1/M tokens. sonar-pro for production-quality multi-source synthesis. sonar-reasoning-pro for analytical tasks needing chain-of-thought. sonar-deep-research for comprehensive reports.https://api.perplexity.ai and use your Perplexity API key. The Sonar chat completions endpoint is OpenAI-compatible. Both the Python and JavaScript OpenAI SDKs work with minimal code changes.Next Step
Build a research automation pipeline. Pick a domain your team monitors (security advisories, regulatory changes, competitor product launches), wire up sonar-pro with domain filtering and recency controls, and output structured JSON to a database or notification system. This exercise demonstrates the full value of search-grounded AI: automated, source-attributed intelligence gathering that would otherwise require manual research.
AI API providers process your inputs on remote servers. Review Perplexity's privacy policy at perplexity.ai/privacy before sending sensitive data through the API. Enterprise customers should evaluate data processing agreements and retention policies.
Search-grounded responses mean your queries may trigger web searches. Consider the privacy implications of query content being used in search operations.
AI tools are productivity aids, not replacements for human judgment. If you or someone you know is in crisis:
988 Suicide & Crisis Lifeline: Call or text 988
SAMHSA Helpline: 1-800-662-4357
Crisis Text Line: Text HOME to 741741
See the NIST AI Risk Management Framework for organizational AI governance guidance.
Under GDPR and CCPA, you have rights to access, correct, and delete your data. Perplexity AI is headquartered in San Francisco, California.
TechJacks Solutions maintains editorial independence. This article was not sponsored or reviewed by Perplexity AI. TechJacks Solutions may earn referral fees from links to vendor products. These fees never influence editorial recommendations.
See our EU AI Act coverage for regulatory context.