How do I get structured JSON output from the Perplexity API?

Use the response_format parameter with type json_schema and provide a schema definition. Schema names must be 1-64 alphanumeric characters. New schemas may take 10-30 seconds on first use.

How do I filter Perplexity API search results by date?

Use the search_recency_filter parameter with values hour, day, week, or month to restrict results to recent sources. This prevents stale information for time-sensitive queries.

Perplexity

Perplexity API & Developer Guide: Search-Grounded AI From Setup to Production (2026)

Q: How do I fix a 401 Unauthorized error with the Perplexity API?

Your API key is invalid or not set. Verify with echo $PERPLEXITY_API_KEY (macOS/Linux) or $env:PERPLEXITY_API_KEY (PowerShell). Generate a new key at console.perplexity.ai if needed.

Q: Which Perplexity Sonar model should I start with?

sonar for cost-sensitive prototyping at $1/M tokens. sonar-pro for production-quality multi-source synthesis. sonar-reasoning-pro for analytical tasks needing chain-of-thought. sonar-deep-research for comprehensive reports.

Q: Can I use the Perplexity API with the OpenAI SDK?

Yes. Set the base URL to https://api.perplexity.ai and use your Perplexity API key. The Sonar chat completions endpoint is OpenAI-compatible. Both the Python and JavaScript OpenAI SDKs work with minimal code changes.

Q: Why are citations missing from my Perplexity API response?

Ensure you are using a Sonar model (sonar, sonar-pro, sonar-reasoning-pro, sonar-deep-research). Third-party models accessed through the Agent API do not return search-grounded citations.

Last verified: May 7, 2026 · Format: Guide · Est. time: 20-25 min

Perplexity's API gives you programmatic access to something no other major AI provider offers natively: search-grounded responses with inline citations. Every answer the Sonar models return comes with URLs pointing to the web sources that informed the response. If you have built applications that chain an LLM call with a separate search API, then parse and inject results into context, Perplexity collapses that entire pipeline into a single API call.

The Sonar model family spans four tiers: a lightweight model for quick factual lookups, a pro model for complex multi-source queries, a reasoning model with chain-of-thought for analytical tasks, and a deep research model that conducts exhaustive multi-step searches. This guide walks through every step from API key generation to production deployment, covering search-grounded completions, structured JSON output, domain filtering, and the cost and rate limit decisions that determine whether your integration runs efficiently at scale.

4 Models

Sonar family spanning search, reasoning, and deep research tiers

Source: Perplexity Model Cards (May 2026)

$1/M

Input token pricing on base Sonar, lowest entry for search-grounded AI

Source: Perplexity API Pricing (May 2026)

6 Tiers

Usage tiers from free (1 QPS) to enterprise (33 QPS, 2,000 RPM)

Source: Perplexity Rate Limits (May 2026)

OpenAI SDK

Compatible: change the base URL and API key, keep your existing code

Source: Perplexity API Docs (May 2026)

What You Need Before Starting

Perplexity's Sonar API is accessed through their developer console at console.perplexity.ai. You will need an account, an API key, and either the OpenAI SDK (which works out of the box) or direct REST calls. The Sonar API follows the OpenAI chat completions format with Bearer token authentication at the base URL https://api.perplexity.ai.

Step 1: Get Your API Key and Configure Authentication

Start at console.perplexity.ai. Create an account if you do not have one, then navigate to the API Keys section. Generate a new key, give it a descriptive name, and copy it immediately. You will not see the full key again after leaving the page.

Set the environment variable

macOS / Linux:

export PERPLEXITY_API_KEY="your_api_key_here"

Windows (PowerShell):

$env:PERPLEXITY_API_KEY = "your_api_key_here"

For persistence, add the export to your shell profile (.bashrc, .zshrc) or use a .env file with a library like python-dotenv.

Authentication format

All API requests use Bearer token authentication via the Authorization header:

Authorization: Bearer YOUR_API_KEY

The Sonar API base URL is https://api.perplexity.ai. The chat completions endpoint follows the OpenAI-compatible format at POST /v1/chat/completions.

Verification: Run curl -H "Authorization: Bearer $PERPLEXITY_API_KEY" https://api.perplexity.ai/v1/models to confirm your key is valid. A 401 error means the key is invalid or not set correctly.

Step 2: Send Your First Search-Grounded Request

The Sonar API's chat completions endpoint accepts messages in the same format as OpenAI's API. The key difference: every response includes a citations array with URLs to the web sources that grounded the answer.

cURL example

curl -X POST \
  "https://api.perplexity.ai/v1/chat/completions" \
  -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar",
    "messages": [
      {
        "role": "system",
        "content": "You are a technical research "
                   "assistant. Be precise and "
                   "cite sources."
      },
      {
        "role": "user",
        "content": "What are the current rate "
                   "limits for the OpenAI API?"
      }
    ]
  }'

Python example (using OpenAI SDK)

Because the Sonar API is OpenAI-compatible, you can use the OpenAI Python SDK by changing the base URL:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["PERPLEXITY_API_KEY"],
    base_url="https://api.perplexity.ai"
)

response = client.chat.completions.create(
    model="sonar",
    messages=[
        {
            "role": "user",
            "content": "What are the latest changes "
                       "to the Python packaging "
                       "ecosystem?"
        }
    ]
)

print(response.choices[0].message.content)
# Response includes inline citations [1], [2]
# Citations array in response metadata

Understanding citations

The response includes a citations field: an array of URLs that the model referenced when generating its answer. The response text contains numbered references (e.g., [1], [2]) that map to positions in this array. This is the core differentiator: you get verifiable, source-attributed answers without building a separate retrieval pipeline.

Verification: You should see a response with numbered citations in the text and a citations array in the response object. If citations are missing, ensure you are using a sonar model, not a third-party model via the Agent API.

Step 3: Choose the Right Sonar Model

According to Perplexity, the Sonar family includes four models, each targeting a different complexity and cost profile.

Model	API Name	Best For	Input / 1M	Output / 1M
Sonar	`sonar`	Quick factual queries, topic summaries	$1	$1
Sonar Pro	`sonar-pro`	Complex multi-source queries, follow-ups	$3	$15
Sonar Reasoning Pro	`sonar-reasoning-pro`	Multi-step analysis, chain-of-thought	$2	$8
Sonar Deep Research	`sonar-deep-research`	Exhaustive research reports, synthesis	$2	$8

Source: Perplexity API Pricing (May 2026)

Beyond token costs, each model incurs per-request fees based on search context size (Low, Medium, or High). Sonar starts at $5 per 1,000 requests (Low context) and scales to $12 (High context). Sonar Pro ranges from $6 to $14 per 1,000 requests. According to Perplexity, Sonar Pro also supports Pro Search modes (fast, pro, auto) with higher per-request fees ($14-$22 per 1,000 requests).

Decision heuristic: Start with sonar for prototyping and simple lookups. Upgrade to sonar-pro when you need multi-source synthesis or follow-up conversation support. Use sonar-reasoning-pro for analytical tasks requiring step-by-step logic. Reserve sonar-deep-research for comprehensive research reports where thoroughness matters more than latency.

Verification: Test the same prompt across all four models and compare response quality, citation count, and latency. The cost difference between sonar ($1/M input) and sonar-pro ($3/M input, $15/M output) is significant at scale.

The Sonar API provides parameters that control how the underlying search operates, which directly affects the quality and relevance of grounded responses.

Domain filtering

Restrict searches to specific domains using search_domain_filter:

response = client.chat.completions.create(
    model="sonar",
    messages=[
        {
            "role": "user",
            "content": "What are the latest "
                       "Kubernetes security "
                       "best practices?"
        }
    ],
    extra_body={
        "search_domain_filter": [
            "kubernetes.io",
            "cncf.io",
            "github.com"
        ]
    }
)

This ensures citations only come from the specified domains, useful for compliance, medical, legal, or technical documentation use cases.

Recency filtering

Control how recent the search results should be with search_recency_filter:

response = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {
            "role": "user",
            "content": "What AI regulations were "
                       "announced this week?"
        }
    ],
    extra_body={
        "search_recency_filter": "week"
    }
)

Accepted values include hour, day, week, and month. For rapidly evolving topics like AI news, security advisories, or market data, recency filtering prevents stale information from contaminating responses.

Search context size

The search_context_size parameter controls how much search context the model ingests. Options are low, medium, and high. Higher context means more thorough grounding but increases per-request cost. Use low for simple factual lookups and high for research-grade queries.

Step 5: Request Structured JSON Output

For applications that need machine-parseable responses, the Sonar API supports structured output via the response_format parameter with JSON Schema.

response = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {
            "role": "user",
            "content": "Compare the pricing of "
                       "the top 3 cloud GPU "
                       "providers for A100 "
                       "instances."
        }
    ],
    extra_body={
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "gpu_pricing",
                "schema": {
                    "type": "object",
                    "properties": {
                        "providers": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "name": {
                                        "type": "string"
                                    },
                                    "hourly_rate": {
                                        "type": "number"
                                    },
                                    "gpu_model": {
                                        "type": "string"
                                    }
                                },
                                "required": [
                                    "name",
                                    "hourly_rate",
                                    "gpu_model"
                                ]
                            }
                        }
                    },
                    "required": ["providers"]
                }
            }
        }
    }
)

According to Perplexity, schema names must be 1-64 alphanumeric characters, and new schemas may experience 10-30 second delays on first use due to preparation. Responses match the specified format unless the output exceeds max_tokens.

One important caveat: according to Perplexity, requesting links as part of a JSON response may not work reliably. For verifiable URLs, use the citations field from the response metadata rather than asking the model to embed URLs in the JSON output.

Verification: Parse the response with json.loads(). Validate against your schema. If parsing fails, check that max_tokens is high enough for the complete JSON output.

Step 6: Use the Agent API for Multi-Provider Workflows

The Agent API (POST https://api.perplexity.ai/v1/agent) offers a higher-level interface that supports multiple AI providers, built-in tool use, and multi-step research workflows.

import requests
import os

api_key = os.environ["PERPLEXITY_API_KEY"]

response = requests.post(
    "https://api.perplexity.ai/v1/agent",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "input": "Research the current state "
                 "of WebAssembly adoption in "
                 "production environments.",
        "preset": "pro-search",
        "tools": [
            {
                "type": "web_search",
                "config": {
                    "filters": {
                        "domains": [
                            "github.com",
                            "stackoverflow.com",
                            "developer.mozilla.org"
                        ]
                    }
                }
            }
        ],
        "reasoning": {
            "effort": "high"
        },
        "max_steps": 5
    }
)

The Agent API supports three presets: fast-search for quick lookups, pro-search for thorough multi-source queries, and deep-research for exhaustive investigations with up to 10 research loop iterations.

Tool costs on the Agent API are separate from model token costs: web_search costs $0.005 per invocation and fetch_url costs $0.0005 per invocation, according to Perplexity's pricing page.

Key Agent API features

Model fallback chains: The models parameter accepts up to 5 models for automatic failover if your primary model is unavailable
Reasoning effort: Configurable at low, medium, or high via the reasoning object
Custom functions: Define your own tools alongside built-in web_search, finance_search, and fetch_url
Cost tracking: The response usage object includes a cost breakdown in USD covering input, output, cache operations, and tool invocations

Verification: Check the response status field. It should be completed. If it shows failed, inspect the error object for details. Monitor the usage object's cost breakdown to track per-request spend.

Step 7: Handle Streaming and Real-Time Output

For chat interfaces or long-form generation, enable streaming to receive tokens as they are generated:

response = client.chat.completions.create(
    model="sonar",
    messages=[
        {
            "role": "user",
            "content": "Explain the differences "
                       "between gRPC and "
                       "REST APIs."
        }
    ],
    stream=True
)

for chunk in response:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content,
              end="", flush=True)

The Agent API supports streaming via Server-Sent Events (SSE) with typed event discriminators:

response.output_text.delta for incremental text output
response.reasoning.search_queries for search activity visibility
response.reasoning.search_results for incoming search data
response.completed for final status

Verification: You should see tokens appear incrementally in your terminal. If the stream hangs, check your network connection and ensure your HTTP client supports SSE.

Step 8: Deploy to Production

Moving from prototype to production requires attention to rate limits, error handling, and cost control.

Rate limits by tier

According to Perplexity, accounts progress through six tiers based on cumulative spending:

Tier	Spend Threshold	QPS	Requests/Min
0	$0	1	50
1	$50+	3	150
2	$250+	8	500
3	$500+	17	1,000
4	$1,000+	33	2,000
5	$5,000+	33	2,000

Source: Perplexity Rate Limits (May 2026)

Sonar Deep Research has significantly lower limits: 5 RPM at Tier 0, scaling to 100 RPM at Tier 5. The rate limiting system uses a leaky bucket algorithm that permits burst capacity while enforcing sustained rate control. Plan your architecture accordingly if deep research is a core workflow.

Error handling

import time

def query_perplexity(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="sonar",
                messages=[
                    {"role": "user",
                     "content": prompt}
                ]
            )
            return response.choices[0].message.content
        except Exception as e:
            if "429" in str(e):
                wait = 2 ** attempt
                time.sleep(wait)
                continue
            elif "401" in str(e):
                raise ValueError(
                    "Invalid API key"
                )
            else:
                raise
    raise RuntimeError(
        "Max retries exceeded"
    )

Cost optimization checklist

Use sonar for simple lookups ($1/M tokens input) and reserve sonar-pro ($3/M input, $15/M output) for complex queries
Set search_context_size to low when full grounding is not critical to reduce per-request fees
Set max_tokens to the minimum needed for your use case
Cache responses for repeated queries to avoid redundant search costs
Monitor usage via the Perplexity console dashboard
Use the Agent API's cost field in response usage to track per-request spend

Verification: Deploy to a staging environment and run 50-100 test requests. Monitor response times, error rates, and per-request costs. Set up alerts for 429 (rate limit) and 500 (server error) responses.

Troubleshooting and FAQ

Common Questions

I get a 401 Unauthorized error. What is wrong? +

Your API key is invalid or not set. Verify with echo $PERPLEXITY_API_KEY (macOS/Linux) or $env:PERPLEXITY_API_KEY (PowerShell). Generate a new key at console.perplexity.ai if needed. Keys are shown only once at creation time.

Citations are missing from the response. What happened? +

Ensure you are using a Sonar model (sonar, sonar-pro, sonar-reasoning-pro, sonar-deep-research). Third-party models accessed through the Agent API do not return search-grounded citations. Also verify your request targets /v1/chat/completions, not the Agent endpoint.

Structured JSON output is incomplete or malformed. How do I fix it? +

Increase max_tokens. According to Perplexity, responses match the specified schema unless output exceeds the token limit. Also note that new schemas may take 10-30 seconds on first use due to preparation. Add format hints to your prompt for better adherence.

The model returns outdated information. How do I get recent results? +

Use search_recency_filter with day or week to restrict results to recent sources. Without this parameter, the model may surface older but higher-authority sources. For time-sensitive queries, combine recency filtering with domain filtering for maximum relevance.

Which model should I start with for prototyping? +

sonar for cost-sensitive prototyping at $1/M tokens. sonar-pro for production-quality multi-source synthesis. sonar-reasoning-pro for analytical tasks needing chain-of-thought. sonar-deep-research for comprehensive reports.

Can I use the Perplexity API with the OpenAI SDK? +

Yes. Set the base URL to https://api.perplexity.ai and use your Perplexity API key. The Sonar chat completions endpoint is OpenAI-compatible. Both the Python and JavaScript OpenAI SDKs work with minimal code changes.

Next Step

Build a research automation pipeline. Pick a domain your team monitors (security advisories, regulatory changes, competitor product launches), wire up sonar-pro with domain filtering and recency controls, and output structured JSON to a database or notification system. This exercise demonstrates the full value of search-grounded AI: automated, source-attributed intelligence gathering that would otherwise require manual research.

Video Resources

▶

Perplexity API: Getting Started Tutorial

Search on YouTube

▶

Sonar API: Search-Grounded AI Responses

Search on YouTube

▶

Perplexity Agent API: Deep Research Automation

Search on YouTube

Before You Use AI

Your Privacy

AI API providers process your inputs on remote servers. Review Perplexity's privacy policy at perplexity.ai/privacy before sending sensitive data through the API. Enterprise customers should evaluate data processing agreements and retention policies.

Search-grounded responses mean your queries may trigger web searches. Consider the privacy implications of query content being used in search operations.

Mental Health & AI Dependency

AI tools are productivity aids, not replacements for human judgment. If you or someone you know is in crisis:

988 Suicide & Crisis Lifeline: Call or text 988
SAMHSA Helpline: 1-800-662-4357
Crisis Text Line: Text HOME to 741741

See the NIST AI Risk Management Framework for organizational AI governance guidance.

Your Rights & Our Transparency

Under GDPR and CCPA, you have rights to access, correct, and delete your data. Perplexity AI is headquartered in San Francisco, California.

TechJacks Solutions maintains editorial independence. This article was not sponsored or reviewed by Perplexity AI. TechJacks Solutions may earn referral fees from links to vendor products. These fees never influence editorial recommendations.

See our EU AI Act coverage for regulatory context.

Gallery

Contacts

Perplexity API & Developer Guide: Search-Grounded AI From Setup to Production (2026)

What You Need Before Starting

Step 1: Get Your API Key and Configure Authentication

Set the environment variable

Authentication format

Step 2: Send Your First Search-Grounded Request

cURL example

Python example (using OpenAI SDK)

Understanding citations

Step 3: Choose the Right Sonar Model

Step 4: Control Search Behavior With Filters

Domain filtering

Recency filtering

Search context size

Related questions and images

Step 5: Request Structured JSON Output

Step 6: Use the Agent API for Multi-Provider Workflows

Key Agent API features

Step 7: Handle Streaming and Real-Time Output

Step 8: Deploy to Production

Rate limits by tier

Error handling

Cost optimization checklist

Troubleshooting and FAQ

Next Step

Services

Learn

Company