Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Perplexity

Perplexity API & Developer Guide: Search-Grounded AI From Setup to Production (2026)

Last verified: May 7, 2026  ·  Format: Guide  ·  Est. time: 20-25 min

Perplexity's API gives you programmatic access to something no other major AI provider offers natively: search-grounded responses with inline citations. Every answer the Sonar models return comes with URLs pointing to the web sources that informed the response. If you have built applications that chain an LLM call with a separate search API, then parse and inject results into context, Perplexity collapses that entire pipeline into a single API call.

The Sonar model family spans four tiers: a lightweight model for quick factual lookups, a pro model for complex multi-source queries, a reasoning model with chain-of-thought for analytical tasks, and a deep research model that conducts exhaustive multi-step searches. This guide walks through every step from API key generation to production deployment, covering search-grounded completions, structured JSON output, domain filtering, and the cost and rate limit decisions that determine whether your integration runs efficiently at scale.

4 Models
Sonar family spanning search, reasoning, and deep research tiers
Source: Perplexity Model Cards (May 2026)
$1/M
Input token pricing on base Sonar, lowest entry for search-grounded AI
Source: Perplexity API Pricing (May 2026)
6 Tiers
Usage tiers from free (1 QPS) to enterprise (33 QPS, 2,000 RPM)
Source: Perplexity Rate Limits (May 2026)
OpenAI SDK
Compatible: change the base URL and API key, keep your existing code
Source: Perplexity API Docs (May 2026)

What You Need Before Starting

Perplexity's Sonar API is accessed through their developer console at console.perplexity.ai. You will need an account, an API key, and either the OpenAI SDK (which works out of the box) or direct REST calls. The Sonar API follows the OpenAI chat completions format with Bearer token authentication at the base URL https://api.perplexity.ai.

Prerequisites Checklist
A Perplexity account at console.perplexity.ai with an API key generated
Python 3.9+ or Node.js 18+ installed on your machine
OpenAI Python SDK installed (pip install openai) for SDK compatibility
A terminal with environment variable support
Basic familiarity with REST APIs and JSON
Optional: billing method attached for higher rate limit tiers
0 of 6 complete
Guide Progress
0 of 8 steps complete
  • Step 1: Get API Key & Authenticate
  • Step 2: First Search-Grounded Request
  • Step 3: Choose the Right Model
  • Step 4: Search Filters & Domain Control
  • Step 5: Structured JSON Output
  • Step 6: Agent API
  • Step 7: Streaming
  • Step 8: Deploy to Production

Step 1: Get Your API Key and Configure Authentication

Start at console.perplexity.ai. Create an account if you do not have one, then navigate to the API Keys section. Generate a new key, give it a descriptive name, and copy it immediately. You will not see the full key again after leaving the page.

Set the environment variable

macOS / Linux:

export PERPLEXITY_API_KEY="your_api_key_here"

Windows (PowerShell):

$env:PERPLEXITY_API_KEY = "your_api_key_here"

For persistence, add the export to your shell profile (.bashrc, .zshrc) or use a .env file with a library like python-dotenv.

Authentication format

All API requests use Bearer token authentication via the Authorization header:

Authorization: Bearer YOUR_API_KEY

The Sonar API base URL is https://api.perplexity.ai. The chat completions endpoint follows the OpenAI-compatible format at POST /v1/chat/completions.

Verification: Run curl -H "Authorization: Bearer $PERPLEXITY_API_KEY" https://api.perplexity.ai/v1/models to confirm your key is valid. A 401 error means the key is invalid or not set correctly.

Step 2: Send Your First Search-Grounded Request

The Sonar API's chat completions endpoint accepts messages in the same format as OpenAI's API. The key difference: every response includes a citations array with URLs to the web sources that grounded the answer.

cURL example

curl -X POST \
  "https://api.perplexity.ai/v1/chat/completions" \
  -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar",
    "messages": [
      {
        "role": "system",
        "content": "You are a technical research "
                   "assistant. Be precise and "
                   "cite sources."
      },
      {
        "role": "user",
        "content": "What are the current rate "
                   "limits for the OpenAI API?"
      }
    ]
  }'

Python example (using OpenAI SDK)

Because the Sonar API is OpenAI-compatible, you can use the OpenAI Python SDK by changing the base URL:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["PERPLEXITY_API_KEY"],
    base_url="https://api.perplexity.ai"
)

response = client.chat.completions.create(
    model="sonar",
    messages=[
        {
            "role": "user",
            "content": "What are the latest changes "
                       "to the Python packaging "
                       "ecosystem?"
        }
    ]
)

print(response.choices[0].message.content)
# Response includes inline citations [1], [2]
# Citations array in response metadata

Understanding citations

The response includes a citations field: an array of URLs that the model referenced when generating its answer. The response text contains numbered references (e.g., [1], [2]) that map to positions in this array. This is the core differentiator: you get verifiable, source-attributed answers without building a separate retrieval pipeline.

Verification: You should see a response with numbered citations in the text and a citations array in the response object. If citations are missing, ensure you are using a sonar model, not a third-party model via the Agent API.

Step 3: Choose the Right Sonar Model

According to Perplexity, the Sonar family includes four models, each targeting a different complexity and cost profile.

Model API Name Best For Input / 1M Output / 1M
Sonar sonar Quick factual queries, topic summaries $1 $1
Sonar Pro sonar-pro Complex multi-source queries, follow-ups $3 $15
Sonar Reasoning Pro sonar-reasoning-pro Multi-step analysis, chain-of-thought $2 $8
Sonar Deep Research sonar-deep-research Exhaustive research reports, synthesis $2 $8

Source: Perplexity API Pricing (May 2026)

Beyond token costs, each model incurs per-request fees based on search context size (Low, Medium, or High). Sonar starts at $5 per 1,000 requests (Low context) and scales to $12 (High context). Sonar Pro ranges from $6 to $14 per 1,000 requests. According to Perplexity, Sonar Pro also supports Pro Search modes (fast, pro, auto) with higher per-request fees ($14-$22 per 1,000 requests).

Decision heuristic: Start with sonar for prototyping and simple lookups. Upgrade to sonar-pro when you need multi-source synthesis or follow-up conversation support. Use sonar-reasoning-pro for analytical tasks requiring step-by-step logic. Reserve sonar-deep-research for comprehensive research reports where thoroughness matters more than latency.

Verification: Test the same prompt across all four models and compare response quality, citation count, and latency. The cost difference between sonar ($1/M input) and sonar-pro ($3/M input, $15/M output) is significant at scale.

Step 4: Control Search Behavior With Filters

The Sonar API provides parameters that control how the underlying search operates, which directly affects the quality and relevance of grounded responses.

Domain filtering

Restrict searches to specific domains using search_domain_filter:

response = client.chat.completions.create(
    model="sonar",
    messages=[
        {
            "role": "user",
            "content": "What are the latest "
                       "Kubernetes security "
                       "best practices?"
        }
    ],
    extra_body={
        "search_domain_filter": [
            "kubernetes.io",
            "cncf.io",
            "github.com"
        ]
    }
)

This ensures citations only come from the specified domains, useful for compliance, medical, legal, or technical documentation use cases.

Recency filtering

Control how recent the search results should be with search_recency_filter:

response = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {
            "role": "user",
            "content": "What AI regulations were "
                       "announced this week?"
        }
    ],
    extra_body={
        "search_recency_filter": "week"
    }
)

Accepted values include hour, day, week, and month. For rapidly evolving topics like AI news, security advisories, or market data, recency filtering prevents stale information from contaminating responses.

Search context size

The search_context_size parameter controls how much search context the model ingests. Options are low, medium, and high. Higher context means more thorough grounding but increases per-request cost. Use low for simple factual lookups and high for research-grade queries.

Related questions and images

Two additional parameters expand what comes back in the response:

  • return_related_questions: Returns suggested follow-up queries, useful for building conversational research interfaces
  • return_images: Includes relevant images in the response, useful for visual content aggregation

Verification: Run the same query with and without domain filtering. Compare the citations arrays to confirm filtering is working. Restricted queries should only return URLs from the specified domains.

Step 5: Request Structured JSON Output

For applications that need machine-parseable responses, the Sonar API supports structured output via the response_format parameter with JSON Schema.

response = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {
            "role": "user",
            "content": "Compare the pricing of "
                       "the top 3 cloud GPU "
                       "providers for A100 "
                       "instances."
        }
    ],
    extra_body={
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "gpu_pricing",
                "schema": {
                    "type": "object",
                    "properties": {
                        "providers": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "name": {
                                        "type": "string"
                                    },
                                    "hourly_rate": {
                                        "type": "number"
                                    },
                                    "gpu_model": {
                                        "type": "string"
                                    }
                                },
                                "required": [
                                    "name",
                                    "hourly_rate",
                                    "gpu_model"
                                ]
                            }
                        }
                    },
                    "required": ["providers"]
                }
            }
        }
    }
)

According to Perplexity, schema names must be 1-64 alphanumeric characters, and new schemas may experience 10-30 second delays on first use due to preparation. Responses match the specified format unless the output exceeds max_tokens.

One important caveat: according to Perplexity, requesting links as part of a JSON response may not work reliably. For verifiable URLs, use the citations field from the response metadata rather than asking the model to embed URLs in the JSON output.

Verification: Parse the response with json.loads(). Validate against your schema. If parsing fails, check that max_tokens is high enough for the complete JSON output.

Step 6: Use the Agent API for Multi-Provider Workflows

The Agent API (POST https://api.perplexity.ai/v1/agent) offers a higher-level interface that supports multiple AI providers, built-in tool use, and multi-step research workflows.

import requests
import os

api_key = os.environ["PERPLEXITY_API_KEY"]

response = requests.post(
    "https://api.perplexity.ai/v1/agent",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "input": "Research the current state "
                 "of WebAssembly adoption in "
                 "production environments.",
        "preset": "pro-search",
        "tools": [
            {
                "type": "web_search",
                "config": {
                    "filters": {
                        "domains": [
                            "github.com",
                            "stackoverflow.com",
                            "developer.mozilla.org"
                        ]
                    }
                }
            }
        ],
        "reasoning": {
            "effort": "high"
        },
        "max_steps": 5
    }
)

The Agent API supports three presets: fast-search for quick lookups, pro-search for thorough multi-source queries, and deep-research for exhaustive investigations with up to 10 research loop iterations.

Tool costs on the Agent API are separate from model token costs: web_search costs $0.005 per invocation and fetch_url costs $0.0005 per invocation, according to Perplexity's pricing page.

Key Agent API features

  • Model fallback chains: The models parameter accepts up to 5 models for automatic failover if your primary model is unavailable
  • Reasoning effort: Configurable at low, medium, or high via the reasoning object
  • Custom functions: Define your own tools alongside built-in web_search, finance_search, and fetch_url
  • Cost tracking: The response usage object includes a cost breakdown in USD covering input, output, cache operations, and tool invocations

Verification: Check the response status field. It should be completed. If it shows failed, inspect the error object for details. Monitor the usage object's cost breakdown to track per-request spend.

Step 7: Handle Streaming and Real-Time Output

For chat interfaces or long-form generation, enable streaming to receive tokens as they are generated:

response = client.chat.completions.create(
    model="sonar",
    messages=[
        {
            "role": "user",
            "content": "Explain the differences "
                       "between gRPC and "
                       "REST APIs."
        }
    ],
    stream=True
)

for chunk in response:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content,
              end="", flush=True)

The Agent API supports streaming via Server-Sent Events (SSE) with typed event discriminators:

  • response.output_text.delta for incremental text output
  • response.reasoning.search_queries for search activity visibility
  • response.reasoning.search_results for incoming search data
  • response.completed for final status

Verification: You should see tokens appear incrementally in your terminal. If the stream hangs, check your network connection and ensure your HTTP client supports SSE.

Step 8: Deploy to Production

Moving from prototype to production requires attention to rate limits, error handling, and cost control.

Rate limits by tier

According to Perplexity, accounts progress through six tiers based on cumulative spending:

Tier Spend Threshold QPS Requests/Min
0 $0 1 50
1 $50+ 3 150
2 $250+ 8 500
3 $500+ 17 1,000
4 $1,000+ 33 2,000
5 $5,000+ 33 2,000

Source: Perplexity Rate Limits (May 2026)

Sonar Deep Research has significantly lower limits: 5 RPM at Tier 0, scaling to 100 RPM at Tier 5. The rate limiting system uses a leaky bucket algorithm that permits burst capacity while enforcing sustained rate control. Plan your architecture accordingly if deep research is a core workflow.

Error handling

import time

def query_perplexity(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="sonar",
                messages=[
                    {"role": "user",
                     "content": prompt}
                ]
            )
            return response.choices[0].message.content
        except Exception as e:
            if "429" in str(e):
                wait = 2 ** attempt
                time.sleep(wait)
                continue
            elif "401" in str(e):
                raise ValueError(
                    "Invalid API key"
                )
            else:
                raise
    raise RuntimeError(
        "Max retries exceeded"
    )

Cost optimization checklist

  • Use sonar for simple lookups ($1/M tokens input) and reserve sonar-pro ($3/M input, $15/M output) for complex queries
  • Set search_context_size to low when full grounding is not critical to reduce per-request fees
  • Set max_tokens to the minimum needed for your use case
  • Cache responses for repeated queries to avoid redundant search costs
  • Monitor usage via the Perplexity console dashboard
  • Use the Agent API's cost field in response usage to track per-request spend

Verification: Deploy to a staging environment and run 50-100 test requests. Monitor response times, error rates, and per-request costs. Set up alerts for 429 (rate limit) and 500 (server error) responses.


Troubleshooting and FAQ

Common Questions
I get a 401 Unauthorized error. What is wrong? +
Your API key is invalid or not set. Verify with echo $PERPLEXITY_API_KEY (macOS/Linux) or $env:PERPLEXITY_API_KEY (PowerShell). Generate a new key at console.perplexity.ai if needed. Keys are shown only once at creation time.
Citations are missing from the response. What happened? +
Ensure you are using a Sonar model (sonar, sonar-pro, sonar-reasoning-pro, sonar-deep-research). Third-party models accessed through the Agent API do not return search-grounded citations. Also verify your request targets /v1/chat/completions, not the Agent endpoint.
Structured JSON output is incomplete or malformed. How do I fix it? +
Increase max_tokens. According to Perplexity, responses match the specified schema unless output exceeds the token limit. Also note that new schemas may take 10-30 seconds on first use due to preparation. Add format hints to your prompt for better adherence.
The model returns outdated information. How do I get recent results? +
Use search_recency_filter with day or week to restrict results to recent sources. Without this parameter, the model may surface older but higher-authority sources. For time-sensitive queries, combine recency filtering with domain filtering for maximum relevance.
Which model should I start with for prototyping? +
sonar for cost-sensitive prototyping at $1/M tokens. sonar-pro for production-quality multi-source synthesis. sonar-reasoning-pro for analytical tasks needing chain-of-thought. sonar-deep-research for comprehensive reports.
Can I use the Perplexity API with the OpenAI SDK? +
Yes. Set the base URL to https://api.perplexity.ai and use your Perplexity API key. The Sonar chat completions endpoint is OpenAI-compatible. Both the Python and JavaScript OpenAI SDKs work with minimal code changes.

Next Step

Build a research automation pipeline. Pick a domain your team monitors (security advisories, regulatory changes, competitor product launches), wire up sonar-pro with domain filtering and recency controls, and output structured JSON to a database or notification system. This exercise demonstrates the full value of search-grounded AI: automated, source-attributed intelligence gathering that would otherwise require manual research.


Before You Use AI
Your Privacy

AI API providers process your inputs on remote servers. Review Perplexity's privacy policy at perplexity.ai/privacy before sending sensitive data through the API. Enterprise customers should evaluate data processing agreements and retention policies.

Search-grounded responses mean your queries may trigger web searches. Consider the privacy implications of query content being used in search operations.

Mental Health & AI Dependency

AI tools are productivity aids, not replacements for human judgment. If you or someone you know is in crisis:

988 Suicide & Crisis Lifeline: Call or text 988
SAMHSA Helpline: 1-800-662-4357
Crisis Text Line: Text HOME to 741741

See the NIST AI Risk Management Framework for organizational AI governance guidance.

Your Rights & Our Transparency

Under GDPR and CCPA, you have rights to access, correct, and delete your data. Perplexity AI is headquartered in San Francisco, California.

TechJacks Solutions maintains editorial independence. This article was not sponsored or reviewed by Perplexity AI. TechJacks Solutions may earn referral fees from links to vendor products. These fees never influence editorial recommendations.

See our EU AI Act coverage for regulatory context.