Quota Management

Understanding how quota works is essential for managing your API usage and costs effectively. This guide explains the quota system in detail.

Quota Overview

Quota represents your available API usage credits. Every API request consumes quota based on token usage and model pricing.

Request → Token Count → Model Multiplier → Quota Consumed

Types of Quota

1. API Key Quota

Each API key has its own quota allocation:

Setting	Description
Remaining Quota	Available credits for this key
Used Quota	Total consumed by this key
Unlimited	No quota limit (uses account balance)

2. Subscription Quota

Subscription plans include monthly quota that resets each billing cycle:

Component	Description
Cycle Limit	Total quota for current cycle
Cycle Used	Quota consumed this cycle
Reset Date	When quota resets to full

3. Account Balance

Your account balance for Pay-As-You-Go (PAYG) usage:

Component	Description
Balance	Current available funds
Used	Total spent on PAYG

Quota Calculation

Token-Based Billing

Quota consumption is calculated per request:

Quota = (Prompt Tokens + Completion Tokens × Multiplier) × Model Rate

Completion Multipliers

Output tokens typically cost more than input tokens:

Model Category	Completion Multiplier
Most models	1.0x
GPT-3.5 series	1.33x
GPT-4 series	2.0x

Model Pricing

Different models have different costs per token. For current model pricing, please visit the Models page.

Monitoring Quota Usage

Dashboard Overview

View your quota status in the dashboard:

Account Overview - Total balance and usage
API Keys - Per-key quota status
Subscription - Cycle quota remaining
Usage History - Detailed consumption logs

API Endpoints

Check quota programmatically:

# Get token information (includes quota)
import requests

response = requests.get(
    "https://api.apertis.ai/api/user/token",
    headers={"Authorization": "Bearer sk-your-api-key"}
)

token_info = response.json()
print(f"Remaining quota: {token_info['data']['remain_quota']}")
print(f"Used quota: {token_info['data']['used_quota']}")

Response Headers

API responses include usage information:

{
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 100,
    "total_tokens": 150
  }
}

Managing Quota Effectively

1. Choose the Right Model

Match model capabilities to your needs:

Task	Recommended Model	Cost Level
Simple Q&A	GPT-3.5 Turbo	$
Code generation	GPT-4o	$$
Complex reasoning	Claude Opus 4.5	$$$
Long documents	Claude Sonnet 4.5	$$

2. Optimize Prompts

Reduce token usage with efficient prompts:

# Inefficient (high token usage)
prompt = """
I would like you to please help me with the following task.
I need you to summarize the following text for me.
Please make sure the summary is comprehensive and detailed.
Here is the text:
{long_text}
"""

# Efficient (lower token usage)
prompt = f"Summarize:\n{long_text}"

3. Use System Messages Wisely

System messages persist across turns. Keep them concise:

# Good - concise system message
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "..."}
]

# Avoid - overly detailed system message
messages = [
    {"role": "system", "content": "You are an extremely helpful assistant..."},  # 500+ tokens
    {"role": "user", "content": "..."}
]

4. Implement Caching

Cache responses for identical queries:

import hashlib
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_cached_response(prompt_hash, model):
    return client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )

# Create hash for caching
prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
response = get_cached_response(prompt_hash, "gpt-4.1")

5. Set Quota Limits

Configure API key quota limits to prevent overspending:

Go to API Keys in dashboard
Edit the key
Set Quota Limit
Save changes

Quota Alerts

Setting Up Alerts

Configure alerts to be notified when quota is low:

Navigate to Settings → Notifications
Enable Low Quota Warning
Set threshold percentage (e.g., 20%)
Choose notification method (email)

Alert Thresholds

Level	Threshold	Action
Warning	20% remaining	Consider top-up
Critical	10% remaining	Top-up soon
Exhausted	0% remaining	Requests will fail

Quota Exhaustion

What Happens

When quota is exhausted:

API Key Quota: Requests return 402 Payment Required
Subscription Quota: Falls back to PAYG (if enabled)
Account Balance: All requests fail with 402

Error Response

{
  "error": {
    "message": "You have exceeded your quota",
    "type": "billing_error",
    "code": "quota_exceeded"
  }
}

Recovery Options

Situation	Solution
API key exhausted	Increase key quota or create new key
Subscription exhausted	Enable PAYG fallback
Balance exhausted	Add funds to account

Quota Reset

Subscription Quota Reset

Subscription quota resets automatically at the start of each billing cycle:

Cycle Start (e.g., Jan 1) → Full Quota Available
                          ↓
                    Use Throughout Month
                          ↓
Cycle End (e.g., Jan 31) → Unused Quota Expires
                          ↓
Next Cycle (Feb 1)       → Full Quota Restored

note

Unused quota does not roll over to the next cycle.

Manual Quota Reset (Admin)

Account administrators can reset quota for:

Individual API keys
Subscription cycles
User accounts

Best Practices

For Development

Use separate keys for development and production
Set low limits on development keys
Monitor usage during testing
Use cheaper models for development

For Production

Set appropriate limits based on expected usage
Enable auto top-up to prevent service interruption
Monitor alerts and respond quickly
Review usage regularly for optimization opportunities

Cost Control Strategies

Strategy	Implementation
Model tiering	Use cheaper models for simple tasks
Prompt optimization	Reduce token count in prompts
Response limits	Set `max_tokens` to limit output
Caching	Cache frequent identical queries
Rate limiting	Implement client-side rate limits

Usage Reports

Accessing Reports

View detailed usage reports in the dashboard:

Go to Usage → Reports
Select date range
Filter by model, key, or endpoint
Export as CSV if needed

Report Metrics

Metric	Description
Total Requests	Number of API calls
Total Tokens	Tokens consumed
Total Cost	Quota/money spent
Average Latency	Response time
Error Rate	Failed requests percentage

Subscription Plans - Plan quotas and features
Rate Limits - Request rate limits
PAYG - Pay-As-You-Go billing
API Keys - Key management

Quota Overview​

Types of Quota​

1. API Key Quota​

2. Subscription Quota​

3. Account Balance​

Quota Calculation​

Token-Based Billing​

Completion Multipliers​

Model Pricing​

Monitoring Quota Usage​

Dashboard Overview​

API Endpoints​

Response Headers​

Managing Quota Effectively​

1. Choose the Right Model​

2. Optimize Prompts​

3. Use System Messages Wisely​

4. Implement Caching​

5. Set Quota Limits​

Quota Alerts​

Setting Up Alerts​

Alert Thresholds​

Quota Exhaustion​

What Happens​

Error Response​

Recovery Options​

Quota Reset​

Subscription Quota Reset​

Manual Quota Reset (Admin)​

Best Practices​

For Development​

For Production​

Cost Control Strategies​

Usage Reports​

Accessing Reports​

Report Metrics​

Related Topics​