Quota Management
Understanding how quota works is essential for managing your API usage and costs effectively. This guide explains the quota system in detail.
Quota Overview
Quota represents your available API usage credits. Every API request consumes quota based on token usage and model pricing.
Request → Token Count → Model Multiplier → Quota Consumed
Types of Quota
1. API Key Quota
Each API key has its own quota allocation:
| Setting | Description |
|---|---|
| Remaining Quota | Available credits for this key |
| Used Quota | Total consumed by this key |
| Unlimited | No quota limit (uses account balance) |
2. Subscription Quota
Subscription plans include monthly quota that resets each billing cycle:
| Component | Description |
|---|---|
| Cycle Limit | Total quota for current cycle |
| Cycle Used | Quota consumed this cycle |
| Reset Date | When quota resets to full |
3. Account Balance
Your account balance for Pay-As-You-Go (PAYG) usage:
| Component | Description |
|---|---|
| Balance | Current available funds |
| Used | Total spent on PAYG |
Quota Calculation
Token-Based Billing
Quota consumption is calculated per request:
Quota = (Prompt Tokens + Completion Tokens × Multiplier) × Model Rate
Completion Multipliers
Output tokens typically cost more than input tokens:
| Model Category | Completion Multiplier |
|---|---|
| Most models | 1.0x |
| GPT-3.5 series | 1.33x |
| GPT-4 series | 2.0x |
Model Pricing
Different models have different costs per token. For current model pricing, please visit the Models page.
Monitoring Quota Usage
Dashboard Overview
View your quota status in the dashboard:
- Account Overview - Total balance and usage
- API Keys - Per-key quota status
- Subscription - Cycle quota remaining
- Usage History - Detailed consumption logs
API Endpoints
Check quota programmatically:
# Get token information (includes quota)
import requests
response = requests.get(
"https://api.apertis.ai/api/user/token",
headers={"Authorization": "Bearer sk-your-api-key"}
)
token_info = response.json()
print(f"Remaining quota: {token_info['data']['remain_quota']}")
print(f"Used quota: {token_info['data']['used_quota']}")
Response Headers
API responses include usage information:
{
"usage": {
"prompt_tokens": 50,
"completion_tokens": 100,
"total_tokens": 150
}
}
Managing Quota Effectively
1. Choose the Right Model
Match model capabilities to your needs:
| Task | Recommended Model | Cost Level |
|---|---|---|
| Simple Q&A | GPT-3.5 Turbo | $ |
| Code generation | GPT-4o | $$ |
| Complex reasoning | Claude Opus 4.5 | $$$ |
| Long documents | Claude Sonnet 4.5 | $$ |
2. Optimize Prompts
Reduce token usage with efficient prompts:
# Inefficient (high token usage)
prompt = """
I would like you to please help me with the following task.
I need you to summarize the following text for me.
Please make sure the summary is comprehensive and detailed.
Here is the text:
{long_text}
"""
# Efficient (lower token usage)
prompt = f"Summarize:\n{long_text}"
3. Use System Messages Wisely
System messages persist across turns. Keep them concise:
# Good - concise system message
messages = [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "..."}
]
# Avoid - overly detailed system message
messages = [
{"role": "system", "content": "You are an extremely helpful assistant..."}, # 500+ tokens
{"role": "user", "content": "..."}
]
4. Implement Caching
Cache responses for identical queries:
import hashlib
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_cached_response(prompt_hash, model):
return client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}]
)
# Create hash for caching
prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
response = get_cached_response(prompt_hash, "gpt-4.1")
5. Set Quota Limits
Configure API key quota limits to prevent overspending:
- Go to API Keys in dashboard
- Edit the key
- Set Quota Limit
- Save changes
Quota Alerts
Setting Up Alerts
Configure alerts to be notified when quota is low:
- Navigate to Settings → Notifications
- Enable Low Quota Warning
- Set threshold percentage (e.g., 20%)
- Choose notification method (email)
Alert Thresholds
| Level | Threshold | Action |
|---|---|---|
| Warning | 20% remaining | Consider top-up |
| Critical | 10% remaining | Top-up soon |
| Exhausted | 0% remaining | Requests will fail |
Quota Exhaustion
What Happens
When quota is exhausted:
- API Key Quota: Requests return
402 Payment Required - Subscription Quota: Falls back to PAYG (if enabled)
- Account Balance: All requests fail with
402
Error Response
{
"error": {
"message": "You have exceeded your quota",
"type": "billing_error",
"code": "quota_exceeded"
}
}
Recovery Options
| Situation | Solution |
|---|---|
| API key exhausted | Increase key quota or create new key |
| Subscription exhausted | Enable PAYG fallback |
| Balance exhausted | Add funds to account |
Quota Reset
Subscription Quota Reset
Subscription quota resets automatically at the start of each billing cycle:
Cycle Start (e.g., Jan 1) → Full Quota Available
↓
Use Throughout Month
↓
Cycle End (e.g., Jan 31) → Unused Quota Expires
↓
Next Cycle (Feb 1) → Full Quota Restored
Unused quota does not roll over to the next cycle.
Manual Quota Reset (Admin)
Account administrators can reset quota for:
- Individual API keys
- Subscription cycles
- User accounts
Best Practices
For Development
- Use separate keys for development and production
- Set low limits on development keys
- Monitor usage during testing
- Use cheaper models for development
For Production
- Set appropriate limits based on expected usage
- Enable auto top-up to prevent service interruption
- Monitor alerts and respond quickly
- Review usage regularly for optimization opportunities
Cost Control Strategies
| Strategy | Implementation |
|---|---|
| Model tiering | Use cheaper models for simple tasks |
| Prompt optimization | Reduce token count in prompts |
| Response limits | Set max_tokens to limit output |
| Caching | Cache frequent identical queries |
| Rate limiting | Implement client-side rate limits |
Usage Reports
Accessing Reports
View detailed usage reports in the dashboard:
- Go to Usage → Reports
- Select date range
- Filter by model, key, or endpoint
- Export as CSV if needed
Report Metrics
| Metric | Description |
|---|---|
| Total Requests | Number of API calls |
| Total Tokens | Tokens consumed |
| Total Cost | Quota/money spent |
| Average Latency | Response time |
| Error Rate | Failed requests percentage |
Related Topics
- Subscription Plans - Plan quotas and features
- Rate Limits - Request rate limits
- PAYG - Pay-As-You-Go billing
- API Keys - Key management