AI Rate Limits, Tokens, TPM & RPM

What Are AI Rate Limits?

When Standard Time® sends a request to an AI model — to analyze a project, answer a question, or create tasks — it is making an API call to an external service such as Groq, OpenAI, Anthropic, or Mistral. Each of those services enforces rate limits: caps on how much work your account can request in a given time window.

Rate limits exist because AI model inference is computationally expensive. Providers share their infrastructure across many customers, so limits prevent any one account from overwhelming the service. On free and lower-tier plans these limits are tight; paid tiers raise them substantially.

When Standard Time® AI Chat hits a rate limit, the service returns an HTTP 429 Too Many Requests response. The AI Chat window displays this as an error message with the limit that was exceeded, how much of it was used, and how long to wait before retrying.

Important: A rate limit error does not damage your project data. The request was blocked before any changes were made. Simply wait the indicated time and retry.

What Is a Token?

AI language models do not read text the way humans do. They break every sentence into small chunks called tokens. A token is roughly four characters of English text — usually a short word, a common word fragment, or a punctuation mark. The model processes tokens, not words or sentences.

Diagram showing a sentence broken into individual tokens, with each word highlighted as a separate token chunk

Every prompt and every reply is measured in tokens. Both the text you send and the text the model writes back count toward your token totals.

Token counts matter for two reasons:

Rate limits: AI providers cap how many tokens your account can use per minute, per hour, or per day. Large payloads exhaust these limits quickly.
Cost: Paid API tiers charge per token. Sending less data means smaller bills.

The data Standard Time® sends to the AI includes: task names, descriptions, employee assignments, start and due dates, status values, time logs, notes, budget fields, and any custom field values. For a project with 50 tasks, dozens of time log entries, and many notes, a single AI Chat request can easily exceed 3,000–5,000 tokens.

Tip: Use a token estimator like platform.openai.com/tokenizer to measure how many tokens a text sample uses before sending it to the AI. Paste a representative block of project data to get a realistic count.

Tokens Per Minute (TPM)

Tokens Per Minute (TPM) is the most commonly hit rate limit in Standard Time® AI Chat sessions. It measures the combined token count of all prompts you send plus all responses the model writes back, summed across the rolling 60-second window.

When TPM is exceeded, the API returns a 429 with a message like:

Rate limit reached for model llama-3.3-70b-versatile on tokens per minute (TPM):
Limit 6,000 · Used 5,842 · Requested 1,240
Please try again in 38.7s.

Side-by-side gauge diagrams for Tokens Per Minute and Requests Per Minute, showing usage levels and the danger zone near the limit

TPM and RPM are independent limits — you can hit either one separately. Both reset on a rolling 60-second window.

Key points about TPM:

The window is rolling, not a fixed clock minute. Tokens from 59 seconds ago still count until the full 60 seconds have elapsed.
The AI's reply tokens count against your TPM just as the prompt tokens do. A verbose AI response can itself push you over the limit.
Different models on the same provider have separate TPM limits. Switching models does not share or combine your quota.

Provider / Plan	TPM Limit (approx.)	Notes
Groq — on-demand (free)	6,000 – 30,000	Varies by model; check console.groq.com/docs/rate-limits
Groq — Dev Tier	60,000 – 120,000+	Higher limits; billing required
OpenAI — Tier 1	60,000 – 200,000	Varies by model (GPT-4o, GPT-4o-mini differ)
Anthropic — Build plan	40,000 – 400,000	Varies by model (Claude Haiku vs Sonnet vs Opus)
Mistral — Free tier	500,000	Generous free tier for experimentation
Ollama (local)	No limit	Runs on your own hardware; no API rate limits

Requests Per Minute (RPM)

Requests Per Minute (RPM) is the cap on how many separate API calls your account can make in a 60-second window, regardless of how many tokens each call uses. A short "summarize this task" prompt and a large "analyze the full project" prompt each count as one request against your RPM.

Standard Time® AI Chat typically makes one request per user message. Follow-up questions in the same conversation each add one request. Automatic background analyses (if configured) can also contribute.

An RPM error looks like:

Rate limit reached for model openai/gpt-oss-20b on requests per minute (RPM):
Limit 30 · Used 30 · Requested 1
Please try again in 12.4s.

RPM limits are usually much easier to stay within than TPM limits during normal use. However, if multiple team members share a single API key in Standard Time®, their requests all count toward the same RPM quota. Consider issuing separate API keys per user for large teams.

Tip: In Standard Time®, each AI model configuration in stdata.AiModelSet stores its own API key. You can create two separate model configurations — one per department or team — each using a different API key, to effectively double your RPM headroom.

Tokens Per Day (TPD)

Tokens Per Day (TPD) is a daily ceiling on your total token consumption. Unlike TPM and RPM which reset every minute, TPD resets once per day — typically at midnight UTC. Groq's free on-demand tier enforces a 200,000-token-per-day cap; once consumed, all further requests return 429 until the next reset.

TPD errors tend to appear toward the end of a heavy workday when a team has run many AI Chat sessions. The error message will say "tokens per day (TPD)" and will not include a short wait time — the reset happens at midnight UTC, not sooner.

Limit Type	Window	Resets	Typical action when hit
TPM	Rolling 60 seconds	Continuously, as old tokens age out	Wait the stated seconds, then retry
RPM	Rolling 60 seconds	Continuously, as old requests age out	Wait the stated seconds, then retry
TPD	Calendar day	Midnight UTC	Wait until next day or upgrade plan

How to Reduce Token Usage

The most effective way to avoid rate limit errors is to send less data to the AI in the first place. Standard Time® gives you several practical levers:

Four tip cards for reducing AI token usage: focus on one project, use exclusion flags, keep prompts short, upgrade your plan

Four strategies for staying within rate limits — each one can be applied independently or combined for maximum savings.

1. Analyze one project at a time

When you click a single project row in the Projects page before opening AI Chat, Standard Time® sends only that project's data. Asking for a portfolio-wide summary forces all projects to be included in the payload, multiplying the token count. Narrow your focus to the project that needs attention right now.

2. Enable Exclusion Flags in stdata.AiModelSet

The most powerful token-reduction tool available. See the Exclusion Flags section below for the complete reference.

3. Write shorter, focused prompts

Each word you type in the AI Chat window adds prompt tokens. Instead of writing "Please take a look at the project I have open and give me a detailed analysis of what is going wrong and what we might do to fix it," write: "Which tasks are overdue?" The AI already has the project context — you do not need to describe what you want in paragraph form.

4. Avoid re-asking questions already answered

Each message in the AI Chat session carries the full conversation history as context. A 10-turn conversation compresses that history into the prompt of message 11. Keeping sessions short — a few focused questions rather than an open-ended conversation — keeps the compounding context from inflating token counts.

5. Upgrade your AI provider plan

If your team regularly hits limits even with the optimizations above, a paid tier is often the most practical fix. Groq Dev Tier, OpenAI Tier 2+, and Anthropic's higher plans all raise TPM and TPD substantially. Upgrade links:

Groq: console.groq.com/settings/billing
OpenAI: platform.openai.com/settings/organization/billing
Anthropic: console.anthropic.com/settings/plans

Exclusion Flags in stdata.AiModelSet

Standard Time® stores AI model configurations in the stdata.AiModelSet database table. Each row defines one AI model: the provider endpoint, the model name, the API key, and a set of Exclusion Flags that control exactly which data categories are stripped from the payload before the request is sent.

Exclusion Flags work by removing entire data categories from the project snapshot that Standard Time® assembles for the AI. Excluded fields are never serialized into the prompt — the AI simply does not receive them. This reduces payload size, token count, and the likelihood of hitting a rate limit, without requiring you to change how you use AI Chat.

Before and after diagram showing how Exclusion Flags reduce the AI payload from roughly 4200 tokens to 900 tokens by removing time logs, notes, completed tasks, and budget fields

Exclusion Flags remove entire data categories before the payload leaves Standard Time®. The AI never sees — and cannot accidentally expose — the excluded fields.

Available Exclusion Flags

The flags below are set in the stdata.AiModelSet table for each configured AI model. Enable flags that correspond to data your AI Chat sessions do not need to see.

Flag	Data Excluded from Payload	Typical Token Savings	When to Use
`ExcludeTimeLogs`	All time log entries for every task — employee, start time, stop time, duration, notes	200 – 800 / project	When asking about project status, not labor hours
`ExcludeNotes`	Task notes and project-level notes fields	100 – 500 / project	When notes are informal and not needed for AI analysis
`ExcludeCompletedTasks`	Tasks whose status is Completed or Closed	300 – 2,000 / project	When focused on what's still in progress or overdue
`ExcludeAssignments`	Employee-to-task assignment records	50 – 200 / project	When asking about schedule only, not staffing
`ExcludeBudget`	Budget amount, cost-to-date, and billing rate fields	30 – 100 / project	When financial data is sensitive or not relevant to the question
`ExcludeCustomFields`	All custom field name/value pairs on tasks and projects	50 – 400 / project	When custom fields are internal codes the AI would not interpret usefully

Recommended starting combination: Enable ExcludeTimeLogs + ExcludeCompletedTasks together. These two flags alone typically cut payload size by 60–80% on active mid-project workloads without removing any data the AI needs for schedule or status analysis.

How to Configure Exclusion Flags

Exclusion Flags are set on the AI model configuration row in Standard Time®. To change them:

Open Standard Time® and go to Tools → Options → AI Models (or access stdata.AiModelSet directly if you are a database administrator).
Select the AI model configuration you want to modify.
In the Exclusion Flags section, check the categories you want to exclude from AI payloads.
Save the configuration. The new flags take effect on the next AI Chat request — no restart required.

Note: Exclusion Flags apply per model configuration, not per user. If your team uses two different AI models for different purposes, each can have its own flag set. For example, a scheduling model can exclude budget fields while a financial review model excludes time logs.

Privacy Benefit

Beyond token savings, Exclusion Flags are a data-minimization tool. Fields flagged as excluded are never serialized and never leave Standard Time®. This is useful when:

Budget or billing data is commercially sensitive and should not be transmitted to a third-party AI provider.
Employee time logs contain personally identifiable information subject to data governance policies.
Custom fields contain internal codes or serial numbers that have no analytical value for the AI but add token bulk.

Troubleshooting Common Errors

Error	Cause	Fix
`HTTP 429 — TPM exceeded`	Too many tokens sent/received in the last 60 seconds	Wait the stated seconds. Enable `ExcludeTimeLogs` and `ExcludeCompletedTasks` to reduce future payloads.
`HTTP 429 — RPM exceeded`	Too many API calls in the last 60 seconds	Wait the stated seconds. If the team shares one key, provision separate keys per department in `stdata.AiModelSet`.
`HTTP 429 — TPD exceeded`	Daily token cap fully consumed	Wait until midnight UTC, or upgrade your AI provider plan. Use Exclusion Flags to reduce daily consumption going forward.
AI gives incomplete answers	A flag is excluding data the AI needs	Disable one flag at a time until the AI has sufficient context. Start by re-enabling `ExcludeNotes` or `ExcludeAssignments`.
AI response cuts off mid-sentence	The reply hit the model's max token output limit	Ask a narrower question. Break large analyses into two or three focused prompts.
Errors persist after waiting	API key invalid, quota frozen, or billing issue	Log into your AI provider console and verify account standing and billing status.

Groq AI Model Error 429 Rate Limit Exceeded — what the error message means and how to resolve it quickly
How to Get a Quick AI Project Analysis — four steps to an instant project breakdown in Standard Time®
Managing Projects with AI Chat — how the AI Chat window can create tasks, reassign work, and analyze timelines
AI-Powered Manufacturing Software — overview of all AI features in Standard Time®

Back to Learning Center