A hard cap on resource consumption per tenant, the multi-tenancy primitive that prevents one user from consuming the whole platform.
A quota is an enforced limit on how much of a shared resource a tenant can consume: API calls per minute, CPU hours per day, GB of storage, concurrent connections, pods per namespace. Cloud platforms enforce quotas at multiple layers: AWS service quotas, Kubernetes ResourceQuotas, RDBMS user limits, application-level rate limits. The quota's job is to make a noisy neighbor's blast radius bounded.
Without quotas, multi-tenant systems are one runaway tenant away from a Sev-1: a buggy script in one tenant's account exhausts a shared limit, and the platform blocks every other tenant. Per-tenant quotas isolate the impact, the bad tenant hits its limit, the others keep running. Quota sizing is policy work as much as engineering work, set them too tight and tenants complain, set them too loose and they don't protect anyone.
See the part of the platform that handles quota (resource quota) in production.