Slowing or rejecting requests once a client exceeds a rate or quota, the enforcement mechanism behind every rate limit.
Throttling is the act of slowing down or rejecting requests when a client has exceeded a rate limit or quota. Common responses are HTTP 429 Too Many Requests (with a Retry-After header), gRPC RESOURCE_EXHAUSTED, AWS-style ProvisionedThroughputExceededException. Throttling differs from rate limiting in framing: rate limiting is the policy ('100 per minute'); throttling is the enforcement action ('this 101st request returns 429'). They go together in every API gateway and database.
Throttling is what makes rate limits real. A misimplemented throttle either fails open (limits don't actually enforce, abuse goes through) or fails brutal (legitimate clients get 429 with no Retry-After, retry tightly, and DDoS the limiter). Returning structured throttle responses with Retry-After headers, and pairing them with token-bucket smoothing, is the difference between a working rate limit and a footgun.
See the part of the platform that handles throttling in production.