To handle rate-limiting for a multi-tenant Spring Boot Gen AI app with different usage quotas, you can use a Redis-based rate limiter with tenant-specific keys. Here is the code snippet you can refer:


In the above code, we are using Tenant Quotas to Customize rate limits for each tenant using Bandwidth, Bucket4j, an efficient in-memory rate-limiting library with Redis support for distributed setups, and Headers, which use tenant-specific headers (X-Tenant-ID) to identify users and apply quotas.
Hence, this approach can scale effectively across tenants with varying quotas.