A blind signature based rate limiting tokens, or their keyed verification analogues (e.g. privacy pass) can be used to rate limit requests, but presents challenges with regards to stockpiling and interaction requirements (credential requests can be batched and done ahead of time subject to anti-stockpiling mitigations, but are still fundamentall O(N)).
The somewhat obvious idea (probably not novel, but I couldn't find a description) presented here uses the unlinkable multi-show property of anonymous credentials to construct token bucket filters with a one time setup, permitting non-interactive self-issuance of usage tokens whose honest usage is anonymous (tokens of a single credential or different credentials are indistinguishable).
A client wishes to make repeated anonymous requests to a rate limited server.
The client is in posession of a scarce resource (e.g. means of payment, proof of work, posession of a bitcoin UTXO with non-negligible coindays, ...) and uses it to obtain a long lived Signal keyed verification anonymous credential1 (KVAC) from the server with a single hidden attribute containing randomness (a Pedersen commitment $C$ to 0, so just $rH$).
The server responds to a credential request with an algebraic MAC that covers $C$.
In the Signal KVAC's scheme's credential presentation the client re-randomizes the commitment $C$ using another blinding term, submitting $C' = C + r' H'$ to the server (this is the basis for unlinkable multi show), and proves knowledge of the MAC covering $C$ such that the server can verify the integrity of $C'$ without learning which $C$ it corresponds to.
For rate limited multi-show, the client computes a hash to curve point $H_t = HashToCurve(t)$ where $t$ is a clock/counter proceeding at the maximum per client request rate, $r$. This is analogous to the rate of a token bucket filter. Every request is accompanied by the explicit value $t$, a nullifier $S = k H_t$, and a DLEQ proof that $k$ is the same in both the revealed nullifier $S$ and the value in the re-randomized commitment $C'$. The server rejects any request with a nullifier that has been seen before. Each credential can be additionally covered by proof of work with computational cost dominating over the verification cost of the credential presentation, to ensure that the cryptographic cost of this rate limiting token isn't a DoS concern on its own.
The server also publishes a time window $\delta$ constraining the $t$ value of requests. Given a value of $t$, and the reference $t_{\mathrm{server}}$ which the server computes as a function of time, a request is valid if $t + \delta \geq t_{\mathrm{server}}$. The window is analogous to the bucket parameter of a token bucket filter.
Furthermore, expired values of $t$ can be accepted so long as the actual aggregate request rate is sufficiently low (effectively a demand adjusted, unbounded grace period).
Clients can then sample values of $t$ from the window randomly (avoiding repetition so as to maintain unlinkability).
Within a time window corresponding to $\delta$ a client can only produce as many distinct nullifiers associated with their long lived credential as there are distinct values of $t$, namely $r \delta$. This means number of credentials issued, $n$, directly controls the maximum aggregate request rate, which is given by the product $n r \delta$2.
This follows from unlinkable multishow and DDH assumption ($S$ looks like a random point even though $k$ is fixed, and $k$ is part of a re-randomized commitment covered by the algebraic MAC).
Randomly sampling $t$ (without replacement when windows overlap) adds noise to the client's clock, which otherwise leaks identifiying information (e.g. clock skew). For low values of $\delta$, privacy suffers as the actual request rate approaches $r$, since the overt time values of $t$ must be disjoint to avoid duplicate values of $S$. This is subject to the birthday bound, in other words, the actual sustained rate should be $\lessapprox \sqrt{r\delta}$ requests in a given window of size $\delta$ without prior knowledge of the base rate. This is because if there is only a single client, the probability of a collision in $t$ values increases rapidly beyond $\sqrt{r\delta}$, so an observably higher effective rate without collisions strongly suggests that only a single client is operating. If the client knows (or assumes) that the server is observing collision in the values of $t$ already, then its actual request rate can approach $r$ maintaining (the assumption of) plausible deniability.
For efficient expiry of credentials, these reusable credentials may be issued in short lived epochs, in exchange for presentation of long lived credentials that encode an expiry time using range proofs, allowing credentials to be expired without requiring range proofs in the rate limiting credential presentation. This is appropriate if the underlying scarcity used to rate limit issuance of long lived credentials is too coarse for the desired time to live, but range proofs are too computationally costly for individual requests.
In either case, nullifier storage is O(1) because the server can prune nullifiers associated with values of $t$ sufficiently far in the past (even with a very generous grace period).
Main differences from privacy pass or any other token that in the form of a blind signature or signature analogue covering a random message used as a nullifier: