In Provision, the CDN broker starts the creation of a CloudFront instance. The broker then takes advantage of Cloud Foundry regularly calling LastOperation, and performs further work during those calls. The work it does once the CloudFront instance is available is quite extended.
LastOperation is supposed to respond very quickly rather than doing substantial work. The cf.cc.job_queue job queue runs jobs that check LastOperation, and block while waiting for the response. Slow responses therefore impact the entire queue. This queue is important as it also processes asynchronous operations for the cf CLI.
-
Find if CloudFront is provisioned.
-
If not provisioned:
- exit and respond that the service is not yet provisioned.
-
Iterate over every domain's DNS and HTTP challenge:
-
Request that challenge and see if it is in place.
-
If this challenge is not in place: exit and respond that the service is not yet provisioned.
-
Notify LetsEncrypt to start requesting the challenge that we found in place.
-
If the status is
valid, try the next challenge. -
If the status is
invalid, exit and respond that the service is not yet provisioned. -
If the status is
pending:- Sleep for the returned
Retry-Afterduration. This may back off exponentionally: 15s, 30s, 60s, ... - Loop indefinitely asking for
statusand sleeping theRetry-Afterduration.
- Sleep for the returned
-
-
Have LetsEncrypt generate the certificate.
-
Upload the certificate to IAM on AWS.
-
Set the certificate to be used by the relevant CloudFront on AWS.
-
Exit and respond from LastOperation: Provisioned successfully.
LastOperation is performed by Cloud Foundry on the same cf.cc.job_queue job queue as asynchronous tasks from the cf CLI. The job runs synchronously and thus a long-running LastOperation will reduce the throughput of this queue. This will delay user actions, including automated user actions such as in smoke tests.
The CDN Broker's LastOperation endpoint is blocking. It does significant work which may take tens of seconds, minutes or potentially even longer:
-
Iterating over each challenge is time-consuming. A single CDN service can have as many as 100 domains, which would mean a lot of work on every
LastOperationcall whileProvisioning:- We will perform up to 100 HTTP requests and 100 DNS lookups until all domains have appropriate
TXTrecords. - We will wait for up to 100 LetsEncrypt verifications, with the suggestion that
Retry-Aftervalues may mean each single verification takes a minimum of 15 seconds.
- We will perform up to 100 HTTP requests and 100 DNS lookups until all domains have appropriate
-
We do not cache which domains have been verified. This would help when creating a few services which many domains at once.