Instead of giving it a bigger machine to make it handle more load (vertically scaling), if we could give it N more machines to make it handle more load, that would scale further.
When the system needs scaled, it should dynamically scale, without developer intervention, within bounds, to be performant within the SLAs of the system.
- communications with the service are over ssl, with a not-hand-rolled authentication layer
when computers/workers turn on and off, no work should be lost, and new computers/workers should be able to continue where the old instance left off.
OpenTracing (or something similar) so we can isolate the logs and metrics to single calls/checks that are flowing through the system between services.
json structured logging
use opentracing jaegertracing?
- https://www.amazon.ca/Clean-Architecture-Craftsmans-Software-Structure/dp/0134494164/ref=pd_bxgy_img_1/130-3523801-3796032?pd_rd_w=bk9jA&pf_rd_p=8c482a45-7c0f-409b-937c-741a67b11a67&pf_rd_r=FXN8W7HDC60FJZQRY7Y9&pd_rd_r=c7660edc-2cb6-49a6-808e-a0a044c0c17b&pd_rd_wg=L8WsK&pd_rd_i=0134494164&psc=1
- https://www.amazon.ca/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321
- https://www.cncf.io/
- https://www.youtube.com/watch?v=5ZjhNTM8XU8