Wanted to chime in today with some case studies I’ve ran on my end and share the finding Hopefully someone will find some value to it. @everyone
Previously I’ve brought up a client of mine where I was using elastic search as well and direct rets real estate data. Elastic search itself as the search engine has indeed reduced the amount of resources each DB call would have taken up for the front end. The next spike to tackle were those on the back end. Since real estate data is big data and I do mean BIG DATA, it was no easy feat. Luckily there are tools to help.
To help alleviate my backend db writes, since I cannot omit requests, I used redis (Find more on redis here http://redis.io/) to save queued jobs that sit in redis and persist until it is able to run, as communication between programs can be independent of time. This capability also allowed for decreased wait times for processes. (ie. if I run a pull properties command in cli, I would need to wait until the process is done before closing out the terminal else the entire process will terminate. Since queues are saved to memory with their data and command to be ran against said data the process continues on and finishes.) On the front end for UX purposes imagine a user fills out a large form and submits said form. and you have a very intricate algorithm to handle that data and average time is approx 5 sec wait. If you turn around and send the submitted data into a job queued to redis the process will complete immediately (perceived wait time) and notify the user their data has been successfully received, yet in the background your job is locked and loaded and processing the data as it should.
This largest benefit I found from redis through scheduling via cron jobs was the cpu reduction. It is very unlikely that CPU becomes an issue with Redis, as usually Redis is either memory or network bound. For instance, Redis on an average Linux system can deliver around 500k requests per second, so if your application's commands will hardly use too much CPU. However to maximize CPU usage you can start multiple instances of Redis in the same box and treat them as different servers. While this is great, as you grow you may want to use multiple CPUs. (http://redis.io/topics/partitioning)
Say your job fails for any reason your job will persist and retry until completed successfully else fail and if configured properly saved into a failed DB table. Unfortunately the failed job usually does not include the failed reason, so I tend to check the failed time against my logs.
Another great aspect is the fact that work can be carried out by small, self-contained programs. Giving you cleaner code and a much easier way to test your apps. Side note benefit for having smaller self contained programs/functions to work with is the ability to reduce redundant code throughout. Good practice … just saying
Next great thing I found to be useful was the fact that redis can also be used to save session cache. Which means global variable fun!! Often times, usually with user data I find myself constantly \App\User::find($userId); to pull in the users data, which means ONE MORE DB request slowing me down. Now instead of calling from the DB once the user logs in, I can immediately save the \Auth::user()->toArray() directly into redis. Since I use Laravel whenever I need my global user now I can just run Redis::get('user'); and all the data I saved upon login is accessible. Less DB queries the better. You can almost think of redis in this manner like MongoDB … almost.
Since redis queues are persistant, if your job fails your log WILL grow ridiculously large in a small amount of time. To get around this in my case I had the option to add the parameter —tries x. This way after x amount of tries if it gives up and saves to my failed table. Anything further outside of sound code, I have not been able to find anything else to suppress the large log files. If any DevOps out there have pointers that would be very helpful.
Each data store has their own pro's and con's. Here are other alternatives to take a look at.
- http://redis.io/
https://aws.amazon.com/sqs/
http://kr.github.io/beanstalkd/
https://www.iron.io/
https://www.rabbitmq.com/
Priority Queuing.
What do you think? Any advice to reduce server intensiveness even further?