#Scaling a rails app
##Learning Objectives
- n+1 querys
###N+1 querys ActiveRecord has no way to know that, for example, a child record is being requested for each of a set of parent records, so it will produce one child record query for each parent record. Because of per-query overhead, this behavior can cause significant performance issues. For example, how many database querys will this code run?
<%@posts = Post.all(@posts).each do |p|%>
<h1><%=p.category.name%></h1>
<p><%=p.body%></p>
<%end%>
One query to fetch all rows, and then one query per row to fetch all the categories associated with that row
We can fix this using something called eager loading. This means that rails will automatically perform the nescessary querys. Rails will use a JOIN SQL statement or a strategy in which multiple queries are performed. However, assuming that you specify all the children you are going to use, it will never result in an N+1 situation, where each iteration of a loop produces an additional query.
This same example can be done by doing this:
<%@posts = Post.find(:all, :include=>[:category]
@posts.each do |p|%>
<h1><%=p.category.name%></h1>
<p><%=p.body%></p>
<%end%>
###This can cause a huge difference in performance
open up the code dbtest.rb
that i put in this repo and we need to do two things:
- create the database "test"
$ createdb test
- and adjust the connection information to match yours ( to figure this out type
$ psql and see what the username is at the far left, mine looks like ```robertwilkinson=#``` meaning my username is
robertwilkinson`
if we go through this code to the bottom we will see the two different approaches show side by side
@items=Item.all().limit(size)
@items.each do |i|
i.category
end
And with :include
@items=Item.includes(:category).all().limit(size)
@items.each do |i|
i.category
end
Run this and see what the difference is! should be a lot
###Applying this on your own https://github.com/flyerhzm/bullet Add this gem to your project 2 then go and change your relation queries to run faster using the includes syntax!
###Denormalization Denormalization is the process of making your reporting logic faster and easier at the expense of making your persistence and OLTP logic slower and harder.
Eventually, if your database gets very large with lots of data, your reports will be slow because of all the joins you'll have to do. This is FINE. AT that point and no sooner, start considering making a separate reporting database that is denormalized that you can update hourly, nightly, weekly etc from the normalized database. Then move your reporting logic to query the reporting database without having to do joins. There is no need to start off this way however. You're just incurring extra complexity and expense without being certain of the payoff. Maybe your reporting sql with joins will work indefinitely without denormalization with the use of indexes. Don't prematurely optimize.
What does denormalization look like?
the idea is pulling your 1-1 relationships out of seperate tables and sticking them into a denormalized one.