metaskills · November 12, 2008 03:19
diff --git a/gistfile1.rbx b/gistfile1.rbx
 # Debugging limiting association joins for SQL Server Adapter.

 Giving the following limiting SQL that would be normaly be generatd for the 
 test_eager_load_has_many_through_has_many(AssociationsJoinModelTest) case in 
 ActiveRecord for SQL Server. In this case the ORDER BY is totally moot.

  SELECT DISTINCT TOP 1 [authors].id 
  FROM [authors]  
    LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])  
    LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id]) 
  WHERE (name = 'David')  ORDER BY comments.id

 Note that this SQL is generated by #construct_finder_sql_for_association_limiting 
 method. We can take two routes here. First we can follow the PostgreSQL adapter 
 again and resort to adding a connection #distinct and #add_order_by_for_association_limiting!
 method that would take the the entire SQL into a sub select and use aliasing. In IRB
 it would looks something like this for #distinct
  
  # Arg Values
  columns = '[authors].id'
  order_by = 'comments.id'
  # Execution And Return
  >> order_columns = order_by.split(',').collect { |s| s.split.first }
  => ["comments.id"]
  >> order_columns.delete_if &:blank?
  => ["comments.id"]
  >> order_columns = order_columns.zip((0...order_columns.size).to_a).map { |s,i| "#{s} AS alias_#{i}" }
  => ["comments.id AS alias_0"]
  >> sql = "DISTINCT ON (#{columns}) #{columns}, "
  => "DISTINCT ON ([authors].id) [authors].id, "
  >> sql << order_columns * ', '
  => "DISTINCT ON ([authors].id) [authors].id, comments.id AS alias_0"

 Now for PostgreSQLs #add_order_by_for_association_limiting! method.

  # Arg Values
  sql = %|SELECT DISTINCT TOP 1 [authors].id 
  FROM [authors]  
    LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])  
    LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id]) 
  WHERE (name = 'David')|
  options = {:order => 'comments.id'}
  # Execution And Return
  >> order = options[:order].split(',').collect { |s| s.strip }.reject(&:blank?)
  => ["comments.id"]
  >> order.map! { |s| 'DESC' if s =~ /\bdesc$/i }
  => [nil]
  >> order.zip((0...order.size).to_a)
  => [[nil, 0]]
  >> order = order.zip((0...order.size).to_a).map { |s,i| "id_list.alias_#{i} #{s}" }.join(', ')
  => "id_list.alias_0 "

 Now minus the fact that there is no DISTINCT ON in SQL Server, the translated SQL 
 would look like this and is quite valid. Despite that two columns are being returned 
 for the select_all, rails will only look for the column that matches the primary_key.
 So the alias_n will doubtfully be an issue.

  # Before
  SELECT DISTINCT TOP 1 [authors].id 
  FROM [authors]  
    LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])  
    LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id]) 
  WHERE (name = 'David')  ORDER BY comments.id
  
  # Adapter Implementation
  def distinct(columns, order_by)
    return "DISTINCT #{columns}" if order_by.blank?
    order_columns = order_by.split(',').collect { |s| s.split.first }
    order_columns.delete_if(&:blank?)
    order_columns = order_columns.zip((0...order_columns.size).to_a).map{ |s,i| "#{s} AS alias_#{i}" }.join(', ')
    "DISTINCT #{columns}, #{order_columns}"
  end
  def add_order_by_for_association_limiting!(sql, options)
    return sql if options[:order].blank?
    order = options[:order].split(',').collect { |s| s.strip }.reject(&:blank?)
    order.map! { |s| 'DESC' if s =~ /\bdesc$/i }
    order = order.zip((0...order.size).to_a).map { |s,i| "id_list.alias_#{i} #{s}" }.join(', ')
    sql.replace "SELECT * FROM (#{sql}) AS id_list ORDER BY #{order}"
  end
  
  # After
  SELECT * FROM (
    SELECT DISTINCT TOP 1 [authors].id, comments.id AS alias_0
    FROM [authors]  
      LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])  
      LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id]) 
    WHERE (name = 'David')
  ) AS id_list ORDER BY id_list.alias_0

 However there is one fatal flaw in this implementation. If the ordered by column 
 is not unique for the DISTINCT column in the catesian product, un-unique values 
 will be returned. For instance #test_select_limited_ids_list in HasAndBelongsToManyAssociationsTest
 will fail with <["1", "2"]> expected but was <["1", "1", "1", "2"]>. There are two 
 possible recourses.

 First the fact of the matter is that the order by is mostly moot in the limiting 
 SQL, as far as I can tell from the tests. So this implementation would work just 
 fine.
  
  def add_order_by_for_association_limiting!(sql, options)
    sql
  end

 However instead of this, we should make an attempt like the PostgreSQL adapter to
 maintain ordering for distinct values. Based on that goal the following URL is an 
 excellent article on our implementation:
 http://weblogs.sqlteam.com/jeffs/archive/2007/12/13/select-distinct-order-by-error.aspx

  # Adapter Implementation
  def add_order_by_for_association_limiting!(sql, options)
    return sql if options[:order].blank?
    columns = sql.match(/SELECT\s+DISTINCT(.*)FROM/)[1].strip
    sql.sub!(/SELECT\s+DISTINCT/,'SELECT')
    sql << "GROUP BY #{columns} ORDER BY #{order_to_min_set(options[:order])}"
  end
  def order_to_min_set(order)
    dir = ''
    orders = order.split(',').map(&:strip).reject(&:blank?)
    mins = orders.map do |o|
      if o =~ /\b(asc|desc)$/i
        dir = $1
        o = o.sub($1,'').strip
      end
      "MIN(#{o})"
    end
    "#{mins.join(', ')} #{dir}".strip
  end
  
  # Working SQL
  SELECT TOP 1 [authors].id
  FROM [authors]  
    LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])  
    LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id]) 
  WHERE (name = 'David')
  GROUP BY [authors].id
  ORDER BY MIN(comments.id), MIN(comments.post_id)
	# Debugging limiting association joins for SQL Server Adapter.

	Giving the following limiting SQL that would be normaly be generatd for the
	test_eager_load_has_many_through_has_many(AssociationsJoinModelTest) case in
	ActiveRecord for SQL Server. In this case the ORDER BY is totally moot.

	SELECT DISTINCT TOP 1 [authors].id
	FROM [authors]
	LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])
	LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id])
	WHERE (name = 'David') ORDER BY comments.id

	Note that this SQL is generated by #construct_finder_sql_for_association_limiting
	method. We can take two routes here. First we can follow the PostgreSQL adapter
	again and resort to adding a connection #distinct and #add_order_by_for_association_limiting!
	method that would take the the entire SQL into a sub select and use aliasing. In IRB
	it would looks something like this for #distinct

	# Arg Values
	columns = '[authors].id'
	order_by = 'comments.id'
	# Execution And Return
	>> order_columns = order_by.split(',').collect { \|s\| s.split.first }
	=> ["comments.id"]
	>> order_columns.delete_if &:blank?
	=> ["comments.id"]
	>> order_columns = order_columns.zip((0...order_columns.size).to_a).map { \|s,i\| "#{s} AS alias_#{i}" }
	=> ["comments.id AS alias_0"]
	>> sql = "DISTINCT ON (#{columns}) #{columns}, "
	=> "DISTINCT ON ([authors].id) [authors].id, "
	>> sql << order_columns * ', '
	=> "DISTINCT ON ([authors].id) [authors].id, comments.id AS alias_0"

	Now for PostgreSQLs #add_order_by_for_association_limiting! method.

	# Arg Values
	sql = %\|SELECT DISTINCT TOP 1 [authors].id
	FROM [authors]
	LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])
	LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id])
	WHERE (name = 'David')\|
	options = {:order => 'comments.id'}
	# Execution And Return
	>> order = options[:order].split(',').collect { \|s\| s.strip }.reject(&:blank?)
	=> ["comments.id"]
	>> order.map! { \|s\| 'DESC' if s =~ /\bdesc$/i }
	=> [nil]
	>> order.zip((0...order.size).to_a)
	=> [[nil, 0]]
	>> order = order.zip((0...order.size).to_a).map { \|s,i\| "id_list.alias_#{i} #{s}" }.join(', ')
	=> "id_list.alias_0 "

	Now minus the fact that there is no DISTINCT ON in SQL Server, the translated SQL
	would look like this and is quite valid. Despite that two columns are being returned
	for the select_all, rails will only look for the column that matches the primary_key.
	So the alias_n will doubtfully be an issue.

	# Before
	SELECT DISTINCT TOP 1 [authors].id
	FROM [authors]
	LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])
	LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id])
	WHERE (name = 'David') ORDER BY comments.id

	# Adapter Implementation
	def distinct(columns, order_by)
	return "DISTINCT #{columns}" if order_by.blank?
	order_columns = order_by.split(',').collect { \|s\| s.split.first }
	order_columns.delete_if(&:blank?)
	order_columns = order_columns.zip((0...order_columns.size).to_a).map{ \|s,i\| "#{s} AS alias_#{i}" }.join(', ')
	"DISTINCT #{columns}, #{order_columns}"
	end
	def add_order_by_for_association_limiting!(sql, options)
	return sql if options[:order].blank?
	order = options[:order].split(',').collect { \|s\| s.strip }.reject(&:blank?)
	order.map! { \|s\| 'DESC' if s =~ /\bdesc$/i }
	order = order.zip((0...order.size).to_a).map { \|s,i\| "id_list.alias_#{i} #{s}" }.join(', ')
	sql.replace "SELECT * FROM (#{sql}) AS id_list ORDER BY #{order}"
	end

	# After
	SELECT * FROM (
	SELECT DISTINCT TOP 1 [authors].id, comments.id AS alias_0
	FROM [authors]
	LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])
	LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id])
	WHERE (name = 'David')
	) AS id_list ORDER BY id_list.alias_0

	However there is one fatal flaw in this implementation. If the ordered by column
	is not unique for the DISTINCT column in the catesian product, un-unique values
	will be returned. For instance #test_select_limited_ids_list in HasAndBelongsToManyAssociationsTest
	will fail with <["1", "2"]> expected but was <["1", "1", "1", "2"]>. There are two
	possible recourses.

	First the fact of the matter is that the order by is mostly moot in the limiting
	SQL, as far as I can tell from the tests. So this implementation would work just
	fine.

	def add_order_by_for_association_limiting!(sql, options)
	sql
	end

	However instead of this, we should make an attempt like the PostgreSQL adapter to
	maintain ordering for distinct values. Based on that goal the following URL is an
	excellent article on our implementation:
	http://weblogs.sqlteam.com/jeffs/archive/2007/12/13/select-distinct-order-by-error.aspx

	# Adapter Implementation
	def add_order_by_for_association_limiting!(sql, options)
	return sql if options[:order].blank?
	columns = sql.match(/SELECT\s+DISTINCT(.*)FROM/)[1].strip
	sql.sub!(/SELECT\s+DISTINCT/,'SELECT')
	sql << "GROUP BY #{columns} ORDER BY #{order_to_min_set(options[:order])}"
	end
	def order_to_min_set(order)
	dir = ''
	orders = order.split(',').map(&:strip).reject(&:blank?)
	mins = orders.map do \|o\|
	if o =~ /\b(asc\|desc)$/i
	dir = $1
	o = o.sub($1,'').strip
	end
	"MIN(#{o})"
	end
	"#{mins.join(', ')} #{dir}".strip
	end

	# Working SQL
	SELECT TOP 1 [authors].id
	FROM [authors]
	LEFT OUTER JOIN [posts] ON ([authors].[id] = [posts].[author_id])
	LEFT OUTER JOIN [comments] ON ([comments].[post_id] = [posts].[id])
	WHERE (name = 'David')
	GROUP BY [authors].id
	ORDER BY MIN(comments.id), MIN(comments.post_id)