nkallen · January 28, 2010 07:06
diff --git a/QueryTimeouts.scala b/QueryTimeouts.scala
 /* This is when a refactoring really pays off.
 *
 * In order to make your code more modular, avoid hard-coding assumptions (or refactor them away).
 * The most fundamental, anti-modular assumption in Object-Oriented software is the concrete type of objects.
 * Any time you write "new MyClass" in your code (or in Ruby MyClass.new) you've hardcoded
 * an assumption about the concrete class of the object you're allocating. These makes it impossible, for example,
 * for someone to later add logging around method invocations of that object, or timeouts, or whatever.
 * 
 * In a very dynamic language like Ruby, open classes and method aliasing mitigate this problem, but
 * they don't solve it. If you manipulate a class to add logging, all instances of that class will have
 * logging; you can't take a surgical approach and say "just objects instantiated in this context".
 *
 * There are standard design patterns to mitigate this, namely Dependency Injection and Factories.
 * By taking a Factory (a function that manufactures objects) as a parameter to a function (or
 * a constructor) you allow a programmer to later change his mind about what Factory to provide; and
 * this means the programmer can change the concrete types of objects as his heart desires.
 * 
 * There is another pattern, the Decorator pattern, that makes this even more powerful. Instead of
 * instantiating class A instead of class B, you instantiate an A with a B, where A delegates all
 * of its methods to B except where it needs to add "decorations" such as logging.
 */

 // In this example, I have a Query object, with methods like #execute(). I want to add timeouts around all queries. I start
 // by creating a QueryProxy that routes all method invocations through an over-ridable method: #delegate:

 abstract class QueryProxy(query: Query) extends Query {
  def select[A](f: ResultSet => A) = delegate(query.select(f))
  def execute() = delegate(query.execute())
  def cancel() = query.cancel()

  protected def delegate[A](f: => A) = f
 }

 // Then, to implement timeouts, I create a Query Decorator:

 class TimingOutQuery(timeout: Duration, query: Query) extends QueryProxy(query) {
  override def delegate[A](f: => A) = {
    try {
      Timeout(timeout) {
        f
      } {
        cancel()
      }
    } catch {
      case e: TimeoutException => 
        throw new SqlTimeoutException
    }
  }
 }

 // As an aside, it is worthwhile to note that the Decorator pattern is just the Object-Oriented equivalent of function composition
 // in a functional language. Scala makes this especially explicit since everything is both an Object and a Function (if it
 // responds to #apply). A Decorator around an object that only implements #apply is pure Function-composition as you would see in
 // Haskell, ML, etc. (e.g., `f . g`). I might phrase this as: function composition is a degenerate case of the Decorator pattern.

 // The implementation of the Timeout function is provided here for the sake of curiosities. It uses threads and is weird but cool.

 object Timeout {
  val timer = new Timer("Timer thread", true)

  def apply[T](timeout: Duration)(f: => T)(onTimeout: => Unit): T = {
    @volatile var cancelled = false
    val task = if (timeout.inMillis > 0) Some(schedule(timeout, { cancelled = true; onTimeout })) else None
    try {
      f
    } finally {
      task map { t =>
        t.cancel()
        timer.purge()
      }
      if (cancelled) throw new TimeoutException
    }
  }

  private def schedule(timeout: Duration, f: => Unit) = {
    val task = new TimerTask() {
      override def run() { f }
    }
    timer.schedule(task, timeout.inMillis)
    task
  }
 }

 // The thing that ties this all together is to make sure that everybody that needs to instantiate a Query object
 // never ever calls "new Query" directly. Provide instead a Factory:

 class TimingOutQueryFactory(queryFactory: QueryFactory, timeout: Duration) extends QueryFactory {
  def apply(connection: Connection, query: String, params: Any*) = {
    new TimingOutQuery(timeout, queryFactory(connection, query, params: _*))
  }
 }

 // Of course, this Factory takes another Factory, allowing Factories to be composed (so we have Factory Decorators that
 // take Decorated Factories that make Decorated Queries! So meta!

 // Actually, "meta" in Greek means nothing like "meta" in English. "Meta" plus
 // the accusative means "after" so Aristotle's Metaphysics is actually just a book "after [the book on] physics". Anyway.

 // The "root" QueryFactory in this onion is the simplest QueryFactory and it just calls "new Query" etc.

 // The thing that made me excited tonight, though, is that I had to add a new feature: per-query timeouts. We had a global
 // 3-second timeout and this was proving to be stupid given that our most common query has a latency of 0.5ms and a standard
 // deviation of 2ms. If you have a global timeout you must set your timeout around your most expensive query not your most
 // common query (otherwise your most expensive query will always timeout!). But for a production system, cheap frequent queries,
 // if they start exceeding 2 standard deviations, can take down your site. So a sensible timeout is like 5ms. But we had it
 // set to 3,000 ms!! Yikes. Fortunately, this has not yet caused a problem but it's a time bomb.

 // So anyway, how many lines of code is it to make per-query timeouts??

 class PerQueryTimeoutTimingOutQueryFactory(queryFactory: QueryFactory, timeouts: Map[String, Duration]) extends QueryFactory {
  def apply(connection: Connection, query: String, params: Any*) = {
    new TimingOutQuery(timeouts(query), queryFactory(connection, query, params: _*)) // YAY!
  }
 }

 // BOOM. 1 LOC. That's what modularity means in practice.
 // We just look some shit up in a hashtable and we're done. By injecting the timeout value itself into the constructor of the
 // TimingOutQuery, we can actually programatically modify it. Contrast this to the previous implementation, which effectively
 // reached up into the air and snatched out a global variable. NOT modular.

 // What's more, it's even more amazing not how short this code is but that it could be
 // added by any programmer anywhere, regardless of whether they have access to the source code that actually instantiates and
 // executes queries. And they can add it "surgically" -- just in this context or that -- not globally as the
 // open classes / alias_method_chain pattern in Ruby entails.
	/* This is when a refactoring really pays off.
	*
	* In order to make your code more modular, avoid hard-coding assumptions (or refactor them away).
	* The most fundamental, anti-modular assumption in Object-Oriented software is the concrete type of objects.
	* Any time you write "new MyClass" in your code (or in Ruby MyClass.new) you've hardcoded
	* an assumption about the concrete class of the object you're allocating. These makes it impossible, for example,
	* for someone to later add logging around method invocations of that object, or timeouts, or whatever.
	*
	* In a very dynamic language like Ruby, open classes and method aliasing mitigate this problem, but
	* they don't solve it. If you manipulate a class to add logging, all instances of that class will have
	* logging; you can't take a surgical approach and say "just objects instantiated in this context".
	*
	* There are standard design patterns to mitigate this, namely Dependency Injection and Factories.
	* By taking a Factory (a function that manufactures objects) as a parameter to a function (or
	* a constructor) you allow a programmer to later change his mind about what Factory to provide; and
	* this means the programmer can change the concrete types of objects as his heart desires.
	*
	* There is another pattern, the Decorator pattern, that makes this even more powerful. Instead of
	* instantiating class A instead of class B, you instantiate an A with a B, where A delegates all
	* of its methods to B except where it needs to add "decorations" such as logging.
	*/

	// In this example, I have a Query object, with methods like #execute(). I want to add timeouts around all queries. I start
	// by creating a QueryProxy that routes all method invocations through an over-ridable method: #delegate:

	abstract class QueryProxy(query: Query) extends Query {
	def select[A](f: ResultSet => A) = delegate(query.select(f))
	def execute() = delegate(query.execute())
	def cancel() = query.cancel()

	protected def delegate[A](f: => A) = f
	}

	// Then, to implement timeouts, I create a Query Decorator:

	class TimingOutQuery(timeout: Duration, query: Query) extends QueryProxy(query) {
	override def delegate[A](f: => A) = {
	try {
	Timeout(timeout) {
	f
	} {
	cancel()
	}
	} catch {
	case e: TimeoutException =>
	throw new SqlTimeoutException
	}
	}
	}

	// As an aside, it is worthwhile to note that the Decorator pattern is just the Object-Oriented equivalent of function composition
	// in a functional language. Scala makes this especially explicit since everything is both an Object and a Function (if it
	// responds to #apply). A Decorator around an object that only implements #apply is pure Function-composition as you would see in
	// Haskell, ML, etc. (e.g., `f . g`). I might phrase this as: function composition is a degenerate case of the Decorator pattern.

	// The implementation of the Timeout function is provided here for the sake of curiosities. It uses threads and is weird but cool.

	object Timeout {
	val timer = new Timer("Timer thread", true)

	def apply[T](timeout: Duration)(f: => T)(onTimeout: => Unit): T = {
	@volatile var cancelled = false
	val task = if (timeout.inMillis > 0) Some(schedule(timeout, { cancelled = true; onTimeout })) else None
	try {
	f
	} finally {
	task map { t =>
	t.cancel()
	timer.purge()
	}
	if (cancelled) throw new TimeoutException
	}
	}

	private def schedule(timeout: Duration, f: => Unit) = {
	val task = new TimerTask() {
	override def run() { f }
	}
	timer.schedule(task, timeout.inMillis)
	task
	}
	}

	// The thing that ties this all together is to make sure that everybody that needs to instantiate a Query object
	// never ever calls "new Query" directly. Provide instead a Factory:

	class TimingOutQueryFactory(queryFactory: QueryFactory, timeout: Duration) extends QueryFactory {
	def apply(connection: Connection, query: String, params: Any*) = {
	new TimingOutQuery(timeout, queryFactory(connection, query, params: _*))
	}
	}

	// Of course, this Factory takes another Factory, allowing Factories to be composed (so we have Factory Decorators that
	// take Decorated Factories that make Decorated Queries! So meta!

	// Actually, "meta" in Greek means nothing like "meta" in English. "Meta" plus
	// the accusative means "after" so Aristotle's Metaphysics is actually just a book "after [the book on] physics". Anyway.

	// The "root" QueryFactory in this onion is the simplest QueryFactory and it just calls "new Query" etc.

	// The thing that made me excited tonight, though, is that I had to add a new feature: per-query timeouts. We had a global
	// 3-second timeout and this was proving to be stupid given that our most common query has a latency of 0.5ms and a standard
	// deviation of 2ms. If you have a global timeout you must set your timeout around your most expensive query not your most
	// common query (otherwise your most expensive query will always timeout!). But for a production system, cheap frequent queries,
	// if they start exceeding 2 standard deviations, can take down your site. So a sensible timeout is like 5ms. But we had it
	// set to 3,000 ms!! Yikes. Fortunately, this has not yet caused a problem but it's a time bomb.

	// So anyway, how many lines of code is it to make per-query timeouts??

	class PerQueryTimeoutTimingOutQueryFactory(queryFactory: QueryFactory, timeouts: Map[String, Duration]) extends QueryFactory {
	def apply(connection: Connection, query: String, params: Any*) = {
	new TimingOutQuery(timeouts(query), queryFactory(connection, query, params: _*)) // YAY!
	}
	}

	// BOOM. 1 LOC. That's what modularity means in practice.
	// We just look some shit up in a hashtable and we're done. By injecting the timeout value itself into the constructor of the
	// TimingOutQuery, we can actually programatically modify it. Contrast this to the previous implementation, which effectively
	// reached up into the air and snatched out a global variable. NOT modular.

	// What's more, it's even more amazing not how short this code is but that it could be
	// added by any programmer anywhere, regardless of whether they have access to the source code that actually instantiates and
	// executes queries. And they can add it "surgically" -- just in this context or that -- not globally as the
	// open classes / alias_method_chain pattern in Ruby entails.