-
-
Save richmolj/5794539 to your computer and use it in GitHub Desktop.
# Everyone should like some amount of Dependency Injection in Ruby. | |
# How much you like usually depends on how much importance you place on | |
# two competing values: | |
# | |
# 1. Build extensibly, loosely coupled components so you can properly | |
# adapt as your application scales. And things that start simple usually | |
# end up complex, so keep a solid foundation (read: solid OOP foundation | |
# in these concepts early on). | |
# | |
# and | |
# | |
# 2. Build the simplest thing possible, because YAGNI. | |
# | |
# I tend to agree with Jeff Atwood's perspective: | |
# http://www.codinghorror.com/blog/2004/10/kiss-and-yagni.html | |
# And with probably the seminal post on DI in ruby by Jamis Buck: | |
# http://weblog.jamisbuck.org/2008/11/9/legos-play-doh-and-programming | |
# And follow-up http://weblog.jamisbuck.org/2007/7/29/net-ssh-revisited | |
# | |
# DI isn't without benefits, but I feel proponents tend to turn a blind | |
# eye to the tradeoffs. | |
# | |
# DI is an obtrusive level of indirection | |
# Example from Objects on Rails: | |
class Post | |
def publish(clock = DateTime) | |
self.pubdate = clock.now | |
blog.add_entry(self) | |
end | |
end | |
# Set the published date on the post when adding to a blog. | |
# This is great because if we want to add an alternate strategy | |
# we just sub-out the injected class with a different one. For instance, | |
# publish after a review period: | |
class DelayClock | |
def now | |
DateTime.now + 24.hours | |
end | |
end | |
@post.publish(DelayClock.new) | |
# So that's the upside. The downside is our mental model goes from | |
# "A post becomes published in 24 hours after it is added" to | |
# "A blog entry could be a post, or something else. In the event it's | |
# a post, it will be published some period after saving. Look up the | |
# injected strategy to figure out when". | |
# | |
# This is only going to get more complex as we keep adding classes and | |
# treat everything as an abstraction. Abstraction == level of indirection | |
# == added complexity. I recommend looking at Net::SSH 1.1.4 (pointed | |
# out in the blog post above) for a larger, real-world example | |
# | |
# When this is a good thing | |
class Post | |
def initialize(backend) | |
@backend = backend | |
end | |
def all | |
@backend.all | |
end | |
end | |
Post.new(DB::Post).new | |
Post.new(Service::Post).new | |
# This is great because we have a subbable backend we can inject | |
# *and we actually need to use one or the other*. It's the difference | |
# between "we need to ways of finding Posts" and "we MIGHT need two ways | |
# of finding posts". Not only that, but the obtrusiveness of DI is actually a | |
# benefit: the code is screaming, "HEY! This is specially configured to support | |
# multiple strategies!" | |
# | |
# Service Locator Pattern | |
# I was disapponted Objects on Rails didn't touch on this, because a) it's the | |
# alternative to DI in acheiving the D in SOLID and b) it's more common in the | |
# ruby community. | |
# | |
# It's also usually less heavy-handed than DI. This post has a great | |
# example that progresses from DI to Service Locator: | |
# http://kresimirbojcic.com/2011/11/19/dependency-injection-in-ruby.html | |
# | |
# Let's say we want all Net::HTTP requests to have a configurable timeout. We could | |
# inject this: | |
class LongTimeout | |
def self.setup | |
http = Net::HTTP.new | |
http.read_timeout = 99999 | |
http | |
end | |
end | |
SomeClass.new(LongTimeout.method(:setup)) | |
# But we have to change the constructor to support this injection, and | |
# it's a little clunky. We can get the same modularily from | |
class Configuration | |
@settings = {} | |
def self.[](key) | |
@settings[key] | |
end | |
def self.[]=(key, value) | |
@settings[key] = value | |
end | |
end | |
Configuration.settings[:http] = proc { | |
http = Net::HTTP.new | |
http.read_timeout = 99999 | |
http | |
} | |
class SomeClass | |
def query_google | |
Configuration[:http].get 'http://google.com' | |
end | |
end | |
# Which is basically the same thing as DI but less obtrusive and more commonly seen. | |
# This will also help when we want to use 3rd-party libraries that aren't expecting | |
# you to dependency inject everything. Carrierwave suggests overriding store_dir | |
# within an Uploader: | |
class MyUploader < CarrierWave::Uploader::Base | |
def store_dir | |
Configuration[:store_dir] | |
end | |
end | |
# Instead of subclassing MyUploader, modifying the constructor, and passing it | |
# in (or using a factory) we just reference Configuration. In addition, | |
# CarrierWave associates an Uploader with an ActiveRecord class by: | |
mount_uploader :avatar, AvatarUploader | |
# We don't have to look at mount_uploader and make sure our custom instance will | |
# work correctly. We just use Carrierwave like normal. | |
# This pattern is so popular you'll often see this abstracted in many gems: | |
# http://robots.thoughtbot.com/post/344833329/mygem-configure-block | |
# | |
# Service Locator vs DI | |
# | |
# Once again, I use DI when I WANT its obtrusiveness. If I really do have an | |
# abstracted domain where things are being subbed in and out, I want to call | |
# attention to this. If I just want a damn timeout or per_page setting, I'll | |
# use Service Locator. | |
# | |
# The Problem With Service Locator | |
# | |
# Configuration is a mutable global state, so if we want this value dependent | |
# on the request, we'll have clashes. | |
# | |
# Sidekiq (think multithreaded Resque) has this problem. It starts a bunch of | |
# workers in the same process. Each worker should be able to write to a common log, | |
# but it would be useful to include some metadata about the worker | |
# and about the job it's working on, so we can get something like this: | |
"worker 1 on job foo : Hello World!" | |
# It'd be pretty heavy-handed to pass around a custom logger to every class that would | |
# like to call Sidekiq.logger.info. This is how it's done (altered for brevity): | |
module Sidekiq | |
module Logging | |
class Pretty < Logger::Formatter | |
def call(message) | |
"#{Thread.current[:sidekiq_context]} : #{message}" | |
end | |
end | |
def self.with_context(msg) | |
begin | |
Thread.current[:sidekiq_context] = msg | |
yield | |
ensure | |
Thread.current[:sidekiq_context] = nil | |
end | |
end | |
end | |
end | |
Sidekiq::Logging.with_context("#{worker.class.to_s} JobID #{item['jid']}") do | |
Sidekiq.logger "Hello world!" | |
end | |
# with_context sets a thread local, a value only accessible to the current thread. | |
# we no longer have global mutable state, but we do have configuration we can reference | |
# anywhere. Take the same concept and apply to our Carrierwave example: | |
class ConfigurationMiddleware | |
def call(env) | |
begin | |
Thread.current[:__configuration__] = {:store_dir = something_via_request} | |
@app.call(env) | |
ensure | |
Thread.current[:__configuration__] = nil | |
end | |
end | |
end | |
class Configuration | |
def self.[](key) | |
Thread.current[:__configuration__][key] | |
end | |
end | |
class MyUploader < CarrierWave::Uploader::Base | |
def store_dir | |
Configuration[:store_dir] | |
end | |
end | |
# So now we have a Service Locator that *does not have mutable global state*. | |
# We have created a request local. | |
# | |
# Time Zones | |
# | |
# Rails takes the Thread.current idea and uses it without either pattern. This | |
# is a great example of how run-time configuration can be so simple you don't | |
# even have to think about it. Let's say we want time zones to be specific to | |
# the user. Straight from the Rails docs: | |
around_filter :set_time_zone | |
def set_time_zone | |
begin | |
old_time_zone = Time.zone | |
Time.zone = current_user.time_zone if logged_in? | |
yield | |
ensure | |
Time.zone = old_time_zone | |
end | |
end | |
# Underneath the hood you'll find (altered for simplicity): | |
def self.zone=(val) | |
Thread.current[:time_zone] = val | |
end | |
def self.zone | |
Thread.current[:time_zone] | |
end | |
# DI is often useful. But I don't usually find that the problem with a piece | |
# of legacy code is that it's well-written and understandable but tightly coupled. | |
# I do often find hard-to-navigate levels of indirection and abstraction (often | |
# without even bringing DI into the mix). That's why my experience has me value | |
# the KISS principle over dogmatic OOP. While both have their problems, I personally | |
# have less with the format than the latter, which is why I value it more. |
Sure, I think that's right. I guess it's all what you optimize for, and my preference is to aim for composability wherever reasonably possible. But I think that post offers up some nice points for enriching the discussion. Namely, what do we think of the descriptions the author offers of his own API examples? Personally I think he botched it, and here's why:
- This one is definitely arguable, but I think that he's digging too much into implementation details. Here:
Publishes the current post. The created_at timestamp is set to the current time. Returns the created_at timestamp.
should setting the created_at
timestamp be documented here? I'm skeptical. If our goal is to document the api, we shouldn't be describing the guts of the method itself. What happens when we change the implementation? This mindset leads to the abstraction hole-poking I was describing above.
2. Failure to identify and document types (abstractions):
Publishes the current post. By default, created_at is set to the current time. Optionally, callers may pass in a Time object, or any object that returns a Time object when sent the now message. The created_at column is set according to that Time value. Returns the value of the created_at timestamp.
This is the wrong place to be describing the interface for the parameter. That's why you have types. We should just say it takes a Timeish
and be done with it. Elsewhere in the documentation, you say what a Timeish
does. This might seem like splitting hairs, but I think it's actually a really important distinction.
It speaks to I think the problem with Ruby's dynamic nature---yes it's convenient and can help get things done, but it also tends to lure programmers into the trap that there are no types at all. Even though there aren't any annotations, the types are still there, and not thinking about them leads to ignoring the contracts, and thus abstraction Swiss cheese.
This is normally the point where somebody says, hey duck-types to the rescue. This never really prevents things from running off the tracks though, b/c rather than documenting unambiguously somewhere in the system that a duck #quacks
, and it #swims
, we instead just throw the duck in the water and see if it drowns. See:
There’s some further potential trouble lurking in this API. What if a caller passes in the wrong kind of Time object? What if sending the now message raises an exception?
When I'm looking at any level of abstraction, I want to know what the inputs are to the thing, what the outputs are, and most importantly the invariants. I suggest that those are the most important items for being able to efficiently reason about code. Having DI makes us think about our dependencies, where they're used, and what we expect from them, explicitly.
Gregory Brown gives a really good look into some of these issues in Practicing Ruby article. It's a really good read, so I'll finish by offering that up and then step back and let some others offer thoughts.
@AndrewO - I had a response I accidentally deleted, because the iPhone has taught me to immediately click OK whenever I get any sort of confirmation pop-up. In any case:
The Kitten example is true, but I can also have conflicting middleware that both set the same key on env, or conflicting gem modules that both add the same method name. In any case, gems like Rails, Sunspot, Authlogic, Paper Trail, Sidekiq, etc are all using this same pattern. It's not stopping them from being widely used.
For the Carrierwave example, I feel like we're talking about different things. Yes, I can add middleware/initialization, use the env hash. And also add some fake env setup/handling for when we're not running through Rack (console, cron job, etc). My point was: I just want to use Carrierwave, not reconstruct my app and learn the innards of Carrierwave.
I believe you note this when referring to how 'model' comes out of nowhere. Whether we like Carrierwave's implementation seems irrelevant to me. It's an example of a popular gem we may want to use, and Joe Developer now needs to open up the source code and figure out how to rejigger the system, instead of just referring to Configuration and moving on.
I had a side-discussion with Lou, which I think is the best, most concise summary of each viewpoint. Please see https://gist.github.com/richmolj/5815196, I think it really distills the debate to its core concepts.
@ljsc - I'll try to address this more in-depth, but I think
http://therealadam.com/2013/01/03/design-for-test-vs-design-for-api
sums it my thoughts. I don't care too much about his test-vs-api point, but I do strongly agree with his examples that DI adds complexity (side point: we have different definitions of 'indirection', this illustrates what mine is) and is therefore often not worth it.
I expect you disagree with the same example, which is fine as this is pretty subjective. How strongly you agree/disagree with his point likely has a high correlation with each point above.