-
-
Save richmolj/5794539 to your computer and use it in GitHub Desktop.
# Everyone should like some amount of Dependency Injection in Ruby. | |
# How much you like usually depends on how much importance you place on | |
# two competing values: | |
# | |
# 1. Build extensibly, loosely coupled components so you can properly | |
# adapt as your application scales. And things that start simple usually | |
# end up complex, so keep a solid foundation (read: solid OOP foundation | |
# in these concepts early on). | |
# | |
# and | |
# | |
# 2. Build the simplest thing possible, because YAGNI. | |
# | |
# I tend to agree with Jeff Atwood's perspective: | |
# http://www.codinghorror.com/blog/2004/10/kiss-and-yagni.html | |
# And with probably the seminal post on DI in ruby by Jamis Buck: | |
# http://weblog.jamisbuck.org/2008/11/9/legos-play-doh-and-programming | |
# And follow-up http://weblog.jamisbuck.org/2007/7/29/net-ssh-revisited | |
# | |
# DI isn't without benefits, but I feel proponents tend to turn a blind | |
# eye to the tradeoffs. | |
# | |
# DI is an obtrusive level of indirection | |
# Example from Objects on Rails: | |
class Post | |
def publish(clock = DateTime) | |
self.pubdate = clock.now | |
blog.add_entry(self) | |
end | |
end | |
# Set the published date on the post when adding to a blog. | |
# This is great because if we want to add an alternate strategy | |
# we just sub-out the injected class with a different one. For instance, | |
# publish after a review period: | |
class DelayClock | |
def now | |
DateTime.now + 24.hours | |
end | |
end | |
@post.publish(DelayClock.new) | |
# So that's the upside. The downside is our mental model goes from | |
# "A post becomes published in 24 hours after it is added" to | |
# "A blog entry could be a post, or something else. In the event it's | |
# a post, it will be published some period after saving. Look up the | |
# injected strategy to figure out when". | |
# | |
# This is only going to get more complex as we keep adding classes and | |
# treat everything as an abstraction. Abstraction == level of indirection | |
# == added complexity. I recommend looking at Net::SSH 1.1.4 (pointed | |
# out in the blog post above) for a larger, real-world example | |
# | |
# When this is a good thing | |
class Post | |
def initialize(backend) | |
@backend = backend | |
end | |
def all | |
@backend.all | |
end | |
end | |
Post.new(DB::Post).new | |
Post.new(Service::Post).new | |
# This is great because we have a subbable backend we can inject | |
# *and we actually need to use one or the other*. It's the difference | |
# between "we need to ways of finding Posts" and "we MIGHT need two ways | |
# of finding posts". Not only that, but the obtrusiveness of DI is actually a | |
# benefit: the code is screaming, "HEY! This is specially configured to support | |
# multiple strategies!" | |
# | |
# Service Locator Pattern | |
# I was disapponted Objects on Rails didn't touch on this, because a) it's the | |
# alternative to DI in acheiving the D in SOLID and b) it's more common in the | |
# ruby community. | |
# | |
# It's also usually less heavy-handed than DI. This post has a great | |
# example that progresses from DI to Service Locator: | |
# http://kresimirbojcic.com/2011/11/19/dependency-injection-in-ruby.html | |
# | |
# Let's say we want all Net::HTTP requests to have a configurable timeout. We could | |
# inject this: | |
class LongTimeout | |
def self.setup | |
http = Net::HTTP.new | |
http.read_timeout = 99999 | |
http | |
end | |
end | |
SomeClass.new(LongTimeout.method(:setup)) | |
# But we have to change the constructor to support this injection, and | |
# it's a little clunky. We can get the same modularily from | |
class Configuration | |
@settings = {} | |
def self.[](key) | |
@settings[key] | |
end | |
def self.[]=(key, value) | |
@settings[key] = value | |
end | |
end | |
Configuration.settings[:http] = proc { | |
http = Net::HTTP.new | |
http.read_timeout = 99999 | |
http | |
} | |
class SomeClass | |
def query_google | |
Configuration[:http].get 'http://google.com' | |
end | |
end | |
# Which is basically the same thing as DI but less obtrusive and more commonly seen. | |
# This will also help when we want to use 3rd-party libraries that aren't expecting | |
# you to dependency inject everything. Carrierwave suggests overriding store_dir | |
# within an Uploader: | |
class MyUploader < CarrierWave::Uploader::Base | |
def store_dir | |
Configuration[:store_dir] | |
end | |
end | |
# Instead of subclassing MyUploader, modifying the constructor, and passing it | |
# in (or using a factory) we just reference Configuration. In addition, | |
# CarrierWave associates an Uploader with an ActiveRecord class by: | |
mount_uploader :avatar, AvatarUploader | |
# We don't have to look at mount_uploader and make sure our custom instance will | |
# work correctly. We just use Carrierwave like normal. | |
# This pattern is so popular you'll often see this abstracted in many gems: | |
# http://robots.thoughtbot.com/post/344833329/mygem-configure-block | |
# | |
# Service Locator vs DI | |
# | |
# Once again, I use DI when I WANT its obtrusiveness. If I really do have an | |
# abstracted domain where things are being subbed in and out, I want to call | |
# attention to this. If I just want a damn timeout or per_page setting, I'll | |
# use Service Locator. | |
# | |
# The Problem With Service Locator | |
# | |
# Configuration is a mutable global state, so if we want this value dependent | |
# on the request, we'll have clashes. | |
# | |
# Sidekiq (think multithreaded Resque) has this problem. It starts a bunch of | |
# workers in the same process. Each worker should be able to write to a common log, | |
# but it would be useful to include some metadata about the worker | |
# and about the job it's working on, so we can get something like this: | |
"worker 1 on job foo : Hello World!" | |
# It'd be pretty heavy-handed to pass around a custom logger to every class that would | |
# like to call Sidekiq.logger.info. This is how it's done (altered for brevity): | |
module Sidekiq | |
module Logging | |
class Pretty < Logger::Formatter | |
def call(message) | |
"#{Thread.current[:sidekiq_context]} : #{message}" | |
end | |
end | |
def self.with_context(msg) | |
begin | |
Thread.current[:sidekiq_context] = msg | |
yield | |
ensure | |
Thread.current[:sidekiq_context] = nil | |
end | |
end | |
end | |
end | |
Sidekiq::Logging.with_context("#{worker.class.to_s} JobID #{item['jid']}") do | |
Sidekiq.logger "Hello world!" | |
end | |
# with_context sets a thread local, a value only accessible to the current thread. | |
# we no longer have global mutable state, but we do have configuration we can reference | |
# anywhere. Take the same concept and apply to our Carrierwave example: | |
class ConfigurationMiddleware | |
def call(env) | |
begin | |
Thread.current[:__configuration__] = {:store_dir = something_via_request} | |
@app.call(env) | |
ensure | |
Thread.current[:__configuration__] = nil | |
end | |
end | |
end | |
class Configuration | |
def self.[](key) | |
Thread.current[:__configuration__][key] | |
end | |
end | |
class MyUploader < CarrierWave::Uploader::Base | |
def store_dir | |
Configuration[:store_dir] | |
end | |
end | |
# So now we have a Service Locator that *does not have mutable global state*. | |
# We have created a request local. | |
# | |
# Time Zones | |
# | |
# Rails takes the Thread.current idea and uses it without either pattern. This | |
# is a great example of how run-time configuration can be so simple you don't | |
# even have to think about it. Let's say we want time zones to be specific to | |
# the user. Straight from the Rails docs: | |
around_filter :set_time_zone | |
def set_time_zone | |
begin | |
old_time_zone = Time.zone | |
Time.zone = current_user.time_zone if logged_in? | |
yield | |
ensure | |
Time.zone = old_time_zone | |
end | |
end | |
# Underneath the hood you'll find (altered for simplicity): | |
def self.zone=(val) | |
Thread.current[:time_zone] = val | |
end | |
def self.zone | |
Thread.current[:time_zone] | |
end | |
# DI is often useful. But I don't usually find that the problem with a piece | |
# of legacy code is that it's well-written and understandable but tightly coupled. | |
# I do often find hard-to-navigate levels of indirection and abstraction (often | |
# without even bringing DI into the mix). That's why my experience has me value | |
# the KISS principle over dogmatic OOP. While both have their problems, I personally | |
# have less with the format than the latter, which is why I value it more. |
AndrewO
commented
Jun 18, 2013
Re the CarrierWave example: we can do all this without a thread local global variable in a way
that leads to smaller classes with fewer responsibilities:
Step 1: Put the configuration on the Rack Env rather than a thread local
class ConfigurationMiddleware
def call(env)
env[:__configuration__] = {:store_dir = something_via_request}
@app.call(env)
end
end
class MyUploader < CarrierWave::Uploader::Base
def store_dir
model.store_dir
# If this seems messy because you're wondering where "model" came
# from, you're right: CarrierWave::Uploader::Base is a super
# object pulling in 15 modules, one of which has an initialize
# method and "model" accessor in it.
end
end
class SomeClass < Struct.new(:store_dir, :avatar)
# For the sake of demonstration, let's assume
# this works just like ActiveRecord's mount command
mount_uploader :avatar, :my_uploader
end
class MyApp
def call(env)
SomeClass.new(env[:__configuration__][:store_dir], some_input)
end
end
Step 2: Split data access from data representation
If we abandon the idea that uploading needs to be done by what's otherwise a dumb data structure, we start to
centralize our stateful code.
Note that we also now have a domain object that can be reused in any context (console, tests, rake tasks, messaging, persistance) with no setup.
(ConfigurationMiddleware is the same as above and assumed to be in the rack stack. Forget all others.)
SomeClass = Struct.new(:avatar)
class MyUploader < CarrierWave::Uploader::Base
# Turns out, CarrierWave defines store_dir as an accessor that
# defaults to a class configuration.
end
class MyApp
def call(env)
data = SomeClass.new(some_input)
get_uploader.call(
env[:__configuration__][:store_dir]
).store!(date.avatar) # Or we could pass the uploader into a method on SomeClass
end
private
def get_uploader(store_dir)
MyUploader.new.tap {|u| u.store_dir = store_dir }
end
end
Step 3: Set up the uploader upstream from the app
Why don't we just create the uploader in the place where it can vary?
(SomeClass and MyUploader are the same as the last example)
class SetupUploader
def call(env)
env[:avatar_uploader] = MyUploader.new.tap {|u| u.store_dir = something_via_request }
@app.call(env)
end
end
class MyApp
def call(env)
data = SomeClass.new(some_input)
env[:avatar_uploader].store!(data.avatar)
some_response
end
end
The end result is that we have a few small classes that can be tested and reused in complete isolation, using only their parameters and method calls.
I think I'm starting out pretty firmly in the +1 DI column. Response/thoughts inline below:
# lr> DI isn't without benefits, but I feel proponents tend to turn a
# lr> blind eye to the tradeoffs. DI is an obtrusive level of
# lr> indirection
#
# But DI has nothing to do with indirection, per se. You already have
# indirection b/c you are delegating to a collaborator. You can make the
# argument whether that is an appropriate use of indirection or not, but
# it has nothing to do with instanciating the dependency, which is
# what DI is all about.
# Example from Objects on Rails:
class Post
def publish(clock = DateTime)
self.pubdate = clock.now
blog.add_entry(self)
end
end
class DelayClock
def now
DateTime.now + 24.hours
end
end
@post.publish(DelayClock.new)
# Granted that you are correct about the overhead of the delegation
# here; however, it's use seems entirely appropriate to me in this
# instance. Consider you mentioned SOLID a few paragraphs down. The
# "extra" abstraction in this example can be argued for citing the SRP,
# and the OCP, (and possibly LSP if you consider ducktypes subtypes).
# lr> The downside is our mental model goes from "A post becomes published
# lr> in 24 hours after it is added" to "A blog entry could be a post, or
# lr> something else. In the event it's a post, it will be published some
# lr> period after saving. Look up the injected strategy to figure out
# lr> when".
# Again, DI doesn't do anything to change the mental model. If you had
# hard coded the first line of the publish method to use DateTime
# directly, you'd still have the indirection.
# lr> This is only going to get more complex as we keep adding classes and
# lr> treat everything as an abstraction. Abstraction == level of indirection
# lr> == added complexity.
#
# I disagree. Abstraction does not mean added complexity, because
# abstraction is not the same thing as indirection. The whole purpose of
# creating an abstraction is to remove complexity, not add it. If the
# particular abstraction is complicating things, I would suggest that
# perhaps we have the wrong abstraction.
#
# In this case, I think it is appropriate. When you're looking at this
# code you shouldn't be thinking about pubdates at all. If you are, then
# you are trying to poke a hole in the abstraction---The post shouldn't
# care, that's the responsibity of the clock.
#
# Is there a cost to to this indirection? Sure. Obviously, there's a
# cost/benefit analysis that has to be taken up here, but I think that
# generally these types of abstractions are worth it because they enable
# composibility. Work out the interface for the collaboration, identify
# the invariants, then forget about the extra complexity b/c you've
# sealed that away in a black box.
#
# This will help us in our quest to have more reusable code.
class Post
def initialize(backend)
@backend = backend
end
def all
@backend.all
end
end
Post.new(DB::Post).new
Post.new(Service::Post).new
# lr> This is great because we have a subbable backend we can inject
# lr> *and we actually need to use one or the other*. It's the difference
# lr> between "we need to ways of finding Posts" and "we MIGHT need two ways
# lr> of finding posts". Not only that, but the obtrusiveness of DI is actually a
# lr> benefit: the code is screaming, "HEY! This is specially configured to support
# lr> multiple strategies!"
#
# In one hundred percent agreement here. I only contend that this way of
# being explicit about your dependencies is almost always worth the
# cost, IMNSHO.
## Service Locator Pattern
class Configuration
@settings = {}
def self.[](key)
@settings[key]
end
def self.[]=(key, value)
@settings[key] = value
end
end
Configuration.settings[:http] = proc {
http = Net::HTTP.new
http.read_timeout = 99999
http
}
class SomeClass
def query_google
Configuration[:http].get 'http://google.com'
end
end
# lr> Which is basically the same thing as DI but less obtrusive and more commonly seen.
# Sure, but this forces you to only have one value for the
# configuration. There are some ways around it, as pointed out below,
# but it means having to deal with spooky action at a distance.
# lr> This will also help when we want to use 3rd-party libraries that aren't expecting
# lr> you to dependency inject everything.
# How so? Either they use DI or a service locator, it shouldn't have
# anything to do with how your code handles dependencies, no?
# lr> This pattern is so popular you'll often see this abstracted in many gems:
# lr> http://robots.thoughtbot.com/post/344833329/mygem-configure-block
# Agreed, it works for things that are truely one time settings. If it's
# really a configuration option for the system as a whole, it's a good
# pattern. But it certainly hampers the reusability of the code if you
# need more than one setting. See comments below.
## Service Locator vs DI
#
# The Problem With Service Locator
#
# lr> Configuration is a mutable global state, so if we want this value dependent
# lr> on the request, we'll have clashes.
#
# Yes. This not a minor drawback.
#
# lr> Sidekiq (think multithreaded Resque) has this problem. It starts a bunch of
# lr> workers in the same process. Each worker should be able to write to a common log,
# lr> but it would be useful to include some metadata about the worker
# lr> and about the job it's working on, so we can get something like this:
"worker 1 on job foo : Hello World!"
# lr> It'd be pretty heavy-handed to pass around a custom logger to every class that would
# lr> like to call Sidekiq.logger.info. This is how it's done (altered for brevity):
module Sidekiq
module Logging
class Pretty < Logger::Formatter
def call(message)
"#{Thread.current[:sidekiq_context]} : #{message}"
end
end
def self.with_context(msg)
begin
Thread.current[:sidekiq_context] = msg
yield
ensure
Thread.current[:sidekiq_context] = nil
end
end
end
end
Sidekiq::Logging.with_context("#{worker.class.to_s} JobID #{item['jid']}") do
Sidekiq.logger "Hello world!"
end
# This is not something I think we should be doing. The problem is
# that the `with_context` call is not guarenteed to be located lexically
# close to the point where you call the logger. First of all, you have
# no clue that the context call is actually modifying #logger without
# a priori knowledge of it's implementation. Secondly, whenever you see a
# call to logger, to know what it's doing you actually have to trace the
# call stack all the way back to the entry point of the program to find
# out where the configuration was last set. Moreover, this is impossible
# to do this statically.
# lr> with_context sets a thread local, a value only accessible to the current thread.
# lr> we no longer have global mutable state, but we do have configuration we can reference
# lr> anywhere... So now we have a Service Locator that *does not have
# lr> mutable global state*. # We have created a request local.
# The variable most certainly is still global, we've just created a way
# to give it dynamic extent. Most modern languages implement
# lexical scoping over dynamic scoping b/c it's very easy to shoot
# yourself in the foot with the latter.
# lr> DI is often useful. But I don't usually find that the problem with a piece
# lr> of legacy code is that it's well-written and understandable but tightly coupled.
# lr> I do often find hard-to-navigate levels of indirection and abstraction (often
# lr> without even bringing DI into the mix). That's why my experience has me value
# lr> the KISS principle over dogmatic OOP. While both have their problems, I personally
# lr> have less with the format than the latter, which is why I value it more.
# I agree that you can go overboard with countless levels of
# indirection, but in my experiance the more common problem is using the
# wrong/non-sensible abstractions. DI encourages thinking about the
# contracts between collaborating objects, which I think more often than
# not encourages better factored code.
@ljsc - I'll try to address this more in-depth, but I think
http://therealadam.com/2013/01/03/design-for-test-vs-design-for-api
sums it my thoughts. I don't care too much about his test-vs-api point, but I do strongly agree with his examples that DI adds complexity (side point: we have different definitions of 'indirection', this illustrates what mine is) and is therefore often not worth it.
I expect you disagree with the same example, which is fine as this is pretty subjective. How strongly you agree/disagree with his point likely has a high correlation with each point above.
Sure, I think that's right. I guess it's all what you optimize for, and my preference is to aim for composability wherever reasonably possible. But I think that post offers up some nice points for enriching the discussion. Namely, what do we think of the descriptions the author offers of his own API examples? Personally I think he botched it, and here's why:
- This one is definitely arguable, but I think that he's digging too much into implementation details. Here:
Publishes the current post. The created_at timestamp is set to the current time. Returns the created_at timestamp.
should setting the created_at
timestamp be documented here? I'm skeptical. If our goal is to document the api, we shouldn't be describing the guts of the method itself. What happens when we change the implementation? This mindset leads to the abstraction hole-poking I was describing above.
2. Failure to identify and document types (abstractions):
Publishes the current post. By default, created_at is set to the current time. Optionally, callers may pass in a Time object, or any object that returns a Time object when sent the now message. The created_at column is set according to that Time value. Returns the value of the created_at timestamp.
This is the wrong place to be describing the interface for the parameter. That's why you have types. We should just say it takes a Timeish
and be done with it. Elsewhere in the documentation, you say what a Timeish
does. This might seem like splitting hairs, but I think it's actually a really important distinction.
It speaks to I think the problem with Ruby's dynamic nature---yes it's convenient and can help get things done, but it also tends to lure programmers into the trap that there are no types at all. Even though there aren't any annotations, the types are still there, and not thinking about them leads to ignoring the contracts, and thus abstraction Swiss cheese.
This is normally the point where somebody says, hey duck-types to the rescue. This never really prevents things from running off the tracks though, b/c rather than documenting unambiguously somewhere in the system that a duck #quacks
, and it #swims
, we instead just throw the duck in the water and see if it drowns. See:
There’s some further potential trouble lurking in this API. What if a caller passes in the wrong kind of Time object? What if sending the now message raises an exception?
When I'm looking at any level of abstraction, I want to know what the inputs are to the thing, what the outputs are, and most importantly the invariants. I suggest that those are the most important items for being able to efficiently reason about code. Having DI makes us think about our dependencies, where they're used, and what we expect from them, explicitly.
Gregory Brown gives a really good look into some of these issues in Practicing Ruby article. It's a really good read, so I'll finish by offering that up and then step back and let some others offer thoughts.
@AndrewO - I had a response I accidentally deleted, because the iPhone has taught me to immediately click OK whenever I get any sort of confirmation pop-up. In any case:
The Kitten example is true, but I can also have conflicting middleware that both set the same key on env, or conflicting gem modules that both add the same method name. In any case, gems like Rails, Sunspot, Authlogic, Paper Trail, Sidekiq, etc are all using this same pattern. It's not stopping them from being widely used.
For the Carrierwave example, I feel like we're talking about different things. Yes, I can add middleware/initialization, use the env hash. And also add some fake env setup/handling for when we're not running through Rack (console, cron job, etc). My point was: I just want to use Carrierwave, not reconstruct my app and learn the innards of Carrierwave.
I believe you note this when referring to how 'model' comes out of nowhere. Whether we like Carrierwave's implementation seems irrelevant to me. It's an example of a popular gem we may want to use, and Joe Developer now needs to open up the source code and figure out how to rejigger the system, instead of just referring to Configuration and moving on.
I had a side-discussion with Lou, which I think is the best, most concise summary of each viewpoint. Please see https://gist.github.com/richmolj/5815196, I think it really distills the debate to its core concepts.