Skip to content

Instantly share code, notes, and snippets.

@francisco-rojas
Last active August 29, 2015 14:11
Show Gist options
  • Save francisco-rojas/113d27c50d8d7e9d3f3f to your computer and use it in GitHub Desktop.
Save francisco-rojas/113d27c50d8d7e9d3f3f to your computer and use it in GitHub Desktop.
Ruby Best Practices: Book Notes

Chapter 1: Driving Code Through Tests

retry and redo

redo and retry are both used to re-execute parts of a loop. But they differ in how much they re-execute: redo only repeats the current iteration, while retry repeats the whole loop from the start.

redo example:

(0..5).each do |i|
  puts "Value: #{i}"
  redo if i > 2
end

Value: 0
Value: 1
Value: 2
Value: 3
Value: 3
Value: 3
# ... this is an infinite loop

retry example:

(0..5).each do |i|
  puts "Value: #{i}"
  retry if i > 2
end

Value: 0
Value: 1
Value: 2
Value: 3
Value: 0
Value: 1
Value: 2
# ... this is an infinite loop, too

If retry appears in rescue clause of begin expression, restart from the beginning of the begin body.

def publish_to_api(data={})
  tries ||= 3
  DataLibrary.publish(data)
rescue DataLibraryFailureException => e
  retry unless (tries -= 1).zero?
else
  logger.info "success!"
end

###Different ways of calling procs:

sum_ints = lambda do |a,b|
  s = 0 ; a.upto(b){|n| s += n } ; s
end

>> sum_ints.call(1,5)
=> 15
>> sum_ints.(1,5)
=> 15
>> sum_ints[1,5]
=> 15

###Ruby's underscore variable

See: http://po-ru.com/diary/rubys-magic-underscore/

Given a hash like this:

people = {
  "Alice" => ["green", "[email protected]"],
  "Bob"   => ["brown", "[email protected]"]
}

we want to ignore eye colour and age and convert each element to a simple array containing name and email address.

Solution 1 (not very what fields.last is):
people.map { |name, fields| [name, fields.last] }

Solution 2 (an extra, unused, variable is distracting):
people.map { |name, (eye_color, email)| [name, email] }

Solution 3 (it’s conventional to use an underscore to represent an unused variable.):
people.map { |name, (_, email)| [name, email] }

if you had:

people = {
  "Alice" => ["green", 34, "[email protected]"],
  "Bob"   => ["brown", 27, "[email protected]"]
}

you could reuse the underscore library (but not so with other variables which would raise a 'duplicated argument name' error):

people.map { |name, (_, _, email)| [name, email] }

###Ruby's double bang !! If you negate something, that forces a boolean context. Of course, it also negates it. If you double-negate it, it forces the boolean context, but returns the proper boolean value.

"hello"   #-> this is a string; it is not in a boolean context
!"hello"  #-> this is a string that is forced into a boolean
          #   context (true), and then negated (false)
!!"hello" #-> this is a string that is forced into a boolean
          #   context (true), and then negated (false), and then
          #   negated again (true)
!!nil     #-> this is a false-y value that is forced into a boolean
          #   context (false), and then negated (true), and then
          #   negated again (false)

###Tests It is important to remember that testing is meant to make your code better and more maintainable, not to lead you into confusion or make you feel like you’re stuck doing busywork instead of doing real coding.

Also remember that if your solution seems difficult to test, it may be a sign that your design is not flexible enough to easily be refactored or interacted with. Writting your tests first and see them fail not only ensures that you are testing what you are actually trying to test but also allows you to write better code and do clean up before the code grows too large.

Try to remember that partial coverage is usually much better than no coverage at all.

If you are working on a very small program or library, and you want to be able to run your tests while in development, but then require the code as part of another program later, there is a simple idiom that is useful for embedding your tests:

class Foo
  ...
end

if __FILE__ == $PROGRAM_NAME
  require "test/unit"
  class TestFoo < Test::Unit::TestCase
    #...
  end
end

Simply wrapping your tests in this if statement will allow running ruby foo.rb to execute your tests, while require "foo" will still work as expected without running the tests.

This can be useful for sharing small programs with others, or for writing some tests while developing a small prototype of a larger application. However, once you start to produce more than a few test cases, be sure to break things back out into their normal directory structure.

#####Test Helpers Require statements and basic helper functions can be repetitive across your test files.

A good solution to keep things clean is to create a test/test_helpers.rb file and then do all of your global configuration there. In your individual tests, you can require this file by expanding the direct path to it, using the following idiom:

require File.dirname(__FILE__) + '/test_helpers'

This allows your test files to be run individually from any directory, not just the top-level directory. An example of a test helper file is the one in Prawn:

require "rubygems"
require "test/unit"
$LOAD_PATH << File.join(File.dirname(__FILE__), '..', 'lib')
require "prawn"
gem 'pdf-reader', ">=0.7.3"
require "pdf/reader"

def create_pdf
  @pdf = Prawn::Document.new(left_margin: 0, right_margin: 0,
                             top_margin: 0, bottom_margin: 0)
end

def observer(klass)
  @output = @pdf.render
  obs = klass.new
  PDF::Reader.string(@output,obs)
  obs
end

def parse_pdf_object(obj)
  PDF::Reader::Parser.new(
      PDF::Reader::Buffer.new(
          StringIO.new(obj)), nil).parse_token
end

puts "Prawn tests: Running on Ruby Version: #{RUBY_VERSION}"

If you want a little more of a clean approach, you can wrap your helpers in a module, but depending on what you’re doing, just defining them at the top level might be fine as well.

#####Custom Assertions We want to transform a basic statement that looks like this:

assert bob.current_zone.eql?(Zone.new("4"))

into:

assert_in_zone("4", bob)

Here’s how you would define assert_in_zone and its complement, assert_not_in_zone :

def assert_in_zone(expected, person)
  assert_block("Expected #{person.inspect} to be in Zone #{expected}") do
    person.current_zone.eql?(Zone.new(expected))
  end
end

def assert_not_in_zone(expected_zone, person)
  assert_block("Expected #{person.inspect} not to be in Zone #{expected}") do
    !person.current_zone.eql?(Zone.new(expected))
  end
end

Chapter 2: Designing Beautiful APIs

This code attempts to be polite by removing the :file or :string option from the options hash before delegating to the relevant constructor. This It is a good practice to remove options from the options hash before delegating to another method to avoid providing the underlying methods with options that it may not handle properly. For example:

def Table(*args,&block)
  table = case args[0]
  ...
    if file = args[0].delete(:file)
      Ruport::Data::Table.load(file,args[0],&block)
    elsif string = args[0].delete(:string)
      Ruport::Data::Table.parse(string,args[0],&block)
    else
      Ruport::Data::Table.new(args[0],&block)
    end
  ...
  return table
end

#####Flexible arguments When we have more than one optional argument the trouble arises when we want to use the default value for the first argument and override the second one. It's simply not possible. When you do find yourself needing this sort of thing, there are most likely better options available, such as an options hash.

def load_file2(name="foo.jpg",mode="rb")
  File.open(name,mode)
end

load_file2 "foo.jpg", "r"

With a basic understanding of the underlying mechanics, you can begin to see the benefits of this style of API. Perhaps the most significant is that the order in which you specify the arguments doesn’t matter at all.

If we combine that feature with a basic idiom for setting default values passed in the hash, we come up with

def story2(options={})
  options = { person: "Yankee Doodle", animal: "Tiger" }.merge(options)
  "#{options[:person]} went to town, riding on a #{options[:animal]}"
end

story2
=> "Yankee Doodle went to town, riding on a Tiger"
story2(person: "Joe Frasier")
=> "Joe Frasier went to town, riding on a Tiger"
story2(animal: "Kitteh")
=> "Yankee Doodle went to town, riding on a Kitteh"
story2(animal: "Kitteh", person: "Joe Frasier")
=> "Joe Frasier went to town, riding on a Kitteh"

However, if one or more of your arguments are really mandatory, it’s worth it to break them out, like so:

def write_story_to_file(file,options={})
  File.open(file,"w") { |f| f << story2(options) }
end

write_story_to_file "output.txt"
write_story_to_file "output.txt", animal: "Kitteh"
write_story_to_file "output.txt", person: "Joe Frasier"
write_story_to_file "output.txt", animal: "Slug", person: "Joe Frasier"

Though you could write code to ensure that certain options are present in a hash, generally it is most natural to just let Ruby do the hard work for you by placing your mandatory arguments before your options hash in your method definition.

#####Treating Arguments as an Array When receiving arguments through an array you might need to perform some params validation. You might use something like this before passing the arguements to another method.

def distance4(*points)
  x1,y1,x2,y2 = points.flatten
  raise ArgumentError unless [x1,y1,x2,y2].all? { |e| Numeric === e }
  Math.hypot(x2 - x1, y2 - y1)
end

The following short list of guidelines will help you in designing your methods:

  • Try to keep the number of ordinal arguments in your methods to a minimum.
  • If your method has multiple parameters with default values, consider using pseudo- keyword arguments via an options hash.
  • Use the array splat operator ( * ) when you want to slurp up your arguments and pass them to another method.
  • The *args idiom is also useful for supporting multiple simultaneous argument processing styles, as in Table() , but can lead to complicated code.
  • Don’t use *args when a normal combination of mandatory ordinal arguments and an options hash will do.
  • If some parameters are mandatory, avoid putting them in an options hash, and instead write a signature like foo(mandatory1, mandatory2, options={}) , unless there is a good reason not to.

#####Blocks for Interface Simplification Does it feel like the word “server” is written too many times in this code?

server = Server.new
server.handle(/hello/i) { "Hello from server at #{Time.now}" }
server.handle(/goodbye/i) { "Goodbye from server at #{Time.now}" }
server.handle(/name is (\w+)/) { |m| "Nice to meet you #{m[1]}!" }
server.run

it would be nice to be able to write this instead:

Server.run do
  handle(/hello/i) { "Hello from server at #{Time.now}" }
  handle(/goodbye/i) { "Goodbye from server at #{Time.now}" }
  handle(/name is (\w+)/) { |m| "Nice to meet you #{m[1]}!" }
end

Keep the following things in mind when using blocks as part of your interface:

  • If you create a collection class that you need to traverse, build on top of Enumerable rather than reinventing the wheel.
  • If you have shared code that differs only in the middle, create a helper method that yields a block in between the pre/postprocessing code to avoid duplication of effort.
  • If you use the &block syntax, you can capture the code block provided to a method inside a variable. You can then store this and use it later, which is very useful for creating dynamic callbacks.
  • Using a combination of &block and instance_eval , you can execute blocks within the context of arbitrary objects, which opens up a lot of doors for highly customized interfaces.
  • The return value of yield (and block.call ) is the same as the return value of the provided block.

#####Understand What method? and method! Mean '?' Allows to query an object about things and make use of the response in conditionals. The return value is always some sort of logical boolean (the !! hack can be useful here to coerse values to their boolean representation).

'!' A common misconception is that we use the exclamation point when we want to let people know we are modifying the receiving object. Truthfully, the purpose of this convention is to mark a method as special. It doesn’t necessarily mean that it will be destructive or dangerous, but it means that it will require more attention than its alternative. This is why it doesn’t make much sense to have some method foo!() without a corresponding foo() method that does something similar. So essentially, if you have only one way of doing something destructive, write this:

class Message
  def destroy
    #...
  end
end

Instead of this:

class Message
  def destroy!
    #...
  end
end

#####Make Use of Custom Operators In Ruby most operators are actually just syntactic sugar for ordinary methods.

>> 1.+(3)
=> 4
>> [1,2,3].<<(4)
=> [1, 2, 3, 4]

Here is an example:

class Inbox
  attr_reader :unread_count
  def initialize
    @messages = []
    @unread_count = 0
  end
  
  def <<(msg)
    @unread_count += 1
    @messages << msg
    return self
  end
end

i = Inbox.new
=> #<Inbox:0x603290 @messages=[], @unread_count=0>
i << "foo" << "bar" << "baz"
=> #<Inbox:0x603290 @messages=["foo", "bar", "baz"], @unread_count=3>
i.unread_count
=> 3

A good habit to get into is to have your << method return the object itself, so the calls can be chained, as just shown.

Another good operator to know about is the spaceship operator ( <=> ), mainly because it allows you to make use of Comparable , which gives you a host of comparison methods: < , <= , == , != , >= , > , and between?() . The spaceship operator should return -1 if the current object is less than the object it is being compared to, 0 if it is equal, and 1 if it is greater. Most of Ruby’s core objects that can be meaningfully compared already have <=> implemented, so it’s often simply a matter of delegating to them, as shown here:

class Tree
  include Comparable

  attr_reader :age
  def initialize(age)
    @age = age
  end

  def <=>(other_tree)
    age <=> other_tree.age
  end
end

a = Tree.new(2)
=> #<Tree:0x5c9ba8 @age=2>
b = Tree.new(3)
=> #<Tree:0x5c7fb0 @age=3>
c = Tree.new(3)
=> #<Tree:0x5c63b8 @age=3>

a < b
=> true
b == c
=> true
c > a
=> true
c != a
=> true

You can, of course, override some of the individual operators that Comparable provides, but its defaults are often exactly what you need. Most operators you use in Ruby can be customized within your objects. Whenever you find yourself writing append() when you really want << , or add() when you really want

  • , consider using your own custom operators.
  • Use attr_reader , attr_writer , and attr_accessor whenever possible, and avoid writing your own accessors unless it is necessary.
  • Consider ending methods that are designed to be used in conditional statements with a question mark.
  • If you have a method foo() , and a similar method that does nearly the same thing but requires the user to pay more attention to what’s going on, consider calling it foo!() .
  • Don’t bother creating a method foo!() if there is not already a method called foo() that does the same thing with less severe consequences.
  • If it makes sense to do so, define custom operators for your objects.

Chapter 3: Mastering the Dynamic Tool Kit

###Blank Slate A BlankSlate is an object without much of anything. A skinny class with a minimal number of methods is called a Blank Slate. As it turns out, Ruby has a ready-made Blank Slate for you to use called BasicObject. Inheriting from BasicObject is the quicker way to define a Blank Slate in Ruby.

BasicObject.instance_methods
=> [:==, :equal?, :!, :!=, :instance_eval, :instance_exec, :__send__]

These methods form the lowest common denominator for Ruby, so BasicObject is pretty reasonable in its offerings. The key thing to remember is that a BasicObject is fully defined by this limited set of features, so you shouldn’t expect anything more than that.

###Building Flexible Interfaces (DSL) A flexible domain-specific interface strips away as much boilerplate code as possible so that every line expresses something meaningful in the context of our domain.

#####Making instance_eval() Optional Code like this:

pdf = Prawn::Document.new
pdf.text "Hello World"
pdf.render_file "hello.pdf"

can be turned into code like this:

Prawn::Document.generate("hello.pdf") do
  text "Hello World"
end

by doing this:

class Prawn::Document
  def self.generate(file, *args, &block)
    pdf = Prawn::Document.new(*args)
    pdf.instance_eval(&block)
    pdf.render_file(file)
  end
end

However, there is a limitation that comes with this sort of interface. Because we are evaluating the block in the context of a Document instance, we do not have access to anything but the local variables of our enclosing scope (which is the Prawn::Document::generate method). This means the following code won’t work:

class MyBestFriend
  def initialize
    @first_name = "Paul"
    @last_name = "Mouzas"
  end
  
  def full_name
    "#{@first_name} #{@last_name}"
  end
  
  def generate_pdf
    Prawn::Document.generate("friend.pdf") do
      text "My best friend is #{full_name}"
    end
  end
end

The problem is that blocks are generally closures. And you expect them to actually be full closures. And it's not obvious from the point where you write the block that that block might not be a full closure. That's what happens when you use instance_eval: you reset the self of that block into something else - this means that the block is still a closure over all local variables outside the block, but NOT for method calls. See:

The solution to this problem is the following:

class Prawn::Document
  def self.generate(file, *args, &block)
    pdf = Prawn::Document.new(*args)
    block.arity < 1 ? pdf.instance_eval(&block) : block.call(pdf)
    pdf.render_file(file)
  end
end

the code is an ordinary closure, and as such, can access the instance methods and variables of the enclosing scope. The call would now look like this:

class MyOtherBestFriend
  def initialize
    @first_name = "Pete"
    @last_name = "Johansen"
  end
  
  def full_name
    "#{@first_name} #{@last_name}"
  end
  
  def generate_pdf
    Prawn::Document.generate("friend.pdf") do |doc|
      doc.text "My best friend is #{full_name}"
    end
  end
end

Another, arguably less clean, solution would be to delegate to the original context method calls to which the new context (instance obj on which instance_eval is executed) doesn't respond, like this:

# From: http://www.dan-manges.com/blog/ruby-dsls-instance-eval-with-delegation
class TableDefinition
  def evaluate(&block)
    @self_before_instance_eval = eval "self", block.binding
    instance_eval &block
  end
  
  def method_missing(method, *args, &block)
    @self_before_instance_eval.send method, *args, &block
  end
end

#####Handling Messages with method_missing() and send() When you have a set of methods with a commom name pattern like the following:

def stroke_some_method(*args)
  some_method(*args)
  stroke
end

which simply end up delegating to other methods called "some_method" and "stroke" you can dynamically intercept methods calls in order to avoid creating so many similar methods by using method_missing() and send(). Here is how you can do it:

# Provides the following shortcuts:
#
#    stroke_some_method(*args) #=> some_method(*args); stroke
#    fill_some_method(*args) #=> some_method(*args); fill
#    fill_and_stroke_some_method(*args) #=> some_method(*args); fill_and_stroke
#
# See; https://gist.github.com/francisco-rojas/453961a0384869e6cee8
# for some basics on regex in ruby
#
def method_missing(id,*args,&block)
  case(id.to_s)
  when /^fill_and_stroke_(.*)/
    send($1,*args,&block); fill_and_stroke
  when /^stroke_(.*)/
    send($1,*args,&block); stroke
  when /^fill_(.*)/
    send($1,*args,&block); fill
  else
    super
  end
end

It’s important to note that when the patterns do not match, super is called. This allows objects up the chain to do their own method_missing handling, including the default, which raises a NoMethodError . This prevents something like pdf.the_shiny_kitty from failing silently, as well as the more subtle pdf.fill_circle .

Also, this code will happily accept pdf.fill_and_stroke_start_new_page or even pdf.stroke_stroke_stroke_line without complaining. Any time you use the method_missing hook, these are the trade-offs you must be willing to accept since making your hooks too robust would otherwise defeat the purpose of this.

This approach can also be used together with define_method() to make things cleaner and more efficient. See the section on method missing from https://gist.github.com/francisco-rojas/5bcbea8d1ad52a4ff451#file-chapter3-md for more details.

#####Dual-Purpose Accessors When working with code that has an instance_eval-based interface you need to disambiguate between local variables and method calls which can ruin your style.

Prawn::Document.generate("accessors.txt") do
  self.font_size = 10
  text "The font size is now #{font_size}"
end

It’s possible to make this look much nicer, as you can see:

Prawn::Document.generate("accessors.txt") do
  font_size 10
  text "The font size is now #{font_size}"
end

We can use Ruby’s default argument syntax to determine whether we’re supposed to be getting or setting the attribute:

class Prawn::Document
  def font_size(size = nil)
    return @font_size unless size
    @font_size = size
  end
  alias_method :font_size=, :font_size
end

use alias_method here instead of attr_writer to ensure there won’t be any difference between the following two lines of code:

pdf.font_size = 16
pdf.font_size(16)

Summary of tips given so far to build Domain-Specific Interfaces:

  • As mentioned in the previous chapter, using instance_eval is a good base for writ- ing a domain-specific interface, but has some limitations.
  • You can use a Proc#arity check to provide the user with a choice between instance_eval and yielding an object.
  • If you want to provide shortcuts for certain sequences of method calls, or dynamic generation of methods, you can use method_missing along with send() .
  • When using method_missing , be sure to use super() to pass unhandled calls up the chain so they can be handled properly by other code, or eventually raise a NoMethodError .
  • Normal attribute writers don’t work well in instance_eval -based interfaces. Offer a dual-purpose reader/writer method, and then alias a writer to it, and both external and internal calls will be clear.

###Implementing Per-Object Behavior (building a simple stubbing system for use in testing) Class methods are actually just per- object behavior on an instance of the class Class. The goal is to create a system that will generate canned responses to certain method calls, without modifying their original classes. This is an important feature, because we don’t want our stubbed method calls to have a global effect during testing.

The target interface will be something like this:

user = User.new
Stubber.stubs(:logged_in?, :for => user, :returns => true)
user.logged_in? #=> true

Each object hides its individual space for method definitions (called a singleton class) from plain view. However, we can reveal it by using a special syntax:

>> singleton = class << user; self; end
=> #<Class:#<User:0x40ed90>>

it's the same sintax used to define class methods:

class A
  class << self
    def foo
      "hi"
    end
    
    def bar
      "bar"
    end
  end
end

So when write "class << user; self; end" , you’re just asking the object to give back its singleton class. With that in hand, we can define methods on it.

>> singleton = class << user; self; end
=> #<Class:#<User:0x40ed90>>
>> singleton.send(:define_method, :logged_in?) { true }
=> #<Proc:0x3fc1f4@(irb):20 (lambda)>
>> user.logged_in?
=> true
>> User.new.logged_in?
NoMethodError: undefined method 'logged_in?' for #<User:0x3f62f4>
from (irb):22
from /Users/sandal/lib/ruby19_1/bin/irb:12:in '<main>'

So the implementation of the Stub system would be:

module Stubber
  extend self
  def stubs(method, options={})
    singleton(options[:for]).send(:define_method, method) do |*a|
    options[:returns]
    end
  end
  
  def singleton(obj)
    class << obj; self; end
  end
end

>> user = User.new
=> #<User:0x445bec>
>> Stubber.stubs(:logged_in?, :for => user, :returns => true)
=> #<Proc:0x43faa8@(irb):28 (lambda)>
>> user.logged_in?
=> true
>> User.new.logged_in?
NoMethodError: undefined method 'logged_in?' for #<User:0x439fe0>
from (irb):40
from /Users/sandal/lib/ruby19_1/bin/irb:12:in '<main>'

Remember that the block passed to define_method() is a closure, which allows to access the local variables of the enclosing scope. This is why we can pass the return value as a parameter to Stubber.stubs() and have it returned from our dynamically defined method.

  • Using per-object behavior usually makes the most sense when you don’t want to define something at the per-class level.
  • Objects in Ruby may have individually customized behaviors that can replace, supplement, or amend the functionality provided by their class definitions.
  • Per-object behavior (known as singleton methods), can be implemented by gaining access to the singleton class of an object using the class << obj notation.
  • define_method is made private on singleton classes, so send() is needed to utilize it.
  • When implementing nondynamic per-object behavior, the familiar def obj.some_method syntax may be used.

#####Extending and Modifying Preexisting Code ###Adding new functionality Although, adding functionality to a class definition is usually considered safer than overriding functionality it is not without dangers. If you can extend predefined objects for your own needs, so can everyone else, including any of the libraries you may depend on. One common problem when adding functionality are "name clashes" where whatever code is loaded last takes precedence. Whenever you are reopening a class to extend it, it is a good practice to always check first for the existence of methods with the same name as the methods you want to define and throw an error if such methods are found. For example:

class Numeric
  [:in, :ft].each do |e|
    if instance_methods.include?(e)
      raise "Method '#{e}' exists, PDF Conversions will not override!"
    end
  end
  
  def in
    self * 72
  end
  
  def ft
    self.in * 12
  end
end

this code will define the methods as expected if there are no existing methods with the same name already, otherwise it throws and explicit error so we know what is happening instead of just overriding the methods silently which could cause unintended behaviour.

The ideal situation is for both libraries to use this technique, because then, regardless of the order in which they are required, the incompatibility between dependencies will be quickly spotted.

###Modification via Aliasing You can use alias_method for the purpose of making a new name point at an old method. This of course is where the feature gets its name: allowing you to create aliases for your methods. But another interesting aspect of alias_method is that it doesn’t simply create a new name for a method it makes a copy of it. The best way to show what this means is through a trivial code example:

# define a method
class Foo
  def bar
    "baz"
  end
end

f = Foo.new
f.bar #=> "baz"

# Set up an alias
class Foo
  alias_method :kittens, :bar
end

f.kittens #=> "baz"

# redefine the original method
class Foo
  def bar
    "Dog"
  end
end

f.bar
f.kittens
#=> "Dog"
#=> "baz"

As you can see here, even when we override the original method bar() , the alias kittens() still points at the original definition. This turns out to be a tremendously useful feature.

This is how RubyGems patches the kernel#require method using aliasing:

# custom_require.rb:
module Kernel
  alias_method :gem_original_require, :require
  def require(path) # :doc:
    gem_original_require path
    rescue LoadError => load_error
      if load_error.message =~ /#{Regexp.escape path}\z/ and
        spec = Gem.searcher.find(path) then
        Gem.activate(spec.name, "= #{spec.version}")
        gem_original_require path
    else
      raise load_error
    end
  end
end

This is a great example of responsible modification to a preexisting method. This code does not change the signature of the original method, nor does it change the possible return values or failure states. All it does is add some new intermediate functionality that will be transparent to the user if it is not needed. This code only gets executed if required and when it gets executed but fails it raises the original error that the method being overriden would raise.

It also has a bit of a limitation, in that you need to keep coming up with new aliases, as aliases are subject to collision just the same as ordinary methods are.

For example, although this code works fine:

class A
  def count
    "one"
  end
  alias_method :one, :count
  
  def count
    "#{one} two"
  end
  alias_method :one_and_two, :count
  
  def count
    "#{one_and_two} three"
  end
end
A.new.count #=> "one two three"  

if we rewrote it this way, we’d blow the stack:

class A
  def count
    "one"
  end
  alias_method :old_count, :count
  
  def count
    "#{old_count} two"
  end
  alias_method :old_count, :count
  
  def count
    "#{old_count} three"
  end
end

You can introduce infinite recursion by aliasing an old method twice to the same name. However there is a work around this issue, per object modification. If we move our modifications from the per-class level to the per-object level, we end up with a pretty nice solution that gets rid of aliasing entirely, and simply leverages Ruby’s ordinary method resolution path. Here is how:

class A
  def count
    "one"
  end
end

module AppendTwo
  def count
    "#{super} two"
  end
end

module AppendThree
  def count
    "#{super} three"
  end
end

a = A.new
a.extend(AppendTwo)
a.extend(AppendThree)
a.count #=> "one two three"

Provided that all the code used by your application employs this approach instead of aliased method chaining, you end up with two main benefits: a pristine original class and no possibility for collisions. Because the amended functionality is included at the instance level, rather than in the class definition, you don’t risk breaking other people’s code as easily, either.

Note that not every single object can be meaningfully extended this way. Any objects that do not allow you to access their singleton space cannot take advantage of this technique. This mostly applies to things that are immediate values, such as numbers and symbols. But more generally, if you cannot use a call to new() to construct your object, chances are that you won’t be able to use these tricks. In those cases, you’d need to revert to aliasing.

Another example:

module PDFWriterMemoryPatch #:nodoc:
  unless self.class.instance_methods.include?("_post_transaction_rewind")
    def _post_transaction_rewind
      @objects.each { |e| e.instance_variable_set(:@parent,self) }
    end
  end
end

class Ruport::Formatter::PDF
  # other implementation details omitted.
  def pdf_writer
    @pdf_writer ||= PDF::Writer.new(:paper => paper_size || "LETTER", :orientation => paper_orientation || :portrait)
    @pdf_writer.extend(PDFWriterMemoryPatch)
  end
end
  • All classes in Ruby are open, which means that object definitions are never final- ized, and new behaviors can be added at runtime.
  • To avoid clashes, conditional statements utilizing reflective features such as instance_methods and friends can be used to check whether a method is already defined before overwriting it.
  • When intentionally modifying code, alias_method can be used to make a copy of the original method to fall back on.
  • Whenever possible, per-object behavior is preferred. The extend() method comes in handy for this purpose.

###Building Classes and Modules Programmatically #####Parameterized subclassing and conditional inheritance.

def Mystery(secret)
  if secret == "chunky bacon"
    Class.new do
      def message
        "You rule!"
      end
    end
  else
    Class.new do
      def message
        "Don't make me cry"
      end
    end
  end
end

Notice here that we call Class.new() with a block that serves as its class definition. New anonymous classes are generated on every call.

class Win < Mystery "chunky bacon"
  def who_am_i
    "I am win!"
  end
end

class EpicFail < Mystery "smooth ham"
  def who_am_i
    "I am teh fail"
  end
end

a = Win.new
a.message #=> "You rule!"
a.who_am_i #=> "I am win!"

b = EpicFail.new
b.message #=> "Don't make me cry"
b.who_am_i #=> "I am teh fail"

We can see that Mystery() conditionally chooses which class to inherit from. Furthermore, the classes generated by Mystery() are anonymous, meaning they don’t have some constant identifier out there somewhere, and that the method is actually generating class objects, not just returning references to preexisting definitions. Finally, we can see that the subclasses behave ordinarily, in the sense that you can add custom functionality to them as needed.

Here is another example from the Ruport gem:

class MyReport < Fatty::Formatter
required_params :first_name, :last_name

  helpers do
    def full_name
      "#{params[:first_name]} #{params[:last_name]}"
    end
  end

  format :txt do
    def render
      "Hello #{full_name} from plain text"
    end
  end

  format :pdf, :base => Prawn::FattyFormat do
    def render
      doc.text "Hello #{full_name} from PDF"
      doc.render
    end
  end
end

This code works because of a couple of methods that take charge of Dynamic Subclassing and Anonymous class/method creation, here are these methods:

# this class method is called with just a block, it generates an anonymous subclass
# of Fatty::Format , and then stores this subclass keyed by extension name in the
# formats hash.Additionally the options[:base] allows to inherit from another class
# that is not Fatty::Format
def format(name, options={}, &block)
  formats[name] = Class.new(options[:base] || Fatty::Format, &block)
end

# modules can also be built up anonymously using a block
# here you can either pass a block to define the anonymous module or
# pass the name of the module you'd like to mixin
def helpers(helper_module=nil, &block)
  @helpers = helper_module || Module.new(&block)
end

def render(format, params={})
  validate(format, params)
  # This line uses the formats hash to look up our anonymous class by extension name.
  format_obj = formats[format].new
  # This line mixes in our helper module
  format_obj.extend(@helpers) if @helpers
  format_obj.params = params
  format_obj.validate
  format_obj.render
end

In summary:

  • Classes and modules can be instantiated like any other object. Both constructors accept a block that can be used to define methods as needed.
  • To construct an anonymous subclass, call Class.new(MySuperClass).
  • Parameterized subclassing can be used to add logic to the subclassing process, and essentially involves a method returning a class object, either anonymous or explicitly defined.

###Detecting Newly Added Functionality You can detect when new methods are added to a class with the method_added() hook.

class Object
  class << self
    alias_method :blank_slate_method_added, :method_added
    # Detect method additions to Object and remove them in the
    # BlankSlate class.
    def method_added(name)
      # save the result of the original method_added method
      result = blank_slate_method_added(name)
      return result if self != Object
      # hide the method if the method is being added to Object
      BlankSlate.hide(name)
      # return the result of the original call
      result
    end
  end
end

You’d think that would do the trick, but as it turns out, Object includes the module Kernel . This means we need to track changes over there too, using nearly the same approach:

module Kernel
  class << self
    alias_method :blank_slate_method_added, :method_added
    
    # Detect method additions to Kernel and remove them in the
    # BlankSlate class.
    def method_added(name)
      result = blank_slate_method_added(name)
      return result if self != Kernel
      BlankSlate.hide(name)
      result
    end
  end
end

However, there isanother problem: inclusion of modules into Object at runtime. Every module included in an object is like a back door for future expansion. So we end up jumping up one level higher to take care of module inclusion dynamically:

class Module
  alias blankslate_original_append_features append_features
  
  def append_features(mod)
    result = blankslate_original_append_features(mod)
    return result if mod != Object
    instance_methods.each do |name|
      BlankSlate.hide(name)
    end
    result
  end
end

In the case where a module is mixed into Object , BlankSlate needs to wipe out the instance methods added to its own class definition. After this, it returns the result of the original append_features() call.

###Tracking Inheritance When you write unit tests via Test::Unit , you typically just subclass Test::Unit::Test Case , which figures out how to find your tests for you.

class SimpleTest < SimpleTestHarness
  def setup
    puts "Setting up @string"
    @string = "Foo"
  end

  def test_string_must_be_foo
    answer = (@string == "Foo" ? "yes" : "no")
    puts "@string == 'Foo': " << answer
  end
  
  def test_string_must_be_bar
    answer = (@string == "bar" ? "yes" : "no")
    puts "@string == 'bar': " << answer
  end
end
SimpleTestHarness.run

We must first identify each subclass as a test case, and store it in an array until SimpleTestHarness.run is called. Like Test::Unit and other common Ruby testing frameworks, we’ll wipe the slate clean by reinstantiating our tests for each test method, running a setup method if it exists. We will follow the Test::Unit convention and run only the methods whose names begin with test_.

class SimpleTestHarness
  class << self
    def inherited(base)
      tests << base
    end

    def tests
      @tests ||= []
    end

    def run
      tests.each do |t|
        t.instance_methods.grep(/^test_/).each do |m|
          test_case = t.new
          test_case.setup if test_case.respond_to?(:setup)
          test_case.send(m)
        end
      end
    end
  end
end

###Tracking Mixins This common ruby idiom automatically define instances and class methods in the class that includes the module

module MyFeatures
  module ClassMethods
    def say_hello
      "Hello"
    end

    def say_goodbye
      "Goodbye"
    end
  end

  def self.included(base)
    base.extend(ClassMethods)
  end

  def say_hello
    "Hello from #{self}!"
  end

  def say_goodbye
    "Goodbye from #{self}"
  end
end # MyFeatures

class A
  include MyFeatures
end
  • If you are making changes to any hooks at the top level, be sure to safely modify them via aliasing, so as not to globally break their behavior.
  • Hooks can be implemented on a particular class or module, and will catch every- thing below them.
  • Most hooks either capture a class, a module, or a name of a method and are exe- cuted after an event takes place. This means that it’s not really possible to intercept an event before it happens, but it is usually possible to undo one once it is.

Chapter 4: Text Processing and File Management

#####Ruby Conditional Assignment ||=

x = nil               #=>nil
x ||= "default"       #=>"default" : value of x will be replaced with "default", but only if x IS nil or false
x ||= "other"         #=>"default" : value of x is not replaced if it already IS NOT nil or false

&&=

x = nil               #=>nil   
x &&= "default"       #=>nil : value of x will be replaced with "default", but only if x IS NOT nil or false
x = "default"         #=>"default"
x &&= "Lorem Ipsum"   #=>"Lorem Ipsum"

#####Line-Based File Processing with State Tracking

The following two lines of code are equivalent:

# See: https://gist.github.com/francisco-rojas/453961a0384869e6cee8
name = line[/\bN\s+(\.?\w+)\s*;/, 1]
name = line =~ /\bN\s+(\.?\w+)\s*;/ && $1

When dealing with a structured document that can be processed by disc

Chapter 5: Functional Programming Techniques

#####Laziness Can Be a Virtue (A Look at lazy.rb) In essence, code is said to be evaluated lazily if it is executed only at the time it is actually needed, not at the time it was defined such as with proc objects in ruby.

A powerful library for lazy evaluation is lazy.rb (http://moonbase.rydia.net/software/lazy.rb/). It can be used to avoid having to build special accessors for our instance variables using the ||= technique when what we’re ultimately doing is setting default values for them, which is normally something we do in our constructor.

Here is an example:

require "lazy"
class Cell
  FONT_HEIGHT = 10
  FONT_WIDTH = 8
  
  def initialize(text)
    @text = text
    @width = promise { calculate_width }
    @height = promise { calculate_height }
  end

  attr_accessor :text, :width, :height

  def to_s
    "Cell(#{width}x#{height})"
  end

  private

  def calculate_height
    @text.lines.count * FONT_HEIGHT
  end

  def calculate_width
    @text.lines.map { |e| e.length }.max * FONT_WIDTH
  end
end

All that promise() does is return a proxy object that wraps a block of code that is designed to be executed later.Once you call any methods on this object, it passes them along to whatever your block evaluates to.

Here is an example of a basic lazy promise object implementation:

module NaiveLazy
  class Promise < BasicObject
  
    def initialize(&computation)
      @computation = computation
    end

    def __result__
      if @computation
        @result = @computation.call
        @computation = nil
      end

      @result
    end 

    def inspect
      if @computation
        "#<NaiveLazy::Promise computation=#{ @computation.inspect }>"
      else
        @result.inspect
      end
    end

    def respond_to?( message )
      message = message.to_sym
      [:__result__, :inspect].include?(message) || __result__.respond_to? message
    end

    def method_missing(*a, &b)
      __result__.__send__(*a, &b)
    end
  end
end

#####Minimizing Mutable State and Reducing Side Effects To write stateless side-effect-free code in Ruby we need to create a new object every single time an element gets added to an array. However, objects are large in Ruby, and constructing them is a slow process. What’s more, if we don’t store any of these intermediate values, we risk getting the garbage collector churning frequently to kill off our discarded objects.

Ruby lacks tail call optimization thus making it a very inefficient language to work with recursive functions. However, an important thing to remember is that any recursive solution can be rewritten iteratively.

Avoiding side effects is different than avoiding mutable state entirely. In Ruby, as long as it makes sense to do so, avoiding side effects is a good thing. It reduces the possibility for unexpected bugs much in the same way that avoiding the use of global variables does. However, avoiding the use of mutable state definitely depends more on your individual situation.

If a stateless (possibly recursive) code looks better than other solutions, and per formance is not a major concern, don’t be afraid to write your code in the more elegant way.

  • The simple way to avoid side effects in Ruby when transforming one object to another is to create a new object, and then populate it by iterating over your original object performing the necessary state transformations.
  • You can write stateless code in Ruby by creating new objects every time you per- form an operation, such as Array#+ .
  • Recursive solutions may aid in writing simple stateless solutions, but incur a major performance penalty in Ruby.
  • Creating too many objects can create performance problems as well, so it is im- portant to find the right balance, and to remember that side effects can be avoided without making things fully stateless.

#####Modular Code Organization Functions can be unified into a single namespace by using modules. Like in the Math module:

>> Math.sin(Math::PI / 2)
=> 1.0
>> Math.sqrt(4)
=> 2.0

You can implement code like that by using module_function

module A
  module_function

  def foo
    "This is foo"
  end

  def bar
    "This is bar"
  end
end

which allows you to call functions directly on the module, like this:

>> A.foo
=> "This is foo"
>> A.bar
=> "This is bar"

However, this approach does come with some limitations, because it does not allow you to use private functions:

module A
  module_function

  def foo
    "This is foo calling baz: #{baz}"
  end

  def bar
    "This is bar"
  end

  private

  def baz
    "hi there"
  end
end

>> A.foo
NameError: undefined local variable or method 'baz' for A:Module
from (irb):33:in 'foo'
from (irb):46
from /Users/sandal/lib/ruby19_1/bin/irb:12:in '<main>'

However, because Modules in Ruby, although they cannot be instantiated, are in essence ordinary objects. Because of this, there is nothing stopping us from mixing a module into itself:

module A
  extend self

  def foo
    "This is foo calling baz: #{baz}"
  end

  def bar
    "This is bar"
  end

  private

  def baz
    "hi there"
  end
end

>> A.foo
=> "This is foo calling baz: hi there"

>> A.baz
NoMethodError: private method 'baz' called for A:Module
from (irb):65
from /Users/sandal/lib/ruby19_1/bin/irb:12:in '<main>'

Using this trick of extending a module with itself provides us with a structure that isn’t too different (at least on the surface) from the sort of modules you might find in func- tional programming languages. But aside from odd cases such as the Math module, you might wonder when this technique would be useful.

For the most part, classes work fine for encapsulating code in Ruby. Traditional in heritance combined with the powerful mixin functionality of modules covers most of the bases just fine. However, there are definitely cases in which a concept isn’t big enough for a class, but isn’t small enough to fit in a single function. Also, modules are usually used when all you need to do is call methods on an object instead of storing variables too as you would do in a class.

However, as soon as you see the same argument being passed to a bunch of functions, you might be running into a situation where some persistence of state wouldn’t hurt. The good news is, if need arises for expansion down the line, converting code that has been organized into a module into a class is somewhat trivial (just change the class keyword for module, remove the extend self statement and add you own initializer if necessary).

Check the book for an example of when to use modules instead of classes

Modules introduce a clear separation of concerns that help make testing much easier. They also left room for future expansion and modification without tight coupling.

Here are a few things to watch for that indicate this technique may be the right way to go:

  • You are solving a single, atomic task that involves lots of steps that would be better broken out into helper functions.
  • You are wrapping some functions that don’t rely on much common state between them, but are related to a common topic.
  • The code is very general and can be used standalone or the code is very specific but doesn’t relate directly to the object that it is meant to be used by.
  • The problem you are solving is small enough where object orientation does more to get in the way than it does to help you.

Because modular code organization reduces the amount of objects you are creating, it can potentially give you a decent performance boost. This offers an incentive to use this approach when it is appropriate.

#####Memoization In Ruby, the trivial implementation of the Fibonacci sequence might look like this:

def fib(n)
  return n if (0..1).include? n
  fib(n-1) + fib(n-2)
end

However, you’ll feel the pain that is relying on deep recursion in Ruby if you compute even modest values of n. However, there is a special characteristic of functions like this that makes it possible to speed them up drastically.

In mathematics, a function is said to be well defined if it consistently maps its input to exactly one output. This is obviously true for fib(n) , as fib(6) will always return 8 , no matter how many times you compute it. This sort of function is distinct from one that is not well defined, such as the following:

def mystery(n)
  n + rand(1000)
end

>> mystery(6)
=> 928
>> mystery(6)
=> 671
>> mystery(6)
=> 843

If we run this code a few times with the same n , we see there isn’t a unique relationship between its input and output.

When we have a function like this, there isn’t much we can assume about it. However, well-defined functions such as fib(n) can get a massive performance boost almost for free.

If your mind wandered to tail-call optimization or rewriting the function iteratively, you’re thinking too hard. However, the idea of reducing the amount of recursive calls is on track. As it stands, this code is a bad dream, as fib(n) is called five times when n =3 and nine times when n =4, with this trend continuing upward as n gets larger.

The key realization is that: fib(6) is always going to be 8 , and fib(10) is always going to be 55 . Because of this, we can store these values rather than calculate them repeatedly.

From Introduction to algorithms / Thomas H. Cormen . . . [et al.].—3rd ed. 
Chapter 15, Dynamic Programming

Dynamic programming applies when the subproblems overlap—that is, when subproblems 
share subsubproblems. In this context, a divide-and-conquer algorithm does more work than 
necessary, repeatedly solving the common subsubproblems. A dynamic-programming algorithm 
solves each subsubproblem just once and then saves its answer in a table, thereby avoiding 
the work of recomputing the answer every time it solves each subsubproblem.
We typically apply dynamic programming to optimization problems. Such problems can have many 
possible solutions. Each solution has a value, and we wish to find a solution with the 
optimal (minimum or maximum) value. We call such a solution an optimal solution to the 
problem, as opposed to the optimal solution, since there may be several solutions that 
achieve the optimal value. When developing a dynamic-programming algorithm, we follow a 
sequence of four steps:

1. Characterize the structure of an optimal solution.
2. Recursively define the value of an optimal solution.
3. Compute the value of an optimal solution, typically in a bottom-up fashion.
4. Construct an optimal solution from computed information.

Steps 1–3 form the basis of a dynamic-programming solution to a problem. If we
need only the value of an optimal solution, and not the solution itself, then we
can omit step 4. When we do perform step 4, we sometimes maintain additional
information during step 3 so that we can easily construct an optimal solution.

Let’s give that a shot and see what happens:

@series = []
def fib(n)
  return n if (0..1).include? n
  @series[n] ||= fib(n-1) + fib(n-2)
end

What we have done is used a technique called memoization to cache the return values of our function based on its input. Because we were caching a sequence, it’s reasonable to use an array here, but in other cases in which the data is more sparse, a hash may be more appropriate.

Another example of memoization are the following functions that convert rgb values to their equivalent hexadecimal values and viceversa:

def rgb2hex(rgb)
  # see Kernel#format and String#% for more details
  rgb.map { |e| "%02x" % e }.join
end

def hex2rgb(hex)
  r,g,b = hex[0..1], hex[2..3], hex[4..5]
  [r,g,b].map { |e| e.to_i(16) }
end

>> rgb2hex([100,25,254])
=> "6419fe"
>> hex2rgb("6419fe")
=> [100, 25, 254]

Although these methods aren’t especially complicated, they represent a decent use case for caching via memoization. Colors are likely to be reused frequently and, after they have been translated once, will never change. Therefore, rgb2hex() and hex2rgb() are well-defined functions.

As it turns out, Ruby’s Hash is a truly excellent cache object. Here is the memoized version:

def rgb2hex_manual_cache(rgb)
  @rgb2hex ||= Hash.new do |colors, value|
    colors[value] = value.map { |e| "%02x" % e }.join
  end

  @rgb2hex[rgb]
end

def hex2rgb_manual_cache(hex)
  @hex2rgb ||= Hash.new do |colors, value|
    r,g,b = value[0..1], value[2..3], value[4..5]
    colors[value] = [r,g,b].map { |e| e.to_i(16) }
  end

  @hex2rgb[hex]
end

when running under a tight loop, the memoization can really make a big difference in these functions, and may be worth the minimal noise introduced by adding a Hash into the mix.

The Memoizable module, is designed to abstract the task of creating a cache to the point at which you simply mark each function that should be memoized similar to the way you mark something public or private.

include Memoizable

def rgb2hex(rgb)
  rgb.map { |e| "%02x" % e }.join
end

memoize :rgb2hex

def hex2rgb(hex)
  r,g,b = hex[0..1], hex[2..3], hex[4..5]
  [r,g,b].map { |e| e.to_i(16) }
end

memoize :hex2rgb

Memoizable works by making a copy of your function, renaming it as unmemoized_method_name , and then injects its automatic caching in place of the original function. That means that when we call rgb2hex() or hex2rgb() , we’ll now be hitting the cached versions of the functions.

This is pretty exciting, as it means that for well-defined functions, you can use Memoizable to get a performance boost without even modifying your underlying im- plementation.

Although Memoizable is predictably slower than our raw implementation, it is still cooking with gas when compared to the uncached versions of our functions. What we are seeing here is the overhead of an additional method call per request, so as the operation becomes more expensive, the cost of Memoizable actually gets lower. Also, if we look at things in terms of work versus payout, Memoizable is the clear winner, due to its ability to transparently hook itself into your functions.

Here is the implementation of Memoizable:

module Memoizable
  def memoize( name, cache = Hash.new )
    original = "__unmemoized_#{name}__"

    ([Class, Module].include?(self.class) ? self : self.class).class_eval do
      alias_method original, name

      private
      original
      define_method(name) { |*args| cache[args] ||= send(original, *args) }
    end
  end
end
  • Functions that are well defined, where a single input consistently produces the same output, can be cached through memoization.
  • Memoization often trades CPU time for memory, storing results rather than recal- culating them. As a result, memoization is best used when memory is cheap and CPU time is costly, and not the other way around. In some cases, even when the memory consumption is negligible, the gains can be substantial. We can see this in the fib(n) example, which is transformed from an exponential algorithm to a linear one simply by storing the intermediate calculations.
  • When coding your own solution, Hash.new ’s block form can be a very handy way of putting together a simple caching object.
  • James Gray’s Memoizable module makes it trivial to introduce memoization to well- defined functions without directly modifying their implementations, but incurs a small cost of indirection over an explicit caching strategy.

#####Infinite Lists Infinite lists (also known as lazy streams) provide a way to represent arbitrary sequences that can be traversed by applying a certain function that gets to you the next element for any given element in the list. For example, if we start with any even number, we can get to the next one in the sequence by simply adding 2 to our original element.

module EvenSeries
  class Node
    def initialize(number=0)
      @value = number
      @next = lambda { Node.new(number + 2) }
    end

    attr_reader :value

    def next
      @next.call
    end
  end
end

e = EvenSeries::Node.new(30)
10.times do
  p e.value
  e = e.next
end

=>30
32
34
36
38
40
42
44
46
48

The key innovation is that we’ve turned an external iteration and state transformation into an internal one.

  • Infinite lists essentially consist of nodes that contain a value along with a procedure that will transform that value into the next element in the sequence.
  • Infinite lists are lazily evaluated, and thus are sometimes called lazy streams.
  • An infinite list might be an appropriate structure to use when you need to iterate over a sequential list in groups at various points in time, or if you have a general function that can be tweaked by some parameters to fit your needs.
  • For data that is sparse, memoization might be a better technique than using an infinite list.
  • When you need to do filtering or state transformation on a long sequence of ele- ments that have a clear relationship from one to the next, a lazy stream might be the best way to go.
  • JEG2’s lazy_stream.rb provides a generalized implementation of infinite lists that is worth taking a look at if you have a need for this sort of thing. See: http://graysoftinc.com/higher-order-ruby/infinite-streams
  • Also check Section 7: Doing Something Cool with Closures. At: https://innig.net/software/ruby/closures-in-ruby.html. To see how to make a data structure containing all of the Fibonacci numbers called in a lazy way.
  • See http://nithinbekal.com/posts/ruby-tco/ for mor details on tail call optimization and how to enable TCO in ruby. One major problem about TCO is that TCO messes up the stack traces, and therefore makes debugging harder. However, Ruby allows you to optionally enable it, even though it’s not the default.

#####Higher-Order Procedures

Currying

#####Higher-Order Procedures

Currying

In mathematics and computer science, currying is the technique of translating the evaluation of a function that takes multiple arguments into evaluating a sequence of functions, each with a single argument (partial application). Currying is converting a single function of n arguments into n functions with a single argument each.

Said another way, currying means breaking a function with many arguments into a series of functions that each take one argument and ultimately produce the same result as the original function.

Given the following function:

function f(x,y,z) { z(x(y));}

When curried, becomes:

function f(x) { lambda(y) { lambda(z) { z(x(y)); } } }

In order to get the full application of f(x,y,z), you need to do this:

f(x)(y)(z);

Many functional languages let you write f x y z. If you only call f x y or f(x)(y) then you get a partially-applied function—the return value is a closure of lambda(z){z(x(y))} with passed-in the values of x and y to f(x,y).

Check this function f which takes 3 params x,y,z

f(x,y,z) = 4*x+3*y+2*z

Currying means that we can rewrite the function as a composition of 3 functions(a function for each param):

f(x)(y)(z) = 2*z+(3*y+(4*x))

The direct use of this is what is called Partial Function where if you have a function that accepts n parameters then you can generate from it one or more functions with some parameter values already filled in.

Currying and partial application are often confused to be the same when in fact they are not. Where partial application takes a function and from it builds a function which takes fewer arguments, currying builds functions which take multiple arguments by composition of functions which each take a single argument.

In ruby Proc#curry returns a curried proc. If the optional arity argument is given, it determines the number of arguments. A curried proc receives some arguments. If a sufficient number of arguments are supplied, it passes the supplied arguments to the original proc and returns the result. Otherwise, returns another curried proc that takes the rest of arguments.

b = proc {|x, y, z| (x||0) + (y||0) + (z||0) }
b.curry.(1,2,3)              #=> 6
p b.curry[1][2][3]           #=> 6
p b.curry[1][2][3]           #=> 6
p b.curry[1].(2).call(3)     #=> 6
p b.curry[1, 2][3, 4]        #=> 6
p b.curry(5)[1][2][3][4][5]  #=> 6
p b.curry(5)[1, 2][3, 4][5]  #=> 6
p b.curry(1)[1]              #=> 1
p b.curry(2)[1][2]           #=> 3

b = lambda {|x, y, z| (x||0) + (y||0) + (z||0) }
p b.curry[1][2][3]           #=> 6
p b.curry[1, 2][3, 4]        #=> wrong number of arguments (4 for 3)
p b.curry(5)                 #=> wrong number of arguments (5 for 3)
p b.curry(1)                 #=> wrong number of arguments (1 for 3)


For:
b = proc {|x, y, z| (x||0) + (y||0) + (z||0) }

Curry generates 3 functions (partial applications) each receiving one parameter as following:
curried = b.curry

# this calls the first partial application function
partial_application1 = curried.(1) => #<Proc:0x00000001113388> 
# this calls the second partial application function
partial_application2 = partial_application1.(2) => #<Proc:0x00000001335358>
# this calls the third (and last) partial application function thus returning the final result
result = partial_application1.(3) => 6

Here is another example:

sum = lambda do |f,a,b|
  s = 0 ; a.upto(b){|n| s += f.(n) } ; s
end

# generate the currying
currying = sum.curry
 
# Generate the partial functions
sum_ints = currying.(lambda{|x| x})
sum_of_squares = currying.(lambda{|x| x**2})
sum_of_powers_of_2 = currying.(lambda{|x| 2**x})
 
puts sum_ints.(1,5) #=> 15
puts sum_ints.(1).(5) #=> 15
puts sum_of_squares.(1,5) #=> 55
puts sum_of_powers_of_2.(1,5) #=> 62

See:

Higher-Order Procedures

In mathematics and computer science, a higher-order function is a function that does at least one of the following:

  • takes one or more functions as an input
  • outputs a function

A function is said to be a higher-order function if it accepts another function as input or returns a function as its output.

In ruby, Object#to_proc is a generic hook. This means Symbol#to_proc isn’t special, and we can build our own custom objects that do even cooler tricks than it does.

The place I use this functionality all the time is in Rails applications where I need to build up filter mechanisms that do some of the work in SQL, and the rest in Ruby.

Here is the general pattern I usually start with:

class Filter
  def initialize
    @constraints = []
  end

  def constraint(&block)
    @constraints << block
  end

  def to_proc
    lambda { |e| @constraints.all? { |fn| fn.call(e) } }
  end
end

We can then construct a Filter object and assign constraints to it on the fly:

filter = Filter.new
filter.constraint { |x| x > 10 }
filter.constraint { |x| x.even? }
filter.constraint { |x| x % 3 == 0 }

Now, when dealing with an Enumerable object, it is easy to filter the data based on our constraints:

p (8..24).select(&filter) #=> [12,18,24]

As we add more constraints, new blocks are generated for us, so things work as expected:

filter.constraint { |x| x % 4 == 0 }
p (8..24).select(&filter) #=> [12,24]

As you can see, Symbol#to_proc isn’t the only game in town. Any object that can mean- ingfully be reduced to a function can implement a useful to_proc method.

Chapter 6: When things go wrong

Note:

[].all? #=> true
[].all? {|e| !e.nil? } #=> true
[].all? { false } #=> true

From http://stackoverflow.com/questions/16662727/why-does-all-return-true-on-an-empty-array: The reason why all? returns true is because in ruby you can never loop over an empty collection(array, hashes etc.), so the block never gets executed. and if the block never gets executed all? returns true.

Here is the Rubinius implementation of all?

def all?
  if block_given?
    each { |*e| return false unless yield(*e) }
  else
    each { return false unless Rubinius.single_block_arg }
  end
  true
end

A process for debuggin ruby code

  1. First, identify the different scenarios that apply to a given feature.
  2. Enumerate over these scenarios to identify which ones are affected by defects and which ones work as expected. This can be done in many ways, ranging from print- ing debugging messages on the command line to logfile analysis and live application testing. The important thing is to identify and isolate the cases affected by the bug.
  3. Hop into irb if possible and take a look at what your objects actually look like under the hood. Experiment with the failing scenarios in a step-by-step fashion to try to dig down and uncover the root cause of problems.
  4. Write tests to reproduce the problems you are having, along with what you expect to happen when the issue is resolved.
  5. Implement a fix that passes the tests, and then repeat the process until all issues are resolved.

Capturing the Essence of a Defect

The main idea is that if you remove all the extraneous code that is unrelated to the issue, it will be easier to see what is really going on. As you continue to investigate an issue, you may discover that you can reduce the example more and more based on what you learn.

Most bugs aren’t going to show up in the first place you look. Instead, they’ll often be hidden farther down the chain, stashed away in some low-level helper method or in some other code that your feature depends on.

Whenever you are hunting for bugs, the practice of reducing your area of interest first will help you avoid dead ends and limit the number of possible places in which you’ll need to look for problems. Before doing any formal investigation, it’s a good idea to check for obvious problems so that you can get a sense of where the real source of your defect is. Some bugs are harder to catch on sight than others, but there is no need to overthink the easy ones.

The main benefit of an automated test is that it will explode when your code fails to act as expected. It is important to keep in mind that even if you have an existing test suite, when you encounter a bug that does not cause any failures, you need to update your tests. This helps prevent regressions, allowing you to fix a bug once and forget about it.

Once we write a test that reproduces our problem, the way we fix it is to get our tests passing again. If other tests end up breaking in order to get our new test to pass, we know that something is still wrong. If for some reason our problem isn’t solved when we get all the tests passing again, it means that our reduced example probably didn’t cover the entirety of the problem, so we need to go back to the drawing board in those cases. Even still, not all is lost. Each test serves as a significant reduction of your problem space. Every passing assertion eliminates the possibility of that particular issue from being the root of your problem. Sooner or later, there won’t be any place left for your bugs to hide.

Scrutinizing Your Code Utilizing Reflection We can infer a lot about an object by using Ruby’s reflective capabilities:

obj.class
obj.instance_variables
obj.private_methods
obj.private_methods(inherited_methods=false)
obj.instance_variable_get(:@some_var)

Improving inspect Output When you inspect a ruby object it might look like this:

#<Prawn::Document:0x12cf17c @page_content=#<Prawn::Reference:0x12cecf4
@data={:Length=>0}, @gen=0, @identifier=4, @stream="0.000 0.000 0.000 r
g\n0.000 0.000 0.000 RG\nq\n", @compressed=false>, @info=
#<Prawn::Reference:0x12cf0c8 @data={:Creator=>"Prawn", :Producer=>"Prawn"},
@gen=0, @identifier=1, @compressed=false>
, @root=#<Prawn::Reference:0x12cf064 @data={:Type=>:Catalog, :Pages=>
#<Prawn::Reference:0x12cf08c @data={:Count=>1, :Kids=>[#<Prawn::Reference:0x12ceca4
@data={:Contents=>#<Prawn::Reference:0x12cecf4
@data={:Length=>0}, @gen=0, @identifier=4, @stream="0.000 0.000 0.000 rg\n0.000
0.000 0.000 RG\nq\n",
<< ABOUT 50 MORE LINES LIKE THIS >>
#<Prawn::Reference:0x12cf08c @data={:Count=>1, :Kids=>[#<Prawn::Reference:0x12ceca4
...>], :Type=>:Pages}, @gen=0, @identifier=2, @compressed=false>,
:MediaBox=>[0, 0, 612.0, 792.0]}, @gen=0, @identifier=5, @compressed=false>],
@margin_box=#<Prawn::Document::BoundingBox:0x12ced30 @width=540.0,
@y=756.0, @x=36, @parent=#<Prawn::Document:0x12cf17c ...>, @height=720.0>,
@fill_color="000000", @current_page=#<Prawn::Reference:0x12ceca4 @data={:Contents=>
#<Prawn::Reference:0x12cecf4 @data={:Length=>0}, @gen=0, @identifier=4,
@stream="0.000 0.000 0.000 rg\n0.000 0.000 0.000 RG\nq\n", @compressed=false>,
:Type=>:Page, :Parent=>#<Prawn::Reference:0x12cf08c @data={:Count=>1,
:Kids=>[#<Prawn::Reference:0x12ceca4 ...>], :Type=>:Pages},
@gen=0, @identifier=2, @compressed=false>, :MediaBox=>[0, 0, 612.0, 792.0]},
@gen=0, @identifier=5, @compressed=false>, @skip_encoding=nil,
@bounding_box=#<Prawn::Document::BoundingBox:0x12ced30 @width=540.0, @y=756.0, @x=36,
@parent=#<Prawn::Document:0x12cf17c ...>, @height=720.0>, @page_size="LETTER",
@stroke_color="000000" , @text_options={}, @compress=false, @margins={:top=>36,
:left=>36, :bottom=>36, :right=>36}>

The whole situation here would be better if we had easier-to-read inspect output. There is actually a standard library called pp that improves the formatting of inspect while operating in a very similar fashion.

The output of Kernel#p can be improved on an object-by-object basis. This may be obvious if you have used Object#inspect before, but it is also a severely underused feature of Ruby.

You can turn the previous output into:

>> pdf = Prawn::Document.new
=> < Prawn::Document:0x27df8a:
  @background: nil
  @compress: false
  @fill_color: "000000"
  @font_size: 12
  @margins: {:left=>36, :right=>36, :top=>36, :bottom=>36}
  @page_layout: :portrait
  @page_size: "LETTER"
  @skip_encoding: nil
  @stroke_color: "000000"
  @text_options: {}
  @y: 756.0
  @bounding_box -> Prawn::Document::BoundingBox:0x27dd64
  @current_page -> Prawn::Reference:0x27dd1e
  @info -> Prawn::Reference:0x27df44
  @margin_box -> Prawn::Document::BoundingBox:0x27dd64
  @objects -> Array:0x27df6c
  @page_content -> Prawn::Reference:0x27dd46
  @pages -> Prawn::Reference:0x27df26
  @root -> Prawn::Reference:0x27df12 >  

which is way easier to read. To accomplish this, here is a template that allows you to pass in a couple of arrays of symbols that point at instance variables:

module InspectTemplate
  def __inspect_template(objs, refs)
    obj_output = objs.sort.each_with_object("") do |v,out|
      out << "\n #{v}: #{instance_variable_get(v).inspect}"
    end

    ref_output = refs.sort.each_with_object("") do |v,out|
      ref = instance_variable_get(v)
      out << "\n #{v} -> #{__inspect_object_tag(ref)}"
    end

    "< #{__inspect_object_tag(self)}: #{obj_output}\n#{ref_output} >"
  end

  def __inspect_object_tag(obj)
    "#{obj.class}:0x#{obj.object_id.to_s(16)}"
  end
end

After mixing this into Prawn::Document , I need only to specify which variables I want to display the entire contents of, and which I want to just show as references. Then, it is as easy as calling __inspect_template with these values

class Prawn::Document
  include InspectTemplate
  def inspect
    objs = [:@page_size, :@page_layout, :@margins, :@font_size, :@background,
            :@stroke_color, :@fill_color, :@text_options, :@y, :@compress,
            :@skip_encoding]
    refs = [:@objects, :@info, :@pages, :@bounding_box, :@margin_box,
            :@page_content, :@current_page, :@root]
    __inspect_template(objs,refs)
  end
end

Once we provide a customized inspect method that returns a string, both Kernel#p and irb will pick up on it, yielding the nice results shown earlier.

The yaml data serialization standard library has the nice side effect of producing highly readable representations of Ruby objects. Because of this, it actually provides a Kernel#y method that can be used as a stand-in replacement for p . Although this may be a bit strange, if you look at it in action, you’ll see that it has some benefits:

>> require "yaml"
=> true
>> y Prawn::Document.new
--- &id007 !ruby/object:Prawn::Document
background:
bounding_box: &id002 !ruby/object:Prawn::Document::BoundingBox
  height: 720.0
  parent: *id007
  width: 540.0
  x: 36
  y: 756.0
compress: false
info: &id003 !ruby/object:Prawn::Reference
  compressed: false
  data:
    :Creator: Prawn
    :Producer: Prawn
  gen: 0
  identifier: 1
  on_encode:
margin_box: *id002
margins:
  :left: 36
  :right: 36
  :top: 36
  :bottom: 36
page_content: *id005
page_layout: :portrait
page_size: LETTER
pages: *id004
root: *id006
skip_encoding:
stroke_color: "000000"
text_options: {}

y: 756.0
=> nil

YAML automatically truncates repeated object references by referring to them by ID only. This turns out to be especially good for tracking down a certain kind of Ruby bug:

>> a = Array.new(6)
=> [nil, nil, nil, nil, nil, nil]
>> a = Array.new(6,[])
=> [[], [], [], [], [], []]
>> a[0] << "foo"
=> ["foo"]
>> a
=> [["foo"], ["foo"], ["foo"], ["foo"], ["foo"], ["foo"]]
>> y a
---
- &id001
- foo
- *id001
- *id001
- *id001
- *id001
- *id001

Here, it’s easy to see that the six subarrays that make up our main array are actually just six references to the same object. And in case that wasn’t the goal, we can see the difference when we have six distinct objects very clearly in YAML:

>> a = Array.new(6) { [] }
=> [[], [], [], [], [], []]
>> a[0] << "foo"
=> ["foo"]
>> a
=> [["foo"], [], [], [], [], []]
>> y a
---
- - foo
- []
- []
- []
- []
- []

Finding Needles in a Haystack If you have a big collection of objects some of which may be corrupted you can use the following code to identify the corrupted records and decide what to do based on that:

data.select.with_index do |e,i|
  begin
    Integer(e[:payment]) > 1000
  rescue ArgumentError
    p [e,i]
    raise # optionally comment this line to identify all corrupted records
  end
end
[{:name=>"Mr. Clotilde Baumbach", :phone_number=>"(608)779-7942",
:payment=>"1991.25"}, 91]
ArgumentError: invalid value for Integer: "1991.25"
from (irb):67:in 'Integer'
from (irb):67:in 'block in irb_binding'
from (irb):65:in 'select'
from (irb):65:in 'with_index'
from (irb):65
from /Users/sandal/lib/ruby19_1/bin/irb:12:in '<main>'

#####Working with Logger #####Working with Logger I’ll show you how to replicate a bit of functionality that is especially common in Ruby’s web frameworks: comprehensive error logging.

To demonstrate this, we’ll be walking through a TCPServer that does simple arithmetic operations in prefix notation. We’ll start by taking a look at it without any logging or error-handling support:

require "socket"

class Server
  def initialize
    @server = TCPServer.new('localhost',port=3333)
  end

  def *(x, y)
    "#{Float(x) * Float(y)}"
  end

  def /(x, y)
    "#{Float(x) / Float(y)}"
  end

  def handle_request(session)
    action, *args = session.gets.split(/\s/)
    if ["*", "/"].include?(action)
      session.puts(send(action, *args))
    else
      session.puts("Invalid command")
    end
  end

  def run
    while session = @server.accept
      handle_request(session)
    end
  end
end

We can use the following fairly generic client to interact with the server, which is similar to the one we used in Chapter 2, Designing Beautiful APIs:

require "socket"

class Client
  def initialize(ip="localhost",port=3333)
    @ip, @port = ip, port
  end

  def send_message(msg)
    socket = TCPSocket.new(@ip,@port)
    socket.puts(msg)
    response = socket.gets
    socket.close
    return response
  end

  def receive_message
    socket = TCPSocket.new(@ip,@port)
    response = socket.read
    socket.close
    return response
  end
end

Without any error handling, we end up with something like this on the client side:

client = Client.new

response = client.send_message("* 5 10")
puts response

response = client.send_message("/ 4 3")
puts response

response = client.send_message("/ 3 foo")
puts response

response = client.send_message("* 5 7.2")
puts response

## OUTPUTS ##
50.0
1.33333333333333
nil
client.rb:8:in 'initialize': Connection refused - connect(2) (Errno::ECONNREFUSED)
from client.rb:8:in 'new'
from client.rb:8:in 'send_message'
from client.rb:35

When we send the erroneous third message, the server never responds, resulting in a nil response. But when we try to send a fourth message, which would ordinarily be valid, we see that our connection was refused. If we take a look server-side, we see that a single uncaught exception caused it to crash immediately:

server_logging_initial.rb:15:in 'Float':
invalid value for Float(): "foo" (ArgumentError)
  from server_logging_initial.rb:15:in '/'
  from server_logging_initial.rb:20:in 'send'
  from server_logging_initial.rb:20:in 'handle_request'
  from server_logging_initial.rb:25:in 'run'
  from server_logging_initial.rb:31

Though this does give us a sense of what happened, it doesn’t give us much insight into when and why. Here is the same implementation using logger:

require "socket"
require "logger"

class StandardError
	def report
		%{#{self.class}: #{message}\n#{backtrace.join("\n")}}
	end
end

class Server
	def initialize(logger)
		@logger = logger
		@server = TCPServer.new('localhost',port=3333)
	end

	def *(x, y)
		"#{Float(x) * Float(y)}"
	end

	def /(x, y)
		"#{Float(x) / Float(y)}"
	end

	def handle_request(session)
		action, *args = session.gets.split(/\s/)
		if ["*", "/"].include?(action)
			@logger.info "executing: '#{action}' with #{args.inspect}"
			session.puts(send(action, *args))
		else
			session.puts("Invalid command")
		end
	rescue StandardError => e
		@logger.error(e.report)
		session.puts "Sorry, something went wrong."
	end

	def run
		while session = @server.accept
			handle_request(session)
		end
	end
end

begin
	logger = Logger.new("development.log")
	host = Server.new(logger)
	host.run
rescue StandardError => e
	logger.fatal(e.report)
	puts "Something seriously bad just happened, exiting"
end

Chapter 7: Reducing Cultural Barriers

Ten years ago, a book on best practices for any given programming language would seem perfectly complete without a chapter on multilingualization (m17n) and locali- zation (L10n). In 2009, the story is just a little bit different.

From: https://blog.mozilla.org/l10n/2011/12/14/i18n-vs-l10n-whats-the-diff/
* Internationalization (i18n).
* Localization (l10n).
* Globalization (g11n).
* Localizability (l12y).

 W3C said it best when they wrote the following:

    __“Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.

    Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).”__

In other words, i18n allows applications to support and satisfy the needs of multiple locales, thus “enabling” l10n.

From: http://www.w3.org/International/questions/qa-i18n.en

What do the terms 'internationalization' and 'localization' mean, and how are they related?

Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).

**Localization(l10n)**

Localization is sometimes written as l10n, where 10 is the number of letters between l and n.

Often thought of only as a synonym for translation of the user interface and documentation, localization is often a substantially more complex issue. It can entail customization related to:

    Numeric, date and time formats
    Use of currency
    Keyboard usage
    Collation and sorting
    Symbols, icons and colors
    Text and graphics containing references to objects, actions or ideas which, in a given culture, may be subject to misinterpretation or viewed as insensitive.
    Varying legal requirements
    and many more things.

Localization may even necessitate a comprehensive rethinking of logic, visual design, or presentation if the way of doing business (eg., accounting) or the accepted paradigm for learning (eg., focus on individual vs. group) in a given locale differs substantially from the originating culture.

**Internationalization(i18n)**

Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.

Internationalization is often written i18n, where 18 is the number of letters between i and n in the English word.

Internationalization typically entails:

    Designing and developing in a way that removes barriers to localization or international deployment. This includes such things as enabling the use of Unicode, or ensuring the proper handling of legacy character encodings where appropriate, taking care over the concatenation of strings, avoiding dependance in code of user-interface string values, etc.
    Providing support for features that may not be used until localization occurs. For example, adding markup in your DTD to support bidirectional text, or for identifying language. Or adding to CSS support for vertical text or other non-Latin typographic features.
    Enabling code to support local, regional, language, or culturally related preferences. Typically this involves incorporating predefined localization data and features derived from existing libraries or user preferences. Examples include date and time formats, local calendars, number formats and numeral systems, sorting and presentation of lists, handling of personal names and forms of address, etc.
    Separating localizable elements from source code or content, such that localized alternatives can be loaded or selected based on the user's international preferences as needed.

Notice that these items do not necessarily include the localization of the content, application, or product into another language; they are design and development practices which allow such a migration to take place easily in the future but which may have significant utility even if no localization ever takes place.

Multilingualization(m17n)

The act of adapting or localizing something to, into, or for multiple languages.

Although some may argue that it took too long to materialize, Ruby 1.9 provides a robust and elegant solution to the m17n problem. Rather than binding its users to a particular internal encoding and requiring complex manual manipulation of text into that format, Ruby 1.9 provides facilities that make it easy to transcode text from one encoding to another. This system is well integrated so that things like pattern matching and I/O operations can be carried out in all of the encodings Ruby supports, which provides a great deal of flexibility for those who need to do encoding-specific opera- tions. Of course, because painless transcoding is possible, you can also write code that accepts and produces text in a wide variety of encodings, but uses a single encoding throughout its internals, improving the consistency and simplicity of the underlying implementation.

Once you are comfortable with how to store, manipulate, and produce international- ized text in various character encodings, you may want to know about how to customize your software so that its interface is adapted to whatever the native language and dia- logue of its users might be. Although multilingualization and localization requirements don’t necessarily come in pairs, they often do.

See http://www.joelonsoftware.com/articles/Unicode.html for details on character encoding.

m17n by Example: A Look at Ruby’s CSV Standard Library

When it comes to m17n, the place to look is the CSV library.

CSV manages to parse data that is in any of the character encodings that Ruby supports without transcoding the source text.

In addition to encoding regular expressions, because CSV accepts user-entered values that modify its core parser, it needs to escape them. Although the built-in Regexp.escape() method works with most of the encodings Ruby supports, at the time of the Ruby 1.9.1 release, it had some issues with a handful of them. To work around this, CSV rolls its own escape method:

# This method is an encoding safe version of Regexp.escape(). It will escape
# any characters that would change the meaning of a regular expression in the
# encoding of +str+. Regular expression characters that cannot be transcoded
# to the target encoding will be skipped and no escaping will be performed if
# a backslash cannot be transcoded.
#
def escape_re(str)
  str.chars.map { |c| @re_chars.include?(c) ? @re_esc + c : c }.join
end

This means that once things like the column separator, row separator, and quote char- acter have been specified by the user and converted into the specified encoding, this code can check to see whether the transcoded characters need to be escaped.

@re_chars is set in the CSV constructor as simply a list of regular expression reserved characters transcoded to the specified @encoding :

@re_chars = %w[ \\ . [ ] - ^ $ ?
                * + { } ( ) | #
                \ \r \n \t \f \v ].map { |s| s.encode(@encoding) rescue nil }.compact

if you for some reason had data columns separated by the Japanese character for 2, you could split things up that way:

# coding: UTF-8
require "csv"
CSV.read("data.csv", encoding: "Shift_JIS", col_sep: "二")

By working with strings and regular expressions indirectly through encoding helpers, we can be sure that any pattern matching or text manipulation gets done in a compatible way. By translating the parser rather than the source data, we incur a fixed cost rather than one that varies in relation to the size of the data source. For a need like CSV processing, this is very important, as the format is often used for large data dumps.

Portable m17n Through UTF-8 Transcoding

Although it’s nice to be able to support each character encoding natively, it can be quite difficult to maintain a complex system that works that way. The easy way out is to standardize on a single, fairly universal character encoding to write your code against. Then, all that remains to be done is to transcode any string that comes in, and possibly transcode again on the way out. The character set of choice for use in code that needs to be portable from one system to another is UTF-8. UTF-8 is capable of representing the myriad character sets that make up Unicode, which means it can represent nearly any glyph you might imagine in any other character encoding.

As a variable-length character encoding, it does this fairly efficiently, so that users who do not need extra bytes to represent large character sets do not incur a significant memory penalty.

Source Encodings

A key aspect of any m17n-capable Ruby projects is to properly set the source encodings of its files. This is done via the # coding: UTF-8 comments at the top of the file. In order for Ruby to pick it up, this comment must be the first line in the file, unless a shebang is present (in this case, the magic comment can appear on the second line), such as in the following example:

#!/usr/bin/env ruby
# coding: UTF-8

However, in all other situations, nothing else should come before it. Although Ruby is very strict about where you place the comment, it’s fairly loose about the way you write it. Case does not matter as long as it’s in the form of coding: some_encoding, and extra text may appear before or after it. This is used primarily for editor support, allowing things such as Emacs-style strings:

# -*- coding: utf-8 -*-

Their purpose is to tell Ruby what encoding your regex and string literals are in. Forgetting to explicitly set the source encoding in this manner can cause all sorts of nasty problems, as it will force Ruby to fall back to US-ASCII, breaking virtually all internationalized text.

Once you set the source encoding to UTF-8 in all your files, if your editor is producing UTF-8 output, you can be sure of the encoding of your literals. That’s the first step.

Working with Files

By default, Ruby uses your locale settings to determine the default external character encoding for files. You can check what yours is set to by running this code:

ruby -e "p Encoding.default_external"

If your locale information isn’t set, Ruby assumes that there is no suitable default encoding, reverting to ASCII-8BIT to interpret external files as sequences of untranslated bytes.

The actual value your default_external is set to doesn’t really matter when you’re developing code that needs to run on systems that you do not control. Because most libraries fall under this category, it means that you simply cannot rely on File.open() or File.read() to work without explicitly specifying an encoding.

This means that if you want to open a file that is in Latin-1 (ISO-8859-1), but process it within your UTF-8-based library, you need to write code something like this:

data = File.read("foo.txt", encoding:"ISO-8859-1:UTF-8")

Here, we’ve indicated that the file we are reading is encoded in ISO-8859-1, but that we want to transcode it to UTF-8 immediately so that the string we end up with in our program is already converted for us.

Writing back to file works in a similar fashion. Here’s what it looks like to automatically transcode text back to Latin-1 from a UTF-8 source string:

File.open("foo.txt", "w:ISO-8859-1:UTF-8") { |f| f << data + "Some extra text" }

In a UTF-8-based library, you will need to supply an encoding string of the form external_format:UTF-8 whenever you’re working with text files. Of course, if the external format happens to be UTF-8, you would just write something like this:

data = File.read("foo.txt", encoding: "UTF-8")
File.open("foo.txt", "w:UTF-8") { |f| f << data + "Some extra text"

The underlying point here is that if you want to work with files in a portable way, you need to be explicit about their character encodings. Without doing this, you cannot be sure that your code will work consistently from machine to machine. Also, if you want to make it so all of the internals of your system operate in a single encoding, you need to explicitly make sure the loaded files get translated to UTF-8 before you process the text in them. If you take care of these two things, you can mostly forget about the details, as all of the actual work will end up getting done on UTF-8 strings.

If you are handling binary files you should then use File.binread() like this:

img_data = File.binread("foo.png")
img_data.encoding #=> #<Encoding:ASCII-8BIT>

For more complex needs, or for when you need to write a binary file, Ruby 1.9 has also changed the meaning of "rb" and "wb" in File.open(). Rather than simply disabling line-ending conversion, using these file modes will now set the external encoding to ASCII-8BIT by default.

Unless you’re working with binaries, be sure to explicitly specify the external encoding of your files, and transcode them to UTF-8 upon read or write. If you are working with binaries, be sure to use File.binread() or File.open() with the proper flags to make sure that your text is not accidentally encoded into the character set specified by your locale. This one can produce subtle bugs that you might not encounter until you run your code on another machine, so it’s important to try to avoid in the first place.

Transcoding User Input in an Organized Fashion

It turns out that in practice, you don’t really need to worry about transcoding whenever you are comparing user input to a finite set of possible ASCII values.

If you can be sure that you never manipulate or compare a string, transcoding can be safely ignored in most cases. In cases in which you do manipulation or comparison, if the input strings will consist of nothing more than ASCII characters in all cases, you do not need to transcode them. All other strings need to be transcoded to UTF-8 within your library unless you expect users to do the conversions themselves.

You can clean up your code significantly by identifying the points where encodings matter in your code. Oftentimes, there will be a handful of low-level functions that are at the core of your system, and they are the places where transcoding needs to be done.

Roughly, the process of building a UTF-8 based system goes like this:

  • Be sure to set the source encoding of every file in your project to UTF-8 using magic comments.
  • Use the external:internal encoding string when opening any I/O stream, speci- fying the internal encoding as UTF-8. This will automatically transcode files to UTF-8 upon read, and automatically transcode from UTF-8 to the external en- coding on write.
  • Make sure to either use File.binread() or include the "b" flag when dealing with binary files. Otherwise, your files may be incorrectly interpreted based on your locale, rather than being treated as a stream of unencoded bytes.
  • When dealing with user-entered strings, only transcode those that need to be ma- nipulated or compared to non-ASCII strings. All others can be left in their native encoding as long as they consist of ASCII characters only or they are not manipu- lated by your code.
  • Do not rely on default_external or default_internal , and be sure to set your source encodings properly. This ensures that your code will not depend on envi- ronmental conditions to run.
  • If you need to do a ton of text processing on user-entered strings that may use many different character mappings, it might not be a great idea to use this approach.

Inferring Encodings from Locale

  • The LANG environment variable that specifies your system locale is used by Ruby to determine the default external encoding of files. A properly set locale can allow Ruby to automatically load files in their native encodings without explicitly stating what character mapping they use.
  • Although magic comments are typically required in files to set the source encoding, an exception is made for ruby -e -based command-line scripts. The source encoding for these one-liners is determined by locale. In most cases, this is what you will want.
  • You can specify a default internal encoding that Ruby will automatically transcode loaded files into when no explicit internal encoding is specified. It is often reason- able to set this to match the source encoding in your scripts.
  • You can set default external/internal encodings via the command-line switch -Eexternal:internal if you do not want to explicitly set them in your scripts.
  • The -Ku flag still works for putting Ruby into “UTF-8” mode, which is useful for backward compatibility with Ruby 1.8.
  • All of the techniques described in this section are suitable mostly for scripts or private use code. It is a bad idea to rely on locale data or manually set external and internal encodings in complex systems or code that needs to run on machines you do not have control over.

m17n-Safe Low-Level Text Processing

The underlying theme of working with low-level text operations in an m17n-safe way is that characters are not necessarily equivalent to bytes.

File.open("hello.txt") { |f|
  loop do
    break if f.eof?
    chunk = "CHUNK: #{f.read(5)}"
    puts chunk unless chunk.empty?
  end
}

The purpose of the previous example is to print out the contents of the file in chunks of five bytes, which, when it comes to ASCII, means five characters. However, multibyte character encodings, especially variable-length ones such as UTF-8, cannot be pro- cessed using this approach. The reason is fairly simple. Imagine this code running against a two-character, six-byte string in UTF-8 such as “ 吴佳 ”. If we read five bytes of this string, we end up breaking the second character’s byte sequence, resulting in the mangled string “ 吴\xE4\xBD ”.

Many times, the reason why we read data in chunks is not to process it at the byte level, but instead, to break it up into small parts as we work on it.

A source of a good solution to the problem, is found within the CSV standard library.

def read_to_char(bytes)
  return "" if @io.eof?
  data = @io.read(bytes)

  begin
    encoded = encode_str(data)
    raise unless encoded.valid_encoding?
    return encoded
  rescue # encoding error or my invalid data raise
    if @io.eof? or data.size >= bytes + 10
      return data
    else
      data += @io.read(1)
      retry
    end
  end
end

If we walk through this step by step, we see that an empty string is returned if the stream is already at eof? . Assuming that it is not, a specified number of bytes is read.

Then, the string is encoded, and it is checked to see whether the character mapping is valid.

When the encoding is valid, read_to_char returns the chunk, assuming that the string was broken up properly. Otherwise, it raises an error, causing the rescue block to be executed. Here, we see that the core fix relies on buffering the data slightly to try to read a complete character. What actually happens here is that the method gets retried repeatedly, adding one extra byte to the data until it either reaches a total of 10 bytes over the specified chunk size, or hits the end of the file.

The reason why this works is that every encoding Ruby supports has a character size of less than 10 bytes.

One other thing to remember about low-level operations on strings when it comes to m17n in order to keep your code character-mapping-agnostic, you’ll want to use String#ord instead of String#unpack.

Localizing Your Code Localization (l10n) is a way to mark the relevant sections of text with meaningful tags that can then be altered by external translation files.

what we can do is come up with unique identifiers for each text segment in our application, and then create translation files that fill in the appropriate values depending on what lan- guage is selected.

An important aspect of localizing your code is that you might want to do it as late as possible so that your business logic is not affected by translations.

  • The first step in localizing an application is identifying the unique text segments that need to be translated.
  • A generalized L10n system provides a way to keep all locale-specific content in translation files rather than tied up in the display code of your application.
  • Every string that gets displayed to the user must be passed through a translation filter so that it can be customized based on the specified language. In Gibberish::Simple , we use T() for this; other systems may vary.
  • Translation should be done at as late a stage as possible, so that L10n-related modifications to text do not interfere with the core business logic of your program.
  • In many cases, you cannot simply interpolate strings in a predetermined order. Gibberish::Simple offers a simple templating mechanism that allows each trans- lation file to specify how substrings should be interpolated into a text segment. If you roll your own system, be sure to keep this in consideration.
  • Creating helper functions to simplify your translation code can come in handy when generating dynamic text output. For an example of this, go back and look at how weapon_name() was used in the simple Sinatra example discussed here.
  • Because adding individual localization tags can be a bit tedious, it’s often a good idea to wait until you have a fully fleshed-out application before integrating a gen- eral L10n system, if it is possible to do so.

Chapter 8: Skillful Project Maintenance

###Exploring a Well-Organized Ruby Project (Haml) We can pretend we have no idea what it actually does, and seek to discover a bit about these details by exploring the source code itself.

After grabbing the source, * we can start by looking for a file called README or some- thing similar. We find one called README.rdoc, which gives us a nice description of why we might want to use Haml right at the top of the file. The rest of the file fills in other useful details, including how to install the library, some usage examples, a list of the executables it ships with, and some information on the authors. It also has a specific line that says: To use Haml and Sass programmatically, check out the RDocs for the Haml and Sass modules. This indicates that the project has autogenerated API documentation.

Noticing the project also has a Rakefile, we can check to see whether there is a task for generating the documentation.

We could read directly through the source now to see which functions are most im- portant, but tests often provide a better road map to where the interesting parts are and describing how some code is meant to be used.

The rake install task, which will install Haml as a gem from the current sources. It just executes a shell command in which it reads the current version from a file called VERSION. Using this approach, the Rakefile is kept independent of a particular version number, allowing a single place for updating version numbers. All of these tricks are done in the name of simplifying maintainability, making it easy to generate and install the library from source for testing.

###Conventions to Know About #####What goes in a Readme A good README should include everything that is necessary to begin working with a project, and nothing more. You’ll need a brief one or two-paragraph description of what the project is for, and what problems it is meant to solve.

Next, it is generally a good idea to point out a couple of the core classes that make up the public API of your project.

Because sometimes raw API documentation isn’t enough to get people started, it’s often a good idea to include a brief synopsis of your project’s capabilities through a few simple examples.

If your install instructions are simple, you can just embed them in your README file directly. However, if your project has several install methods, and optional dependencies enable certain features you can just create an INSTALL file and reference it in the readme.

Finally, once you’ve told users what your project is, where to look for documentation, how it looks in brief, and how to get it installed, you’ll want to let them know how to contact you in case something goes wrong. If you’re working on a bigger project, this might be the right place to link to a mailing list or bug tracker.

#####Laying Out Your Library

Library files are generally kept in a lib/ directory. Generally speaking, this directory should only have one file in it, and one subdirectory. For Haml, the structure is lib/haml.rb and lib/haml/. For HighLine, it is lib/high line.rb and lib/highline/. The Ruby file in your lib/ dir should bear the name of your project and act as a jumping- off point for loading dependencies as well as any necessary support libraries. The top of lib/highline.rb provides a good example of this:

#!/usr/local/bin/ruby -w
require "erb"
require "optparse"
require "stringio"
require "abbrev"
require "highline/compatibility"
require "highline/system_extensions"
require "highline/question"
require "highline/menu"
require "highline/color_scheme"
class HighLine
# ...
end

If you have deeply nested classes in your projects, you will typically repeat this process for each level of nesting.

# a.rb
require "a/b"

# a/b.rb
require "a/b/c"
require "a/b/d"

# a/b/c.rb
module A
	module B
		class C
		# ...
		end
	end
end

# a/b/d.rb
module A
	module B
		class D
		#...
		end
	end
end

With a file structure as indicated by the comments in the example code, and the nec- essary require statements in place, we end up being able to do this:

>> require "a"
=> true
>> A::B::C
=> A::B::C
>> A::B::D
=> A::B::D

Although this is much more important in large systems than small ones, it is a good habit to get into. Essentially, unless there is a good reason to deviate, files will often map to class names in Ruby. Nested classes that are large enough to deserve their own file should be loaded in the file that defines the class they are nested within. Using this approach allows the user a single entry point into your library, but also allows for running parts of the system in isolation.

Filenames do not necessarily need to be representative of a class at all so you can deviate from this standard if needed.

In the more general case, you might have files that contain extensions to provide back- ward compatibility with Ruby 1.8, or ones that make minor changes to core Ruby classes. Decent names for these are lib/myproject/compatibility.rb and lib/myproject/ extensions.rb, respectively. When things get complicated, you can of course nest these and work on individual classes one at a time.

However you choose to organize your files, one thing is fairly well agreed upon: if you intend to modify core Ruby in any way, you should do it in files that are well marked as extension files, to help people hunt down changes that might conflict with other packages.

#####Executables Scripts and applications are usually placed in a bin/ dir in Ruby projects. These are typically ordinary Ruby scripts that have been made executable via something like a combination of a shebang line and a chmod +x call. To make these appear more like ordinary command-line utilities, it is common to omit the file extension. As an example, we can take a look at the haml executable:

#!/usr/bin/env ruby
# The command line Haml parser.

$LOAD_PATH.unshift File.dirname(__FILE__) + '/../lib'
require 'haml'
require 'haml/exec'
opts = Haml::Exec::Haml.new(ARGV)
opts.parse!

You can see that the executable starts with a shebang line that indicates how to find the Ruby interpreter. This is followed by a line that adds the library to the loadpath by relative positioning. Finally, the remaining code simply requires the necessary library files and then delegates to an object that is responsible for handling command-line requests. Ideally speaking, most of your scripts in bin/ will follow a similar approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment