You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
redo and retry are both used to re-execute parts of a loop. But they differ in how much they re-execute:
redo only repeats the current iteration, while retry repeats the whole loop from the start.
redo example:
(0..5).eachdo |i|
puts"Value: #{i}"
redo ifi > 2endValue: 0Value: 1Value: 2Value: 3Value: 3Value: 3# ... this is an infinite loop
retry example:
(0..5).eachdo |i|
puts"Value: #{i}"retryifi > 2endValue: 0Value: 1Value: 2Value: 3Value: 0Value: 1Value: 2# ... this is an infinite loop, too
If retry appears in rescue clause of begin expression, restart from the beginning of the begin body.
you could reuse the underscore library (but not so with other variables which would raise a 'duplicated argument name' error):
people.map{ |name,(_,_,email)| [name,email]}
###Ruby's double bang !!
If you negate something, that forces a boolean context. Of course, it also negates it. If you double-negate it, it forces the boolean context, but returns the proper boolean value.
"hello"#-> this is a string; it is not in a boolean context
!"hello"#-> this is a string that is forced into a boolean# context (true), and then negated (false)
!!"hello"#-> this is a string that is forced into a boolean# context (true), and then negated (false), and then# negated again (true)
!!nil#-> this is a false-y value that is forced into a boolean# context (false), and then negated (true), and then# negated again (false)
###Tests
It is important to remember that testing is meant to make your code
better and more maintainable, not to lead you into confusion or make you feel like you’re stuck doing busywork instead of doing real coding.
Also remember that if your solution seems difficult to test, it may be a sign that your design is not flexible enough to easily be refactored or interacted with. Writting your tests first and see them fail not only ensures that you are testing what you are actually trying to test but also allows you to write better code and do clean up before the code grows too large.
Try to remember that partial coverage is usually much better than no coverage at all.
If you are working on a very small program or library, and you want to be able to run your tests while in development, but then require the code as part of another program later, there is a simple idiom that is useful for embedding your tests:
Simply wrapping your tests in this if statement will allow running ruby foo.rb to execute your tests, while require "foo" will still work as expected without running the tests.
This can be useful for sharing small programs with others, or for writing some tests while developing a small prototype of a larger application. However, once you start to produce more than a few test cases, be sure to break things back out into their normal directory structure.
#####Test Helpers
Require statements and basic helper functions can be repetitive across your test files.
A good solution to keep things clean is to create a test/test_helpers.rb file and then do all of your global configuration there. In your individual tests, you can require this file
by expanding the direct path to it, using the following idiom:
requireFile.dirname(__FILE__) + '/test_helpers'
This allows your test files to be run individually from any directory, not just the top-level directory.
An example of a test helper file is the one in Prawn:
If you want a little more of a clean approach, you can wrap your helpers in a module, but depending on what you’re doing, just defining them at the top level might be fine as well.
#####Custom Assertions
We want to transform a basic statement that looks like this:
assertbob.current_zone.eql?(Zone.new("4"))
into:
assert_in_zone("4",bob)
Here’s how you would define assert_in_zone and its complement, assert_not_in_zone :
defassert_in_zone(expected,person)assert_block("Expected #{person.inspect} to be in Zone #{expected}")doperson.current_zone.eql?(Zone.new(expected))endenddefassert_not_in_zone(expected_zone,person)assert_block("Expected #{person.inspect} not to be in Zone #{expected}")do
!person.current_zone.eql?(Zone.new(expected))endend
This code attempts to be polite by removing the :file or :string option from the
options hash before delegating to the relevant constructor. This
It is a good practice to remove options from the options hash before delegating to another method to avoid providing the underlying methods with options that it may not handle properly.
For example:
#####Flexible arguments
When we have more than one optional argument the trouble arises when we want to use the default value for the first argument
and override the second one. It's simply not possible.
When you do find yourself needing this sort of thing, there are
most likely better options available, such as an options hash.
With a basic understanding of the underlying mechanics, you can begin to see the benefits of this style of API. Perhaps the most significant is that the order in which you specify the arguments doesn’t matter at all.
If we combine that feature with a basic idiom for setting default values passed in the hash, we come up with
defstory2(options={})options={person: "Yankee Doodle",animal: "Tiger"}.merge(options)"#{options[:person]} went to town, riding on a #{options[:animal]}"endstory2=>"Yankee Doodle went to town, riding on a Tiger"story2(person: "Joe Frasier")=>"Joe Frasier went to town, riding on a Tiger"story2(animal: "Kitteh")=>"Yankee Doodle went to town, riding on a Kitteh"story2(animal: "Kitteh",person: "Joe Frasier")=>"Joe Frasier went to town, riding on a Kitteh"
However, if one or more of your arguments are really mandatory, it’s
worth it to break them out, like so:
Though you could write code to ensure that certain options are present in a hash,
generally it is most natural to just let Ruby do the hard work for you by placing your
mandatory arguments before your options hash in your method definition.
#####Treating Arguments as an Array
When receiving arguments through an array you might need to perform some params validation. You might use something like this before passing the arguements to another method.
The following short list of guidelines will help you in designing your methods:
Try to keep the number of ordinal arguments in your methods to a minimum.
If your method has multiple parameters with default values, consider using pseudo-
keyword arguments via an options hash.
Use the array splat operator ( * ) when you want to slurp up your arguments and
pass them to another method.
The *args idiom is also useful for supporting multiple simultaneous argument
processing styles, as in Table() , but can lead to complicated code.
Don’t use *args when a normal combination of mandatory ordinal arguments and
an options hash will do.
If some parameters are mandatory, avoid putting them in an options hash, and
instead write a signature like foo(mandatory1, mandatory2, options={}) , unless
there is a good reason not to.
#####Blocks for Interface Simplification
Does it feel like the word “server” is written too many times in this code?
server=Server.newserver.handle(/hello/i){"Hello from server at #{Time.now}"}server.handle(/goodbye/i){"Goodbye from server at #{Time.now}"}server.handle(/name is (\w+)/){ |m| "Nice to meet you #{m[1]}!"}server.run
it would be nice to be able to write this instead:
Server.rundohandle(/hello/i){"Hello from server at #{Time.now}"}handle(/goodbye/i){"Goodbye from server at #{Time.now}"}handle(/name is (\w+)/){ |m| "Nice to meet you #{m[1]}!"}end
Keep the following things in mind when using blocks as part of your interface:
If you create a collection class that you need to traverse, build on top of
Enumerable rather than reinventing the wheel.
If you have shared code that differs only in the middle, create a helper method that
yields a block in between the pre/postprocessing code to avoid duplication of effort.
If you use the &block syntax, you can capture the code block provided to a method
inside a variable. You can then store this and use it later, which is very useful for
creating dynamic callbacks.
Using a combination of &block and instance_eval , you can execute blocks within
the context of arbitrary objects, which opens up a lot of doors for highly customized
interfaces.
The return value of yield (and block.call ) is the same as the return value of the
provided block.
#####Understand What method? and method! Mean
'?' Allows to query an object about things and make use of the response in conditionals. The return value is always some sort of logical boolean (the !! hack can be useful here to coerse values to their boolean representation).
'!' A common misconception is that we use the exclamation point when we want to let
people know we are modifying the receiving object.
Truthfully, the purpose of this convention is to mark a method as special. It doesn’t
necessarily mean that it will be destructive or dangerous, but it means that it will require more attention than its alternative. This is why it doesn’t make much sense to have some method foo!() without a corresponding foo() method that does something similar. So essentially, if you have only one way of doing something destructive, write this:
classMessagedefdestroy#...endend
Instead of this:
classMessagedefdestroy!#...endend
#####Make Use of Custom Operators
In Ruby most operators are actually just syntactic sugar for ordinary methods.
A good habit to get into is to have your << method return the object itself, so the calls can be chained, as just shown.
Another good operator to know about is the spaceship operator ( <=> ), mainly because
it allows you to make use of Comparable , which gives you a host of comparison methods: < , <= , == , != , >= , > , and between?() .
The spaceship operator should return -1 if the current object is less than the object it is being compared to, 0 if it is equal, and 1 if it is greater. Most of Ruby’s core objects that can be meaningfully compared already have <=> implemented, so it’s often simply a matter of delegating to them, as shown here:
You can, of course, override some of the individual operators that Comparable provides, but its defaults are often exactly what you need.
Most operators you use in Ruby can be customized within your objects. Whenever you
find yourself writing append() when you really want << , or add() when you really want
, consider using your own custom operators.
Use attr_reader , attr_writer , and attr_accessor whenever possible, and avoid
writing your own accessors unless it is necessary.
Consider ending methods that are designed to be used in conditional statements
with a question mark.
If you have a method foo() , and a similar method that does nearly the same thing
but requires the user to pay more attention to what’s going on, consider calling it
foo!() .
Don’t bother creating a method foo!() if there is not already a method called
foo() that does the same thing with less severe consequences.
If it makes sense to do so, define custom operators for your objects.
###Blank Slate
A BlankSlate is an object without much of anything. A skinny class with a minimal number of methods is called a Blank Slate. As it turns out, Ruby has a ready-made Blank Slate for you to use called BasicObject. Inheriting from BasicObject is the quicker way to define a Blank Slate in Ruby.
These methods form the lowest common denominator for Ruby, so BasicObject is
pretty reasonable in its offerings. The key thing to remember is that a BasicObject is
fully defined by this limited set of features, so you shouldn’t expect anything more than
that.
###Building Flexible Interfaces (DSL)
A flexible domain-specific interface strips away as much boilerplate code as possible so that every line expresses something
meaningful in the context of our domain.
#####Making instance_eval() Optional
Code like this:
However, there is a limitation that comes with this sort of interface. Because we are
evaluating the block in the context of a Document instance, we do not have access to
anything but the local variables of our enclosing scope (which is the Prawn::Document::generate method). This means the following code
won’t work:
classMyBestFrienddefinitialize@first_name="Paul"@last_name="Mouzas"enddeffull_name"#{@first_name}#{@last_name}"enddefgenerate_pdfPrawn::Document.generate("friend.pdf")dotext"My best friend is #{full_name}"endendend
The problem is that blocks are generally closures. And you expect them to actually be full closures. And it's not obvious from the point where you write the block that that block might not be a full closure. That's what happens when you use instance_eval: you reset the self of that block into something else - this means that the block is still a closure over all local variables outside the block, but NOT for method calls.
See:
the code is an ordinary closure, and as such, can access the instance methods and
variables of the enclosing scope. The call would now look like this:
classMyOtherBestFrienddefinitialize@first_name="Pete"@last_name="Johansen"enddeffull_name"#{@first_name}#{@last_name}"enddefgenerate_pdfPrawn::Document.generate("friend.pdf")do |doc|
doc.text"My best friend is #{full_name}"endendend
Another, arguably less clean, solution would be to delegate to the original context method calls to which the new context (instance obj on which instance_eval is executed) doesn't respond, like this:
which simply end up delegating to other methods called "some_method" and "stroke" you can dynamically intercept methods calls in order to avoid creating so many similar methods by using method_missing() and send().
Here is how you can do it:
# Provides the following shortcuts:## stroke_some_method(*args) #=> some_method(*args); stroke# fill_some_method(*args) #=> some_method(*args); fill# fill_and_stroke_some_method(*args) #=> some_method(*args); fill_and_stroke## See; https://gist.github.com/francisco-rojas/453961a0384869e6cee8# for some basics on regex in ruby#defmethod_missing(id,*args,&block)case(id.to_s)when/^fill_and_stroke_(.*)/send($1,*args,&block);fill_and_strokewhen/^stroke_(.*)/send($1,*args,&block);strokewhen/^fill_(.*)/send($1,*args,&block);fillelsesuperendend
It’s important to note that when the patterns do not match, super is called. This allows
objects up the chain to do their own method_missing handling, including the default,
which raises a NoMethodError . This prevents something like pdf.the_shiny_kitty from
failing silently, as well as the more subtle pdf.fill_circle .
Also, this code will happily accept pdf.fill_and_stroke_start_new_page or even
pdf.stroke_stroke_stroke_line without complaining. Any time you use the
method_missing hook, these are the trade-offs you must be willing to accept since making your hooks too robust would otherwise defeat the purpose of this.
#####Dual-Purpose Accessors
When working with code that has an instance_eval-based interface you need to disambiguate between local variables and method calls which can ruin your style.
Prawn::Document.generate("accessors.txt")doself.font_size=10text"The font size is now #{font_size}"end
It’s possible to make this look much nicer, as you can see:
Prawn::Document.generate("accessors.txt")dofont_size10text"The font size is now #{font_size}"end
We can use Ruby’s default argument syntax to determine whether we’re supposed to be getting or setting the attribute:
use alias_method here instead of attr_writer to ensure there won’t be any difference between the following two lines of code:
pdf.font_size=16pdf.font_size(16)
Summary of tips given so far to build Domain-Specific Interfaces:
As mentioned in the previous chapter, using instance_eval is a good base for writ-
ing a domain-specific interface, but has some limitations.
You can use a Proc#arity check to provide the user with a choice between
instance_eval and yielding an object.
If you want to provide shortcuts for certain sequences of method calls, or dynamic
generation of methods, you can use method_missing along with send() .
When using method_missing , be sure to use super() to pass unhandled calls up the
chain so they can be handled properly by other code, or eventually raise a
NoMethodError .
Normal attribute writers don’t work well in instance_eval -based interfaces. Offer
a dual-purpose reader/writer method, and then alias a writer to it, and both external
and internal calls will be clear.
###Implementing Per-Object Behavior (building a simple stubbing system for use in testing)
Class methods are actually just per- object behavior on an instance of the class Class.
The goal is to create a system that will generate canned responses to certain method
calls, without modifying their original classes. This is an important feature, because we
don’t want our stubbed method calls to have a global effect during testing.
Each object hides its individual space for method definitions (called a singleton class) from plain view. However, we can reveal it by using a special syntax:
So when write "class << user; self; end" , you’re just asking the object to give back its singleton class. With that in hand, we can define methods on it.
Remember that the block passed to define_method() is a closure, which allows to access the local variables of the enclosing scope. This is why we can pass the return value as a parameter to Stubber.stubs() and have it returned from our dynamically defined method.
Using per-object behavior usually makes the most sense when you don’t want to
define something at the per-class level.
Objects in Ruby may have individually customized behaviors that can replace,
supplement, or amend the functionality provided by their class definitions.
Per-object behavior (known as singleton methods), can be implemented by gaining
access to the singleton class of an object using the class << obj notation.
define_method is made private on singleton classes, so send() is needed to utilize it.
When implementing nondynamic per-object behavior, the familiar def
obj.some_method syntax may be used.
#####Extending and Modifying Preexisting Code
###Adding new functionality
Although, adding functionality to a class definition is usually considered safer than overriding functionality it is not without dangers. If you can extend predefined objects for your own needs, so can everyone else, including any of the libraries you may depend on.
One common problem when adding functionality are "name clashes" where whatever code is loaded last takes precedence.
Whenever you are reopening a class to extend it, it is a good practice to always check first for the existence of methods with the same name as the methods you want to define and throw an error if such methods are found.
For example:
classNumeric[:in,:ft].eachdo |e|
ifinstance_methods.include?(e)raise"Method '#{e}' exists, PDF Conversions will not override!"endenddefinself * 72enddefftself.in * 12endend
this code will define the methods as expected if there are no existing methods with the same name already, otherwise it throws and explicit error so we know what is happening instead of just overriding the methods silently which could cause unintended behaviour.
The ideal situation is for both libraries to use this technique, because then, regardless of the order in which they are required, the incompatibility between dependencies will be quickly spotted.
###Modification via Aliasing
You can use alias_method for the purpose of making a new name point at an old method. This of course is where the feature gets its name: allowing you to create aliases for your methods. But another interesting aspect of alias_method is that it doesn’t simply create a new name for a method it makes a copy of it. The best way to show what this means is through a trivial code example:
# define a methodclassFoodefbar"baz"endendf=Foo.newf.bar#=> "baz"# Set up an aliasclassFooalias_method:kittens,:barendf.kittens#=> "baz"# redefine the original methodclassFoodefbar"Dog"endendf.barf.kittens#=> "Dog"#=> "baz"
As you can see here, even when we override the original method bar() , the alias kittens() still points at the original definition. This turns out to be a tremendously useful feature.
This is how RubyGems patches the kernel#require method using aliasing:
This is a great example of responsible modification to a preexisting method. This code does not change the signature of the original method, nor does it change the possible return values or failure states. All it does is add some new intermediate functionality that will be transparent to the user if it is not needed. This code only gets executed if required and when it gets executed but fails it raises the original error that the method being overriden would raise.
It also has a bit of a limitation, in that you need to keep coming up with new aliases, as aliases are subject to collision just the same as ordinary methods are.
For example, although this code works fine:
classAdefcount"one"endalias_method:one,:countdefcount"#{one} two"endalias_method:one_and_two,:countdefcount"#{one_and_two} three"endendA.new.count#=> "one two three"
You can introduce infinite recursion by aliasing an old method twice to the same name. However there is a work around this issue, per object modification. If we move our modifications from the per-class level to the per-object level, we end up with a pretty nice solution that gets rid of aliasing entirely, and simply leverages Ruby’s ordinary method resolution path. Here is how:
classAdefcount"one"endendmoduleAppendTwodefcount"#{super} two"endendmoduleAppendThreedefcount"#{super} three"endenda=A.newa.extend(AppendTwo)a.extend(AppendThree)a.count#=> "one two three"
Provided that all the code used by your application employs this approach instead of
aliased method chaining, you end up with two main benefits: a pristine original class
and no possibility for collisions. Because the amended functionality is included at the
instance level, rather than in the class definition, you don’t risk breaking other people’s
code as easily, either.
Note that not every single object can be meaningfully extended this way. Any objects
that do not allow you to access their singleton space cannot take advantage of this
technique. This mostly applies to things that are immediate values, such as numbers
and symbols. But more generally, if you cannot use a call to new() to construct your
object, chances are that you won’t be able to use these tricks. In those cases, you’d need
to revert to aliasing.
All classes in Ruby are open, which means that object definitions are never final-
ized, and new behaviors can be added at runtime.
To avoid clashes, conditional statements utilizing reflective features such as
instance_methods and friends can be used to check whether a method is already
defined before overwriting it.
When intentionally modifying code, alias_method can be used to make a copy of
the original method to fall back on.
Whenever possible, per-object behavior is preferred. The extend() method comes
in handy for this purpose.
###Building Classes and Modules Programmatically
#####Parameterized subclassing and conditional inheritance.
defMystery(secret)ifsecret == "chunky bacon"Class.newdodefmessage"You rule!"endendelseClass.newdodefmessage"Don't make me cry"endendendend
Notice here that we call Class.new() with a block that serves as its class definition. New anonymous classes are generated on every call.
classWin < Mystery"chunky bacon"defwho_am_i"I am win!"endendclassEpicFail < Mystery"smooth ham"defwho_am_i"I am teh fail"endenda=Win.newa.message#=> "You rule!"a.who_am_i#=> "I am win!"b=EpicFail.newb.message#=> "Don't make me cry"b.who_am_i#=> "I am teh fail"
We can see that Mystery() conditionally chooses which class to inherit from. Furthermore, the classes generated by Mystery() are anonymous, meaning they don’t have some constant identifier out there somewhere, and that the method is actually generating class objects, not just returning references to preexisting definitions. Finally, we can see that the subclasses behave ordinarily, in the sense that you can add custom functionality to them as needed.
Here is another example from the Ruport gem:
classMyReport < Fatty::Formatterrequired_params:first_name,:last_namehelpersdodeffull_name"#{params[:first_name]}#{params[:last_name]}"endendformat:txtdodefrender"Hello #{full_name} from plain text"endendformat:pdf,:base=>Prawn::FattyFormatdodefrenderdoc.text"Hello #{full_name} from PDF"doc.renderendendend
This code works because of a couple of methods that take charge of Dynamic Subclassing and Anonymous class/method creation, here are these methods:
# this class method is called with just a block, it generates an anonymous subclass# of Fatty::Format , and then stores this subclass keyed by extension name in the# formats hash.Additionally the options[:base] allows to inherit from another class# that is not Fatty::Formatdefformat(name,options={}, &block)formats[name]=Class.new(options[:base] || Fatty::Format, &block)end# modules can also be built up anonymously using a block# here you can either pass a block to define the anonymous module or# pass the name of the module you'd like to mixindefhelpers(helper_module=nil, &block)@helpers=helper_module || Module.new(&block)enddefrender(format,params={})validate(format,params)# This line uses the formats hash to look up our anonymous class by extension name.format_obj=formats[format].new# This line mixes in our helper moduleformat_obj.extend(@helpers)if@helpersformat_obj.params=paramsformat_obj.validateformat_obj.renderend
In summary:
Classes and modules can be instantiated like any other object. Both constructors accept a block that can be used to define methods as needed.
To construct an anonymous subclass, call Class.new(MySuperClass).
Parameterized subclassing can be used to add logic to the subclassing process, and essentially involves a method returning a class object, either anonymous or explicitly defined.
###Detecting Newly Added Functionality
You can detect when new methods are added to a class with the method_added() hook.
classObjectclass << selfalias_method:blank_slate_method_added,:method_added# Detect method additions to Object and remove them in the# BlankSlate class.defmethod_added(name)# save the result of the original method_added methodresult=blank_slate_method_added(name)returnresultifself != Object# hide the method if the method is being added to ObjectBlankSlate.hide(name)# return the result of the original callresultendendend
You’d think that would do the trick, but as it turns out, Object includes the module
Kernel . This means we need to track changes over there too, using nearly the same
approach:
moduleKernelclass << selfalias_method:blank_slate_method_added,:method_added# Detect method additions to Kernel and remove them in the# BlankSlate class.defmethod_added(name)result=blank_slate_method_added(name)returnresultifself != KernelBlankSlate.hide(name)resultendendend
However, there isanother problem: inclusion of modules into Object at runtime. Every module included in an object is like a back door for future expansion. So we end up jumping up one level higher to take care of module inclusion dynamically:
In the case where a module is mixed into Object , BlankSlate needs to wipe out the instance methods added to its own class definition. After this, it returns the result of the original append_features() call.
###Tracking Inheritance
When you write unit tests via Test::Unit , you typically just subclass Test::Unit::Test Case , which figures out how to find your tests for you.
We must first identify each subclass as a test case, and store it in an array until
SimpleTestHarness.run is called. Like Test::Unit and other common Ruby testing
frameworks, we’ll wipe the slate clean by reinstantiating our tests for each test method,
running a setup method if it exists. We will follow the Test::Unit convention and run
only the methods whose names begin with test_.
###Tracking Mixins
This common ruby idiom automatically define instances and class methods in the class that includes the module
moduleMyFeaturesmoduleClassMethodsdefsay_hello"Hello"enddefsay_goodbye"Goodbye"endenddefself.included(base)base.extend(ClassMethods)enddefsay_hello"Hello from #{self}!"enddefsay_goodbye"Goodbye from #{self}"endend# MyFeaturesclassAincludeMyFeaturesend
If you are making changes to any hooks at the top level, be sure to safely modify
them via aliasing, so as not to globally break their behavior.
Hooks can be implemented on a particular class or module, and will catch every-
thing below them.
Most hooks either capture a class, a module, or a name of a method and are exe-
cuted after an event takes place. This means that it’s not really possible to intercept
an event before it happens, but it is usually possible to undo one once it is.
x=nil#=>nilx ||= "default"#=>"default" : value of x will be replaced with "default", but only if x IS nil or falsex ||= "other"#=>"default" : value of x is not replaced if it already IS NOT nil or false
&&=
x=nil#=>nil x &&= "default"#=>nil : value of x will be replaced with "default", but only if x IS NOT nil or falsex="default"#=>"default"x &&= "Lorem Ipsum"#=>"Lorem Ipsum"
#####Line-Based File Processing with State Tracking
#####Laziness Can Be a Virtue (A Look at lazy.rb)
In essence, code is said to be evaluated lazily if it is executed only at the time it is actually
needed, not at the time it was defined such as with proc objects in ruby.
A powerful library for lazy evaluation is lazy.rb (http://moonbase.rydia.net/software/lazy.rb/). It can be used to avoid having to build special accessors for our instance variables using the ||= technique when what we’re ultimately doing is setting default values for them, which is normally something we do in our constructor.
All that promise() does is return a proxy object that wraps a block of code that is designed to be executed later.Once you call any methods on this object, it passes them along to whatever your block evaluates to.
Here is an example of a basic lazy promise object implementation:
#####Minimizing Mutable State and Reducing Side Effects
To write stateless side-effect-free code in Ruby we need to create a new object every single time an element gets added to an array. However, objects are large in Ruby, and constructing them is a slow process. What’s more, if we don’t store any of these intermediate values, we risk getting the garbage collector churning frequently to kill off our discarded objects.
Ruby lacks tail call optimization thus making it a very inefficient language to work with recursive functions. However, an important thing to remember is that any recursive solution can be rewritten iteratively.
Avoiding side effects is different than avoiding mutable state entirely. In Ruby, as long as it makes sense to do so, avoiding side effects is a good thing. It reduces the possibility for unexpected bugs much in the same way that avoiding the use of global variables does. However, avoiding the use of mutable state definitely depends more on your individual situation.
If a stateless (possibly recursive) code looks better than other solutions, and per
formance is not a major concern, don’t be afraid to write your code in the more elegant
way.
The simple way to avoid side effects in Ruby when transforming one object to
another is to create a new object, and then populate it by iterating over your original
object performing the necessary state transformations.
You can write stateless code in Ruby by creating new objects every time you per-
form an operation, such as Array#+ .
Recursive solutions may aid in writing simple stateless solutions, but incur a major
performance penalty in Ruby.
Creating too many objects can create performance problems as well, so it is im-
portant to find the right balance, and to remember that side effects can be avoided
without making things fully stateless.
#####Modular Code Organization
Functions can be unified into a single namespace by using modules. Like in the Math module:
You can implement code like that by using module_function
moduleAmodule_functiondeffoo"This is foo"enddefbar"This is bar"endend
which allows you to call functions directly on the module, like this:
>> A.foo=>"This is foo"
>> A.bar=>"This is bar"
However, this approach does come with some limitations, because it does
not allow you to use private functions:
moduleAmodule_functiondeffoo"This is foo calling baz: #{baz}"enddefbar"This is bar"endprivatedefbaz"hi there"endend
>> A.fooNameError: undefinedlocalvariableormethod'baz'forA:Modulefrom(irb):33:in'foo'from(irb):46from/Users/sandal/lib/ruby19_1/bin/irb:12:in'<main>'
However, because Modules in Ruby, although they cannot be instantiated, are in essence ordinary objects. Because of this, there is nothing stopping us from mixing a module into itself:
moduleAextendselfdeffoo"This is foo calling baz: #{baz}"enddefbar"This is bar"endprivatedefbaz"hi there"endend
>> A.foo=>"This is foo calling baz: hi there"
>> A.bazNoMethodError: privatemethod'baz'calledforA:Modulefrom(irb):65from/Users/sandal/lib/ruby19_1/bin/irb:12:in'<main>'
Using this trick of extending a module with itself provides us with a structure that isn’t
too different (at least on the surface) from the sort of modules you might find in func-
tional programming languages. But aside from odd cases such as the Math module, you
might wonder when this technique would be useful.
For the most part, classes work fine for encapsulating code in Ruby. Traditional in
heritance combined with the powerful mixin functionality of modules covers most of
the bases just fine. However, there are definitely cases in which a concept isn’t big
enough for a class, but isn’t small enough to fit in a single function. Also, modules are usually used when all you need to do is call methods on an object instead of storing variables too as you would do in a class.
However, as soon as you see the same argument being passed to a bunch of functions, you might be running into a situation where some persistence of state wouldn’t hurt. The good news is, if need arises for expansion down the line, converting code that has
been organized into a module into a class is somewhat trivial (just change the class keyword for module, remove the extend self statement and add you own initializer if necessary).
Check the book for an example of when to use modules instead of classes
Modules introduce a clear separation of concerns that help make testing
much easier. They also left room for future expansion and modification without tight
coupling.
Here are a few things to watch for that indicate this technique may be
the right way to go:
You are solving a single, atomic task that involves lots of steps that would be better
broken out into helper functions.
You are wrapping some functions that don’t rely on much common state between
them, but are related to a common topic.
The code is very general and can be used standalone or the code is very specific but
doesn’t relate directly to the object that it is meant to be used by.
The problem you are solving is small enough where object orientation does more
to get in the way than it does to help you.
Because modular code organization reduces the amount of objects you are creating, it
can potentially give you a decent performance boost. This offers an incentive to use
this approach when it is appropriate.
#####Memoization
In Ruby, the trivial implementation of the Fibonacci sequence might look like this:
However, you’ll feel the pain that is relying on deep recursion in Ruby if you compute
even modest values of n. However, there is a special characteristic of functions like this that makes it possible to speed them up drastically.
In mathematics, a function is said to be well defined if it consistently maps its input to
exactly one output. This is obviously true for fib(n) , as fib(6) will always return 8 , no
matter how many times you compute it. This sort of function is distinct from one that
is not well defined, such as the following:
If we run this code a few times with the same n , we see there isn’t a unique relationship
between its input and output.
When we have a function like this, there isn’t much we can assume about it. However,
well-defined functions such as fib(n) can get a massive performance boost almost for
free.
If your mind wandered to tail-call optimization or rewriting the function iteratively,
you’re thinking too hard. However, the idea of reducing the amount of recursive calls
is on track. As it stands, this code is a bad dream, as fib(n) is called five times when
n =3 and nine times when n =4, with this trend continuing upward as n gets larger.
The key realization is that: fib(6) is always going to be 8 , and
fib(10) is always going to be 55 . Because of this, we can store these values rather than
calculate them repeatedly.
From Introduction to algorithms / Thomas H. Cormen . . . [et al.].—3rd ed.
Chapter 15, Dynamic Programming
Dynamic programming applies when the subproblems overlap—that is, when subproblems
share subsubproblems. In this context, a divide-and-conquer algorithm does more work than
necessary, repeatedly solving the common subsubproblems. A dynamic-programming algorithm
solves each subsubproblem just once and then saves its answer in a table, thereby avoiding
the work of recomputing the answer every time it solves each subsubproblem.
We typically apply dynamic programming to optimization problems. Such problems can have many
possible solutions. Each solution has a value, and we wish to find a solution with the
optimal (minimum or maximum) value. We call such a solution an optimal solution to the
problem, as opposed to the optimal solution, since there may be several solutions that
achieve the optimal value. When developing a dynamic-programming algorithm, we follow a
sequence of four steps:
1. Characterize the structure of an optimal solution.
2. Recursively define the value of an optimal solution.
3. Compute the value of an optimal solution, typically in a bottom-up fashion.
4. Construct an optimal solution from computed information.
Steps 1–3 form the basis of a dynamic-programming solution to a problem. If we
need only the value of an optimal solution, and not the solution itself, then we
can omit step 4. When we do perform step 4, we sometimes maintain additional
information during step 3 so that we can easily construct an optimal solution.
What we have done is used a technique called memoization to cache the return values
of our function based on its input. Because we were caching a sequence, it’s reasonable
to use an array here, but in other cases in which the data is more sparse, a hash may be
more appropriate.
Another example of memoization are the following functions that convert rgb values to their equivalent hexadecimal values and viceversa:
defrgb2hex(rgb)# see Kernel#format and String#% for more detailsrgb.map{ |e| "%02x" % e}.joinenddefhex2rgb(hex)r,g,b=hex[0..1],hex[2..3],hex[4..5][r,g,b].map{ |e| e.to_i(16)}end
>> rgb2hex([100,25,254])=>"6419fe"
>> hex2rgb("6419fe")=>[100,25,254]
Although these methods aren’t especially complicated, they represent a decent use case
for caching via memoization. Colors are likely to be reused frequently and, after they
have been translated once, will never change. Therefore, rgb2hex() and hex2rgb() are
well-defined functions.
As it turns out, Ruby’s Hash is a truly excellent cache object. Here is the memoized version:
when running under a tight loop, the memoization can really make a big difference in these functions, and may be worth the minimal noise introduced by adding a Hash into the mix.
The Memoizable module, is designed to abstract the task of creating a cache to the point at which you simply mark each function that should be memoized similar to the way you mark something public or private.
Memoizable works by making a copy of your function, renaming it as unmemoized_method_name , and then injects its automatic caching in
place of the original function. That means that when we call rgb2hex() or hex2rgb() ,
we’ll now be hitting the cached versions of the functions.
This is pretty exciting, as it means that for well-defined functions, you can use
Memoizable to get a performance boost without even modifying your underlying im-
plementation.
Although Memoizable is predictably slower than our raw implementation, it is still
cooking with gas when compared to the uncached versions of our functions. What we
are seeing here is the overhead of an additional method call per request, so as the
operation becomes more expensive, the cost of Memoizable actually gets lower. Also, if
we look at things in terms of work versus payout, Memoizable is the clear winner, due
to its ability to transparently hook itself into your functions.
Functions that are well defined, where a single input consistently produces the
same output, can be cached through memoization.
Memoization often trades CPU time for memory, storing results rather than recal-
culating them. As a result, memoization is best used when memory is cheap and
CPU time is costly, and not the other way around. In some cases, even when the
memory consumption is negligible, the gains can be substantial. We can see this
in the fib(n) example, which is transformed from an exponential algorithm to a
linear one simply by storing the intermediate calculations.
When coding your own solution, Hash.new ’s block form can be a very handy way
of putting together a simple caching object.
James Gray’s Memoizable module makes it trivial to introduce memoization to well-
defined functions without directly modifying their implementations, but incurs a
small cost of indirection over an explicit caching strategy.
#####Infinite Lists
Infinite lists (also known as lazy streams) provide a way to represent arbitrary sequences
that can be traversed by applying a certain function that gets to you the next element
for any given element in the list. For example, if we start with any even number, we can
get to the next one in the sequence by simply adding 2 to our original element.
The key innovation is that we’ve turned an external iteration and state transformation
into an internal one.
Infinite lists essentially consist of nodes that contain a value along with a procedure
that will transform that value into the next element in the sequence.
Infinite lists are lazily evaluated, and thus are sometimes called lazy streams.
An infinite list might be an appropriate structure to use when you need to iterate
over a sequential list in groups at various points in time, or if you have a general
function that can be tweaked by some parameters to fit your needs.
For data that is sparse, memoization might be a better technique than using an
infinite list.
When you need to do filtering or state transformation on a long sequence of ele-
ments that have a clear relationship from one to the next, a lazy stream might be
the best way to go.
Also check Section 7: Doing Something Cool with Closures. At: https://innig.net/software/ruby/closures-in-ruby.html.
To see how to make a data structure containing all of the Fibonacci numbers called in a lazy way.
See http://nithinbekal.com/posts/ruby-tco/ for mor details on tail call optimization and how to enable TCO in ruby.
One major problem about TCO is that TCO messes up the stack traces, and therefore makes debugging harder. However, Ruby allows you to optionally enable it, even though it’s not the default.
#####Higher-Order Procedures
Currying
#####Higher-Order Procedures
Currying
In mathematics and computer science, currying is the technique of translating the evaluation of a function that takes multiple arguments into evaluating a sequence of functions, each with a single argument (partial application). Currying is converting a single function of n arguments into n functions with a single argument each.
Said another way, currying means breaking a function with many arguments into a series of functions that each take one argument and ultimately produce the same result as the original function.
In order to get the full application of f(x,y,z), you need to do this:
f(x)(y)(z);
Many functional languages let you write f x y z. If you only call f x y or f(x)(y) then you get a partially-applied function—the return value is a closure of lambda(z){z(x(y))} with passed-in the values of x and y to f(x,y).
Check this function f which takes 3 params x,y,z
f(x,y,z) = 4*x+3*y+2*z
Currying means that we can rewrite the function as a composition of 3 functions(a function for each param):
f(x)(y)(z) = 2*z+(3*y+(4*x))
The direct use of this is what is called Partial Function where if you have a function that accepts n parameters then you can generate from it one or more functions with some parameter values already filled in.
Currying and partial application are often confused to be the same when in fact they are not. Where partial application takes a function and from it builds a function which takes fewer arguments, currying builds functions which take multiple arguments by composition of functions which each take a single argument.
In ruby Proc#curry returns a curried proc. If the optional arity argument is given, it determines the number of arguments. A curried proc receives some arguments. If a sufficient number of arguments are supplied, it passes the supplied arguments to the original proc and returns the result. Otherwise, returns another curried proc that takes the rest of arguments.
b=proc{|x,y,z| (x||0) + (y||0) + (z||0)}b.curry.(1,2,3)#=> 6pb.curry[1][2][3]#=> 6pb.curry[1][2][3]#=> 6pb.curry[1].(2).call(3)#=> 6pb.curry[1,2][3,4]#=> 6pb.curry(5)[1][2][3][4][5]#=> 6pb.curry(5)[1,2][3,4][5]#=> 6pb.curry(1)[1]#=> 1pb.curry(2)[1][2]#=> 3b=lambda{|x,y,z| (x||0) + (y||0) + (z||0)}pb.curry[1][2][3]#=> 6pb.curry[1,2][3,4]#=> wrong number of arguments (4 for 3)pb.curry(5)#=> wrong number of arguments (5 for 3)pb.curry(1)#=> wrong number of arguments (1 for 3)For:
b=proc{|x,y,z| (x||0) + (y||0) + (z||0)}Currygenerates3functions(partialapplications)eachreceivingoneparameterasfollowing:
curried=b.curry# this calls the first partial application functionpartial_application1=curried.(1)=>#<Proc:0x00000001113388> # this calls the second partial application functionpartial_application2=partial_application1.(2)=>#<Proc:0x00000001335358># this calls the third (and last) partial application function thus returning the final resultresult=partial_application1.(3)=>6
Here is another example:
sum=lambdado |f,a,b|
s=0;a.upto(b){|n| s += f.(n)};send# generate the curryingcurrying=sum.curry# Generate the partial functionssum_ints=currying.(lambda{|x| x})sum_of_squares=currying.(lambda{|x| x**2})sum_of_powers_of_2=currying.(lambda{|x| 2**x})putssum_ints.(1,5)#=> 15putssum_ints.(1).(5)#=> 15putssum_of_squares.(1,5)#=> 55putssum_of_powers_of_2.(1,5)#=> 62
In mathematics and computer science, a higher-order function is a function that does at least one of the following:
takes one or more functions as an input
outputs a function
A function is said to be a higher-order function if it accepts another function as input
or returns a function as its output.
In ruby, Object#to_proc is a generic hook. This means Symbol#to_proc isn’t special, and we can build our own custom objects that do even cooler tricks than it does.
The place I use this functionality all the time is in Rails applications where I need to
build up filter mechanisms that do some of the work in SQL, and the rest in Ruby.
We can then construct a Filter object and assign constraints to it on the fly:
filter=Filter.newfilter.constraint{ |x| x > 10}filter.constraint{ |x| x.even?}filter.constraint{ |x| x % 3 == 0}
Now, when dealing with an Enumerable object, it is easy to filter the data based on our
constraints:
p(8..24).select(&filter)#=> [12,18,24]
As we add more constraints, new blocks are generated for us, so things work as
expected:
filter.constraint{ |x| x % 4 == 0}p(8..24).select(&filter)#=> [12,24]
As you can see, Symbol#to_proc isn’t the only game in town. Any object that can mean-
ingfully be reduced to a function can implement a useful to_proc method.
First, identify the different scenarios that apply to a given feature.
Enumerate over these scenarios to identify which ones are affected by defects and
which ones work as expected. This can be done in many ways, ranging from print-
ing debugging messages on the command line to logfile analysis and live application
testing. The important thing is to identify and isolate the cases affected by the bug.
Hop into irb if possible and take a look at what your objects actually look like under
the hood. Experiment with the failing scenarios in a step-by-step fashion to try to
dig down and uncover the root cause of problems.
Write tests to reproduce the problems you are having, along with what you expect
to happen when the issue is resolved.
Implement a fix that passes the tests, and then repeat the process until all issues
are resolved.
Capturing the Essence of a Defect
The main idea is that if you remove all the extraneous code that is unrelated to
the issue, it will be easier to see what is really going on. As you continue to investigate
an issue, you may discover that you can reduce the example more and more based on
what you learn.
Most bugs aren’t going to show up in the first place you look. Instead, they’ll often be
hidden farther down the chain, stashed away in some low-level helper method or in
some other code that your feature depends on.
Whenever you are hunting for bugs, the practice of reducing your area of interest first
will help you avoid dead ends and limit the number of possible places in which you’ll
need to look for problems. Before doing any formal investigation, it’s a good idea to
check for obvious problems so that you can get a sense of where the real source of your
defect is. Some bugs are harder to catch on sight than others, but there is no need to
overthink the easy ones.
The main benefit of an automated test is that it will explode when your code fails to
act as expected. It is important to keep in mind that even if you have an existing test
suite, when you encounter a bug that does not cause any failures, you need to update
your tests. This helps prevent regressions, allowing you to fix a bug once and forget
about it.
Once we write a test that reproduces our problem, the way we fix it is to get our tests
passing again. If other tests end up breaking in order to get our new test to pass, we
know that something is still wrong. If for some reason our problem isn’t solved when
we get all the tests passing again, it means that our reduced example probably didn’t
cover the entirety of the problem, so we need to go back to the drawing board in those
cases. Even still, not all is lost. Each test serves as a significant reduction of your problem
space. Every passing assertion eliminates the possibility of that particular issue from
being the root of your problem. Sooner or later, there won’t be any place left for your
bugs to hide.
Scrutinizing Your Code
Utilizing Reflection
We can infer a lot about an object by using Ruby’s reflective capabilities:
The whole situation here would be better if we had easier-to-read inspect
output. There is actually a standard library called pp that improves the formatting of
inspect while operating in a very similar fashion.
The output of Kernel#p can be improved on an object-by-object basis.
This may be obvious if you have used Object#inspect before, but it is also a severely
underused feature of Ruby.
which is way easier to read.
To accomplish this, here is a template that allows you to pass
in a couple of arrays of symbols that point at instance variables:
After mixing this into Prawn::Document , I need only to specify which variables I want
to display the entire contents of, and which I want to just show as references. Then, it
is as easy as calling __inspect_template with these values
Once we provide a customized inspect method that returns a string, both Kernel#p and
irb will pick up on it, yielding the nice results shown earlier.
The yaml data serialization standard library has the nice side effect of producing highly
readable representations of Ruby objects. Because of this, it actually provides a
Kernel#y method that can be used as a stand-in replacement for p . Although this may
be a bit strange, if you look at it in action, you’ll see that it has some benefits:
YAML automatically truncates repeated object references by referring
to them by ID only. This turns out to be especially good for tracking down a certain
kind of Ruby bug:
Here, it’s easy to see that the six subarrays that make up our main array are actually
just six references to the same object. And in case that wasn’t the goal, we can see the
difference when we have six distinct objects very clearly in YAML:
Finding Needles in a Haystack
If you have a big collection of objects some of which may be corrupted you can use
the following code to identify the corrupted records and decide what to do based on that:
data.select.with_indexdo |e,i|
beginInteger(e[:payment]) > 1000rescueArgumentErrorp[e,i]raise# optionally comment this line to identify all corrupted recordsendend[{:name=>"Mr. Clotilde Baumbach",:phone_number=>"(608)779-7942",:payment=>"1991.25"},91]ArgumentError: invalidvalueforInteger: "1991.25"from(irb):67:in'Integer'from(irb):67:in'block in irb_binding'from(irb):65:in'select'from(irb):65:in'with_index'from(irb):65from/Users/sandal/lib/ruby19_1/bin/irb:12:in'<main>'
#####Working with Logger
#####Working with Logger
I’ll show you how to replicate a bit of
functionality that is especially common in Ruby’s web frameworks: comprehensive
error logging.
To demonstrate this, we’ll be walking through a TCPServer that
does simple arithmetic operations in prefix notation. We’ll start by taking a look at it
without any logging or error-handling support:
We can use the following fairly generic client to interact with the server, which is similar
to the one we used in Chapter 2, Designing Beautiful APIs:
When we send the erroneous third message, the server never responds, resulting in a
nil response. But when we try to send a fourth message, which would ordinarily be
valid, we see that our connection was refused. If we take a look server-side, we see that
a single uncaught exception caused it to crash immediately:
Ten years ago, a book on best practices for any given programming language would
seem perfectly complete without a chapter on multilingualization (m17n) and locali-
zation (L10n). In 2009, the story is just a little bit different.
From: https://blog.mozilla.org/l10n/2011/12/14/i18n-vs-l10n-whats-the-diff/
* Internationalization (i18n).
* Localization (l10n).
* Globalization (g11n).
* Localizability (l12y).
W3C said it best when they wrote the following:
__“Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.
Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).”__
In other words, i18n allows applications to support and satisfy the needs of multiple locales, thus “enabling” l10n.
From: http://www.w3.org/International/questions/qa-i18n.en
What do the terms 'internationalization' and 'localization' mean, and how are they related?
Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).
**Localization(l10n)**
Localization is sometimes written as l10n, where 10 is the number of letters between l and n.
Often thought of only as a synonym for translation of the user interface and documentation, localization is often a substantially more complex issue. It can entail customization related to:
Numeric, date and time formats
Use of currency
Keyboard usage
Collation and sorting
Symbols, icons and colors
Text and graphics containing references to objects, actions or ideas which, in a given culture, may be subject to misinterpretation or viewed as insensitive.
Varying legal requirements
and many more things.
Localization may even necessitate a comprehensive rethinking of logic, visual design, or presentation if the way of doing business (eg., accounting) or the accepted paradigm for learning (eg., focus on individual vs. group) in a given locale differs substantially from the originating culture.
**Internationalization(i18n)**
Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.
Internationalization is often written i18n, where 18 is the number of letters between i and n in the English word.
Internationalization typically entails:
Designing and developing in a way that removes barriers to localization or international deployment. This includes such things as enabling the use of Unicode, or ensuring the proper handling of legacy character encodings where appropriate, taking care over the concatenation of strings, avoiding dependance in code of user-interface string values, etc.
Providing support for features that may not be used until localization occurs. For example, adding markup in your DTD to support bidirectional text, or for identifying language. Or adding to CSS support for vertical text or other non-Latin typographic features.
Enabling code to support local, regional, language, or culturally related preferences. Typically this involves incorporating predefined localization data and features derived from existing libraries or user preferences. Examples include date and time formats, local calendars, number formats and numeral systems, sorting and presentation of lists, handling of personal names and forms of address, etc.
Separating localizable elements from source code or content, such that localized alternatives can be loaded or selected based on the user's international preferences as needed.
Notice that these items do not necessarily include the localization of the content, application, or product into another language; they are design and development practices which allow such a migration to take place easily in the future but which may have significant utility even if no localization ever takes place.
Multilingualization(m17n)
The act of adapting or localizing something to, into, or for multiple languages.
Although some may argue that it took too long to materialize, Ruby 1.9 provides a
robust and elegant solution to the m17n problem. Rather than binding its users to a
particular internal encoding and requiring complex manual manipulation of text into
that format, Ruby 1.9 provides facilities that make it easy to transcode text from one
encoding to another. This system is well integrated so that things like pattern matching
and I/O operations can be carried out in all of the encodings Ruby supports, which
provides a great deal of flexibility for those who need to do encoding-specific opera-
tions. Of course, because painless transcoding is possible, you can also write code that
accepts and produces text in a wide variety of encodings, but uses a single encoding
throughout its internals, improving the consistency and simplicity of the underlying
implementation.
Once you are comfortable with how to store, manipulate, and produce international-
ized text in various character encodings, you may want to know about how to customize
your software so that its interface is adapted to whatever the native language and dia-
logue of its users might be. Although multilingualization and localization requirements
don’t necessarily come in pairs, they often do.
m17n by Example: A Look at Ruby’s CSV Standard Library
When it comes to m17n, the place to look is the CSV library.
CSV manages to parse data that is in any of the character encodings that Ruby supports without transcoding the source text.
In addition to encoding regular expressions, because CSV accepts user-entered values
that modify its core parser, it needs to escape them. Although the built-in
Regexp.escape() method works with most of the encodings Ruby supports, at the time
of the Ruby 1.9.1 release, it had some issues with a handful of them. To work around
this, CSV rolls its own escape method:
# This method is an encoding safe version of Regexp.escape(). It will escape# any characters that would change the meaning of a regular expression in the# encoding of +str+. Regular expression characters that cannot be transcoded# to the target encoding will be skipped and no escaping will be performed if# a backslash cannot be transcoded.#defescape_re(str)str.chars.map{ |c| @re_chars.include?(c) ? @re_esc + c : c}.joinend
This means that once things like the column separator, row separator, and quote char-
acter have been specified by the user and converted into the specified encoding, this
code can check to see whether the transcoded characters need to be escaped.
@re_chars is set in the CSV constructor as simply a list of
regular expression reserved characters transcoded to the specified @encoding :
By working with strings and regular expressions indirectly through encoding helpers, we can be sure that any pattern matching or text manipulation gets done in a compatible way. By
translating the parser rather than the source data, we incur a fixed cost rather than one
that varies in relation to the size of the data source. For a need like CSV processing,
this is very important, as the format is often used for large data dumps.
Portable m17n Through UTF-8 Transcoding
Although it’s nice to be able to support each character encoding natively, it can be quite
difficult to maintain a complex system that works that way. The easy way out is to
standardize on a single, fairly universal character encoding to write your code against.
Then, all that remains to be done is to transcode any string that comes in, and possibly
transcode again on the way out. The character set of choice for use in code that needs
to be portable from one system to another is UTF-8. UTF-8 is capable of representing the myriad character sets that make up Unicode, which means it can represent nearly any glyph you might imagine in any other character encoding.
As a variable-length character encoding, it does this fairly efficiently, so that users who
do not need extra bytes to represent large character sets do not incur a significant
memory penalty.
Source Encodings
A key aspect of any m17n-capable Ruby projects is to properly set the source encodings
of its files. This is done via the # coding: UTF-8 comments at the top of the file.
In order for Ruby to pick it up, this comment must be the first line in the file, unless a shebang is present (in this case, the magic comment can appear on the second line), such as in the following example:
#!/usr/bin/env ruby# coding: UTF-8
However, in all other situations, nothing else should come before it. Although Ruby is very strict about where you place the comment, it’s fairly loose about the way you write it. Case does not matter as long as it’s in the form of coding: some_encoding, and extra text may appear before or after it. This is used primarily for editor support, allowing things such as Emacs-style strings:
# -*- coding: utf-8 -*-
Their purpose is to tell Ruby what encoding your regex and string literals are
in. Forgetting to explicitly set the source encoding in this manner can cause all sorts of
nasty problems, as it will force Ruby to fall back to US-ASCII, breaking virtually all
internationalized text.
Once you set the source encoding to UTF-8 in all your files, if your editor is producing
UTF-8 output, you can be sure of the encoding of your literals. That’s the first step.
Working with Files
By default, Ruby uses your locale settings to determine the default external character
encoding for files. You can check what yours is set to by running this code:
ruby -e"p Encoding.default_external"
If your locale information isn’t set, Ruby assumes that there is no suitable default encoding, reverting to ASCII-8BIT to interpret external files as sequences of untranslated bytes.
The actual value your default_external is set to doesn’t really matter when you’re
developing code that needs to run on systems that you do not control. Because most
libraries fall under this category, it means that you simply cannot rely on File.open()
or File.read() to work without explicitly specifying an encoding.
This means that if you want to open a file that is in Latin-1 (ISO-8859-1), but process
it within your UTF-8-based library, you need to write code something like this:
Here, we’ve indicated that the file we are reading is encoded in ISO-8859-1, but that
we want to transcode it to UTF-8 immediately so that the string we end up with in our
program is already converted for us.
Writing back to file works in a similar fashion. Here’s what it looks like to automatically
transcode text back to Latin-1 from a UTF-8 source string:
File.open("foo.txt","w:ISO-8859-1:UTF-8"){ |f| f << data + "Some extra text"}
In a UTF-8-based library, you will need to
supply an encoding string of the form external_format:UTF-8 whenever you’re working
with text files. Of course, if the external format happens to be UTF-8, you would just
write something like this:
data=File.read("foo.txt",encoding: "UTF-8")File.open("foo.txt","w:UTF-8"){ |f| f << data + "Some extra text"
The underlying point here is that if you want to work with files in a portable way, you
need to be explicit about their character encodings. Without doing this, you cannot be
sure that your code will work consistently from machine to machine. Also, if you want
to make it so all of the internals of your system operate in a single encoding, you need
to explicitly make sure the loaded files get translated to UTF-8 before you process the
text in them. If you take care of these two things, you can mostly forget about the details,
as all of the actual work will end up getting done on UTF-8 strings.
If you are handling binary files you should then use File.binread() like this:
For more complex needs, or for when you need to write a binary file, Ruby 1.9 has also
changed the meaning of "rb" and "wb" in File.open(). Rather than simply disabling
line-ending conversion, using these file modes will now set the external encoding to
ASCII-8BIT by default.
Unless you’re working with binaries, be sure to explicitly specify the external encoding of your files, and transcode them to UTF-8 upon read or write. If you are working with binaries,
be sure to use File.binread() or File.open() with the proper flags to make sure that
your text is not accidentally encoded into the character set specified by your locale.
This one can produce subtle bugs that you might not encounter until you run your code
on another machine, so it’s important to try to avoid in the first place.
Transcoding User Input in an Organized Fashion
It turns out that in practice, you don’t really need to worry about transcoding whenever
you are comparing user input to a finite set of possible ASCII values.
If you can be sure that you never manipulate or compare a string, transcoding can be safely ignored in most cases. In cases in which you do manipulation or comparison, if the input strings will consist of nothing more than ASCII characters in all cases, you do not need to
transcode them. All other strings need to be transcoded to UTF-8 within your library
unless you expect users to do the conversions themselves.
You can clean up your code significantly
by identifying the points where encodings matter in your code. Oftentimes, there will
be a handful of low-level functions that are at the core of your system, and they are the
places where transcoding needs to be done.
Roughly, the process of building a UTF-8 based system goes like this:
Be sure to set the source encoding of every file in your project to UTF-8 using magic
comments.
Use the external:internal encoding string when opening any I/O stream, speci-
fying the internal encoding as UTF-8. This will automatically transcode files to
UTF-8 upon read, and automatically transcode from UTF-8 to the external en-
coding on write.
Make sure to either use File.binread() or include the "b" flag when dealing with
binary files. Otherwise, your files may be incorrectly interpreted based on your
locale, rather than being treated as a stream of unencoded bytes.
When dealing with user-entered strings, only transcode those that need to be ma-
nipulated or compared to non-ASCII strings. All others can be left in their native
encoding as long as they consist of ASCII characters only or they are not manipu-
lated by your code.
Do not rely on default_external or default_internal , and be sure to set your
source encodings properly. This ensures that your code will not depend on envi-
ronmental conditions to run.
If you need to do a ton of text processing on user-entered strings that may use many
different character mappings, it might not be a great idea to use this approach.
Inferring Encodings from Locale
The LANG environment variable that specifies your system locale is used by Ruby to
determine the default external encoding of files. A properly set locale can allow
Ruby to automatically load files in their native encodings without explicitly stating
what character mapping they use.
Although magic comments are typically required in files to set the source encoding,
an exception is made for ruby -e -based command-line scripts. The source encoding
for these one-liners is determined by locale. In most cases, this is what you will
want.
You can specify a default internal encoding that Ruby will automatically transcode
loaded files into when no explicit internal encoding is specified. It is often reason-
able to set this to match the source encoding in your scripts.
You can set default external/internal encodings via the command-line switch
-Eexternal:internal if you do not want to explicitly set them in your scripts.
The -Ku flag still works for putting Ruby into “UTF-8” mode, which is useful for
backward compatibility with Ruby 1.8.
All of the techniques described in this section are suitable mostly for scripts or
private use code. It is a bad idea to rely on locale data or manually set external and
internal encodings in complex systems or code that needs to run on machines you
do not have control over.
m17n-Safe Low-Level Text Processing
The underlying theme of working with low-level text operations in an
m17n-safe way is that characters are not necessarily equivalent to bytes.
The purpose of the previous example is to print out the contents of the file in chunks
of five bytes, which, when it comes to ASCII, means five characters. However, multibyte
character encodings, especially variable-length ones such as UTF-8, cannot be pro-
cessed using this approach. The reason is fairly simple.
Imagine this code running against a two-character, six-byte string in UTF-8 such as
“ 吴佳 ”. If we read five bytes of this string, we end up breaking the second character’s
byte sequence, resulting in the mangled string “ 吴\xE4\xBD ”.
Many times, the reason why we read data in chunks is not to process it at the byte level, but instead, to break it up into small parts as we work on it.
A source of a good solution to the problem, is found within the CSV standard library.
defread_to_char(bytes)return""if@io.eof?data=@io.read(bytes)beginencoded=encode_str(data)raiseunlessencoded.valid_encoding?returnencodedrescue# encoding error or my invalid data raiseif@io.eof?ordata.size >= bytes + 10returndataelsedata += @io.read(1)retryendendend
If we walk through this step by step, we see that an
empty string is returned if the stream is already at eof? . Assuming that it is not, a
specified number of bytes is read.
Then, the string is encoded, and it is checked to see whether the character mapping is
valid.
When the encoding is valid, read_to_char returns the chunk, assuming that the string
was broken up properly. Otherwise, it raises an error, causing the rescue block to be
executed. Here, we see that the core fix relies on buffering the data slightly to try to
read a complete character. What actually happens here is that the method gets retried
repeatedly, adding one extra byte to the data until it either reaches a total of 10 bytes
over the specified chunk size, or hits the end of the file.
The reason why this works is that every encoding Ruby supports has a character size
of less than 10 bytes.
One other thing to remember about low-level operations on strings when it comes to m17n
in order to keep your code character-mapping-agnostic, you’ll want to use String#ord instead
of String#unpack.
Localizing Your Code
Localization (l10n) is a way to mark the relevant sections of text with meaningful tags that can then be altered by external translation files.
what we can do is come up with unique identifiers for each text segment in our application, and then create translation files that fill in the appropriate values depending on what lan-
guage is selected.
An important aspect of localizing your code is that you might want to do it as late as
possible so that your business logic is not affected by translations.
The first step in localizing an application is identifying the unique text segments
that need to be translated.
A generalized L10n system provides a way to keep all locale-specific content in
translation files rather than tied up in the display code of your application.
Every string that gets displayed to the user must be passed through a translation
filter so that it can be customized based on the specified language. In
Gibberish::Simple , we use T() for this; other systems may vary.
Translation should be done at as late a stage as possible, so that L10n-related
modifications to text do not interfere with the core business logic of your program.
In many cases, you cannot simply interpolate strings in a predetermined order.
Gibberish::Simple offers a simple templating mechanism that allows each trans-
lation file to specify how substrings should be interpolated into a text segment. If
you roll your own system, be sure to keep this in consideration.
Creating helper functions to simplify your translation code can come in handy
when generating dynamic text output. For an example of this, go back and look at
how weapon_name() was used in the simple Sinatra example discussed here.
Because adding individual localization tags can be a bit tedious, it’s often a good
idea to wait until you have a fully fleshed-out application before integrating a gen-
eral L10n system, if it is possible to do so.
###Exploring a Well-Organized Ruby Project (Haml)
We can pretend we have no idea what it actually does, and seek to discover a bit about these
details by exploring the source code itself.
After grabbing the source, * we can start by looking for a file called README or some-
thing similar. We find one called README.rdoc, which gives us a nice description of
why we might want to use Haml right at the top of the file. The rest of the file fills in other useful details, including how to install the library, some
usage examples, a list of the executables it ships with, and some information on the
authors. It also has a specific line that says: To use Haml and Sass programmatically, check out the RDocs for the Haml and Sass modules. This indicates that the project has autogenerated API documentation.
Noticing the project also has a Rakefile, we can check to see whether there is a task for generating the documentation.
We could read directly through the source now to see which functions are most im-
portant, but tests often provide a better road map to where the interesting parts are and describing how some code is meant to be used.
The rake install task, which will install Haml as a gem from the current sources. It just executes a shell command in which it reads the current version from a file called VERSION.
Using this approach, the Rakefile is kept independent of a particular version number,
allowing a single place for updating version numbers. All of these tricks are done in the
name of simplifying maintainability, making it easy to generate and install the library
from source for testing.
###Conventions to Know About
#####What goes in a Readme
A good README should include everything that is necessary to begin working with a project, and nothing more.
You’ll need a brief one or two-paragraph description of what the project is for, and what problems it is meant to solve.
Next, it is generally a good idea to point out a couple of the core classes that make up
the public API of your project.
Because sometimes raw API documentation isn’t enough to get people started, it’s often
a good idea to include a brief synopsis of your project’s capabilities through a few simple
examples.
If your install instructions are simple, you can just embed them in your README file
directly. However, if your project has several install methods, and
optional dependencies enable certain features you can just create an INSTALL file and reference it in the readme.
Finally, once you’ve told users what your project is, where to look for documentation,
how it looks in brief, and how to get it installed, you’ll want to let them know how to
contact you in case something goes wrong. If you’re working on a bigger project, this might be the right place to link
to a mailing list or bug tracker.
#####Laying Out Your Library
Library files are generally kept in a lib/ directory. Generally speaking, this directory
should only have one file in it, and one subdirectory. For Haml,
the structure is lib/haml.rb and lib/haml/. For HighLine, it is lib/high
line.rb and lib/highline/.
The Ruby file in your lib/ dir should bear the name of your project and act as a jumping-
off point for loading dependencies as well as any necessary support libraries. The top
of lib/highline.rb provides a good example of this:
With a file structure as indicated by the comments in the example code, and the nec-
essary require statements in place, we end up being able to do this:
Although this is much more important in large systems than small ones, it is a good
habit to get into. Essentially, unless there is a good reason to deviate, files will often
map to class names in Ruby. Nested classes that are large enough to deserve their own
file should be loaded in the file that defines the class they are nested within. Using this
approach allows the user a single entry point into your library, but also allows for
running parts of the system in isolation.
Filenames do not necessarily need to be representative of a class at all so you can deviate from this standard if needed.
In the more general case, you might have files that contain extensions to provide back-
ward compatibility with Ruby 1.8, or ones that make minor changes to core Ruby
classes. Decent names for these are lib/myproject/compatibility.rb and lib/myproject/
extensions.rb, respectively. When things get complicated, you can of course nest these
and work on individual classes one at a time.
However you choose to organize your files, one thing is fairly well agreed upon: if you
intend to modify core Ruby in any way, you should do it in files that are well marked
as extension files, to help people hunt down changes that might conflict with other
packages.
#####Executables
Scripts and applications are usually placed in a bin/ dir in Ruby projects. These are
typically ordinary Ruby scripts that have been made executable via something like a
combination of a shebang line and a chmod +x call. To make these appear more like
ordinary command-line utilities, it is common to omit the file extension. As an example,
we can take a look at the haml executable:
#!/usr/bin/env ruby# The command line Haml parser.
$LOAD_PATH.unshiftFile.dirname(__FILE__) + '/../lib'require'haml'require'haml/exec'opts=Haml::Exec::Haml.new(ARGV)opts.parse!
You can see that the executable starts with a shebang line that indicates how to find
the Ruby interpreter. This is followed by a line that adds the library to the loadpath by
relative positioning. Finally, the remaining code simply requires the necessary library
files and then delegates to an object that is responsible for handling command-line
requests. Ideally speaking, most of your scripts in bin/ will follow a similar approach.