-
-
Save marawannwh/3e64391012ae9b26f624cd22135f68ea to your computer and use it in GitHub Desktop.
A summary of the book "The Ruby Programming Language" by David Flanagana and Yukihiro "Matz" Matsumoto
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CONTENT | |
Expressions, Statements and Controlstructures | |
Equlaity | |
Assigments | |
The ||= idiom | |
Other assignments | |
Flip-flops | |
Iterators | |
Blocks | |
Control-flow keywords | |
Return | |
Break, Next and Redo | |
Throw and Catch | |
Raise and rescue | |
Methods, Procs, Lambdas and Closures | |
Methods | |
Arguments | |
Return values | |
Method Lookup | |
Procs and Lambdas | |
Closures | |
Method objects | |
Classes and Modules | |
Method visibility and Inheritance | |
Arrayifing, Hash access, and Equlity (Ducktyping) | |
Structs | |
Class methods | |
Clone and Dup | |
Modules | |
Namespaces | |
Mixins | |
Load and Require | |
Loadpaths | |
Eigenclass | |
Other useful stuff | |
Threads | |
Tracing | |
Eval | |
Monkey patching | |
DSL´s | |
Ruby I/O | |
Networking | |
Expressions and Statements | |
Many programming languages distinguish between low-level expressions and higherlevel statements, | |
such as conditionals and loops. In these languages, statements control the flow of a program, but | |
they do not have values. They are executed, rather than evaluated. In Ruby, there is no clear | |
distinction between statements and expressions; everything in Ruby, including class and method | |
definitions, can be evaluated as an expression and will return a value. The fact that if | |
statements return a value means that, for example, the multiway conditional shown previously can | |
be elegantly rewritten as follows: | |
name = if x == 1 then "one" | |
elsif x == 2 then "two" | |
elsif x == 3 then "three" | |
elsif x == 4 then "four" | |
else "many" | |
end | |
Instead of writing: | |
if expression then code end | |
we can simply write: | |
code if expression | |
When used in this form, if is known as a statement (or expression) modifier. To use if as a | |
modifier, it must follow the modified statement or expression immediately, with no intervening | |
line break. | |
y = x.invert if x.respond_to? :invert | |
y = (x.invert if x.respond_to? :invert) | |
If x does not have a method named invert, then nothing happens at all, and the value of y is not | |
modified. In the second line, the if modifier applies only to the method call. If x does not have | |
an invert method, then the modified expression evaluates to nil, and this is the value that is | |
assigned to y. Note that an expression modified with an if clause is itself an expression that can | |
be modified. It is therefore possible to attach multiple if modifiers to an expression: | |
# Output message if message exists and the output method is defined | |
puts message if message if defined? puts | |
This should be avoided for clarity: | |
puts message if message and defined? puts | |
Equality | |
The equal? method is defined by Object to test whether two values refer to exactly the same | |
object. For any two distinct objects, this method always returns false: | |
a = "Ruby" # One reference to one String object | |
b = c = "Ruby" # Two references to another String object | |
a.equal?(b) # false: a and b are different objects | |
b.equal?(c) # true: b and c refer to the same object | |
By convention, subclasses never override the equal? method. The == operator is the most common | |
way to test for equality. In the Object class, it is simply a synonym for equal?, and it tests | |
whether two object references are identical. Most classes redefine this operator to allow distinct | |
instances to be tested for equality. | |
a = "Ruby" # One String object | |
b = "Ruby" # A different String object with the same content | |
a.equal?(b) # false: a and b do not refer to the same object | |
a == b # true: but these two distinct objects have equal values | |
Assignments | |
The ||= idiom | |
You might use this line: | |
results ||= [] | |
Think about this for a moment. It expands to: | |
results = results || [] | |
The righthand side of this assignment evaluates to the value of results, unless that is nil or | |
false. In that case, it evaluates to a new, empty array. This means that the abbreviated | |
assignment shown here leaves results unchanged, unless it is nil or false, in which case it | |
assigns a new array. | |
Other assignments | |
x = 1, 2, 3 # x = [1,2,3] | |
x, = 1, 2, 3 # x = 1; other values are discarded | |
x, y, z = [1, 2, 3] # Same as x,y,z = 1,2,3 | |
x, y, z = 1, 2 # x=1; y=2; z=nil | |
x, y, z = 1, *[2,3] # Same as x,y,z = 1,2,3 | |
x,*y = 1, 2, 3 # x=1; y=[2,3] | |
x = y = z = 0 # Assign zero to variables x, y, and z | |
x,(y,z) = a, b | |
This is effectively two assignments executed at the same time: | |
x = a | |
y,z = b | |
To make it clearer | |
x,y,z = 1,[2,3] # No parens: x=1;y=[2,3];z=nil | |
x,(y,z) = 1,[2,3] # Parens: x=1;y=2;z=3 | |
Case is a alternative to if/elsif/esle. The last expression evaluated in the case expression | |
becomes the return value of the case statement. === is the case equality operator. For many | |
classes, such as the Fixnum class used earlier, the === operator behaves just the same as ==. But | |
certain classes define this operator in interesting ways.Here is a example of a using a range in | |
a case: | |
# Compute 2006 U.S. income tax using case and Range objects | |
tax = case income | |
when 0..7550 | |
income * 0.1 | |
when 7550..30650 | |
755 + (income-7550)*0.15 | |
when 30650..74200 | |
4220 + (income-30655)*0.25 | |
when 74200..154800 | |
15107.5 + (income-74201)*0.28 | |
when 154800..336550 | |
37675.5 + (income-154800)*0.33 | |
else | |
97653 + (income-336550)*0.35 | |
end | |
Flip-flops | |
When the .. and ... operators are used in a conditional, such as an if statement, or in a loop, | |
such as a while loop, they do not create Range objects. Instead, they create a special kind of | |
Boolean expression called a flip-flop. A flip-flop expression evaluates to true or false, just as | |
comparison and equality expressions do. Consider the flip-flop in the following code. Note that | |
the first .. in the code creates a Range object. The second one creates the flip-flop | |
expression: | |
(1..10).each {|x| print x if x==3..x==5 } | |
The flip-flop consists of two Boolean expressions joined with the .. operator, in the context of a | |
conditional or loop. A flip-flop expression is false unless and until the lefthand expression | |
evaluates to true. Once that expression has become true, the expression “flips” into a persistent | |
true state. The following simple Ruby program demonstrates a flip-flop. It reads a text file | |
line-by-line and prints any line that contains the text “TODO”. It then continues printing lines | |
until it reads a blank line: | |
ARGF.each do |line| # For each line of standard in or of named files | |
print line if line=~/TODO/..line=~/^$/ # Print lines when flip-flop is true | |
end | |
Iterators | |
The defining feature of an iterator method is that it invokes a block of code associated with the | |
method invocation. You do this with the yield statement. The following method is a trivial | |
iterator that just invokes its block twice | |
def twice | |
yield | |
yield | |
end | |
Other examples: | |
squares = [1,2,3].collect {|x| x*x} # => [1,4,9] | |
evens = (1..10).select {|x| x%2 == 0} # => [2,4,6,8,10] | |
odds = (1..10).reject {|x| x%2 == 0} # => [1,3,5,7,9] | |
The inject method is a little more complicated than the others. It invokes the associated block | |
with two arguments. The first argument is an accumulated value of some sort from previous iterations. | |
data = [2, 5, 3, 4] | |
sum = data.inject {|sum, x| sum + x } # => 14 (2+5+3+4) | |
The initial value of the accumulator variable is either the argument to inject, if there is one, | |
or the first element of the enumerable object, as the two examples below shows | |
floatprod = data.inject(1.0) {|p,x| p*x } # => 120.0 (1.0*2*5*3*4) | |
max = data.inject {|m,x| m>x ? m : x } # => 5 (largest element) | |
If a method is invoked without a block, it is an error for that method to yield, because there is | |
nothing to yield to. Sometimes you want to write a method that yields to a block if one is | |
provided but takes some default action (other than raising an error) if invoked with no block. To | |
do this, use block_given? to determine whether there is a block associated with the invocation. | |
Example: | |
def sequence(n, m, c) | |
i, s = 0, [] # Initialize variables | |
while(i < n) # Loop n times | |
y = m*i + c # Compute value | |
yield y if block_given? # Yield, if block | |
s << y # Store the value | |
i += 1 | |
end | |
s # Return the array of values | |
end | |
Normally, enumerators with next methods are created from Enumerable objects that have an each | |
method. If, for some reason, you define a class that provides a next method for external | |
iteration instead of an each method for internal iteration, you can easily implement each in | |
terms of next. In fact, turning an externally iterable class that implements next into an | |
Enumerable class is as simple as mixing in a module. | |
module Iterable | |
include Enumerable # Define iterators on top of each | |
def each # And define each on top of next | |
loop { yield self.next } | |
end | |
end | |
The “gang of four” define and contrast internal and external iterators quite clearly in their | |
design patterns book: | |
"A fundamental issue is deciding which party controls the iteration, the iterator or the client | |
that uses the iterator. When the client controls the iteration, the iterator is called an external | |
iterator, and when the iterator controls it, the iterator is an internal iterator. Clients that use | |
an external iterator must advance the traversal and request the next element explicitly from the | |
iterator. In contrast, the client hands an internal iterator an operation to perform, and the | |
iterator applies that operation to every element...." | |
In Ruby, iterator methods like each are internal iterators; they control the iteration and “push” | |
values to the block of code associated with the method invocation. Enumerators have an each method | |
for internal iteration, but in Ruby 1.9 and later, they also work as external iterators—client code | |
can sequentially “pull” values from an enumerator with next. | |
Suppose you have two Enumerable collections and need to iterate their elements in pairs: the first | |
elements of each collection, then the second elements, and so on. Without an external iterator, you | |
must convert one of the collections to an array (with the to_a method defined by Enumerable ) so | |
that you can access its elements while iterating the other collection with each. Below shows three | |
different methods to iterate through such collections in parallell: | |
# Call the each method of each collection in turn. | |
# This is not a parallel iteration and does not require enumerators. | |
def sequence(*enumerables, &block) | |
enumerables.each do |enumerable| | |
enumerable.each(&block) | |
end | |
end | |
# Iterate the specified collections, interleaving their elements. | |
# This can't be done efficiently without external iterators. | |
# Note the use of the uncommon else clause in begin/rescue. | |
def interleave(*enumerables) | |
# Convert to an array of enumerators | |
enumerators = enumerables.map {|e| e.to_enum } | |
# Loop until we don't have any enumerators | |
until enumerators.empty? | |
begin | |
# Take the first enumerator | |
e = enumerators.shift | |
yield e.next # Get its next and pass to the bloc | |
rescue StopIteration # If no exception occurred | |
else | |
enumerators << e # Put the enumerator back | |
end | |
end | |
end | |
# Iterate the specified collections, yielding | |
# tuples of values, one value from each of the | |
# collections. See also Enumerable.zip. | |
def bundle(*enumerables) | |
enumerators = enumerables.map {|e| e.to_enum } | |
loop { yield enumerators.map {|e| e.next} } | |
end | |
# Examples of how these iterator methods work | |
a,b,c = [1,2,3], 4..6, 'a'..'e' | |
sequence(a,b,c) {|x| print x} # prints "123456abcde" | |
interleave(a,b,c) {|x| print x} # prints "14a25b36cde" | |
bundle(a,b,c) {|x| print x} # '[1, 4, "a"][2, 5, "b"][3, 6, "c"]' | |
In general, Ruby’s core collection of classes iterate over live objects rather than private copies | |
or “snapshots” of those objects, and they make no attempt to detect or prevent concurrent | |
modification to the collection while it is being iterated. | |
a = [1,2,3,4,5] # prints "1,1\n3,2\n5,3" | |
a.each {|x| puts "#{x},#{a.shift}" } ' | |
Blocks | |
Blocks may not stand alone; they are only legal following a method invocation. You can, however, | |
place a block after any method invocation; if the method is not an iterator and never invokes the | |
block with yield, the block will be silently ignored. Blocks are delimited with curly braces or | |
with do and end keywords. | |
Consider the Array.sort method. If you associate a block with an invocation of this method, it will | |
yield pairs of elements to the block, and it is the block’s job to sort them. The block’s return | |
value (–1, 0, or 1) indicates the ordering of the two arguments. The “return value” of the block is | |
available to the iterator method as the value of the yield statement. | |
# The block takes two words and "returns" their relative order. | |
words.sort! {|x,y| y <=> x } | |
Blocks define a new variable scope: variables created within a block exist only within that block | |
and are undefined outside of the block. Be cautious, however; the local variables in a method are | |
available to any blocks within that method. Ruby 1.9 is different: block parameters are always local | |
to their block, and invocations of the block never assign values to existing variables.Ruby 1.9 is | |
different in another important way, too. Block syntax has been extended to allow you to declare | |
block-local variables that are guaranteed to be local, even if a variable by the same name already | |
exists in the enclosing scope. To do this, follow the list of block parameters with a semicolon and | |
a comma-separated list of block local variables. Here is an example: | |
# local variables | |
x = y = 0 # x and y are local to block | |
1.upto(4) do |x;y| # x and y "shadow" the outer variables | |
y = x + 1 # Use y as a scratch var | |
puts y*y # Prints 4, 9, 16, 25 | |
end [x,y] # => [0,0]: block does not alter these | |
In this code, x is a block parameter: it gets a value when the block is invoked with yield. y is a | |
block-local variable. It does not receive any value from a yield invocation, but it has the value | |
nil until the block actually assigns some other value to it.Blocks can have more than one parameter | |
and more than one local variable, of course. Here is a block with two parameters and three local | |
variables: | |
hash.each {|key,value; i,j,k| ... } | |
In Ruby 1.8, only the last block parameter may have an * prefix. Ruby 1.9 lifts this restriction and | |
allows any one block parameter, regardless of its position in the list, to have an * prefix: | |
def five; yield 1,2,3,4,5; end # Yield 5 values | |
# Extra values go into body array | |
five do |head, *body, tail| | |
print head, body, tail # Prints "1[2,3,4]5" | |
end | |
Control-flow keywords | |
Return | |
return may optionally be followed by an expression, or a comma-separated list of expressions. If | |
there is no expression, then the return value of the method is nil. If there is one expression, | |
then the value of that expression becomes the return value of the method. If there is more than one | |
expression after the return keyword, then the return value of the method is an array containing the | |
values of those expressions. | |
Most Ruby programmers omit return when it is not necessary. Instead of writing return x as the last | |
line of a method, they would simply write x. The return value in this case is the value of the last | |
expression in the method. return is useful if you want to return from a method prematurely, or if | |
you want to return more than one value. | |
def double(x) | |
return x, x.dup | |
end | |
When the return statement is used in a block, it does not just cause the block to return. And it | |
does not just cause the iterator that invokes the block to return. return always causes the | |
enclosing method to return, just like it is supposed to, since a block is not a method. | |
def find(array, target) | |
array.each_with_index do |element,index| # return element from find, not from block | |
return index if (element == target) | |
end | |
nil # If we didn't find the element | |
end | |
Break, Next and Redo | |
Like return keyword, break and next (continue in java) can be used alone or together with | |
expressions, or comma-separated expressions. We have seen already what return does in a block, | |
when next or break is used together with values in a block the values are what is "yielded". | |
squareroots = data.collect do |x| | |
next 0 if x < 0 # 0 for negative values | |
Math.sqrt(x) | |
end | |
As with the return statement, it is not often necessary to explicitly use next to specify a value. | |
squareroots = data.collect do |x| | |
if (x < 0) | |
then 0 | |
else | |
Math.sqrt(x) | |
end | |
end | |
The redo statement restarts the current iteration of a loop or iterator. This is not the same | |
thing as next. next transfers control to the end of a loop or block so that the next iteration | |
can begin, whereas redo transfers control back to the top of the loop or block so that the | |
iteration can start over. | |
i = 0 | |
while(i < 3) # Prints "0123" instead of "012" | |
print i # Control returns here when redo is executed | |
i += 1 | |
redo if i == 3 | |
end | |
One use, however, is to recover from input errors when prompting a user for input. | |
puts "Please enter the first word you think of" | |
words = %w(apple banana cherry) | |
response = words.collect do |word| # Control returns here when redo is executed | |
print word + "> " # Prompt the user | |
response = gets.chop # Get a response | |
if response.size == 0 | |
word.upcase! # Emphasize the prompt | |
redo # And skip to the top of the block | |
end | |
response # Return the response | |
end | |
The retry statement is normally used in a rescue clause to re-execute a block of code that raised | |
an exception. | |
Throw and catch | |
throw and catch are Kernel methods that define a control structure that can be thought of as a | |
multilevel break. throw doesn’t just break out of the current loop or block but can actually | |
transfer out any number of levels, causing the block defined with a catch to exit. If you are | |
familiar with languages like Java and JavaScript, then you probably recognize throw and catch as | |
the keywords those languages use for raising and handling exceptions. | |
Ruby does exceptions differently, using raise and rescue, which we’ll learn about later. But the | |
parallel to exceptions is intentional. Calling throw is very much like raising an exception. And | |
the way a throw propagates out through the lexical scope and then up the call stack is very much | |
the same as the way an exception propagates out and up. Despite the similarity to exceptions, it | |
is best to consider throw and catch as a general-purpose (if perhaps infrequently used) control | |
structure rather than an exception mechanism. Here is an example: | |
for matrix in data do # Process a deeply nested data structure. | |
catch :missing_data do # Label this statement so we can break out. | |
for row in matrix do | |
for value in row do | |
throw :missing_data unless value # Break out of two loops at once. | |
# Otherwise, do some actual data processing here. | |
end | |
end | |
end | |
# We end up here after the nested loops finish processing each matrix. | |
# We also get here if :missing_data is thrown. | |
end | |
If no catch call matches the symbol passed to throw, then a NameError exception is raised. | |
Raise and rescue | |
An exception is an object that represents some kind of exceptional condition; it indicates that | |
something has gone wrong. Raising an exception transfers the flow-of control to exception | |
handling code.The Exception class defines two methods that return details about the exception. The | |
message method returns a string that may provide human-readable details about what went wrong. | |
The other important method of exception objects is backtrace. This method returns an array of | |
strings that represents the call stack at the point that the exception was raised. Each element of | |
the array is a string of the form: | |
filename : linenumber in methodname | |
If you are defining a module of Ruby code, it is often appropriate to define your own subclass of | |
StandardError for exceptions that are specific to your module. This may be a trivial, one-line | |
subclass: | |
class MyError < StandardError; end | |
def factorial(n) # Define a factorial method with argument n | |
raise MyError, "bad argument" if n < 1 # Raise an exception for bad n | |
return 1 if n == 1 # factorial(1) is 1 | |
n * factorial(n-1) # Compute other factorials recursively | |
end | |
Whithout defining the class ruby raises a runtime error by default. | |
raise "An default runtime error" | |
Most commonly, a rescue clause is attached to a begin statement. The begin statement exists simply | |
to delimit the block of code within which exceptions are to be handled. A begin statement with a | |
rescue clause looks like this: | |
begin | |
# Any number of Ruby statements go here. | |
# Usually, they are executed without exceptions and | |
# execution continues after the end statement. | |
rescue | |
# This is the rescue clause; exception-handling code goes here. | |
# If an exception is raised by the code above, or propagates up | |
# from one of the methods called above, then execution jumps here. | |
end | |
An example using this: | |
begin # Handle exceptions in this block | |
x = factorial(-1) # Note illegal argument | |
rescue => ex # Store exception in variable ex | |
puts "#{ex.class}: #{ex.message}" # Handle exception by printing message | |
end # End the begin/rescue block | |
if you want to handle only specific types of exceptions, you must include one or more exception | |
classes in the rescue clause. | |
rescue ArgumentError => ex | |
# or to handle more errors | |
rescue ArgumentError, TypeError => error | |
If you want to handle each error individually you could: | |
begin | |
x = factorial(1) | |
rescue ArgumentError => ex | |
puts "Try again with a value >= 1" | |
rescue TypeError => ex | |
puts "Try again with an integer" | |
rescue Exception => ex | |
puts "No idea what happened" # Use rescue Exception as the last rescue clause. | |
end | |
A begin statement may include an else clause after its rescue clauses. You might guess that the | |
else clause is a catch-all rescue: that it handles any exception that does not match a previous | |
rescue clause. This is not what else is for. The code in an else clause is executed if the code | |
in the body of the begin statement runs to completion without exceptions. Putting code in an else | |
clause is a lot like simply tacking it on to the end of the begin clause. The only difference is | |
that when you use an else clause, any exceptions raised by that clause are not handled by the | |
rescue statements. | |
A begin statement may have one final clause. The optional ensure clause, if it appears, must come | |
after all rescue and else clauses. It may also be used by itself without any rescue or else | |
clauses. The ensure clause contains code that always runs, no matter what happens with the code | |
following begin: | |
def method_name(x) | |
# The body of the method goes here. | |
# Usually, the method body runs to completion without exceptions | |
# and returns to its caller normally. | |
rescue | |
# Exception-handling code goes here. | |
# If an exception is raised within the body of the method, or if | |
# one of the methods it calls raises an exception, then control | |
# jumps to this block. | |
else | |
# If no exceptions occur in the body of the method | |
# then the code in this clause is executed. | |
ensure | |
# The code in this clause is executed no matter what happens in the | |
# body of the method. It is run if the method runs to completion, if | |
# it throws an exception, or if it executes a return statement. | |
end | |
Fibers | |
Fibers reminds alot of threads but do not execute in parallell, they are more of subrutines that | |
returns execution back to the caller of the fiber. Fibers are mostly used for implementing | |
generators. Here follows an example: | |
f = Fiber.new { # Line 1: Create a new fiber | |
puts "Fiber says Hello" # Line 2: | |
Fiber.yield # Line 3: goto line 9 | |
puts "Fiber says Goodbye" # Line 4: | |
} # Line 5: goto line 11 | |
# Line 6: | |
puts "Caller says Hello" # Line 7: | |
f.resume # Line 8: goto line 2 | |
puts "Caller says Goodbye" # Line 9: | |
f.resume # Line 10: goto line 4 | |
The code produces the following output: | |
Caller says Hello | |
Fiber says Hello | |
Caller says Goodbye | |
Fiber says Goodbye | |
Fibers and their callers can exchange data through the arguments and return values of resume and | |
yield. | |
f = Fiber.new do |message| | |
puts "Caller said: #{message}" | |
message2 = Fiber.yield("Hello") # "Hello" returned by first resume | |
puts "Caller said: #{message2}" | |
"Fine" # "Fine" returned by second resume | |
end | |
response = f.resume("Hello") # "Hello" passed to block | |
puts "Fiber said: #{response}" | |
response2 = f.resume("How are you?") # "How are you?" returned by Fiber.yield | |
puts "Fiber said: #{response2}" | |
The caller passes two messages to the fiber, and the fiber returns two responses to the caller. | |
It prints: | |
Caller said: Hello | |
Fiber said: Hello | |
Caller said: How are you? | |
Fiber said: Fine | |
But fibers are more likely used as generators. Here is an example of an generator for making a | |
fibonacci sequence: | |
# Return a Fiber to compute Fibonacci numbers | |
def fibonacci_generator(x0,y0) # Base the sequence on x0,y0 | |
Fiber.new do | |
x,y = x0, y0 # Initialize x and y | |
loop do # This fiber runs forever | |
Fiber.yield y # Yield the next number in the sequence | |
x,y = y,x+y # Update x and y | |
end | |
end | |
end | |
g = fibonacci_generator(0,1) # Create a generator | |
10.times { print g.resume, " " } # And use it | |
The code above prints the first 10 Fibonacci numbers: | |
1 1 2 3 5 8 13 21 34 55 | |
However, you should avoid using these additional features wherever possible, because: | |
• They are not supported by all implementations. JRuby, for example, cannot support them on | |
current Java VMs. | |
• They are so powerful that misusing them can crash the Ruby VM. | |
Methods, Procs, Lambdas and Closures | |
Many languages distinguish between functions, which have no associated object, and methods, | |
which are invoked on a receiver object. Because Ruby is a purely objectoriented language, all | |
methods are true methods and are associated with at least one object. The methods without objects | |
look like global functions with no associated object. In fact, Ruby implicitly defines and invokes | |
them as private methods of the Object class. | |
Ruby’s methods are not objects in the way that strings, numbers, and arrays are. It is possible, | |
however, to obtain a Method object that represents a given method, and we can invoke methods | |
indirectly through Method objects. Blocks, like methods, are not objects that Ruby can manipulate. | |
But it’s possible to create an object that represents a block, and this is actually done with some | |
frequency in Ruby programs. A Proc object represents a block. Like a Method object, we can execute | |
the code of a block through the Proc that represents it. There are two varieties of Proc objects, | |
called procs and lambdas, which have slightly different behavior. Both procs and lambdas are | |
functions rather than methods invoked on an object. An important feature of procs and lambdas is | |
that they are closures: they retain access to the local variables that were in scope when they were | |
defined, even when the proc or lambda is invoked from a different scope. | |
Methods | |
A def statement that defines a method may include exception-handling code in the form of rescue, | |
else, and ensure clauses, just as a begin statement can. It is also, however, to use the def | |
statement to define a method on a single specified object. Math.sin and File.delete are actually | |
singleton methods. | |
o = "message" # A string is an object | |
def o.printme # Define a singleton method for this object | |
puts self | |
end | |
o.printme # Invoke the singleton | |
Method names may (but are not required to) end with an equals sign, a question mark, or an exclamation | |
point. An equals sign suffix signifies that the method is a setter that can be invoked using assignment | |
syntax. Any method whose name ends with a question mark returns a value that answers the question posed | |
by the method invocation. A method whose name ends with an exclamation mark should be used with | |
caution, this is often seen used with mutatator methods. The language has a keyword alias that serves | |
to define a new name for an existing method. Use it like this: | |
alias aka also_known_as # alias new_name existing_name | |
you can specify default values for some or all of the parameters. If you do this, then your method may | |
be invoked with fewer argument values than the declared number of parameters. | |
Arguments | |
When you define a method, you can specify default values for some or all of the parameters. | |
def prefix(s, len=1) | |
s[0,len] | |
end | |
prefix("Ruby", 3) # => "Rub" | |
prefix("Ruby") # => "R" | |
Sometimes we want to write methods that can accept an arbitrary number of arguments. To do this, we put | |
an * before one of the method’s parameters. | |
def max(first, *rest) | |
max = first | |
rest.each {|x| max = x if x > max } | |
max | |
end | |
max(1) # first=1, rest=[] | |
max(1,2) # first=1, rest=[2] | |
max(1,2,3) # first=1, rest=[2,3] | |
data = [3, 2, 1] | |
m = max(*data) # first = 3, rest=[2,1] => 3 | |
m = max(data) # first = [3,2,1], rest=[] => [3,2,1] | |
Recall from that a block is a chunk of Ruby code associated with a method invocation, and that an | |
iterator is a method that expects a block. Any method invocation may be followed by a block, and any | |
method that has a block associated with it may invoke the code in that block with the yield statement. | |
If you prefer more explicit control over a block (so that you can pass it on to some other method, for | |
example), add a final argument to your method, and prefix the argument name with an ampersand.* If you | |
do this, then that argument will refer to the block—if any—that is passed to the method. | |
def sequence3(n, m, c, &b) # Explicit argument to get block as a Proc | |
i = 0 | |
while(i < n) | |
b.call(i*m + c) # Invoke the Proc with its call method | |
i += 1 | |
end | |
end | |
# Note that the block is still passed outside of the parentheses | |
sequence3(5, 2, 2) {|x| puts x } | |
You could also explicitly pass a Proc object like this: | |
def sequence4(n, m, c, b) # No ampersand used for argument b | |
i = 0 | |
while(i < n) | |
b.call(i*m + c) # Proc is called explicitly | |
i += 1 | |
end | |
end | |
p = Proc.new {|x| puts x } # Explicitly create a Proc object | |
sequence4(5, 2, 2, p) # And pass it as an ordinary argument | |
When & is used before a Proc object in a method invocation, it treats the Proc as if it was an | |
ordinary block following the invocation. | |
a, b = [1,2,3], [4,5] # Start with some data. | |
sum = a.inject(0) {|total,x| total+x } # => 6. Sum elements of a. | |
sum = b.inject(sum) {|total,x| total+x } # => 15. Add the elements of b in. | |
More about Procs follow in the next section | |
Return values | |
Ruby methods may return more than one value. To do this, use an explicit return statement, and | |
separate the values to be returned with commas: | |
# Convert the Cartesian point (x,y) to polar (magnitude, angle) coordinates | |
def polar(x,y) | |
return Math.hypot(y,x), Math.atan2(y,x) | |
end | |
Instead of using the return statement with multiple values, we can simply create an array of | |
values ourselves: | |
# Convert polar coordinates to Cartesian coordinates | |
def cartesian(magnitude, angle) | |
[magnitude*Math.cos(angle), magnitude*Math.sin(angle)] | |
end | |
Methods of this form are typically intended for use with parallel assignment: | |
distance, theta = polar(x,y) | |
x,y = cartesian(distance,theta) | |
Method Lookup | |
When Ruby evaluates a method invocation expression, it must first figure out which method is to be invoked. | |
For the method invocation expression o.m: | |
1. First, it checks the eigenclass of o for singleton methods named m. | |
2. If no method m is found in the eigenclass, Ruby searches the class of o for an instance | |
method named m. | |
3. If no method m is found in the class, Ruby searches the instance methods of any | |
modules included by the class of o. If that class includes more than one module, | |
then they are searched in the reverse of the order in which they were included. That | |
is, the most recently included module is searched first. | |
4. If no instance method m is found in the class of o or in its modules, then the search | |
moves up the inheritance hierarchy to the superclass. Steps 2 and 3 are repeated | |
for each class in the inheritance hierarchy until each ancestor class and its included | |
modules have been searched. | |
5. If no method named m is found after completing the search, then a method named | |
method_missing is invoked instead. In order to find an appropriate definition of this | |
method, the name resolution algorithm starts over at step 1. | |
When method_missing is invoked, the first argument is a symbol that names the method that could not be found. | |
This symbol is followed by all the arguments that were to be passed to the original method. If there is a block | |
associated with the method invocation, that block is passed to method_missing as well. Defining your own | |
method_missing method for a class allows you an opportunity to handle any kind of invocation on instances of the | |
class. The method_missing hook is one of the most powerful of Ruby’s dynamic capabilities, and one of the most | |
commonly used metaprogramming techniques. | |
class Hash | |
# Allow hash values to be queried and set as if they were attributes. | |
# We simulate attribute getters and setters for any key. | |
def method_missing(key, *args) | |
text = key.to_s | |
if text[-1,1] == "=" # If key ends with = set a value | |
self[text.chop.to_sym] = args[0] # Strip = from key | |
else # Otherwise... | |
self[key] # ...just return the key value | |
end | |
end | |
end | |
h = {} # Create an empty hash object | |
h.one = 1 # Same as h[:one] = 1 | |
puts h.one # Prints 1. Same as puts h[:one] | |
Procs and Lambdas | |
Blocks are syntactic structures in Ruby; they are not objects, and cannot be manipulated as objects. | |
It is possible, however, to create an object that represents a block. Depending on how the object is | |
created, it is called a proc or a lambda. | |
We’ve already seen one way to crfate a Proc object: by associating a block with a method that is | |
defined with an ampersand-prefixed block argument. There is nothing preventing such a method from | |
returning the Proc object for use outside the method: | |
def makeproc(&p) | |
p # Return the Proc object | |
end | |
adder = makeproc {|x,y| x+y } | |
All Proc objects have a call method that, when invoked, runs the code contained by the block from | |
which the proc was created. | |
sum = adder.call(2,2) # => 4 | |
This example is ofcourse just for explanation, and the methos makeproc is no necessary in reality as | |
ruby already have methods that have this functionality. Proc.new expects no arguments, and returns a | |
Proc object that is a proc (not a lambda). You could also call its Proc.new´s synonymous method proc. | |
p = Proc.new {|x,y| x+y } | |
p = proc {|x,y| x+y } | |
Another technique for creating Proc objects is with the lambda method. lambda is a method of the | |
Kernel module. | |
The difference between a proc end a lambda is small, but significant in some scenarios. A proc is the | |
object form of a block, and it behaves like a block. A lambda has slightly modified behavior and | |
behaves more like a method than a block. Calling a proc is like yielding to a block, whereas calling | |
a lambda is like invoking a method. Recall that the return statement in a block does not just return | |
from the block to the invoking iterator, it returns from the method that invoked the iterator. A | |
return statement in a lambda returns from the lambda itself, not from the method that surrounds the | |
creation site of the lambda: | |
def test | |
puts "entering method" | |
p = lambda { puts "entering lambda"; return } | |
p.call # Invoking the lambda does not make the method return | |
puts "exiting method" # This line *is* executed now | |
end | |
The fact that return in a lambda only returns from the lambda itself means that we never have to worry | |
about LocalJumpError. | |
Invoking a block with yield is similar to, but not the same as, invoking a method. There are | |
differences in the way argument values in the invocation are assigned to the argument variables declared | |
in the block or method. | |
p = Proc.new {|x,y| print x,y } | |
p.call(1) # x,y=1: nil used for missing rvalue: Prints 1nil | |
p.call(1,2) # x,y=1,2: 2 lvalues, 2 rvalues: Prints 12 | |
p.call(1,2,3) # x,y=1,2,3: extra rvalue discarded: Prints 12 | |
p.call([1,2]) # x,y=[1,2]: array automatically unpacked: Prints 12 | |
l = lambda {|x,y| print x,y } | |
l.call(1,2) # This works | |
l.call(1) # Wrong number of arguments | |
l.call(1,2,3) # Wrong number of arguments | |
l.call([1,2]) # Wrong number of arguments | |
l.call(*[1,2]) # Works: explicit splat to unpack the array | |
Closures | |
In Ruby, procs and lambdas are closures. When you create a proc or a lambda, the resulting Proc object | |
holds not just the executable block but also bindings for all the variables used by the block. You | |
already know that blocks can use local variables and method arguments that are defined outside the block. | |
def multiply(data, n) | |
data.collect {|x| x*n } | |
end | |
What is more interesting, and possibly even surprising, is that if the block were turned into a proc or | |
lambda, it could access n even after the method to which it is an argument had returned. | |
# Return a lambda that retains or "closes over" the argument n | |
def multiplier(n) | |
lambda {|data| data.collect{|x| x*n } } | |
end | |
doubler = multiplier(2) # Get a lambda that knows how to double | |
puts doubler.call([1,2,3]) # Prints 2,4,6 | |
It is important to understand that a closure does not just retain the value of the variables it refers | |
to—it retains the actual variables and extends their lifetime. Another way to say this is that the | |
variables used in a lambda or proc are not statically bound when the lambda or proc is created. Instead, | |
the bindings are dynamic, and the values of the variables are looked up when the lambda or proc is | |
executed. | |
# Return a pair of lambdas that share access to a local variable. | |
def accessor_pair(initialValue=nil) | |
value = initialValue # A local variable shared by the returned lambdas. | |
getter = lambda { value } # Return value of local variable. | |
setter = lambda {|x| value = x } # Change value of local variable. | |
return getter,setter # Return pair of lambdas to caller. | |
end | |
getX, setX = accessor_pair(0) # Create accessor lambdas for initial value 0. | |
puts getX[] # Prints 0. Note square brackets instead of call. | |
setX[10] # Change the value through one closure. | |
puts getX[] # Prints 10. The change is visible through the other. | |
Any time you have a method that returns more than one closure, you should pay particular attention to | |
the variables they use. | |
def multipliers(*args) | |
x = nil | |
args.map {|x| lambda {|y| x*y }} | |
end | |
double,triple = multipliers(2,3) | |
puts double.call(2) # Prints 6 in Ruby 1.8 but 4 in Ruby 1.9 | |
puts triple.call(5) # Prints 15 in Ruby 1.9 | |
Method objects | |
The Method class is not a subclass of Proc, but it behaves much like it. Method objects are invoked with | |
the call method (or the [] operator), just as Proc objects are. The Object class defines a method named | |
method. Pass it a method name, as a string or a symbol, and it returns a Method object representing the | |
named method of the receiver. | |
m = 0.method(:succ) # A Method representing the succ method of Fixnum 0 | |
puts m.call # => 1. Same as puts 0.succ. Or use puts m[]. | |
m.name # => :succ | |
m.owner # => Fixnum | |
m.receiver # => 0 | |
Method object uses method-invocation semantics, not yield semantics. Method objects, therefore, behave | |
more like lambdas than like procs. When a true Proc is required, you can use Method.to_proc to convert a | |
Method to a Proc. This is why Method objects can be prefixed with an ampersand and passed to a method in | |
place of a block. | |
def square(x); x*x; end | |
puts (1..10).map(&method(:square)) | |
One important difference between Method objects and Proc objects is that Method objects are not closures. | |
Ruby’s methods are intended to be completely self-contained, and they never have access to local variables | |
outside of their own scope. The only binding retained by a Method object, therefore, is the value of self— | |
the object on which the method is to be invoked. | |
In addition to the Method class, Ruby also defines an UnboundMethod class. As its name suggests, an | |
UnboundMethod object represents a method, without a binding to the object on which it is to be invoked. | |
In order to invoke an unbound method, you must first bind it to an object using the bind method: | |
unbound_plus = Fixnum.instance_method("+") # creates an unbound object | |
plus_2 = unbound_plus.bind(2) # Bind the method to the object 2 | |
sum = plus_2.call(2) # => 4 | |
Classes and Modules | |
Classes may extend or subclass other classes, and inherit or override the methods of their superclass. Classes | |
can also include—or inherit methods from—modules. The methods defined by a class may have “public,” “protected,” | |
or “private” visibility, which affects how and where they may be invoked. Ruby’s objects are strictly | |
encapsulated: their state can be accessed only through the methods they define. In contrast to the strict | |
encapsulation of object state, Ruby’s classes are very open. Any Ruby program can add methods to existing | |
classes, and it is even possible to add “singleton methods” to individual objects. | |
class Point | |
end | |
p = Point.new | |
p.class # => Point | |
p.is_a? Point # => true | |
In addition to defining a new class, the class keyword creates a new constant to refer to the class. The class | |
name and the constant name are the same, so all class names must begin with a capital letter. Within the body of | |
a class, but outside of any instance methods defined by the class, the self keyword refers to the class being | |
defined. | |
The “constructor” in Ruby, it is done with an initialize method: | |
class Point | |
@@n = 0 # Classvariable: How many points have been created | |
def initialize(x,y) | |
@x, @y = x, y # Instancevariables (Inside of methods) | |
@@n += 1 | |
end | |
ORIGIN = Point.new(0,0) # Constant | |
def x # The accessor (or getter) method for @x | |
@x | |
end | |
def y # The accessor method for @y | |
@y | |
end | |
def x=(value) # The setter method for @x | |
@x = value | |
end | |
def y=(value) # The setter method for @y | |
@y = value | |
end | |
end | |
p = Point.new(0,0) | |
origin = Point::ORIGIN | |
Point::ORIGIN.instance_variables # => ["@y", "@x"] | |
Point.class_variables # => ["@@n"] | |
Point.constants # => ["ORIGIN"] | |
If the initialize method would be written in ruby actully it would look something like: | |
def new(*args) | |
o = self.allocate # Create a new object of this class | |
o.initialize(*args) # Call the object's initialize method with our args | |
o # Return new object; ignore return value of initialize | |
end | |
The combination of instance variable with trivial getter and setter methods is so common that Ruby provides a | |
way to automate it. The attr_reader and attr_accessor methods are defined by the Module class, which is extended | |
by the Class class. | |
class Point | |
attr_accessor :x, :y # Define accessor methods for a mutable object | |
end | |
class Point | |
attr_reader :x, :y # Define reader methods for a immutable object | |
end | |
If you want to define a new instance method of a class or module, use define_method. This instance method of Module | |
takes the name of the new method (as a Symbol) as its first argument. | |
# Add an instance method named m to class c with body b | |
def add_method(c, m, &b) | |
c.class_eval { | |
define_method(m, &b) | |
} | |
end | |
add_method(String, :greet) { "Hello, " + self } | |
"world".greet # => "Hello, world" | |
define_method is used mostly in metaprogramming contexts and is preferably used instead of method_missing because | |
method_missing could make a program behave in strange ways if you don´t also define other methods such as respond_to? and so | |
on. Here is another example of define_method so you really get a grip of this powerful method. | |
class Multiplier | |
def self.create_multiplier(n) # Creates a classmethod, more about these methods later | |
define_method("times_#{n}") do |val| | |
val * n | |
end | |
end | |
create_multiplier(2) | |
create_multiplier(3) | |
end | |
m = Multiplier.new | |
puts m.times_2(3) # => 6 | |
puts m.times_3(4) # => 12 | |
The attr_reader and attr_accessor methods also define new methods for a class. Like define_method, these are private | |
methods of Module and can easily be implemented. This is a metaprogramming aspect of Ruby that lets you write code that | |
writes code. They accept attribute names as their arguments, and dynamically create methods with those names. If you | |
dont want a class to be dynamically changed you could use the freeze method on the class. Once frozen, a class cannot be | |
altered. | |
Here is another example of metaprogramming to illustrate how it works: | |
class Module | |
private # The methods that follow are both private | |
# This method works like attr_reader, but has a shorter name | |
def readonly(*syms) | |
return if syms.size == 0 # If no arguments, do nothing | |
code = "" # Start with an empty string of code | |
syms.each do |s| | |
code << "def #{s}; @#{s}; end\n" # The method definition | |
end | |
# Finally, class_eval the generated code to create instance methods. | |
class_eval code | |
end | |
# This method works like attr_accessor, but has a shorter name. | |
def readwrite(*syms) | |
return if syms.size == 0 | |
code = "" | |
syms.each do |s| | |
code << "def #{s}; @#{s} end\n" | |
code << "def #{s}=(value); @#{s} = value; end\n" | |
end | |
class_eval code | |
end | |
end | |
You might wonder how come the getter knows what @x it is referring as Ruby is strictly objectoriented and there | |
is no target for the method, but that is not the whole truth since self is always automatically being invoced if | |
no target exists. | |
In addition to being automatically invoked by Point.new, the initialize method is automatically made private. An | |
object can call initialize on itself, but you cannot explicitly call initialize on p to reinitialize its state. | |
Instance variables always begin with @, and they always “belong to” whatever object self refers to. In statically | |
typed languages, you must declare your variables, including instance variables. In Ruby variables don’t need to | |
be declared. In fact if you do initialize them it means that you are doing so outside of a instance method where | |
the self keyword is referring to the class itself and not to the instance. Therefore the variables outside of the | |
instance methods and the variables inside of instance methods are referring to different variables. Class variables, | |
are for example always evaluated in reference to the class object created by the enclosing class definition | |
statement. Class variables are shared by a class and all of its subclasses. If a class A defines a variable @@a, | |
then subclass B can use that variable. But the difference from inherited instance variables is that if the subclass | |
changes the class variable then it shows in the superclass also. It is really shared. | |
Class instance variables are instance variables used inside a class definition but outside an instance method | |
definition is a class instance variable. Like class variables, class instance variables are associated with the | |
class rather than with any particular instance of the class. Because they are prefixed with @ it is very easy to | |
confuse them with intancevariables. Without the distinctive punctuation prefixes, it may be more difficult to | |
remember whether a variable is associated with instances or with the class object. One of the most important | |
advantages of class instance variables over class variables has to do with the confusing behavior of class | |
variables when subclassing an existing class. If we use class instance variables instead for class variables the | |
only difficulty is that because class instance variables cannot be used from instance methods, we must move the | |
statistics gathering code out of the initialize method (which is an instance method): | |
class Point | |
@n = 0 | |
def initialize(x,y) # Initialize method | |
@x,@y = x, y # Sets initial values for instance variables | |
end | |
def self.new(x,y) # Class method to create new Point objects | |
@n += 1 | |
super # Invoke the real definition of new to create a Point | |
end | |
# other methods | |
end | |
Method visibility and Inheritance | |
class Point | |
# public methods... | |
protected | |
# protected methods... | |
private | |
# private methods... | |
end | |
or | |
class Point | |
def example_method | |
nil | |
end | |
private :example_method # now its private | |
end | |
To extend a class | |
class Point3D < Point | |
end | |
It is also perfectly reasonable to define an abstract class that invokes certain undefined “abstract” methods, | |
which are left for subclasses to define. | |
class AbstractGreeter | |
def greet | |
puts "#{greeting} #{who}" | |
end | |
end | |
# A concrete subclass | |
class WorldGreeter < AbstractGreeter | |
def greeting; "Hello"; end | |
def who; "World"; end | |
end | |
WorldGreeter.new.greet # Displays "Hello World" | |
Private methods cannot be invoked from outside the class that defines them. But they are inherited by subclasses. | |
This means that subclasses can invoke them and can override them. Sometimes when we override a method, we don’t | |
want to replace it altogether, we just want to augment its behavior by adding some new code. In order to do this, | |
we need a way to invoke the overridden method from the overriding method. This is known as chaining, and it is | |
accomplished with the keyword super. Super works like a special method invocation: it invokes a method with the | |
same name as the current one, in the superclass of the current class. | |
class Point3D < Point | |
def initialize(x,y,z) | |
super | |
@z = z; | |
end | |
end | |
If you use super as a bare keyword—with no arguments and no parentheses—then all of the arguments that were passed | |
to the current method are passed to the superclass method. If the method has modified the parameters then that will | |
affect the supermethod. If you want to pass zero arguments to the supermethod you must specify it with empty | |
parantesis. | |
Module, Class, and Object implement several callback methods, or hooks. These methods are not defined by default, | |
but if you define them for a module, class, or object, then they will be invoked when certain events occur. When a | |
new class is defined, Ruby invokes the class method inherited on the superclass of the new class, passing the new | |
class object as the argument. This allows classes to add behavior to or enforce constraints on their descendants. | |
Class methods are inherited, so that the an inherited method will be invoked if it is defined by any of the ancestors | |
of the new class. Define Object.inherited to receive notification of all new classes that are defined: | |
def Object.inherited(c) | |
puts "class #{c} < #{self}" | |
end | |
def String.method_added(name) | |
puts "New instance method #{name} added to String" | |
end | |
If you want to check an objects methods you could use the Ruby language reflective capabilities. | |
o.methods # => [ names of all public methods ] | |
o.public_methods # => the same thing | |
o.public_methods(false) # Exclude inherited methods | |
o.protected_methods # => []: there aren't any | |
o.private_methods # => array of all private methods | |
o.private_methods(false) # Exclude inherited private methods | |
String.instance_methods == "s".public_methods # => true | |
String.instance_methods(false) == "s".public_methods(false) # => true | |
String.public_instance_methods == String.instance_methods # => true | |
String.protected_instance_methods # => [] | |
String.private_instance_methods(false) # => ["initialize_copy", | |
# "initialize"] | |
String.public_method_defined? :reverse # => true | |
String.protected_method_defined? :reverse # => false | |
String.private_method_defined? :initialize # => true | |
String.method_defined? :upcase! # => true | |
Arrayifying, Hash access, and Equlity (Ducktyping) | |
If you want to make the Point class to behave like an array or hash or even give the class an own iterator you | |
could add methods that makes it possible, for example: | |
def [](index) | |
case index | |
when 0, -2: @x # Index 0 (or -2) is the X coordinate | |
when 1, -1: @y # Index 1 (or -1) is the Y coordinate | |
when :x, "x": @x # Hash keys as symbol or string for X | |
when :y, "y": @y # Hash keys as symbol or string for Y | |
else nil # Arrays and hashes just return nil on bad indexes | |
end | |
end | |
def each | |
yield @x | |
yield @y | |
end | |
p = Point.new(1,2) | |
p.each {|x| print x } # Prints "12" | |
This approach is sometimes called “duck typing,” after the adage “if it walks like a duck and quacks like a | |
duck, it must be a duck.” More importantly, defining the each iterator allows us to mix in the methods of the | |
Enumerable module, all of which are defined in terms of each. Our class gains over 20 iterators by adding a | |
single line: | |
include Enumerable | |
If we do this, then we can write interesting code like this: | |
# Is the point P at the origin? | |
p.all? {|x| x == 0 } # True if the block is true for all elements | |
Here is an == method for Point: | |
def ==(o) # Is self == o? | |
if o.is_a? Point # If o is a Point object | |
@x==o.x && @y==o.y # then compare the fields. | |
elsif # If o is not a Point | |
false # then, by definition, self != o. | |
end | |
end | |
A more liberal definition of equality would support duck typing. Some caution is required, however. Our == | |
method should not raise a NoMethodError if the argument object does not have x and y methods. Instead, it | |
should simply return false: | |
def ==(o) # Is self == o? | |
@x == o.x && @y == o.y # Assume o has proper x and y methods | |
rescue # If that assumption fails | |
false # Then self != o | |
end | |
Another way of implementing equality is by defining <=> method and including the Comparable module: | |
include Comparable # Mix in methods from the Comparable module. | |
# Define an ordering for points based on their distance from the origin. | |
# This method is required by the Comparable module. | |
def <=>(other) | |
return nil unless other.instance_of? Point | |
@x**2 + @y**2 <=> other.x**2 + other.y**2 | |
end | |
Our distance-based comparison operator results in an == method that considers the points (1,0) and (0,1) to | |
be equal. | |
Because eql? is used for hashes, you must never implement this method by itself. If you define an eql? | |
method, you must also define a hash method to compute a hashcode for your object. If two objects are equal | |
according to eql?, then their hash methods must return the same value. | |
def hash | |
code = 17 | |
code = 37*code + @x.hash | |
code = 37*code + @y.hash | |
# Add lines like this for each significant instance variable | |
code # Return the resulting code | |
end | |
Structs | |
If you want a mutable Point class, one way to create it is with Struct. Struct is a core Ruby class | |
that generates other classes. | |
Struct.new("Point", :x, :y) # Creates new class Struct::Point | |
Point = Struct.new(:x, :y) # Creates new class, assigns to Point | |
p = Point.new(1,2) # => #<struct Point x=1, y=2> | |
The second line in the code relies on a curious fact about Ruby classes: if you assign an unnamed class | |
object to a constant. Structs also define the [] and []= operators for array and hash-style indexing, a | |
working == operator, a helpful to_s, and even provide each and each_pair iterators. | |
We can make a Struct-based class immutable: | |
Point = Struct.new(:x, :y) # Define mutable class | |
class Point # Open the class | |
undef x=,y=,[]= # Undefine mutator methods | |
end | |
Class methods | |
To define a class method for the Point class, what we are really doing is defining a singleton method | |
of the Point object. Class methods are invoked implicitly on self, and the value of self in a class method | |
is the class on which it was invoked. | |
class Point | |
attr_reader :x, :y | |
def self.sum(*points) # Return the sum of an arbitrary number of points | |
x = y = 0 | |
points.each {|p| x += p.x; y += p.y } | |
Point.new(x,y) | |
end | |
end | |
total = Point.sum(p1, p2, p3) | |
Within the body of a class method, you may invoke the other class methods of the class without an explicit | |
receiver. | |
class Point3D < Point | |
def self.sum(*points2D) | |
superclass.sum(*points2D) | |
end | |
end | |
There is yet another technique for defining class methods. Though it is less clear than the previously shown | |
technique, it can be handy when defining multiple class methods. | |
class << Point # Syntax for adding methods to a single object | |
def sum(*points) # This is the class method Point.sum | |
x = y = 0 | |
points.each {|p| x += p.x; y += p.y } | |
Point.new(x,y) | |
end | |
# Other class methods can be defined here | |
end | |
Another way of doing the same thing: | |
class Point | |
# Instance methods go here | |
class << self | |
# Class methods go her e | |
end | |
end | |
Clone and Dup | |
These methods allocate a new instance of the class of the object on which they are invoked. They then copy all | |
the instance variables and the taintedness of the receiver object to the newly allocated object. clone takes this | |
copying a step further than dup—it also copies singleton methods of the receiver object and freezes the copy | |
object if the original is frozen. | |
animal = Object.new | |
def animal.nr_of_feet=(feet) | |
@feet = feet | |
end | |
def animal.nr_of_feet | |
@feet | |
end | |
animal.nr_of_feet = 4 | |
felix = animal.clone | |
felix.nr_of_feet # => 4 | |
What we get here is a more powerful, or differenet kind of inheritance not much unlike the one used in JavaScript, | |
called prototypal inheritance. | |
If a class defines a method named initialize_copy, then clone and dup will invoke that method on the copied object | |
after copying the instance variables from the original. clone calls initialize_copy before freezing the copy object, | |
so that initialize_copy is still allowed to modify it. Like initialize, Ruby ensures that initialize_copy is always | |
private. | |
def initialize_copy(orig) # If someone copies this Point object | |
@feet = @feet.dup # Make a copy of the nr of feet too | |
end | |
Modules | |
The difference between Modules and classes is that a Module can not be instantiated and cannot be subclassed. | |
Modules are used as namespaces and mixins. Class is a subclass of Module. | |
Namespaces | |
module Base64 | |
class Encoder | |
def encode | |
end | |
end | |
class Decoder | |
def decode | |
end | |
end | |
def Base64.helper | |
end | |
end | |
By structuring our code this way, we’ve defined two new classes, Base64::Encoder and Base64::Decoder. Because | |
classes are modules, they too can be nested. Nesting one class within another only affects the namespace of the | |
inner class; it does not give that class any special access to the methods or variables of the outer class. | |
Mixins | |
If a module defines instance methods instead of the class methods, those instance methods can be mixed in to | |
other classes. Enumerable and Comparable are well-known examples of mixin modules. | |
class Point | |
include Comparable | |
end | |
When a module is included into a class or into another module, the included class method of the included module | |
is invoked with the class or module object into which it was included as an argument. This gives the included | |
module an opportunity to augment or alter the class in whatever way it wants—it effectively allows a module to | |
define its own meaning for include. | |
module Final # A class that includes Final can't be subclassed | |
def self.included(c) # When included in class c | |
c.instance_eval do # Define a class method of c | |
def inherited(sub) # To detect subclasses | |
raise Exception, # And abort with an exception | |
"Attempt to create subclass #{sub} of Final class #{self}" | |
end | |
end | |
end | |
end | |
Load and Require | |
Ruby programs may be broken up into multiple files, and the most natural way to partition a program is to place | |
each nontrivial class or module into a separate file. These separate files can then be reassembled into a single | |
program using load and require keywords. These are global functions defined in Kernel, but are used like language | |
keywords. | |
There are some differences between load and require. require can also load binary extensions to Ruby. | |
load expects a complete filename including an extension. require is usually passed a library name, with no | |
extension, rather than a filename. In that case, it searches for a file that has the library name as its base | |
name and an appropriate source or native library extension. load can load the same file multiple times. require | |
tries to prevent multiple loads of the same file. require keeps track of the files that have been loaded by | |
appending them to the global array $" (also known as $LOADED_FEATURES). load does not do this. | |
Files loaded with load or require are executed in a new top-level scope that is different from the one in which | |
load or require was invoked. The loaded file can see all global variables and constants that have been defined at | |
the time it is loaded, but it does not have access to the local scope from which the load was initiated. | |
The autoload methods of Kernel and Module allow lazy loading of files on an as-needed basis. When the autoload | |
funcion is used the first time it registers the in a constant through require. | |
# Require 'socket' if and when the TCPSocket is first used | |
autoload :TCPSocket, "socket" | |
Use autoload? or Module.autoload? to test whether a reference to a constant will cause a file to be loaded. This | |
method expects a symbol argument. If a file will be loaded when the constant named by the symbol is referenced, then | |
autoload? returns the name of the file otherwise nil. | |
Loadpaths | |
Ruby’s load path is an array that you can access using either of the global variables $LOAD_PATH or $:. Each | |
element of the array is the name of a directory that Ruby will search for files to load. The /usr/lib/ruby/1.8/ | |
directory is where the Ruby standard library is installed. The /usr/lib/ruby/1.8/i386-linux/ directory holds Linux | |
binary extensions for the standard library. The site_ruby directories in the path are for site-specific libraries | |
that you have installed. The more significant load path change in Ruby 1.9 is the inclusion of RubyGems | |
installation directories. RubyGems is built into Ruby 1.9: the gem command is distributed with Ruby and can be used | |
to install new packages whose installation directories are automatically added to the default load path. | |
Eigenclass | |
We learned that you could apply a singleton method on a single object. The singleton methods of an object are not | |
defined by the class of that object. But they are methods and they must be associated with a class of some sort. | |
The singleton methods of an object are instance methods of the anonymous eigenclass associated with that object. | |
The eigenclass is also called the singleton class or (less commonly) the metaclass. Ruby defines a syntax for | |
opening the eigenclass of an object and adding methods to it. | |
To open the eigenclass of the object o, use class << o. For example, we can define class methods of Point like this: | |
class << Point | |
def class_method # This is an instance method of the eigenclass. | |
end # It is also a class method of Point. | |
end | |
We can formalize this into a method of Object, so that we can ask for the eigenclass of any object: | |
class Object | |
def eigenclass | |
class << self; self; end | |
end | |
end | |
Unless you are doing sophisticated metaprogramming with Ruby, you are unlikely to really need an eigenclass. | |
Other useful stuff | |
Threads | |
Ruby makes it easy to write multi-threaded programs with the Thread class. To start a new thread, just associate a block | |
with a call to Thread.new. | |
# Thread 1 is running here | |
Thread.new { | |
# Thread #2 runs this code | |
} | |
# Thread 1 runs this code | |
A thread runs the code in the block associated with the call to Thread.new and then it stops running. The value of the | |
last expression in that block is the value of the thread, and can be obtained by calling the value method of the Thread | |
object. If the thread has run to completion, then the value returns the thread’s value right away. Otherwise, the | |
value method blocks and does not return until the thread has completed. One of the key features of threads is that they | |
can share access to variables. Because threads are defined by blocks, they have access to whatever variables (local | |
variables, instance variables, global variables, and so on) are in the scope of the block. | |
x = 0 | |
t1 = Thread.new do | |
x++ | |
end | |
t2 = Thread.new do | |
x-- | |
end | |
But if you run the following code | |
n = 1 | |
while n <= 3 | |
Thread.new { puts n } | |
n += 1 | |
end | |
It is not certain that the code will run always as expected. In some cricumstances it may print out 4, 4, 4 instead of 1, | |
2, 3 because the threads may not as predictably as sequential code. One way of fixing this would be to make the variable | |
private: | |
n = 1 | |
while n <= 3 | |
# Get a private copy of the current value of n in x | |
Thread.new(n) {|x| puts x } | |
n += 1 | |
end | |
The class method Thread.current returns the Thread object that represents the current thread. This allows threads to | |
manipulate themselves. The class method Thread.main returns the Thread object that represents the main thread—this is the | |
initial thread of execution that began when the Ruby program was started. | |
The main thread is special: the Ruby interpreter stops running when the main thread is done. You must ensure, therefore, | |
that your main thread does not end while other threads are still running. We’ve already mentioned that you can call the | |
value method of a thread to wait for it to finish. If you don’t care about the value of your threads, you can wait with | |
the join method instead. | |
def join_all | |
main = Thread.main # The main thread | |
current = Thread.current # The current thread | |
all = Thread.list # All threads still running | |
all.each {|t| t.join unless t == current or t == main } | |
end | |
If an exception is raised in the main thread, and is not handled anywhere, the Ruby interpreter prints a message and | |
exits. In threads other than the main thread, unhandled exceptions cause the thread to stop running. If a thread t exits | |
because of an unhandled exception, and another thread s calls t.join or t.value, then the exception that occurred in t is | |
raised in the thread s. If you want an unhandled exception in any thread to cause the interpreter to stop: | |
Thread.abort_on_exception = true | |
If you want the interpreter to stop on a particular tread t use: | |
t.abort_on_exception = true | |
When true parallel processing is not possible, it is simulated by sharing a CPU among threads. The process for sharing a | |
CPU among threads is called thread scheduling. The first factor that affects thread scheduling is thread priority: high- | |
priority threads are scheduled before low-priority threads. Set and query the priority of a Ruby Thread object with | |
priority= and priority. Note that there is no way to set the priority of a thread before it starts running. A newly | |
created thread starts at the same priority as the thread that created it. The main thread starts off at priority 0. Under | |
Linux, for example, nonprivileged threads cannot have their priorities raised or lowered. So in Ruby 1.9 (which uses | |
native threads) on Linux, the thread priority setting is ignored. | |
A Ruby thread may be in one of five possible states. The two most interesting states are for live threads: a thread that | |
is alive is runnable or sleeping. A runnable thread is one that is currently running, or that is ready and eligible to run | |
the next time there are CPU resources for it. A sleeping thread is one that is sleeping, that is waiting for I/O, or that | |
has stopped itself. There are two thread states for threads that are no longer alive. A terminated thread has either | |
terminated normally or has terminated abnormally with an exception. Finally, there is one transitional state. A thread | |
that has been killed but that has not yet terminated is said to be aborting. | |
Calling Thread.stop is effectively the same thing as calling Kernel.sleep with no argument: the thread pauses forever. | |
Threads also temporarily enter the sleeping state if they call Kernel.sleep with an argument. In this case, they | |
automatically wake up and reenter the runnable state after (approximately) the specified number of seconds pass. way for a | |
thread to terminate normally is by calling Thread.exit. Note that any ensure clauses are processed before a thread exits | |
in this way. A thread can forcibly terminate another thread by invoking the instance method killon the thread to be | |
terminated. terminate and exit are synonyms for kill. The Thread.list method returns an array of Thread objects | |
representing all live (running or sleeping) threads. | |
If you want to impose some order onto a subset of threads, you can create a ThreadGroup object and add threads to it: | |
group = ThreadGroup.new | |
3.times {|n| group.add(Thread.new { do_task(n) }} | |
New threads are initially placed in the group to which their parent belongs. | |
Thread are normally used for IO bound programs. Here are some examples of use. | |
def conread(filenames) | |
h = {} # Empty hash of results | |
# Create one thread for each file | |
filenames.each do |filename| # For each named file | |
h[filename] = Thread.new do # Create a thread, map to filename | |
open(filename) {|f| f.read } # Open and read the file | |
end # Thread value is file contents | |
end | |
Module afterevery | |
# Execute block after sleeping the specified number of seconds. | |
def after(seconds, &block) | |
Thread.new do # In a new thread... | |
sleep(seconds) # First sleep | |
block.call # Then call the block | |
end # Return the thread | |
end | |
# Repeatedly sleep and then execute the block. | |
# Pass value to the block on the first invocation. | |
# On subsequent invocations, pass the value of the previous invocation. | |
def every(seconds, value=nil, &block) | |
Thread.new do # In a new thread... | |
loop do # Loop forever (or until break in block) | |
sleep(seconds) # Sleep | |
value = block.call(value) # And invoke block | |
end # Then repeat.. | |
end # every returns the Thread | |
end | |
end | |
require 'afterevery' | |
1.upto(5) {|i| after i { puts i} } # Slowly print the numbers 1 to 5 | |
sleep(5) # Wait five seconds | |
every 1, 6 do |count| # Now slowly print 6 to 10 | |
puts count | |
break if count == 10 | |
count + 1 # The next value of count | |
end | |
sleep(6) # Give the above time to run | |
When writing programs that use multiple threads, it is important that two threads do not attempt to modify the same | |
object at the same time. One way to do this is to place the code that must be made thread-safe in a block associated | |
with a call to the synchronize method of a Mutex object. | |
class BankAccount | |
def init(name, checking, savings) | |
@name,@checking,@savings = name,checking,savings | |
@lock = Mutex.new # For thread safety | |
end | |
# Lock account and transfer money from savings to checking | |
def transfer_from_savings(x) | |
@lock.synchronize { | |
@savings -= x | |
@checking += x | |
} | |
end | |
# Lock account and report current balances | |
def report | |
@lock.synchronize { | |
"#@name\nChecking: #@checking\nSavings: #@savings" | |
} | |
end | |
end | |
When writing programs that use multiple threads, it is important that two threads do not attempt to modify the same object | |
at the same time. One way to do this is to place the code that must be made thread-safe in a block associated with a call | |
to the synchronize method of a Mutex object.Another example but dynamically programs the Object class to emulate Java’s | |
synchronized keyword with a global method named synchronized. | |
class Object | |
# Return the Mutex for this object, creating it if necessary. | |
def mutex | |
# If this object already has a mutex, just return it | |
return @__mutex if @__mutex | |
# Otherwise, we've got to create a mutex for the object. | |
# To do this safely we've got to synchronize on our class object. | |
synchronized(self.class) { | |
@__mutex = @__mutex || Mutex.new | |
} | |
# The return value is @__mutex | |
end | |
end | |
require 'thread' # Ruby 1.8 keeps Mutex in this library | |
# This works like the synchronized keyword of Java. | |
def synchronized(o) | |
o.mutex.synchronize { yield } | |
end | |
# The Object.mutex method defined above needs to lock the class | |
# if the object doesn't have a Mutex yet. If the class doesn't have | |
# its own Mutex yet, then the class of the class (the Class object) | |
# will be locked. In order to prevent infinite recursion, we must | |
# ensure that the Class object has a mutex. | |
Class.instance_eval { @__mutex = Mutex.new } | |
Another way of doing it the Ruby way is by invoking the method_missing method. This way the class SynchronizedObject | |
modifies this method so that, when invoked without a block, it returns a SynchronizedObject wrapper around the object. | |
SynchronizedObject is a delegating wrapper class based on method_missing. | |
class SynchronizedObject < BasicObject | |
def initialize(o); @delegate = o; end | |
def __delegate; @delegate; end | |
def method_missing(*args, &block) | |
@delegate.mutex.synchronize { | |
@delegate.send *args, &block | |
} | |
end | |
end | |
def synchronized(o) | |
if block_given? | |
o.mutex.synchronize { yield } | |
else | |
SynchronizedObject.new(o) | |
end | |
end | |
Now you may wonder over the send method use in the example. send invokes on its receiver the method named by its first | |
argument, passing any remaining arguments to that method. | |
"hello".send :upcase # => "HELLO": invoke an instance method | |
Math.send(:sin, Math::PI/2) # => 1.0: invoke a class method | |
Tracing | |
The trace method returns an instance of TracedObject that uses method_missing to catch invocations, trace them, and | |
delegate them to the object being traced. You might use it like this for debugging: | |
class TracedObject | |
# Undefine all of our noncritical public instance methods. | |
# Note the use of Module.instance_methods and Module.undef_method. | |
instance_methods.each do |m| | |
m = m.to_sym # Ruby 1.8 returns strings, instead of symbols | |
next if m == :object_id || m == :__id__ || m == :__send__ | |
undef_method m | |
end | |
# Initialize this TracedObject instance. | |
def initialize(o, name, stream) | |
@o = o # The object we delegate to | |
@n = name # The object name to appear in tracing messages | |
@trace = stream # Where those tracing messages are sent | |
end | |
# This is the key method of TracedObject. It is invoked for just | |
# about any method invocation on a TracedObject. | |
def method_missing(*args, &block) | |
m = args.shift # First arg is the name of the method | |
begin | |
# Trace the invocation of the method. | |
arglist = args.map {|a| a.inspect}.join(', ') | |
@trace << "Invoking: #{@n}.#{m}(#{arglist}) at #{caller[0]}\n" | |
# Invoke the method on our delegate object and get the return value. | |
r = @o.send m, *args, &block | |
# Trace a normal return of the method. | |
@trace << "Returning: #{r.inspect} from #{@n}.#{m} to #{caller[0]}\n" | |
# Return whatever value the delegate object returned. | |
r | |
rescue Exception => e | |
# Trace an abnormal return from the method. | |
@trace << "Raising: #{e.class}:#{e} from #{@n}.#{m}\n" | |
# And re-raise whatever exception the delegate object raised. | |
raise | |
end | |
end | |
# Return the object we delegate to. | |
def __delegate | |
@o | |
end | |
end | |
class Object | |
def trace(name="", stream=STDERR) | |
# Return a TracedObject that traces and delegates everything else to us. | |
TracedObject.new(self, name, stream) | |
end | |
end | |
a = [1,2,3].trace("a") | |
a.reverse | |
puts a[2] | |
puts a.fetch(3) | |
This produces the following tracing output: | |
Invoking: a.reverse() at trace1.rb:66 | |
Returning: [3, 2, 1] from a.reverse to trace1.rb:66 | |
Invoking: a.fetch(3) at trace1.rb:67 | |
Raising: IndexError:index 3 out of array from a.fetch | |
Eval | |
eval is a very powerful function, but unless you are actually writing a shell program (like irb) that executes lines | |
of Ruby code entered by a user you are unlikely to really need it. | |
x = 1 | |
eval "x + 1" # => 2 | |
A Binding object represents the state of Ruby’s variable bindings at some moment. The Kernel.binding object returns | |
the bindings in effect at the location of the call. You may pass a Binding object as the second argument to eval, and | |
the string you specify will be evaluated in the context of those bindings. For example to peek inside of a object: | |
class Object # Open Object to add a new method | |
def bindings # Note plural on this method | |
binding # This is the predefined Kernel method | |
end | |
end | |
class Test # A simple class with an instance variable | |
def initialize(x); @x = x; end | |
end | |
t = Test.new(10) # Create a test object | |
eval("@x", t.bindings) # => 10: We've peeked inside t | |
The Object class defines a method named instance_eval, and the Module class defines a method named class_eval. Both | |
of these methods evaluate Ruby code, like eval does, but there are two important differences. The first difference is | |
that they evaluate the code in the context of the specified object or in the context of the specified module—the object | |
or module is the value of self while the code is being evaluated. | |
o.instance_eval("@x") # Return the value of o's instance variable @x | |
# Define an instance method len of String to return string length | |
String.class_eval("def len; size; end") | |
String.class_eval("alias len size") | |
Monkey Patching | |
As we’ve seen, metaprogramming in Ruby often involves the dynamic definition of methods. Just as common is the dynamic | |
modification of methods. Methods are modified with a technique we’ll call alias chaining.* It works like this: | |
• First, create an alias for the method to be modified. This alias provides a name for | |
the unmodified version of the method. | |
• Next, define a new version of the method. This new version should call the | |
unmodified version through the alias, but it can add whatever functionality is | |
needed before and after it does that. | |
class Foo | |
def bar | |
'Hello' | |
end | |
end | |
class Foo | |
alias_method :old_bar, :bar | |
def bar | |
old_bar + ' World' | |
end | |
end | |
Foo.new.bar # => 'Hello World' | |
Foo.new.old_bar # => 'Hello' | |
One way of using this could be for example to write a traceprogram of your program: | |
module ClassTrace | |
# This array holds our list of files loaded and classes defined. | |
T = [] # Array to hold the files loaded | |
# Now define the constant OUT to specify where tracing output goes. | |
# This defaults to STDERR, but can also come from command-line arguments | |
if x = ARGV.index("--traceout") # If argument exists | |
OUT = File.open(ARGV[x+1], "w") # Open the specified file | |
ARGV[x,2] = nil # And remove the arguments | |
else | |
OUT = STDERR # Otherwise default to STDERR | |
end | |
end | |
# Alias chaining step 1: define aliases for the original methods | |
alias original_require require | |
alias original_load load | |
# Alias chaining step 2: define new versions of the methods | |
def require(file) | |
ClassTrace::T << [file,caller[0]] # Remember what was loaded where | |
original_require(file) # Invoke the original method | |
end | |
def load(*args) | |
ClassTrace::T << [args[0],caller[0]] # Remember what was loaded where | |
original_load(*args) # Invoke the original method | |
end | |
# This hook method is invoked each time a new class is defined | |
def Object.inherited(c) | |
ClassTrace::T << [c,caller[0]] # Remember what was defined where | |
end | |
# Kernel.at_exit registers a block to be run when the program exits | |
# We use it to report the file and class data we collected | |
at_exit { | |
o = ClassTrace::OUT | |
o.puts "="*60 | |
o.puts "Files Loaded and Classes Defined:" | |
o.puts "="*60 | |
ClassTrace::T.each do |what,where| | |
if what.is_a? Class # Report class (with hierarchy) defined | |
o.puts "Defined: #{what.ancestors.join('<-')} at #{where}" | |
else # Report file loaded | |
o.puts "Loaded: #{what} at #{where}" | |
end | |
end | |
} | |
DSL´s | |
The goal of metaprogramming in Ruby is often the creation of domain-specific languages, or DSLs. A DSL is just an | |
extension of Ruby’s syntax (with methods that look like keywords) or API that allows you to solve a problem or | |
represent data more naturally than you could otherwise. For our examples, we’ll take the problem domain to be the | |
output of XML formatted data, and we’ll define two DSLs—one very simple and one more clever—to tackle this problem. | |
method_missing variant: | |
pagetitle = "Test Page for XML.generate" | |
XML.generate(STDOUT) do | |
html do | |
head do | |
title { pagetitle } | |
comment "This is a test" | |
end | |
body do | |
h1(:style => "font-family:sans-serif") { pagetitle } | |
ul :type=>"square" do | |
li { Time.now } | |
li { RUBY_VERSION } | |
end | |
end | |
end | |
end | |
Output: | |
<html><head> | |
<title>Test Page for XML.generate</title> | |
<!-- This is a test --> | |
</head><body> | |
<h1 style='font-family:sans-serif'>Test Page for XML.generate</h1> | |
<ul type='square'> | |
<li>2007-08-19 16:19:58 -0700</li> | |
<li>1.9.0</li> | |
</ul></body></html> | |
The implementation: | |
class XML | |
# Create an instance of this class, specifying a stream or object to | |
# hold the output. This can be any object that responds to <<(String). | |
def initialize(out) | |
@out = out # Remember where to send our output | |
end | |
# Output the specified object as CDATA, return nil. | |
def content(text) | |
@out << text.to_s | |
nil | |
end | |
def comment(text) | |
@out << "<!-- #{text} -->" | |
nil | |
end | |
# Output a tag with the specified name and attributes. | |
# If there is a block invoke it to output or return content. | |
# Return nil. | |
def tag(tagname, attributes={}) | |
@out << "<#{tagname}" | |
attributes.each {|attr,value| @out << " #{attr}='#{value}'" } | |
if block_given? | |
@out << '>' | |
content = yield | |
if content | |
@out << content.to_s | |
end | |
@out << "</#{tagname}>" | |
else | |
@out << '/>' | |
end | |
nil # Tags output themselves, so they don't return any content | |
end | |
# The code below is what changes this from an ordinary class into a DSL. | |
# First: any unknown method is treated as the name of a tag. | |
alias method_missing tag | |
# Second: run a block in a new instance of the class. | |
def self.generate(out, &block) | |
XML.new(out).instance_eval(&block) | |
end | |
end | |
The XML class of is helpful for generating well-formed XML, but it does no error checking to ensure that the output is | |
valid according to any particular XML grammar. A better way would be to define what elements are appropriate. | |
class HTMLForm < XMLGrammar | |
element :form, :action => REQ, | |
:method => "GET", | |
:enctype => "application/x-www-form-urlencoded", | |
:name => OPT | |
element :input, :type => "text", :name => OPT, :value => OPT, | |
:maxlength => OPT, :size => OPT, :src => OPT, | |
:checked => BOOL, :disabled => BOOL, :readonly => BOOL | |
element :textarea, :rows => REQ, :cols => REQ, :name => OPT, | |
:disabled => BOOL, :readonly => BOOL | |
element :button, :name => OPT, :value => OPT, | |
:type => "submit", :disabled => OPT | |
end | |
How to use it: | |
HTMLForm.generate(STDOUT) do | |
comment "This is a simple HTML form" | |
form :name => "registration", | |
:action => "http://www.example.com/register.cgi" do | |
content "Name:" | |
input :name => "name" | |
content "Address:" | |
textarea :name => "address", :rows=>6, :cols=>40 do | |
"Please enter your mailing address here" | |
end | |
button { "Submit" } | |
end | |
end | |
The implementation: | |
class XMLGrammar | |
# Create an instance of this class, specifying a stream or object to | |
# hold the output. This can be any object that responds to <<(String). | |
def initialize(out) | |
@out = out # Remember where to send our output | |
end | |
# Invoke the block in an instance that outputs to the specified stream. | |
def self.generate(out, &block) | |
new(out).instance_eval(&block) | |
end | |
# Define an allowed element (or tag) in the grammar. | |
# This class method is the grammar-specification DSL | |
# and defines the methods that constitute the XML-output DSL. | |
def self.element(tagname, attributes={}) | |
@allowed_attributes ||= {} | |
@allowed_attributes[tagname] = attributes | |
class_eval %Q{ | |
def #{tagname}(attributes={}, &block) | |
tag(:#{tagname},attributes,&block) | |
end | |
} | |
end | |
# These are constants used when defining attribute values. | |
OPT = :opt # for optional attributes | |
REQ = :req # for required attributes | |
BOOL = :bool # for attributes whose value is their own name | |
def self.allowed_attributes | |
@allowed_attributes | |
end | |
# Output the specified object as CDATA, return nil. | |
def content(text) | |
@out << text.to_s | |
nil | |
end | |
# Output the specified object as a comment, return nil. | |
def comment(text) | |
@out << "<!-- #{text} -->" | |
nil | |
end | |
# Output a tag with the specified name and attribute. | |
# If there is a block, invoke it to output or return content. | |
# Return nil. | |
def tag(tagname, attributes={}) | |
# Output the tag name | |
@out << "<#{tagname}" | |
# Get the allowed attributes for this tag. | |
allowed = self.class.allowed_attributes[tagname] | |
# First, make sure that each of the attributes is allowed. | |
# Assuming they are allowed, output all of the specified ones. | |
attributes.each_pair do |key,value| | |
raise "unknown attribute: #{key}" unless allowed.include?(key) | |
@out << " #{key}='#{value}'" | |
end | |
# Now look through the allowed attributes, checking for | |
# required attributes that were omitted and for attributes with | |
# default values that we can output. | |
allowed.each_pair do |key,value| | |
# If this attribute was already output, do nothing. | |
next if attributes.has_key? key | |
if (value == REQ) | |
raise "required attribute '#{key}' missing in <#{tagname}>" | |
elsif value.is_a? String | |
@out << " #{key}='#{value}'" | |
end | |
end | |
if block_given? | |
# This block has content | |
@out << '>' # End the opening tag | |
content = yield # Invoke the block to output or return content | |
if content # If any content returned | |
@out << content.to_s # Output it as a string | |
end | |
@out << "</#{tagname}>" # Close the tag | |
else | |
# Otherwise, this is an empty tag, so just close it. | |
@out << '/>' | |
end | |
nil # Tags output themselves, so they don't return any content. | |
end | |
end | |
Ruby I/O | |
To obtain a list of files that match a given pattern, use the Dir.[] operator. The pattern is not a regular | |
expression, but is like the file-matching patterns used in shells. “?” matches a single character. “*” matches any | |
number of characters. And “**” matches any number of directory levels. Characters in square brackets are alternatives, | |
as in regular expression. | |
dir['*.data'] # Files with the "data" extension | |
Dir['?'] # Any single-character filename | |
Dir['*.[ch]'] # Any file that ends with .c or .h | |
Dir['*.{java,rb}'] # Any file that ends with .java or .rb | |
Dir['*/*.rb'] # Any Ruby program in any direct sub-directory | |
Dir['**/*.rb'] # Any Ruby program in any descendant directory | |
puts Dir.getwd # Print current working directory | |
Dir.chdir("..") # Change CWD to the parent directory | |
Dir.chdir("../sibling") # Change again to a sibling directory | |
Dir.chdir("/home") # Change to an absolute directory | |
# Get the names of all files in the config/ directory | |
filenames = Dir.entries("config") # Get names as an array | |
Dir.foreach("config") {|filename| ... } # Iterate names | |
File.open("log.txt", "a") do |log| # Open for appending | |
log.puts("INFO: Logging a message") # Output to the file | |
end | |
The Kernel method open works like File.open but is more flexible. If the filename begins with |, it is treated as an | |
operating system command, and the returned stream is used for reading from and writing to that command process. This | |
is platform-dependent, of course: | |
# How long has the server been up? | |
uptime = open("|uptime") {|f| f.gets } | |
If the open-uri library has been loaded, then open can also be used to read from http and ftp URLs as if they were | |
files: | |
require "open-uri" # Required library | |
f = open("http://malinstehn.se/") # Webpage as a file | |
webpage = f.read # Read it as one big string | |
f.close | |
Another way to obtain an IO object is to use the stringio library to read from or write to a string: | |
require "stringio" | |
input = StringIO.open("now is the time") # Read from this string | |
buffer = "" | |
output = StringIO.open(buffer, "w") # Write into buffer | |
The StringIO class is not a subclass of IO, but it defines many of the same methods as IO does, and duck typing | |
usually allows us to use a StringIO object in place of an IO object. | |
Ruby predefines a number of streams that can be used without being created or opened. The global constants STDIN, | |
STDOUT, and STDERR are the standard input stream, the standard output stream, and the standard error stream, | |
respectively. By default, these streams are connected to the user’s console or a terminal window of some sort. | |
The global variables $stdin, $stdout, and $stderr are initially set to the same values as the stream constants. Global | |
functions like print and puts write to $stdout by default. If a script alters the value of this global variable, it | |
will change the behavior of those methods. The true “standard output” will still be available through STDOUT, however. | |
Here follows scripts to show how it works: | |
#!/usr/bin/ruby | |
# file: readline.rb | |
print "Enter your name: " | |
name = gets # In fact $stdin.gets | |
puts "Hello #{name}" # In fact $stdout.puts | |
$ ./readline.rb # running the script | |
Enter your name: Patrik | |
Hello Patrik | |
Another predefined stream is ARGF, or $*. This stream has special behavior intended to make it simple to write scripts | |
that read the files specified on the command line or from standard input. | |
#!/usr/bin/ruby | |
# outputargs.rb | |
puts ARGS | |
a = Array.new($*) | |
puts a.to_s | |
$ ./outputargs.rb hej hopp | |
hej | |
hopp | |
hejhopp | |
In Ruby 1.9, every stream can have two encodings associated with it. These are known as the external and internal | |
encodings, and are returned by the external_encoding and internal_encoding methods of an IO object. The external | |
encoding is the encoding of the text as stored in the file. The internal encoding is the encoding used to represent | |
the text within Ruby. Specify the encoding of any IO object (including pipes and network sockets) with the | |
set_encoding method. With two arguments, it specifies an external encoding and an internal encoding. If the external | |
encoding is also the desired internal encoding, there is no need to specify an internal encoding. If, on the other | |
hand, you’d like the internal representation of the text to be different than the external representation, you can | |
specify an internal encoding and Ruby will transcode from the external to the internal when reading and to the | |
external when writing. | |
f.set_encoding("iso-8859-1", "utf-8") # Latin-1 (external), transcoded to UTF-8 (internal) | |
in = File.open("data.txt", "r:utf-8"); # Read UTF-8 text | |
out = File.open("log", "a:utf-8"); # Write UTF-8 text | |
If you specify no encoding at all, then Ruby defaults to the default external encoding when reading from files, and | |
defaults to no encoding (i.e., the ASCII-8BIT/ BINARY encoding) when writing to files or when reading or writing from | |
pipes and sockets. | |
IO defines a number of ways to read lines from a stream: | |
lines = ARGF.readlines # Read all input, return an array of lines | |
line = DATA.readline # Read one line from stream | |
print l while l = DATA.gets # Read until gets returns nil, at EOF | |
DATA.each {|line| print line } # Iterate lines from stream until EOF | |
The readline and the gets method differ only in their handling of EOF (end-of-file: the condition that occurs when | |
there is no more to read from a stream). gets returns nil if it is invoked on a stream at EOF. readline instead raises | |
an EOFError. You can check whether a stream is already at EOF with the eof? method. The lines returned by these | |
methods include the line terminator (although the last line in a file may not have one). Use String.chomp! to strip it | |
off. The special global variable $/ holds the line terminator. You can set $/ to alter the default behavior of all the | |
line-reading methods, or you can simply pass an alternate separator to any of the methods (including the each | |
iterator). You might do this when reading comma-separated fields from a file, for example, or when reading a binary | |
file that has some kind of “record separator” character. There are two special cases for the line terminator. If you | |
specify nil, then the line-reading methods keep reading until EOF and return the entire contents of the stream as a | |
single line. If you specify the empty string “” as the line terminator, then the line-reading methods read a paragraph | |
at a time, looking for a blank line as the separator. | |
The STDOUT and STDERR streams are writable, as are files opened in any mode other than "r" or "rb". | |
o = STDOUT | |
o.putc("B") # Write single byte 66 (capital B) | |
o.putc("CD") # Write just the first byte of the string (C) | |
o << x # Output x.to_s | |
o << x << y # May be chained: output x.to_s + y.to_s | |
o.print s # Output s.to_s + $\ | |
o.puts x # Output x.to_s.chomp plus newline | |
o.puts x,y # Output x.to_s.chomp, newline, y.to_s.chomp, newline | |
If the output record separator $/ has been changed from its default value of nil, then that value is output after all | |
arguments are printed. | |
When you are done reading from or writing to a stream, you must close it with the close method. This flushes any | |
buffered input or output, and also frees up operating system resources. A number of stream-opening methods allow you | |
to associate a block with them. They pass the open stream to the block, and automatically close the stream when the | |
block exits. Managing streams in this way ensures that they are properly closed even when exceptions are raised: | |
File.open("test.txt") do |f| | |
# Use stream f here | |
end | |
Ruby’s output methods (except syswrite) buffer output for efficiency. The output buffer is flushed at reasonable | |
times, such as when a newline is output or when data is read from a corresponding input stream. There are times, | |
however, when you may need to explicitly flush the output buffer to force output to be sent right away: | |
#!/usr/bin/ruby | |
out = STDOUT | |
out.print 'wait>' # Display a prompt | |
out.flush # Manually flush output buffer to OS | |
sleep(1) # Prompt appears before we go to sleep | |
You can decide the behaviour of whether you want ruby to automatically flush the buffer after every write or if you | |
want to control it. | |
out.sync = true # Automatically flush buffer after every write | |
out.sync = false # Don't automatically flush | |
out.sync # return mode. | |
IO defines several predicates for testing the state of a stream: | |
f.eof? # true if stream is at EOF | |
f.closed? # true if stream has been closed | |
f.tty? # true if stream is interactive | |
The only one of these methods that needs explanation is tty?. This method, and its alias isatty (with no question | |
mark), returns true if the stream is connected to an interactive device such as a terminal window or a keyboard with | |
(presumably) a human at it. | |
Networking | |
At the lowest level, networking is accomplished with sockets, which are a kind of IO object. Once you have a socket | |
opened, you can read data from, or write data to, another computer just as if you were reading from or writing to a | |
file. Internet clients use the TCPSocket class, and Internet servers use the TCPServer class (also a socket). All | |
socket classes are part of the standard library, so to use them in your Ruby program, you must first write: | |
require 'socket' | |
To write Internet client applications, use the TCPSocket class. Obtain a TCPSocket instance with the TCPSocket.open | |
class method, or with its synonym TCPSocket.new. | |
#!/usr/bin/ruby | |
# simpleclient.rb | |
require 'socket' # Sockets are in standard library | |
host, port = ARGV # Host and port from command line | |
s = TCPSocket.open(host, port) # Open a socket to host and port | |
while line = s.gets # Read lines from the socket | |
puts line.chop # And print with platform line terminator | |
end | |
s.close # Close the socket when done | |
Like File.open, the TCPSocket.open method can be invoked with a block. In that form, it passes the open socket to the | |
block and automatically closes the socket when the block returns. | |
To write Internet servers, we use the TCPServer class. In essence, a TCPServer object is a factory for TCPSocket | |
objects. Call TCPServer.open to specify a port for your service and create a TCPServer object. | |
#!/usr/bin/ruby | |
# simpleserver.rb | |
require 'socket' # Get sockets from stdlib | |
server = TCPServer.open(2000) # Socket to listen on port 2000 | |
loop { # Infinite loop: servers run forever | |
client = server.accept # Wait for a client to connect | |
client.puts(Time.now.ctime) # Send the time to the client | |
client.close # Disconnect from the client | |
} | |
Now you can test your server and client by opening up two terminals | |
$ ./simpleserver.rb # in first terminal | |
$ ./simpleclient.rb localhost 2000 # in second terminal | |
Output: | |
Sun Oct 13 16:46:40 2013 | |
A lower-overhead alternative is to use UDP datagrams, with the UDPSocket class. UDP allows computers to send | |
individual packets of data to other computers, without the overhead of establishing a persistent connection. | |
require 'socket' | |
host, port, request = ARGV # Get args from command line | |
ds = UDPSocket.new # Create datagram socket | |
ds.connect(host, port) # Connect to the port on the host | |
ds.send(request, 0) # Send the request text | |
response,address = ds.recvfrom(1024) # Wait for a response (1kb max) | |
puts response # Print the response | |
The second argument to the send method specifies flags. It is required, even though we are not setting any flags. | |
The argument to recvfrom specifies the maximum amount of data we are interested in receiving. In this case, we limit | |
our client and server to transferring 1 kilobyte. | |
The server code uses the UDPSocket class just as the client code does | |
require 'socket' # Standard library | |
port = ARGV[0] # The port to listen on | |
ds = UDPSocket.new # Create new socket | |
ds.bind(nil, port) # Make it listen on the port | |
loop do # Loop forever | |
request,address=ds.recvfrom(1024) # Wait to receive something | |
response = request.upcase # Convert request text to uppercase | |
clientaddr = address[3] # What ip address sent the request? | |
clientname = address[2] # What is the host name? | |
clientport = address[1] # What port was it sent from | |
ds.send(response, 0, # Send the response back... | |
clientaddr, clientport) # ...where it came from | |
# Log the client connection | |
puts "Connection from: #{clientname} #{clientaddr} #{clientport}" | |
end | |
Instead of calling connect to connect the socket, our server calls bind to tell the socket what port to listen on. | |
The server then uses send and recvfrom, just as the client does, but in the opposite order. It calls recvfrom to | |
wait until it receives a datagram on the specified port. | |
The following code is a more fully developed Internet client in the style of telnet. It connects to the specified | |
host and port and then loops, reading a line of input from the console, sending it to the server, and then reading | |
and printing the server’s response. | |
require 'socket' | |
host, port = ARGV # Network host and port on command line | |
begin # Begin for exception handling | |
# Give the user some feedback while connecting. | |
STDOUT.print "Connecting..." # Say what we're doing | |
STDOUT.flush # Make it visible right away | |
s = TCPSocket.open(host, port) # Connect | |
STDOUT.puts "done" | |
# Now display information about the connection. | |
local, peer = s.addr, s.peeraddr | |
STDOUT.print "Connected to #{peer[2]}:#{peer[1]}" | |
STDOUT.puts " using local port #{local[1]}" | |
# Wait just a bit, to see if the server sends any initial message. | |
begin | |
sleep(0.5) | |
msg = s.read_nonblock(4096) # Non blocking way of reading up to 4096 bytes | |
STDOUT.puts msg.chop | |
rescue SystemCallError | |
# If nothing was ready to read, just ignore the exception. | |
end | |
# Now begin a loop of client/server interaction. | |
loop do | |
STDOUT.print '> ' # Display prompt for local input | |
STDOUT.flush | |
local = STDIN.gets # Read line from the console | |
break if !local # Quit if no input from console | |
s.puts(local) # Send the line to the server | |
s.flush | |
# Read the server's response and print out. | |
# The server may send more than one line, so use readpartial | |
# to read whatever it sends (as long as it all arrives in one chunk). | |
response = s.readpartial(4096) | |
puts(response.chop) | |
end | |
rescue | |
puts $! | |
ensure | |
s.close if s # Don't forget to close the socket | |
end | |
The simple time server shown earlier in this section never maintained a connection with any client,it would simply | |
tell the client the time and disconnect. Many more sophisticated servers maintain a connection, and in order to be | |
useful, they must allow multiple clients to connect and interact at the same time. One way to do this is with | |
threads—each client runs in its own thread. The alternative is to write a multiplexing server using the Kernel. | |
select method. The return value of select is an array of arrays of IO objects. The first element of the array is the | |
array of streams (sockets, in this case) that have data to be read (or a connection to be accepted). The example | |
server is trivial—it simply reverses each line of client input and sends it back, if the client sends quit it stops | |
the service. | |
require 'socket' | |
server = TCPServer.open(2000) | |
sockets = [server] # An array of sockets we'll monitor | |
log = STDOUT # Send log messages to standard out | |
while true # Servers loop forever | |
ready = select(sockets) # Wait for a socket to be ready | |
readable = ready[0] # These sockets are readable | |
readable.each do |socket| | |
if socket == server # If the server socket is ready | |
client = server.accept # Accept a new client | |
sockets << client # Add it to the set of sockets | |
# Tell the client what and where it has connected. | |
client.puts "Reversal service v0.01 running on #{Socket.gethostname}" | |
# And log the fact that the client connected | |
log.puts "Accepted connection from #{client.peeraddr[2]}" | |
else # Otherwise, a client is ready | |
input = socket.gets # Read input from the client | |
# If no input, the client has disconnected | |
if !input | |
log.puts "Client on #{socket.peeraddr[2]} disconnected." | |
sockets.delete(socket) # Stop monitoring this socket | |
socket.close # Close it | |
next # And go on to the next | |
end | |
input.chop! | |
if (input == "quit") # If the client asks to quit | |
socket.puts("Bye!"); | |
log.puts "Closing connection to #{socket.peeraddr[2]}" | |
sockets.delete(socket) # Stop monitoring the socket | |
socket.close # Terminate the session | |
else # Otherwise, client is not quitting | |
socket.puts(input.reverse) # So reverse input and send it back | |
end | |
end | |
end | |
end | |
Here is a example of a multiplex server but using threads | |
require 'socket' | |
# This method expects a socket connected to a client. | |
# It reads lines from the client, reverses them and sends them back. | |
# Multiple threads may run this method at the same time. | |
def handle_client(c) | |
while true | |
input = c.gets.chop # Read a line of input from the client | |
break if !input # Exit if no more input | |
break if input=="quit" # or if the client asks to. | |
c.puts(input.reverse) # Otherwise, respond to client. | |
c.flush # Force our output out | |
end | |
c.close # Close the client socket | |
end | |
server = TCPServer.open(2000) # Listen on port 2000 | |
while true # Servers loop forever | |
client = server.accept # Wait for a client to connect | |
Thread.start(client) do |c| # Start a new thread | |
handle_client(c) # And handle the client on that thread | |
end | |
end | |
We can use the socket library to implement any Internet protocol. Here, for example, is code to fetch the content | |
of a web page: | |
require 'socket' | |
host = 'www.example.com' # The web server | |
port = 80 # Default HTTP port | |
path = "/index.html" # The file we want | |
# This is the HTTP request we send to fetch a file | |
request = "GET #{path} HTTP/1.0\r\n\r\n" | |
socket = TCPSocket.open(host,port) # Connect to server | |
socket.print(request) # Send request | |
response = socket.read # Read complete response | |
# Split response at first blank line into headers and body | |
headers,body = response.split("\r\n\r\n", 2) | |
print body | |
You might prefer to use a prebuilt library like Net::HTTP for working with HTTP. | |
require 'net/http' # The library we need | |
host = 'www.example.com' | |
path = '/index.html' | |
http = Net::HTTP.new(host) | |
headers, body = http.get(path) # Request the file | |
if headers.code == "200" # Check the status code | |
# NOTE: code is not a number! | |
print body # Print body if we got it | |
else | |
puts "#{headers.code} #{headers.message}" # Display error message | |
end | |
Finally, recall that the open-uri library described earlier in the chapter makes fetching a web page even easier: | |
require 'open-uri' | |
open("http://www.example.com/index.html") {|f| | |
puts f.read | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment