Created
May 26, 2011 06:05
-
-
Save dbrady/992630 to your computer and use it in GitHub Desktop.
Unit Tests for CSV demonstrating that Enumerable is supported only at the whim of the underlying StringIO seek position
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
unless RUBY_VERSION.start_with? "1.9" | |
puts "This test file is for Ruby 1.9.x" | |
exit -1 | |
end | |
require 'csv' | |
require 'test/unit' | |
# Hi James! | |
# This is written with love and respect. I'm a little cranky with this | |
# code at the moment, but I think once you see what I'm seeing you'll | |
# probably agree. | |
# 0. This is maddening. I just went to pull up the docco that I was | |
# working off of today, and I can't find it. I just found THIS page, | |
# which DOES have the examples I was hoping to find: | |
# | |
# http://www.ruby-doc.org/ruby-1.9/index.html | |
# | |
# Hang on, lemme go crack open the work laptop. Oh, wild. I was | |
# working entirely off of the CSV::Table documentation, because that's | |
# where Google led me. | |
# 1. I didn't get that CSV was a file I/O based metaphor from its | |
# documentation at all. That's because it doesn't HAVE much | |
# documentation. This thing NEEDS some examples desperately. Stack | |
# Overflow has a few examples of people saying "is there a simple | |
# example of how to use CSV?" It would be REALLY helpful if those | |
# simple examples were in the rdoc. | |
# 2. Note that my use case is probably simple and edge-casey. I'm not | |
# streaming data. I'm grabbing a CSV file wholesale, reading it in one | |
# go, and then pushing it out to an HTML table--that isn't allowed to | |
# get friendly with the report's native data format, so NO, it can't | |
# just use CSV's iteration helpers. :-) | |
# 3. I TOTALLY expected that CSV would uphold the Enumerable contract, | |
# because it says "Modules included: Enumerable" right on the tin. | |
# dbrady's not so very humble opinion: this cost me an hour of | |
# debugging time late in the afternoon of a deploy day. If you don't | |
# want to change it because I'm just using it wrong, PLEASE document | |
# this. I read your existing documentation in good faith and with a | |
# careful eye and was led down the garden path. (See point 0 above, | |
# though--I didn't know there was a CSV page apart from CSV::Table. I | |
# blame the crappy 4-iframe bullshit indexing of RDoc, that forces me | |
# to use Google to search the damn documentation.) | |
# Then again, if you're not shocked and appalled by the tests that | |
# follow, what I *really* want to know is who are you and what have | |
# you done with James Edward Gray II? | |
# The tests that follow are in two groups. test_workaround_* tests DO | |
# pass, but demonstrate the need to completely reach up CSV's butt and | |
# violate its encapsulation rather vigorously. All other tests fail, | |
# demonstrating the problem that CSV presents an Enumerable and/or | |
# random-access interface that is completely dependent on the StringIO | |
# seek position. | |
class TestCsvSeekDependencies < Test::Unit::TestCase | |
def setup | |
@csv = CSV.new(<<PETS, :headers => true) | |
"name","type","age" | |
Noni,"Border Collie/Golden Retriever Mix",8 | |
BamBam,Cat,2 | |
Pebbles,Cat,2 | |
PETS | |
end | |
def test_sanity | |
assert true, "This really should have been true." | |
end | |
def test_headers_do_not_depend_on_first_fetch | |
assert_equal ["name", "type", "age"], @csv.headers | |
end | |
def test_workaround_headers_do_not_depend_on_first_fetch | |
@csv.count | |
assert_equal ["name", "type", "age"], @csv.headers | |
end | |
def test_headers_does_not_get_erased_by_rewind | |
@csv.count | |
assert_equal ["name", "type", "age"], @csv.headers | |
@csv.rewind | |
assert_equal ["name", "type", "age"], @csv.headers | |
end | |
def test_count_does_not_depend_on_seek_position | |
assert_equal 3, @csv.count | |
assert_equal 3, @csv.count | |
end | |
def test_workaround_count_does_not_depend_on_seek_position | |
assert_equal 3, @csv.count | |
@csv.rewind | |
assert_equal 3, @csv.count | |
@csv.rewind | |
assert_equal 3, @csv.count | |
end | |
def test_first_does_not_depend_on_seek_position | |
assert_equal "Noni", @csv.first["name"] | |
assert_equal "Noni", @csv.first["name"] | |
end | |
def test_workaround_first_does_not_depend_on_seek_position | |
assert_equal "Noni", @csv.first["name"] | |
@csv.rewind | |
assert_equal "Noni", @csv.first["name"] | |
end | |
def test_first_does_not_depend_on_count_not_having_been_called_previously | |
assert_equal "Noni", @csv.first["name"] | |
assert_equal 3, @csv.count | |
end | |
def test_workaround_first_does_not_depend_on_count_not_having_been_called_previously | |
assert_equal "Noni", @csv.first["name"] | |
@csv.rewind | |
assert_equal 3, @csv.count | |
end | |
def test_each_should_not_depend_on_each_not_having_been_called | |
names = [] | |
@csv.each do |row| | |
names << row["name"] | |
end | |
assert_equal ["Noni", "BamBam", "Pebbles"], names | |
names = [] | |
@csv.each do |row| | |
names << row["name"] | |
end | |
assert_equal ["Noni", "BamBam", "Pebbles"], names | |
end | |
def test_each_should_not_depend_on_first_not_having_been_called | |
names = [] | |
@csv.first | |
@csv.each do |row| | |
names << row["name"] | |
end | |
assert_equal ["Noni", "BamBam", "Pebbles"], names | |
end | |
def test_map_should_not_depend_on_geez_we_get_it_already | |
names = @csv.map {|row| row["name"] } | |
assert_equal ["Noni", "BamBam", "Pebbles"], names | |
names = @csv.map {|row| row["name"] } | |
assert_equal ["Noni", "BamBam", "Pebbles"], names | |
end | |
def test_anyp_should_not_depend_on_well_you_see_the_problem_now | |
assert_equal true, @csv.any? {|row| row["name"] == "BamBam"} | |
# next line passes, because seek pos is left at line 2 | |
assert_equal true, @csv.any? {|row| row["name"] == "Pebbles"} | |
# fails because seek pos is now at EOF | |
assert_equal true, @csv.any? {|row| row["name"] == "Noni"} | |
end | |
def test_etc_and_so_forth_for_all_of_enumerable | |
@csv.count | |
assert_equal [["name", "BamBam"], ["type", "Cat"], ["age", "2"]], @csv.find {|row| row["name"] == "BamBam"}.to_a | |
assert_equal 12, @csv.inject(0) {|a,b| a + b["age"].to_i} | |
assert_equal [["name", "Pebbles"], ["type", "Cat"], ["age", "2"]], @csv.drop(2)[0].to_a | |
# ...etc... | |
end | |
def test_if_youre_gonna_play_in_texas_you_gotta_have_grep_in_the_band | |
# this would ALSO be dependent on seek position if it wasn't broken | |
assert_not_equal [], @csv.grep(/BamBam/) | |
end | |
end | |
# So... there you go. I hope you find this useful, and I *REALLY* hope | |
# you'll continue to support Enumerable in CSV. By which I mean start | |
# supporting. *cough* ;-) | |
# It may not be possible to leave the seek position alone while | |
# guaranteeing that Enumerable functions like map() and first() work | |
# correctly. You may well need to split Enumerable out of CSV | |
# altogether. If you do, like I said above, PLEASE provide a way to | |
# permit it, even if it means casting the whole file to_a or similar. | |
# Cf. ruby's IO#read_lines() method that just gives you back an array. | |
# Thanks for all your hard work! I'm grumpy because of time lost and | |
# the mega schedule stress it induced. I am NOT grumpy that you | |
# created this library and put it out there. Which is why I'm giving | |
# you unit tests rather than a complaint form. :-) |
I've pushed support for Enumerator. Thanks!
Well, I'm glad you fixed something unrelated, I guess. Kinda discouraged that everything else boiled down to "nope, the documentation that you couldn't find was accurate."
I hate it when it turns out that I'm a moron. And because I'm STILL a moron, where the hell did you pull that documentation out of? Or is it just on the rdoc page I found when writing my email last night?
Ah well. Learned, I guess. Ruby 1.9, File includes Enumerable as well, and implements it in the same kinda shoddy way, e.g. map-rewind-map will work but map-map won't, etc. REALLY feels like a bug, but maybe I'm thinking constant collections here, when I need to be thinking lightweight enumerators like Python's xrange generators or STL iterators.
Thanks!
## David
David Brady
[email protected]
Ask not what the world needs. Ask what makes you come alive... and then go do it. Because what the world needs is people who have come alive. -- Howard Thurman
…On May 26, 2011, at 9:36 AM, JEG2 wrote:
I've pushed support for Enumerator. Thanks!
##
Reply to this email directly or view it on GitHub:
https://gist.github.com/992630
On Fri, May 27, 2011 at 12:07 AM, dbrady < ***@***.***>wrote:
Well, I'm glad you fixed something unrelated, I guess. Kinda discouraged
that everything else boiled down to "nope, the documentation that you
couldn't find was accurate."
I hate it when it turns out that I'm a moron. And because I'm STILL a
moron, where the hell did you pull that documentation out of? Or is it just
on the rdoc page I found when writing my email last night?
I pulled it out of the source when I was writing the email, but it's just
the standard documentation that is available through ri or
http://ruby-doc.org/:
```
http://www.ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
```
Ruby 1.9, File includes Enumerable as well, and implements it in the same
kinda shoddy way, e.g. map-rewind-map will work but map-map won't, etc.
REALLY feels like a bug, but maybe I'm thinking constant collections here,
when I need to be thinking lightweight enumerators like Python's xrange
generators or STL iterators.
I think you need to be thinking about streams. That's what IO objects are.
What if they can't be rewound? Like a socket? CSV could be used to read
over a socket, as long as we don't call rewind(). Make sense?
James Edward Gray II
Yeah, it makes sense. I still don't have to like it, though. :-)
So, um I _did_ this on the day, and this still appears to be the case:
11:33:34 dbrady@shinything:~ ruby-1.9.2-p180
∫ ri CSV
Nothing known about CSV
Is there something I need to do to make that exist?
Thanks,
## David
David Brady
[email protected]
Ask not what the world needs. Ask what makes you come alive... and then go do it. Because what the world needs is people who have come alive. -- Howard Thurman
…On May 27, 2011, at 7:03 AM, JEG2 wrote:
On Fri, May 27, 2011 at 12:07 AM, dbrady <
***@***.***>wrote:
> Well, I'm glad you fixed something unrelated, I guess. Kinda discouraged
> that everything else boiled down to "nope, the documentation that you
> couldn't find was accurate."
>
> I hate it when it turns out that I'm a moron. And because I'm STILL a
> moron, where the hell did you pull that documentation out of? Or is it just
> on the rdoc page I found when writing my email last night?
I pulled it out of the source when I was writing the email, but it's just
the standard documentation that is available through ri or
http://ruby-doc.org/:
http://www.ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
> Ruby 1.9, File includes Enumerable as well, and implements it in the same
> kinda shoddy way, e.g. map-rewind-map will work but map-map won't, etc.
> REALLY feels like a bug, but maybe I'm thinking constant collections here,
> when I need to be thinking lightweight enumerators like Python's xrange
> generators or STL iterators.
I think you need to be thinking about streams. That's what IO objects are.
What if they can't be rewound? Like a socket? CSV could be used to read
over a socket, as long as we don't call rewind(). Make sense?
James Edward Gray II
##
Reply to this email directly or view it on GitHub:
https://gist.github.com/992630
On Fri, May 27, 2011 at 12:35 PM, dbrady < ***@***.***>wrote:
So, um I _did_ this on the day, and this still appears to be the case:
11:33:34 ***@***.***:~ ruby-1.9.2-p180
∫ ri CSV
Nothing known about CSV
Is there something I need to do to make that exist?
CSV is fully documented. Are you using RVM? Did you issue a:
```
rvm docs generate
```
?
James Edward Gray II
Okay, seriously, WHY do you know that?
I'm serious and not kidding. I've never HEARD of "rvm docs generate", and I've never heard anybody talk about it. Curious.
Nice, though--that fixes the problem!
Thanks!
David
##
David Brady
[email protected]
Ask not what the world needs. Ask what makes you come alive... and then go do it. Because what the world needs is people who have come alive. -- Howard Thurman
…On May 27, 2011, at 11:59 AM, JEG2 wrote:
On Fri, May 27, 2011 at 12:35 PM, dbrady <
***@***.***>wrote:
> So, um I _did_ this on the day, and this still appears to be the case:
>
> 11:33:34 ***@***.***:~ ruby-1.9.2-p180
> ∫ ri CSV
> Nothing known about CSV
>
> Is there something I need to do to make that exist?
CSV is fully documented. Are you using RVM? Did you issue a:
rvm docs generate
?
James Edward Gray II
##
Reply to this email directly or view it on GitHub:
https://gist.github.com/992630
On Fri, May 27, 2011 at 1:42 PM, dbrady < ***@***.***>wrote:
Okay, seriously, WHY do you know that?
I'm serious and not kidding. I've never HEARD of "rvm docs generate", and
I've never heard anybody talk about it. Curious.
Because I chewed Wayne E. Seguin out that RI didn't work for me just a few
hours before you asked this question and he politely lead me to the
documentation:
```
https://rvm.beginrescueend.com/rubies/docs/
```
:D
James Edward Gray II
Okay, I feel better about that. Heh.
## David
David Brady
[email protected]
Ask not what the world needs. Ask what makes you come alive... and then go do it. Because what the world needs is people who have come alive. -- Howard Thurman
…On May 27, 2011, at 12:49 PM, JEG2 wrote:
On Fri, May 27, 2011 at 1:42 PM, dbrady <
***@***.***>wrote:
> Okay, seriously, WHY do you know that?
>
> I'm serious and not kidding. I've never HEARD of "rvm docs generate", and
> I've never heard anybody talk about it. Curious.
Because I chewed Wayne E. Seguin out that RI didn't work for me just a few
hours before you asked this question and he politely lead me to the
documentation:
https://rvm.beginrescueend.com/rubies/docs/
:D
James Edward Gray II
##
Reply to this email directly or view it on GitHub:
https://gist.github.com/992630
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you for the super awesome bug report. :)
I'm not sure I agree it's a secret that CSV has an IO like interface. Here are multiple examples from the documentation:
The documentation does also include examples of usage, including a non-iterator example:
I'm totally willing to add more examples that are needed. Please send patches.
For the final stop on our documentation tour, the surprising behavior of headers() is documented:
Also, you asked for support of CSV#to_a() and/or the IO interface CSV::read()/CSV::readlines(). All three of those methods are available. There's also a very nice CSV::table() shortcut wrapper for these slurp interfaces.
Finally, I don't agree that grep() is broken. Remember that you are grep()ping over rows, not fields.
Summary of the above: I'm trying very hard to meet your needs, Dude! :)
All of that said, you did locate a problem that I think needs fixing. each() needs updates. At the minimum, CSV#each() needs to support returning 1.9's Enumerators.
The other question is, what is the correct stream-like behavior for the IO iterators? I see the two options as continuing where the stream left off or calling rewind() to reset after iteration. Let's see what File does:
It looks to me like it holds the seek position, which feels like the correct short-curcuiting behavior to me. Given that, I think CSV handles this correctly, but I'm open to a debate if people disagree. Please let me know.
As an aside, the switch to officially supporting the IO interface was one of the primary changes from the old CSV code to FasterCSV and later the new CSV. It really bothered me that the old CSV code used the IO methods, but didn't really honor their contract. In my opinion, it's a great interface for this domain. We need to remember that CSV data, like any stream, can be very large. We need to default to handling the data as it comes in.