Skip to content

Instantly share code, notes, and snippets.

@noam87
Last active December 21, 2015 05:39
Show Gist options
  • Save noam87/6258984 to your computer and use it in GitHub Desktop.
Save noam87/6258984 to your computer and use it in GitHub Desktop.
Strange gsub bug

Problem solved by geniuses on IRC: check final comment for complete log. The cause of this bug is unexpected to say the least!


This is the code that isn't working:

Assume the input is "Hello world @123"

def memo_content(input, remote: false)
  content = sanitize input
  content = content.gsub(/@([a-z0-9A-Z]+)/) do |match|
    binding.pry
    link_to "@#{ $1 }", memo_path($1), class: "marker", remote: remote
  end 
  content = simple_format content, {  }, sanitize: false
end

When it gets to the pry binding, I check what the "match" and "$1" are holding:

[1] pry> match
=> "@W"
[2] pry> $1
=> nil
[3] pry> content
=> "Hello world @123"

However, when I remove the second line:

def memo_content(input, remote: false)
  content = input.gsub(/@([a-z0-9A-Z]+)/) do |match|
    binding.pry
    link_to "@#{ $1 }", memo_path($1), class: "marker", remote: remote
  ...
end

It works jusst fine, and "$1" holds "123" as it should.

What exactly is happening here? Why does it say "$1" is nill when clearly there was a match?

@noam87
Copy link
Author

noam87 commented Aug 17, 2013

First run, working:

    12: def memo_content(input, remote: false)
    13:   content = input.gsub(/@([a-z0-9A-Z]+)/) do |match|
 => 14:     binding.pry
    15:     link_to "@#{ $1 }", memo_path($1), class: "marker", remote: remote
    16:   end 
    17:   content = simple_format content, {  }, sanitize: false
    18:   content = sanitize content, tags: ['p', 'a'], attributes: ['data-remote']
    19: end

[1] pry(#<#<Class:0x007f9ae8417eb0>>)> match
=> "@W"
[2] pry(#<#<Class:0x007f9ae8417eb0>>)> $1
=> "W"
[3] pry(#<#<Class:0x007f9ae8417eb0>>)> input
=> "@W @W hi <strong>hfsdhkh</strong>"

Second run: Failing. Pry inside block.

    12: def memo_content(input, remote: false)
    13:   content = sanitize input
    14:   content = content.gsub(/@([a-z0-9A-Z]+)/) do |match|
 => 15:     binding.pry
    16:     link_to "@#{ $1 }", memo_path($1), class: "marker", remote: remote
    17:   end 
    18:   content = simple_format content, {  }, sanitize: false
    19:   content = sanitize content, tags: ['p', 'a'], attributes: ['data-remote']
    20: end

[1] pry(#<#<Class:0x007f9ae8cb2f98>>)> match
=> "@W"
[2] pry(#<#<Class:0x007f9ae8cb2f98>>)> $1                     # <== here comes wtf
=> nil
[3] pry(#<#<Class:0x007f9ae8cb2f98>>)> input
=> "@W @W hi <strong>hfsdhkh</strong>"
[4] pry(#<#<Class:0x007f9ae8cb2f98>>)> content
=> "@W @W hi <strong>hfsdhkh</strong>"
[5] pry(#<#<Class:0x007f9ae8cb2f98>>)> content == input      # <== here comes more wtf
=> true

@noam87
Copy link
Author

noam87 commented Aug 17, 2013

God bless the ROR Freenode:

<wtf_gsub> Can someone help with this really strange gsub behavior? https://gist.github.com/CLUSTERfoo/6258984

you removed the line that said 'content = sanitize input' ?

wtf_gsub: what does sanitize(input) do?

what does sanitize do?

<wtf_gsub> to that string, nothing. http://api.rubyonrails.org/classes/ActionView/Helpers/SanitizeHelper.html#method-i-sanitize

<wtf_gsub> if I check input == content, I get "true"

<wtf_gsub> it also correctly matched "@123"

<wtf_gsub> it's just giving me nil when I ask for $1, which makes no sense because it's part of the regex definition

wtf_gsub: put your binding.pry at the top of the method and step through it all

is $1 available in that block?

$1 is available everywhere

are you sure?

<wtf_gsub> if I do the second example (in which that block does not change), $1 returns "123" as it should

omarqureshi: from the docs: "In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&amp;, and $' will be set appropriately. "

omarqureshi: yes -- it's a global

but will they be set IN the block or after ?

omarqureshi: $1 is always in relation to the last matched regexp, so it's there in. plus wtf_gsub says it works without the sanitize

https://gist.github.com/omarqureshi/15640e9b71205fd65f54

is weird

no puts

huh.

<wtf_gsub> omarqureshi: I mean, there is literally 0 difference between the first example and the second, other than the string going through "sanitize" first, as far as I can tell. So if it works in the second it should work in the first. I'm a t a loss :( -- is this a ruby bug?

omarqureshi: you didn't set a matching group in that

ah, im stupid

wtf_gsub: did you do as i suggested?

"foo".gsub(/(o)/) {|match| puts "LOL"; puts match} works as expected

urgh

so

omarqureshi: so does puts $1, so wtf_gsub's problem is very weird

indeed

<wtf_gsub> pontiki: if I put binding.pry on top and then try sanitize it just returns the same string. sanitize isn't doing anything to that particular string

wtf_gsub: there needs to be more to this, simple calling sanitize couldn't cause that behavior :/

wtf_gsub: which ruby version?

wtf_gsub: step all the way through, testing it at each point, see if there's something else going on

2.0

<wtf_gsub> pontiki: I know :( I've copied and pasted straight from my vim and from terminal

i can't see why it wouldn't work

but as it doesn't, see where it it going wrong

pontiki: try rubinius and see if it still happens, you can step through the gsub source :)

wtf_gsub: ^

<wtf_gsub> pontiki: ruby 2.0.0 p247

eh?

pontiki: I meant wtf_gsub but addressed you, wtf_gsub meant omarqureshi but addressed you

why W matches is weird anyway

wtf_gsub: play around some more in pry and gist the bash log

<wtf_gsub> DouweM: ok, here is the log (first comment) https://gist.github.com/CLUSTERfoo/6258984

wtf_gsub: :/

you can have as many routes as you'd like -- they don't tie directly to any model

The Cause is Discovered:

wtf_gsub: rails/rails#1555

here's my question: why do they have to be separate routes?

no, really

pontiki: I'm not aware of any other reason

wtf_gsub: I started thinking about what sanitize actually does to a string, and then I though of html_safe which turns it into a SafeBuffer

wtf_gsub: https://www.ruby-forum.com/topic/198458

pontiki: about the gsub thing, looks like the problem is in $1 not being as global as we'd like. apparently it only lives for two frames, see https://www.ruby-forum.com/topic/198458

pontiki: they're also thread-local, which should give away their not-actually-being-globals

<wtf_gsub> DoueM !! oh my

wtf_gsub: the rails thread is an interesting read. apparently there is no fix for this, and it breaks a lot of stuff with html_safe strings

<wtf_gsub> Douwem: :( uh oh

DouweM: where are they wrapped in that construct? i'd suppose they'd have gone out of scope after the sanitize assuming they're used there (likely)

wtf_gsub: solution in your case: .to_str before the gsub, html_safe after

but i'd think they'd still be fully scoped correctly inside the gsub! block

pontiki: it's because of sanitize calling .html_safe on the string, turning it into a SafeBuffer, thus introducing the wrap around super: rails/rails#1555

fascinating

AH!

it's a differnt type!

that's what i was hoping to find

well, not me, personally

pontiki: yeah, that was my lightbulb. I started thinking about what sanitize actually did, and then I thought of html_safe

:)

nodnodnod

well done

bows, accepts applause

pontiki: but the worst thing is that there is apparently no fix for this. it's literally impossible to subclass String, override gsub and have it work properly

<wtf_gsub> Douwem: where does the to_str go? I'm affraid I'm not savvy enough to fully understand what's happening :/

A Solution

wtf_gsub: well, if you cal gsub after html_safe (which happens inside sanitize), all hell will break loose. if you run to_str after html_safe however, you're safe again. so. run the gsub on sanitize(input).to_str

wtf_gsub: but because you still want an html_safe version (so Rails will allow you to print the HTML verbatim), you have to html_safe after the gsub to restore the html safety

wtf_gsub: so content.to_str.gsub(......).html_safe

<wtf_gsub> DouweM: omg! it works, I don't have much of an idea why, but I'll read more about it. Thank you so much!

wtf_gsub: read the ruby-forum thread :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment