I have a simple Rails app that collects all the tweets I favorite on Twitter so I can sort and search through them at my leisure. Many of those favorites contain links I'd like to refer to, so I wrote a helper method that converts them to clickable anchor tags that looked like this:
# app/helpers/favorites_helper.rb
module FavoritesHelper
# snip
def text_to_true_link(tweet_text)
urls = tweet_text.scan(/https*:\/\/t.co\/\w+/)
urls.each do |url|
tweet_text.gsub!(url, "<a href=#{url} target='_blank'>#{url}</a>")
end
tweet_text.html_safe
end
end
The text_to_true_link
method
- takes raw
tweet_text
as a string, scan
s through it looking for Twitter shortlinks with a regex (which should have used ? in place of * there),- stores the link text in an array called
urls
, - substitutes each link with an anchor tag for that link, and
- returns the newly formatted
tweet_text
with clickable links.
I thought this was a pretty clever hack, but while looking at my oldest tweets, I realized that they had links that predated the standard t.co shortlink and subsequently were not being converted into clickable links. So I did what you'd expect an inexperienced developer to do — I started looking for a Goldilocks regex that wasn't too complex and wasn't too liberal that would be adequate for my URI matching purposes.
While doing this, I stumbled upon a Stack Overflow answer that mentioned URI::regexp
which had a comment mentioning URI::extract
. What does URI::extract
do? Why, exactly what I want — it extracts URIs from text.
At first, I tried using urls = URI.extract(tweet_text)
which seemed to work. However, on further inspection, this was capturing any text that terminated in a colon, too, e.g.,
tweet_text = "Kleisli: common monads in Ruby https://github.com/txus/kleisli"
urls = URI.extract(tweet_text) # => ["Kleisli:", "https://github.com/txus/kleisli"]
Looking more closely at the documentation, URI::extract
takes a second argument that limits URI matches to a specific set of schemes.
tweet_text = "Kleisli: common monads in Ruby https://github.com/txus/kleisli"
urls = URI.extract(tweet_text, %w(http https)) # => ["https://github.com/txus/kleisli"]
This led me to my current adequate implementation:
# app/helpers/favorites_helper.rb
module FavoritesHelper
# snip
def text_to_true_link(tweet_text)
urls = URI.extract(tweet_text, %w(http https))
urls.each do |url|
tweet_text.gsub!(url, "<a href=#{url} target='_blank'>#{url}</a>")
end
tweet_text.html_safe
end
end
Normally, I think I do a good job checking (or knowing) whether Ruby has a method that does what I want before I try to implement my own solution. Thinking more deeply as to why I missed URI::extract
, I realized that while I have a pretty good command of Ruby's core libraries, I haven't spent nearly as much time exploring Ruby's standard libraries. I'd like to dig into more of the latter from here on out.
Questions I still have:
- Is there a better way to replace embedded links in text with their clickable counterparts?
- How does a large site like Twitter or Facebook implement this?
Thanks for this! should solve my problem exactly. Heads up—I think there's a bug that appears if the same url appears twice in the text you're escaping:
This can be fixed by adding a
.uniq
after the call toURI.extract
.