Rails has a handy truncate helper (which is actually mostly a method added to String ), but it warns you it's not safe to use on html source, it'll cut off end tags and such.
What if you want an HTML safe one? There are a variety of suggested solutions you can google, none of which were quite robust/powerful enough for me.
So I started with my favorite, by Andrea Singh, using nokogiri.
But:
-
I modified it to not monkey-patch Nokogiri, but be a static method instead (sadly making already confusing code yet more confusing, but I didn't want to monkey patch nokogiri)
-
I made it smarter about putting the mark-of-omission inside the tag who's text ended up truncated, instead of at the end of the source -- this is also not perfect, but works 'good enough' for most common use cases.
-
I made it handle Rails :seperator option -- again, very not perfectly, it will often break at a tag boundary instead of the actual best seperator, but in ways that should be good enough for most common use cases (tag boundaries are usually good breaking points too).
-
I made the top-level invocation method a Rails helper method using Rails functionality so-as to handle both html-safe truncation and ordinary truncation, if the string is html-safe, it uses html-safe truncation and returns a string that's still html-safe.
-
I added some tests (my tests run at the rails-helper method level, because that was convenient for me).
See the tests to see what it does and doesn't do. It's not perfect, and there are a variety of different implementation or api choices that could be made -- but it's good enough for me, and if others have use cases like mine possibly better than anything else easily findable on the net.
If there's a lot of interest, I could turn this into an actual gem.
Although ultimately, for use in Rails, what I think should really happen is for this functionality to be added to Rails html sanitize helper -- times when you want to sanitize overlap extensively with times when you want to truncate (since both are normally going to be with html as 'input' to your program), and both require an HTML parse. Better to do the HTML parse just once for both functions simultaneously, then need to do it once for sanitizing and again for truncating. (Rails sanitize doesn't use nokogiri, but it's own weird html parser).