-
-
Save mauro-oto/9282368c31d43afae5b4e9a06704280f to your computer and use it in GitHub Desktop.
Solution by mauro-oto to https://twitter.com/keystonelemur/status/1003701123092393984
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The goal of this problem is to extract headers from a block of text, | |
# and arrange them hierarchically. | |
# | |
# See the specs for more detail on the output | |
HEADER_HTML = /\<(h[\d])\>([^<>]+)\<\/h[\d]\>/ | |
HEADER_LEVEL_AND_CONTENT = /h(?<level>[\d])(?<content>.*)/ | |
def header_hierarchy(html) | |
html.scan(HEADER_HTML).map(&:join).map do |node| | |
node.gsub(HEADER_LEVEL_AND_CONTENT) do | |
" " * (($~[:level].to_i - 1) * 2) + "[h" + $~[:level] + "] " + $~[:content] | |
end | |
end | |
end | |
describe '#header_hierarchy' do | |
context 'EASY' do | |
it 'can extract a single header' do | |
expect(header_hierarchy("<h1>Foo</h1>")).to eq(['[h1] Foo']) | |
end | |
it 'can extract one nested level of header' do | |
expect( | |
header_hierarchy("<h1>Foo</h1><h2>Bar</h2>") | |
).to eq([ | |
'[h1] Foo', | |
' [h2] Bar' | |
]) | |
end | |
end | |
context 'MEDIUM' do | |
it 'can extract multiple levels of nested headers' do | |
expect( | |
header_hierarchy("<h1>Foo</h1><h2>Bar</h2><h3>Baz</h3><h4>Bam</h4>") | |
).to eq([ | |
'[h1] Foo', | |
' [h2] Bar', | |
' [h3] Baz', | |
' [h4] Bam' | |
]) | |
end | |
end | |
context 'HARD' do | |
it 'can extract multiple nested headers in multiple branches' do | |
expect( | |
header_hierarchy("<h1>Foo</h1><h2>Bar</h2><h3>Baz</h3><h2>Bam</h2><h3>Ba</h3>") | |
).to eq([ | |
'[h1] Foo', | |
' [h2] Bar', | |
' [h3] Baz', | |
' [h2] Bam', | |
' [h3] Ba' | |
]) | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, I modified it slightly so that it would work on the jquery site, also. I removed the "[H0]" req, b/c it seemed like an implementation detail of the original solution. There are still limitations, and the regex approach is ultimately not going to be able to handle them all, but I like that it gets really far with very little overhead (eg nokogiri is a big dependency, and the regex approach will work for many scopes that this function may be used in)