Skip to content

Instantly share code, notes, and snippets.

@Bigbublik
Created September 21, 2015 12:38
Show Gist options
  • Save Bigbublik/191a0b908a69e2999753 to your computer and use it in GitHub Desktop.
Save Bigbublik/191a0b908a69e2999753 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
require 'open-uri'
require 'nokogiri'
html = open(ARGV[0])
doc = Nokogiri::HTML(html)
# Название страницы/статьи
title = doc.css('title')[0].text
puts title
# Подзаголовок
about_article = doc.css('div.current-article>p.about-article')[0].text
puts about_article
# Получаем статью
article = doc.css('div.current-article>p')
# Удаляем первый элемент, который является подзаголовком
article.shift
# Собираем в кучу всю страницу
article_result = ''
article.each do |a|
a.children.each do |ch|
if ch.name == "a" or ch.name == "text"
article_result << ch.text.strip
end
end
end
puts article_result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment