Last active
December 28, 2015 10:54
-
-
Save mitio/99538c7d14b68d880db8 to your computer and use it in GitHub Desktop.
An example with Nokogiri and UTF-8 (cyrillic)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'nokogiri' | |
require 'net/http' | |
html_string = Net::HTTP.get(URI.parse('http://www.dir.bg/')) | |
# Net::HTTP has a bug when handling encodings, see | |
# http://stackoverflow.com/a/13779685/75715. Alternatively, | |
# you can use another HTTP library, such as | |
# https://github.com/jnunemaker/httparty. | |
html_string.force_encoding('utf-8') | |
html_document = Nokogiri::HTML(html_string) | |
puts html_document.css('body h2').map(&:text) | |
__END__ | |
The above code results in something like the following: | |
Днес | |
Темите 2015 | |
Финанси | |
Културен афиш София | |
Е-Референдум | |
Вкусотии | |
Aвто | |
Каталог | |
Маркет | |
Кино | |
Телевизия | |
Справочник | |
Лайф | |
СпортLiveScore | |
Почивки | |
Технологии | |
Галерия | |
Времето в София | |
Новини от българския WEB | |
Зодиак | |
Събитиен календар | |
Игри | |
Виц на деня | |
Изпрати снимка и ти! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment