This example uses thease projects:
The HtmlPageReader goes to html's page and transform that content with mochiweb_html.parse
defmodule Usecase.HtmlPageReader do
def read(url) do
:httpc.request(url) |>
read_result |>
read_body |>
parse_body
end
defp read_result({:ok, result}) do
result
end
defp read_body({_status, _header, body}) do
body
end
defp parse_body(body) do
:mochiweb_html.parse(body)
end
end
The ErlangNewsReader transverse a parsed tree to find news and tranform to a tuple with title and text attributes.
defmodule Usecase.ErlangNewsReader do
def find_news_from(tree) do
execute_xpath('//div/h3[contains(text(), "NEWS")]/..',tree) |>
first_element |>
list_of_news |>
Enum.map(&parse_new/1)
end
defp list_of_news(news_container) do
execute_xpath('/div/div/div', news_container)
end
defp parse_new(node) do
title = execute_xpath('div/p/a/text()', node) |> first_element
text = execute_xpath('div/div/text()', node) |> first_element
%{:title => title, :text => text}
end
defp execute_xpath(xpath, node) do
:mochiweb_xpath.execute(xpath, node)
end
defp first_element([head|_tail]) do
head
end
defp first_element([]) do
""
end
end
The ErlangPageReader is a facade for other modules and tells what web page will be parsed. In this case we'll parse the news content of http://www.erlang.org. 😜
defmodule Usecase.ErlangPageReader do
def read_news do
Usecase.HtmlPageReader.read('http://www.erlang.org') |>
Usecase.ErlangNewsReader.find_news_from
end
end
❗ To run this code, you will need run inets application before.
:inets.start
At this time that i do this example the result is:
[
%{
text: "\n Erlang/OTP 19.0 is a new major release with new features, quite a few (characteristics) improvements, as well as a few incompatibilities.\n ",
title: "Erlang/OTP 19.0 has been released"
},
%{
text: "",
title: "Notes from OTP Technical Board"
},
%{
text: "\n This is the release candidate before the final OTP 19.0 product release in June 2016.\n ",
title: "Erlang/OTP 19.0-rc1 is available for testing"
}
]