Create a script that takes the name of a HTML file as a command line argument, and prints its visible content as plain text to standard output (to the terminal).
- Only the body contents should be printed.
- All HTML tags and indentation should be removed.
- Contents of block elements (
<p>
,<h#>
,<div>
) should be divided by empty lines. - Line breaks (
<br />
) should be respeced. - Only
<br />
elements should be treated as line breaks, new line characters should not.
Example input (index.html):
<!DOCTYPE html>
<html>
<head>
<title>Lorem ipsum</title>
</head>
<body>
<h1>Lorem Ipsum</h1>
<h2>Dolor Sit Amet</h2>
<p>
<b>C</b>onsectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua.<br />
Ut enim ad minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex ea commodo consequat.
</p>
<p>
<b>D</b>uis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur.
</p>
</body>
</html>
Example output:
Lorem Ipsum
Dolor Sit Amet
Consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.