Skip to content

Instantly share code, notes, and snippets.

@leovolving
Created April 18, 2023 19:30
Show Gist options
  • Save leovolving/4eb825643a1545c4036109939c307860 to your computer and use it in GitHub Desktop.
Save leovolving/4eb825643a1545c4036109939c307860 to your computer and use it in GitHub Desktop.
Converts html string to plain text, adding a single whitespace between tags
// html string includes line break
const htmlString = "<h1 style=\"\">test</h1><p style=\"\">line</p><p style=\"\">break</p><p style=\"\">testing <strong>foo</strong> testing</p><ul><li><p style=\"\">this</p></li><li><p style=\"\">and this</p></li></ul>"
// creates a new html document
const htmlDom = new DOMParser().parseFromString(htmlString, 'text/html')
console.log(htmlDom.body.children) // HTMLCollection(5) [h1, p, p, p, ul]
const plainText = Array
.from(htmlDom.body.children, e => e.innerText) // map fn is 2nd argument in Array.from
.filter(Boolean) //removes elements that had no text to avoid double spaces
.join(' ')
console.log(plainText) // 'test line break testing foo testing thisand this'
// TODO: add recursion to deal with nested tags, such as the <li> inside the <ul>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment