Skip to content

Instantly share code, notes, and snippets.

@rr-codes
Created July 25, 2020 22:12
Show Gist options
  • Select an option

  • Save rr-codes/522ac5727754335e3b39068860d3789e to your computer and use it in GitHub Desktop.

Select an option

Save rr-codes/522ac5727754335e3b39068860d3789e to your computer and use it in GitHub Desktop.
A String extension to filter substrings based on parts of speech
import NaturalLanguage
extension String {
/// Extracts the substrings of this string which match any of the specified `partsOfSpeech`
///
/// For example,
///
/// let string = "John's anniversary in Greece"
/// let filtered = string.filter(by: [.noun, .placeName]) // ["anniversary", "Greece"]
///
/// - Parameter partsOfSpeech: an array of `NLTags` specifying which parts of speech should be extracted
///
/// - Returns: An array of `Substring`s which match any of the specified parts of speech
func filter(by partsOfSpeech: [NLTag]) -> [Substring] {
let tagger = NLTagger(tagSchemes: [.nameTypeOrLexicalClass])
tagger.string = self
let tags = tagger.tags(
in: self.startIndex ..< self.endIndex,
unit: .word,
scheme: .nameTypeOrLexicalClass,
options: [.omitPunctuation, .omitWhitespace, .omitOther]
)
let filtered = tags.filter { (tag, _) in
if let tag = tag {
return partsOfSpeech.contains(tag)
}
return false
}
return filtered.map { (_, range) in self[range] }
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment