Last active
February 12, 2020 21:49
-
-
Save vkurpad/b69599d7dd4837ea85bdb61a810b8362 to your computer and use it in GitHub Desktop.
Building an Azure Cognitive Search enrichment pipeline that supports rapid iterations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| For this scenario we are going to extend the [km-aml solution accelerator](https://github.com/microsoft/solution-accelerator-km-aml) to use a custom corpus. The goal is to start with corpus of data. | |
| 1. Skim through the documents to identify a set of entities that should be recognized | |
| 2. Create a list of entities | |
| 3. Create a enrichment pipeline with a skill that takes in the list of entities and labels the text with IOB tags | |
| 4. Train a custom entity classifier on this labeled dataset | |
| 5. Update the enrichment pipeline to use the newly minted entity classifier | |
| 6. Reprocess the documents to now identify the labeled entities and other similar entities |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment