This Seed Streams guide illustrates how to use Lucidworks Fusion to crawl a specific set of documents on a website whose URIs match a regular expression. Additionally, img src
fields are extracted with a JavaScript parsing stage and inserted into the index for use in other indexing stages. A vision network may be utilized to extract additional fields from the images.
- Start a Fusion instance on Google. Click the link the script outputs to navigate to the Fusion instance page. Set a password. Login with
admin
and the new password. - Create a new application. Call it
XKCD
. - Click on the new application.
- Create a new datasource under Indexing..Datasources. Add a web source. Add https://xkcd.com a