This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| [INFO] Scanning for projects... | |
| [INFO] ------------------------------------------------------------------------ | |
| [INFO] Reactor Build Order: | |
| [INFO] | |
| [INFO] Heritrix 3 | |
| [INFO] Heritrix 3: 'commons' subproject (utility classes) | |
| [INFO] Heritrix 3: 'modules' subproject (reusable components) | |
| [INFO] Heritrix 3: 'engine' subproject | |
| [INFO] Heritrix 3 (distribution bundles) | |
| [INFO] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| [INFO] Scanning for projects... | |
| [INFO] ------------------------------------------------------------------------ | |
| [INFO] Reactor Build Order: | |
| [INFO] | |
| [INFO] Heritrix 3 | |
| [INFO] Heritrix 3: 'commons' subproject (utility classes) | |
| [INFO] Heritrix 3: 'modules' subproject (reusable components) | |
| [INFO] Heritrix 3: 'engine' subproject | |
| [INFO] Heritrix 3 (distribution bundles) | |
| [INFO] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| [INFO] Scanning for projects... | |
| [INFO] ------------------------------------------------------------------------ | |
| [INFO] Reactor Build Order: | |
| [INFO] | |
| [INFO] Heritrix 3 | |
| [INFO] Heritrix 3: 'commons' subproject (utility classes) | |
| [INFO] Heritrix 3: 'modules' subproject (reusable components) | |
| [INFO] Heritrix 3: 'engine' subproject | |
| [INFO] Heritrix 3 (distribution bundles) | |
| [INFO] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| [INFO] Scanning for projects... | |
| [INFO] ------------------------------------------------------------------------ | |
| [INFO] Reactor Build Order: | |
| [INFO] | |
| [INFO] Heritrix 3 | |
| [INFO] Heritrix 3: 'commons' subproject (utility classes) | |
| [INFO] Heritrix 3: 'modules' subproject (reusable components) | |
| [INFO] Heritrix 3: 'engine' subproject | |
| [INFO] Heritrix 3 (distribution bundles) | |
| [INFO] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| package org.archive.modules.extractor; | |
| import org.archive.modules.CrawlURI; | |
| import org.htmlcleaner.CleanerProperties; | |
| import org.htmlcleaner.DomSerializer; | |
| import org.htmlcleaner.HtmlCleaner; | |
| import org.htmlcleaner.TagNode; | |
| import org.w3c.dom.Document; | |
| import org.w3c.dom.NamedNodeMap; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <?xml version="1.0" encoding="UTF-8"?> | |
| <!-- | |
| HERITRIX 3 CRAWL JOB CONFIGURATION FILE | |
| This is a relatively minimal configuration suitable for many crawls. | |
| Commented-out beans and properties are provided as an example; values | |
| shown in comments reflect the actual defaults which are in effect | |
| if not otherwise specified specification. (To change from the default | |
| behavior, uncomment AND alter the shown values.) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| [INFO] Scanning for projects... | |
| [INFO] ------------------------------------------------------------------------ | |
| [INFO] Reactor Build Order: | |
| [INFO] | |
| [INFO] Heritrix 3 | |
| [INFO] Heritrix 3: 'commons' subproject (utility classes) | |
| [INFO] Heritrix 3: 'modules' subproject (reusable components) | |
| [INFO] Heritrix 3: 'engine' subproject | |
| [INFO] Heritrix 3 (distribution bundles) | |
| [INFO] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| clj-heritrix.core=> (print-table (:members (r/reflect h))) | |
| | :name | :type | :declaring-class | :flags | | |
| |------------------------------+--------------------------------------+------------------------------+----------------------------| | |
| | PROPERTIES | java.lang.String | org.archive.crawler.Heritrix | #{:private :static :final} | | |
| | useAdhocKeystore | | org.archive.crawler.Heritrix | #{:protected} | | |
| | getComponent | | org.archive.crawler.Heritrix | #{:public} | | |
| | instanceMain | | org.archive.crawler.Heritrix | #{:public} | | |
| | options | | org.archive.crawler.Heritrix | #{:private :static} | | |
| | org.archive.crawler.Heritrix | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ;; gorilla-repl.fileformat = 1 | |
| ;; ** | |
| ;;; # Gorilla REPL | |
| ;;; | |
| ;;; Welcome to gorilla :-) Shift + enter evaluates code. Poke the question mark (top right) to learn more ... | |
| ;; ** | |
| ;; @@ | |
| (+ 1 2) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <html><head><link href="http://fonts.googleapis.com/css?family=Arvo:400,700,400italic,700italic|Lora:400,700,400italic,700italic" rel="stylesheet" type="text/css" /><link href="http://yandex.st/highlightjs/8.0/styles/default.min.css" rel="stylesheet" type="text/css" /><script src="http://yandex.st/highlightjs/8.0/highlight.min.js"></script><style> | |
| body { | |
| /*padding-top: 40px;*/ | |
| } | |
| div#contents { | |
| margin-left: 10%; | |
| margin-right: 10%; | |
| } |