Last active
September 26, 2021 18:50
-
-
Save vincent-zurczak/23e0f626eaafab96cb32 to your computer and use it in GitHub Desktop.
HTML 5 validation in Java (based on the Nu HTML Checker)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!-- Add this in your POM --> | |
<dependency> | |
<groupId>nu.validator</groupId> | |
<artifactId>validator</artifactId> | |
<version>15.3.14</version> | |
<scope>test</scope> | |
<exclusions> | |
<exclusion> | |
<groupId>org.eclipse.jetty</groupId> | |
<artifactId>*</artifactId> | |
</exclusion> | |
</exclusions> | |
</dependency> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Verifies that a HTML content is valid. | |
* @param htmlContent the HTML content | |
* @return true if it is valid, false otherwise | |
* @throws Exception | |
*/ | |
public boolean validateHtml( String htmlContent ) throws Exception { | |
InputStream in = new ByteArrayInputStream( htmlContent.getBytes( "UTF-8" )); | |
ByteArrayOutputStream out = new ByteArrayOutputStream(); | |
SourceCode sourceCode = new SourceCode(); | |
ImageCollector imageCollector = new ImageCollector(sourceCode); | |
boolean showSource = false; | |
MessageEmitter emitter = new TextMessageEmitter( out, false ); | |
MessageEmitterAdapter errorHandler = new MessageEmitterAdapter( sourceCode, showSource, imageCollector, 0, false, emitter ); | |
errorHandler.setErrorsOnly( true ); | |
SimpleDocumentValidator validator = new SimpleDocumentValidator(); | |
validator.setUpMainSchema( "http://s.validator.nu/html5-rdfalite.rnc", new SystemErrErrorHandler()); | |
validator.setUpValidatorAndParsers( errorHandler, true, false ); | |
validator.checkHtmlInputSource( new InputSource( in )); | |
return 0 == errorHandler.getErrors(); | |
} |
The errors flow from SimpleDocumentValidator
into the MessageEmitterAdapter
into the TextMessageEmitter
into the ByteArrayOutputStream
.
To actually see them you'll have to call errorHandler.end(...)
before reading out
.
I agree a nicer way to 'programmatically' collect the errors would be great, but I didn't see anything particulary nice yet.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
How can i collect all errors (maybe even with my preferential format like JSON) ?
And what is this line for ?
validator.setUpMainSchema( "http://s.validator.nu/html5-rdfalite.rnc", new SystemErrErrorHandler());