Skip to content

Instantly share code, notes, and snippets.

@jatin-lab49
Created June 4, 2021 15:10
Show Gist options
  • Save jatin-lab49/68770dc2235fa84852b48257f9e31eb7 to your computer and use it in GitHub Desktop.
Save jatin-lab49/68770dc2235fa84852b48257f9e31eb7 to your computer and use it in GitHub Desktop.
TIL-Lab49/Using Apache Tika for file type validation

We recently had a security team ask us to validate that the file being uploaded was indeed a plain text file. To do this, we decided to use Apache Tika.

The code itself is fairly straightforward:


import org.apache.tika.config.TikaConfig;
import org.apache.tika.exception.TikaException;
import org.apache.tika.io.TikaInputStream;
import org.apache.tika.mime.MediaType;
import org.apache.tika.metadata.MetaData;

// assuming we get the file as an inputStream using a Jersey endpoint

TikaConfig tika = new TikaConfig;
MediaType mimeType = tika.getDetector().detect(TikaInputStream.get(inputStream), new Metadata());

if(!MediaType.TEXT_PLAIN.equals(mineType)){
  // invalid file type
} else {
  // valid file type
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment