We recently had a security team ask us to validate that the file being uploaded was indeed a plain text file. To do this, we decided to use Apache Tika.
The code itself is fairly straightforward:
import org.apache.tika.config.TikaConfig;
import org.apache.tika.exception.TikaException;
import org.apache.tika.io.TikaInputStream;
import org.apache.tika.mime.MediaType;
import org.apache.tika.metadata.MetaData;
// assuming we get the file as an inputStream using a Jersey endpoint
TikaConfig tika = new TikaConfig;
MediaType mimeType = tika.getDetector().detect(TikaInputStream.get(inputStream), new Metadata());
if(!MediaType.TEXT_PLAIN.equals(mineType)){
// invalid file type
} else {
// valid file type
}