I have a Java Application wherein my usecase is to detect the uploaded file which can be types (.docx, .doc, .ppt, .pptx, .xls, .xlsx) is password protected or not.
I got the solution to validate my PDF file using apache pdfbox library
private boolean isPdfPasswordProtected(InputStream inputStream) {
try (PDDocument document = PDDocument.load(inputStream)) {
return document.isEncrypted();
} catch (InvalidPasswordException e) {
return true;
} catch (IOException e) {
e.printStackTrace();
return false;
}
}
But for these file types (.docx, .doc, .ppt, .pptx, .xls, .xlsx), I got suggestions that we can use apache poi-ooxml but none of the implementations I referred is working for me. They also say Apache Tika metadata can be used to identify if encrypted or not and that as well is not working for me. Please help me with the implementation on validating the files is encrypted or not.
I tried using Apache POI with the below code and it did not work for encrypted word documents:
private boolean isWordPasswordProtected(InputStream inputStream, String contentType) {
try {
if (contentType.equalsIgnoreCase("application/msword")) { // Check for DOC files first
try (POIFSFileSystem poifs = new POIFSFileSystem(inputStream)) {
HWPFDocument doc = new HWPFDocument(poifs);
// Accessing properties will trigger password check
doc.getSummaryInformation();
return false; // Not password protected
} catch (EncryptedDocumentException e) {
return true; // Password protected
}
} else if (contentType.equalsIgnoreCase("application/vnd.openxmlformats-officedocument.wordprocessingml.document")) { // Then handle DOCX files
try (XWPFDocument docx = new XWPFDocument(inputStream)) {
// Accessing properties will trigger password check
docx.getProperties().getCoreProperties().getTitle();
return false; // Not password protected
} catch (EncryptedDocumentException e) {
return true; // Password protected
}
} else {
// Handle unsupported file formats
throw new IllegalArgumentException("Unsupported file format: " + contentType);
}
} catch (Exception e) {
// Handle exceptions
e.printStackTrace();
throw new RuntimeException("Error checking password protection", e);
}
}
I tried apache tika to get the metadata to verify but did not work. Please find the below code:
private boolean isWordPasswordProtected(InputStream inputStream, String contentType) {
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
// To extract metadata only, we use BodyContentHandler with -1 as the maximum characters
BodyContentHandler handler = new BodyContentHandler(1000);
ParseContext context = new ParseContext();
context.set(Parser.class, parser);
try {
parser.parse(inputStream, handler, metadata, context);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (SAXException e) {
throw new RuntimeException(e);
} catch (TikaException e) {
throw new RuntimeException(e);
}
// Check metadata for encryption-related information
String encryption = metadata.get("encryption");
return encryption != null && !encryption.isEmpty();
}
Please help me with the implementation on this.