/* contains code samples and explanation of apache poi*/
for reading, creating, editing PPT files
OLE - Object linking and Embedding is a proprietary technology by Microsoft which allows embedding and linking to other doucments and objects
OLE allows embedding one document within another
POIFS is a pure Java implementation of the OLE 2 Compound Document format.
From the Apache POI FS intro page: A common confusion is on just what POIFS buys you or what OLE 2 Compound Document format is exactly. POIFS does not buy you DOC, or XLS, but is necessary to generate or read DOC or XLS files. You see, all file formats based on the OLE 2 Compound Document Format have a common structure. The OLE 2 Compound Document Format is essentially a convoluted archive format. Think of POIFS as a "zip" library. Once you can get the data in a zip file you still need to interpret the data. As a general rule, while all of our formats use POIFS, most of them attempt to abstract you from it. There are some circumstances where this is not possible, but as a general rule this is true.
If you're an end user type just looking to generate XLS files, then you'd be looking for HSSF not POIFS; however, if you have legacy code that uses MFC property sets, POIFS is for you! Regardless, you may or may not need to know how to use POIFS but ultimately if you use technologies that come from the POI project, you're using POIFS underneath. Perhaps we should have a branding campaign "POIFS Inside!". ;-)
All the MS Office documents are basically compressed XML files in a convoluted ZIP format
It is possible for one OLE 2 based document to have other OLE 2 documents embedded in it. For example, an Excel file may have a Word document and a PowerPoint slideshow embedded as part of it.
Normally, these other documents are stored in subdirectories of the OLE 2 (POIFS) filesystem. The exact location of the embedded documents will vary depending on the type of the master document, and the exact directory names will differ each time. To figure out exactly which directory to look in, you will either need to process the appropriate OLE 2 linking entry in the master document, or simple iterate over all the directories in the filesystem.
As a general rule, you will find the same OLE 2 entries in the subdirectories, as you would've found at the root of the filesystem were a document to not be embedded.
The child directories may have other embeded documents.
PowerPoint does not normally store embedded files in the OLE2 layer. Instead, they are held within records of the main PowerPoint file. See the HSLF Tutorial for how to retrieve embedded OLE objects from a presentation.
To conclude, every MS Office document may have sub-directories containing embedded documents and some records of the main master document
POIFS provides a simple tool for listing the contents of OLE2 files. This can allow you to see what your POIFS file contents, and hence if it has any embedded documents in it, and where.
The tool to use is org.apache.poi.poifs.dev.POIFSLister
. This tool may be run from the command line, and takes a filename as its parameter. It will print out all the directories and files contained within the POIFS file.
- HSLFSlideShow
- HSLFShape
- HSLFObjectData
- HSLFSlide
- HSLFTextShape (may prove to be useful)
- HSLFTextBox (subclass of the above class)
- HSLFGroupShape
- HSLFTextParagraph
- HSLFTextRun - represents a run of text, all having the same style -- could be useful to identify the section headers
https://poi.apache.org/slideshow/how-to-shapes.html#OLE
FileInputStream is = new FileInputStream("slideshow.ppt");
HSLFSlideShow ppt = new HSLFSlideShow(is);
is.close();
This is how you get the handle to the PPT of the given inputstream of PPT document
HSLFSlide.getTitle()