(Description of the different solutions / alternatives)
This is the current, official, Microsoft endorsed/supported solution ("cloud based")
(2017 - present)
-
The user uploads their MS Office document (
source.doc
in our example snippet bellow) to their Microsoft OneDrive -
The user then uses the Microsoft Graph REST API to send a HTTP GET Request to the Convert content endpoint:
GET /drive/root:/{path to file}:/content?format={format}
setting the following parameters:
{path to file}
you want to convert{format}
the desired output file format (PDF in our case)
example:
https://graph.microsoft.com/v1.0/me/drive/root:/source.doc:/content?format=pdf
-
The Microsoft Graph REST API sends back a HTTP Response (Header) containing a
Location
field with the URL of the converted PDF document (ready for download) -
The user then downloads the converted PDF document from the URL returned in the previous step
HttpWebRequest convToPdfRequest =
(HttpWebRequest)WebRequest.Create("https://graph.microsoft.com/v1.0/me/drive/root:/source.doc:/content?format=pdf");
HttpWebResponse convToPdfResponse =
(HttpWebResponse)myHttpWebRequest.GetResponse();
string pdfDownloadUrl = convToPdfResponse.GetResponseHeader("Location");
This is the previous, official Microsoft endorsed/supported (server-side) solution
(2010 - 2016)
- The User has to purchase SharePoint Server 2010 ("standard edition", or "enterprise edition")
- Word Automation Services is a service that installs and runs (by default) with a stand-alone SharePoint Server 2010 installation
- Microsoft recomends the number of worker processes be set to no more than one less than the number of processors on your server
- Microsoft recommends that you configure the system for a maximum of 90 document conversions per worker process per minute
- By default, it starts conversion processes at 15 minute intervals; In addition, there are scenarios where you may want Word Automation Services to use as much resources as possible. Those scenarios may also benefit from setting the interval to one minute
- Once installed and configured you can start using it:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SharePoint;
using Microsoft.Office.Word.Server.Conversions;
class Program
{
static void Main(string[] args)
{
string siteUrl = "http://localhost";
// If you manually installed Word automation services, then replace the name
// in the following line with the name that you assigned to the service when
// you installed it.
string wordAutomationServiceName = "Word Automation Services";
using (SPSite spSite = new SPSite(siteUrl))
{
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.AddFile(siteUrl + "/Shared%20Documents/source.doc",
siteUrl + "/Shared%20Documents/source.pdf");
job.Start();
}
}
}
NOTE: The official Microsoft description of this solution, including installation, configuration, software development advice with C# examples is here.
This is an older, constrained solution (perhaps "workaround" is more fitting)
(apx. 2005 - 2010)
"All current versions of Microsoft Office were designed, tested, and configured to run as end-user products on a client workstation. They assume an interactive desktop and user profile. They do not provide the level of reentrancy or security that is necessary to meet the needs of server-side components that are designed to run unattended.
Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment."
Microsoft's official explanations of the contraints/limitations of using this approach is here.
SUMMARY: The above constrains/limitations details are the reason why the "MsOfficeToPdfConverter service" will be most likely implemented and run as interactive-desktop-like Microsoft Windows application, rahter than a "real" Microsoft Windows service.
- Microsoft Windows env. with Microsoft Office 2007 (or higher) pre-installed
- The solution is a script-or-application written in a language of your choice (PowerShell, .NET/C#, Python/pywin32) that uses Microsoft COM layer to invoke Microsoft Office (for example MS Word) functionality:
For each of the MS Office documents you want to convert, the script-or-application:- opens the MS Office document
- saves the document in the desired output format (PDF)
- close the document
Example (This is the original script that was tested to convert the example patient record MS Office files to PDF on a Microsoft Windows 10 env):
# This script converts all the .doc and .docx files in the `$documents_path` dir to .pdf
#
# It needs proper/robust error handling:
# - https://stackoverflow.com/questions/16534292/basic-powershell-batch-convert-word-docx-to-pdf
# NOTE: the 2nd post about crashes, and their ("typical") m$ workaround
#
$documents_path = '.\test_files'
$word_app = New-Object -ComObject Word.Application
# This filter will find .doc as well as .docx documents
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
$document = $word_app.Documents.Open($_.FullName)
$pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"
$document.SaveAs([ref] $pdf_filename, [ref] 17)
$document.Close()
}
$word_app.Quit()