MSDN Blog Postings

via RSS Feed

Developing with SharePoint 2010 Word Automation Services

Posted by on March 17th, 2010

There are some tasks that are difficult using the Open XML SDK, such as repagination, conversion to other document formats such as PDF, or updating of the table of contents, fields, and other dynamic content in documents.  Word Automation Services is a new feature of SharePoint 2010 that can help in these scenarios.  It is a shared service that provides unattended, server-side conversion of documents into other formats, as well as some other essential pieces of functionality.  It was designed from the outset to work on servers, and can process high volumes of documents in a reliable and predictable fashion.


Note: This is an MSDN article that I’ve co-authored with Tristan Davis and Zeyad Rajabi.  It will be published on MSDN sometime in the near future.


Table of Contents


Overview


Typical Scenario


How Word Automation Services Works


Configuration of Word Automation Services


Building your first Word Automation Services application


Monitoring Status of Conversions


Determining which Documents Failed to Convert


Delete Source Files after Conversion


Integrating with the Open XML SDK


See Also


Overview


Using Word Automation Services, you can convert from Open XML WordprocessingML to other document formats.  For example, you may want to convert a number of documents to PDF and spool them to a printer or send them by email to your customers.


You can convert from other document formats (such as HTML or Word 97-2003 binary documents) to Open XML word-processing documents.


In addition to the document conversion facilities, there are other important areas of functionality that Word Automation Services provides, such as updating field codes in documents, and converting altChunk content to normally styled paragraphs.  These tasks are difficult to accomplish using the Open XML SDK, but it is easy to use Word Automation Services to do them.


In the past, to accomplish tasks such as these, developers have sometimes resorted to using Word client automation.  However, this approach is problematic.  The Word client is an application that is best suited for authoring documents interactively, and wasn’t designed for high-volume processing on a server.  In the process of accomplishing these tasks, it is possible that Word will put up a dialog box reporting an error, and if the Word client is being automated on a server, there is no user to respond to the dialog box, and the entire process can come to an untimely halt.  The issues associated with automation of word are documented in this Knowledge Base article.


Typical Scenario


The following is a typical use of Word Automation Services.




  • An expert creates some Word ‘template’ documents that follow specific conventions.  She might use content controls to give structure to the template documents.  This provides a good user experience and a reliable programmatic approach for determining the locations in the template document to replace with data in the document generation process.  These ‘template’ documents are typically stored in a document library in a SharePoint site.


  • A program runs on the server to merge the template documents with data, generating a set of Open XML WordprocessingML (DOCX) documents.  This program is best written using the Open XML SDK, which is specifically designed for generating documents on a server.  These documents are placed in a SharePoint document library.


  • After generating the set of documents, they might be automatically printed.  Alternatively, they might be emailed to a set of users, either as WordprocessingML documents, or perhaps as PDF, XPS, or MHTML documents after converting them from WordprocessingML to the desired format.


  • As part of the conversion, you can instruct Word Automation Services to update fields, such as the table of contents.

Using the Open XML SDK together with Word Automation Services enables the creation of rich, end-to-end solutions that perform well and do not require automation of the Word client application.


One of the key advantages of Word Automation Services is that it can scale to your needs.  Unlike the Word client application, you can configure it to use multiple processors.  Further, you can configure it to load-balance across multiple servers if your needs require.


Another key advantage is that Word Automation Services has perfect fidelity with the Word client applications.  Document layout, including pagination, is identical regardless of whether the document is processed on the server or client.


Source Document Formats


The supported source document formats for documents are as follows.




  • Open XML File Format documents (.docx, .docm, .dotx, .dotm)


  • Word 97-2003 documents (.doc, .dot)


  • Rich Text Format files (.rtf)


  • Single File Web Pages (.mht, .mhtml)


  • Word 2003 XML Documents (.xml)


  • Word XML Document (.xml)

Destination Document Formats


The supported destination document formats includes all of the supported source document formats, as well as the following.




  • Portable Document Format (.pdf)


  • Open XML Paper Specification (.xps)

Other Capabilities


In addition to the ability to load and save documents in a variety of formats, Word Automation Services includes other capabilities.


You can cause Word Automation Services to update the table of contents, the table of authorities, and index fields.  This is important when generating documents.  After generating a document, if the document has a table of contents, it is an especially difficult task to determine document pagination so that the table of contents is updated properly.  Word Automation Services takes care of this for you.


Open XML word-processing documents can contain a wide variety of field types, which allows you to put ‘dynamic’ content into a document.  You can use Word Automation Services to cause all fields to be recalculated.  For example, you can have a field type that inserts the current date into a document.  Updating fields will update the associated content so that the document displays the current date at the location of the field.


One of the powerful ways that you can use content controls is to bind them to XML elements in a custom XML part.  See Building Document Generation Systems from Templates using Word 2007 and Word 2010 for an explanation of bound content controls, as well as links to a number of resources to help you get started.  You can replace the contents of bound content controls by replacing the XML in the custom XML part.  You don’t need to alter the main document part.  The main document part contains ‘cached’ values for all bound content controls, and if you replace the XML in the custom XML part, the cached values in the main document part are not updated.  This is not a problem if you expect that your users will view these generated documents only using the Word client.  However, if you want to further process the WordprocessingML markup, you will want to update the cached values in the main document part.  Word Automation Services can do this.


Alternate format chunks (as represented by the altChunk element) are a great way to import HTML content into a WordprocessingML document.  Building Document Generation Systems from Templates using Word 2007 and Word 2010 discusses alternate format chunks, their uses, and some links to help you get started.  Prior to Word Automation Services, until documents that contain altChunk elements were opened and saved using the Word client, the document contained the imported HTML, not normal WordprocessingML paragraphs.  You can use Word Automation Services to import the HTML (or other forms of alternative content) and convert them to WordprocessingML markup that contains familiar WordprocessingML paragraphs that have styles.


You can also convert to and from formats used by previous versions of Word.  If you are building an enterprise class application that will be used by thousands of users, you may have a number of users who are using Word 2007 or Word 2003 to edit Open XML documents.  You can convert Open XML documents so that they contain only the markup and features used by either Word 2007 or Word 2003.


Limitations


Word Automation Services does not include capabilities for printing documents.  However, it is straightforward to convert WordprocessingML documents to PDF or XPS and spool them to a printer.


A question that sometimes arises is whether you can use Word Automation Services without purchasing and installing SharePoint Server 2010.  Word Automation Services takes advantage of facilities of SharePoint 2010, and is a feature of it.  You must purchase and install SharePoint Server 2010 to use it.  Word Automation Services is in the standard and enterprise CAL.


How Word Automation Services Works


As mentioned, Word Automation Services is a service that installs and runs by default with a standalone SharePoint Server 2010 installation.  If you are using SharePoint 2010 in a farm, you must explicitly enable it.


To use it, you use its programming interface to start a ‘conversion job’.  For each conversion job, you specify which files, folders, or document libraries you want the conversion job to process.  Based on your configuration of it, when you start a conversion job, it will start a specified number of conversion processes on each server.  You can specify the frequency with which it starts conversion jobs, and you can specify the number of conversions to start for each conversion process.  In addition, you can specify the maximum percentage of memory that Word Automation Services is allowed to use.


The configuration settings allow you to configure Word Automation Services so that it does not consume too many resources on SharePoint servers that are part of your vital infrastructure.  The settings that you will want to use will be dictated by how you want to use your SharePoint server.  If it is solely used for document conversions, then you will want to configure the settings so that the conversion service can consume most of your processor time.  If you are using the conversion service for low-priority background conversions, then you will want to configure accordingly.


Important: We recommend that the number of worker processes be set to at most one less than the number of processors on your server.  If you have a four processor server, then set the number of worker processes to three at most.


If you are running a farm installation, the number of worker processes should be set to one less than the number of processors on the server in the farm with the least number of processors.


Important: We recommend that you configure the system for a maximum of 90 document conversions per worker process per minute.


In addition to writing code that starts conversion processes, you can also write code to monitor the progress of conversions.  This allows you to inform users or post notifications when very large conversion jobs are completed.


Word Automation Services allows you to configure three additional aspects of conversions.




  • You can limit the number of file formats that it supports.


  • Because it is possible that invalid documents can cause Word Automation Services to consume too much memory, you can specify the number of documents converted by a conversion process before it is restarted.  All memory will be reclaimed when the process is restarted.  In addition, you can specify the number of times that Word Automation Services attempts to convert a document.  By default, this is set to two, so if Word Automation Services fails in its attempts to convert a document, it attempts to convert it only one more time (in that conversion job).


  • Word Automation Services will monitor conversions, making sure that conversions have not stalled.  You can specify the length of elapsed time before conversion processes are monitored.

Configuration of Word Automation Services


Unless you have installed a farm of servers, Word Automation Services is installed and started by default in SharePoint Server 2010.  However, as a developer, you will want to alter its configuration so that you have a better development experience.  By default, it will start conversion processes at 15 minute intervals, so if you are testing code that uses it, you will benefit from setting the interval to one minute.  In addition, there are scenarios where you may want Word Automation Services to use as much resources as possible.  Those scenarios may also benefit from setting the interval to one minute.


Procedure: Adjusting the conversion process interval to one minute




  1. Start SharePoint 2010 Central Administration.


  2. Click the Manage Service Applications link on the home page of SharePoint 2010 Central Administration.


  3. In the Service Applications administration page, service applications are sorted alphabetically.  Scroll to the bottom of the page, and click on Word Automation Services.  If you are installing a server farm, and installed Word Automation Services manually, then whatever you entered for the name of the service is what you will see on this page.


  4. In the Word Automation Services administration page, modify the conversion throughput field with your desired frequency for starting conversion jobs.


  5. As a developer, you may also wish to set the number of conversion processes, and to adjust the number of conversions per worker process.  If you adjust the frequency with which conversion processes start without adjusting the other two values, and you attempt to convert a large number of documents, you will make the conversion process much less efficient.  The optimum value for these numbers is dependent on the power of your computer that is running SharePoint Server 2010.


  6. Scroll to the bottom of the page and click OK.

Building your first Word Automation Services application


Because Word Automation Services is a service of SharePoint Server 2010, you can only use it in an application that runs directly on a SharePoint server.  You must build the application as a farm solution.  You cannot use Word Automation Services from a sandboxed solution.


A convenient way to use Word Automation Services is to write a web service that you can use from client applications.


However, the easiest way to demonstrate how to write code that uses Word Automation Services is to build a console application.  You must build and run the console application on your SharePoint server, not on a client computer.  The code to start conversion jobs and the code to monitor conversion jobs is identical to the code that you would write for a web part, a workflow, or an event handler.  Showing the use of Word Automation Services from a console application allows us to discuss the API without adding the complexities of a web part, an event handler, or a workflow.


Note that the following sample applications call Thread.Sleep so that the examples query for status every five seconds.  This would not be the best approach when writing code that you will deploy on production servers.  Instead, you probably would want to write a workflow with Delay activity.


To build the application




  1. Start Microsoft Visual Studio 2010.


  2. On the File menu, point to New, and then click Project.


  3. In the New Project dialog box, in the Recent Template pane, expand Visual C#, and then click Windows.


  4. To the right of the Recent Template pane, click Console Application.


  5. By default, Visual Studio creates a project that targets .NET Framework 4.0, but you must target .NET Framework 3.5.  From the list at the upper part of the File Open dialog box, select .NET Framework 3.5.


  6. In the Name box, type the name that you want to use for your project, such as FirstWordAutomationServicesApplication.


  7. In the Location box, type the location where you want to place the project.

Figure 1. Creating a solution in the New Project dialog box




  1. Click OK to create the solution.


  2. By default, Visual Studio 2010 will create a project that targets x86 CPUs, but to build SharePoint server applications, you must target ‘any CPU’.


  3. If you are building a C# application, in the Solution Explorer window, right click on the project, and then click Properties.  In the project properties window, click the Build tab.  Point at the Platform Target list, and select Any CPU.


  4. If you are building a Visual Basic.NET application, then in the project properties window, click the Compile tab.  Click the Advanced Compile Options button.  Point at the Platform Target list, and select Any CPU.

Figure 2. Target Any CPU when building a C# console application


Figure 3. Compile options for a Visual Basic application

Figure 4. Advanced Compiler Settings dialog box


To add references to the Microsoft.SharePoint assembly and Microsoft.Office.Word.Server assembly




  1. On the Project menu, click Add Reference to open the Add Reference dialog box.


  2. Select the .NET tab, and add the component named Microsoft Office 2010 component.

Figure 5. Adding a reference to Microsoft Office 2010 component


Figure 6. Adding a reference to Microsoft SharePoint


Following are the complete C# and VB listings for the simplest Word Automation Services application.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SharePoint;
using Microsoft.Office.Word.Server.Conversions;

class Program
{
    static void Main(string[] args)
    {
        string siteUrl = “http://localhost”;
        using (SPSite spSite = new SPSite(siteUrl))
        {
            // If you manually installed Word Automation Services, then replace the name
            // in the following line with the name that you assigned to the service when
            // you installed it.
            ConversionJob job = new ConversionJob(“Word Automation Services”);
            job.UserToken = spSite.UserToken;
            job.Settings.UpdateFields = true;
            job.Settings.OutputFormat = SaveFormat.PDF;
            job.AddFile(siteUrl + “/Shared%20Documents/Test.docx”,
                siteUrl + “/Shared%20Documents/Test.pdf”);
            job.Start();
        }
    }
}

Imports Microsoft.SharePoint
Imports Microsoft.Office.Word.Server.Conversions
Module Module1
    Sub Main()
        Dim siteUrl As String = “http://localhost”
        Using spSite As SPSite = New SPSite(siteUrl)
            ‘ If you manually installed Word Automation Services, then replace the name
            ‘ in the following line with the name that you assigned to the service when
            ‘ you installed it.
            Dim job As ConversionJob = New ConversionJob(“Word Automation Services”)
            job.UserToken = spSite.UserToken
            job.Settings.UpdateFields = True
            job.Settings.OutputFormat = SaveFormat.PDF
            job.AddFile(siteUrl + “/Shared%20Documents/Test.docx”, _
                siteUrl + “/Shared%20Documents/Test.pdf”)
            job.Start()
        End Using
    End Sub
End Module


Replace the URL assigned to siteUrl with the URL to your SharePoint site.


To build and run the example




  1. Add a Word document named Test.docx to the Shared Documents folder in your SharePoint site.


  2. Build and run the example.


  3. After waiting one minute for the conversion process to run, navigate to the ShareDocuments folder in the SharePoint site, and refresh the page.  The document library will now contain a new PDF document, Test.pdf.

Monitoring Status of Conversions


In many scenarios, you will want to monitor the status of conversions, either to inform the user of completion, or to further process the converted documents.  You can use the ConversionJobStatus class to query Word Automation Services about the status of a conversion job.  You pass the name of the WordServiceApplicationProxy as a string (by default, “Word Automation Services”), and the conversion job identifier, which you can get from the ConversionJob object.  You can also pass a Guid that specifies a tenant partition, but if the SharePoint farm is not configured for multiple tenants, then you can pass null (Nothing in Visual Basic) as the argument for this parameter.


After you instantiate a ConversionJobStatus object, you can access a number of properties that tell you the status of the conversion job.  The following are the three most interesting properties. yyy
















Property


Return Value


ConversionJobStatus.Count


The number of documents in the conversion job.


ConversionJobStatus.Succeeded


The number of documents successfully converted.


ConversionJobStatus.Failed


The number of documents that failed conversion.


 


Whereas the first example specified a single document to convert, the following example converts all documents in a specified document library.  You have the option of creating all converted documents in a different document library than the source library, but for simplicity, the following example specifies the same document library for both the input and output document libraries.  In addition, the following example specifies that the conversion job should overwrite the output document if it already exists.


Console.WriteLine(“Starting conversion job”);
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPList listToConvert = spSite.RootWeb.Lists[“Shared Documents”];
job.AddLibrary(listToConvert, listToConvert);
job.Start();
Console.WriteLine(“Conversion job started”);
ConversionJobStatus status = new ConversionJobStatus(wordAutomationServiceName,
    job.JobId, null);
Console.WriteLine(“Number of documents in conversion job: {0}”, status.Count);
while (true)
{
    Thread.Sleep(5000);
    status = new ConversionJobStatus(wordAutomationServiceName, job.JobId,
        null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine(“Completed, Successful: {0}, Failed: {1}”,
            status.Succeeded, status.Failed);
        break;
    }
    Console.WriteLine(“In progress, Successful: {0}, Failed: {1}”,
        status.Succeeded, status.Failed);
}
[Download Complete Example]

Console.WriteLine(“Starting conversion job”)
Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
job.UserToken = spSite.UserToken
job.Settings.UpdateFields = True
job.Settings.OutputFormat = SaveFormat.PDF
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite
Dim listToConvert As SPList = spSite.RootWeb.Lists(“Shared Documents”)
job.AddLibrary(listToConvert, listToConvert)
job.Start()
Console.WriteLine(“Conversion job started”)
Dim status As ConversionJobStatus = _
    New ConversionJobStatus(wordAutomationServiceName, job.JobId, Nothing)
Console.WriteLine(“Number of documents in conversion job: {0}”, status.Count)
While True
    Thread.Sleep(5000)
    status = New ConversionJobStatus(wordAutomationServiceName, job.JobId, _
                                     Nothing)
    If status.Count = status.Succeeded + status.Failed Then
        Console.WriteLine(“Completed, Successful: {0}, Failed: {1}”, _
                          status.Succeeded, status.Failed)
        Exit While
    End If
    Console.WriteLine(“In progress, Successful: {0}, Failed: {1}”, _
                      status.Succeeded, status.Failed)
End While
[Download Complete Example]


To run this example, place a number of WordprocessingML documents in the Shared Documents library.  When you run this example, you will see output similar to the following:


Starting conversion job
Conversion job started
Number of documents in conversion job: 4
In progress, Successful: 0, Failed: 0
In progress, Successful: 0, Failed: 0
Completed, Successful: 4, Failed: 0


Determining which Documents Failed to Convert


You may want to determine which documents failed conversion, perhaps to inform the user, or take remedial action such as removing the invalid document from the input document library.  You can call the ConversionJobStatus.GetItems method, which returns a collection of ConversionItemInfo objects.  When you call ConversionJobStatus.GetItems, you pass a parameter that specifies whether you want to retrieve a collection of failed conversions or successful conversions.  The following example shows how to do this.


Console.WriteLine(“Starting conversion job”);
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPList listToConvert = spSite.RootWeb.Lists[“Shared Documents”];
job.AddLibrary(listToConvert, listToConvert);
job.Start();
Console.WriteLine(“Conversion job started”);
ConversionJobStatus status = new ConversionJobStatus(wordAutomationServiceName,
    job.JobId, null);
Console.WriteLine(“Number of documents in conversion job: {0}”, status.Count);
while (true)
{
    Thread.Sleep(5000);
    status = new ConversionJobStatus(wordAutomationServiceName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine(“Completed, Successful: {0}, Failed: {1}”,
            status.Succeeded, status.Failed);
        ReadOnlyCollection<ConversionItemInfo> failedItems =
            status.GetItems(ItemTypes.Failed);
        foreach (var failedItem in failedItems)
            Console.WriteLine(“Failed item: Name:{0}”, failedItem.InputFile);
        break;
    }
    Console.WriteLine(“In progress, Successful: {0}, Failed: {1}”, status.Succeeded,
        status.Failed);
}
[Download Complete Example]

Console.WriteLine(“Starting conversion job”)
Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
job.UserToken = spSite.UserToken
job.Settings.UpdateFields = True
job.Settings.OutputFormat = SaveFormat.PDF
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite
Dim listToConvert As SPList = spSite.RootWeb.Lists(“Shared Documents”)
job.AddLibrary(listToConvert, listToConvert)
job.Start()
Console.WriteLine(“Conversion job started”)
Dim status As ConversionJobStatus = _
    New ConversionJobStatus(wordAutomationServiceName, job.JobId, Nothing)
Console.WriteLine(“Number of documents in conversion job: {0}”, status.Count)
While True
    Thread.Sleep(5000)
    status = New ConversionJobStatus(wordAutomationServiceName, job.JobId, _
                                     Nothing)
    If status.Count = status.Succeeded + status.Failed Then
        Console.WriteLine(“Completed, Successful: {0}, Failed: {1}”, _
                          status.Succeeded, status.Failed)
        Dim failedItems As ReadOnlyCollection(Of ConversionItemInfo) = _
            status.GetItems(ItemTypes.Failed)
        For Each failedItem In failedItems
            Console.WriteLine(“Failed item: Name:{0}”, failedItem.InputFile)
        Next
        Exit While
    End If
    Console.WriteLine(“In progress, Successful: {0}, Failed: {1}”, _
                      status.Succeeded, status.Failed)
End While
[Download Complete Example]


To run this example, create an invalid document and upload it to the document library.  An easy way to create an invalid document is to rename the WordprocessingML document, appending .zip to the filename.  Then delete the main document part (named document.xml), which is in the word folder of the package.  Rename the document, removing the .zip extension, so that it contains the normal .docx extension.


When you run this example, it will produce output similar to the following:


Starting conversion job
Conversion job started
Number of documents in conversion job: 5
In progress, Successful: 0, Failed: 0
In progress, Successful: 0, Failed: 0
In progress, Successful: 4, Failed: 0
In progress, Successful: 4, Failed: 0
In progress, Successful: 4, Failed: 0
Completed, Successful: 4, Failed: 1
Failed item: Name:http://intranet.contoso.com/Shared%20Documents/IntentionallyIn
validDocument.docx


Another approach to monitoring a conversion process is to use event handlers on a SharePoint list to watch for when converted document are placed in the output document library.


Delete Source Files after Conversion


In some situations, you may want to delete the source documents after conversion.  The following example shows how to do this.


Console.WriteLine(“Starting conversion job”);
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPFolder folderToConvert = spSite.RootWeb.GetFolder(“Shared Documents”);
job.AddFolder(folderToConvert, folderToConvert, false);
job.Start();
Console.WriteLine(“Conversion job started”);
ConversionJobStatus status = new ConversionJobStatus(wordAutomationServiceName,
    job.JobId, null);
Console.WriteLine(“Number of documents in conversion job: {0}”, status.Count);
while (true)
{
    Thread.Sleep(5000);
    status = new ConversionJobStatus(wordAutomationServiceName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine(“Completed, Successful: {0}, Failed: {1}”,
            status.Succeeded, status.Failed);
        Console.WriteLine(“Deleting only items that successfully converted”);
        ReadOnlyCollection<ConversionItemInfo> convertedItems =
            status.GetItems(ItemTypes.Succeeded);
        foreach (var convertedItem in convertedItems)
        {
            Console.WriteLine(“Deleting item: Name:{0}”, convertedItem.InputFile);
            folderToConvert.Files.Delete(convertedItem.InputFile);
        }
        break;
    }
    Console.WriteLine(“In progress, Successful: {0}, Failed: {1}”,
        status.Succeeded, status.Failed);
}
[Download Complete Example]

Console.WriteLine(“Starting conversion job”)
Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
job.UserToken = spSite.UserToken
job.Settings.UpdateFields = True
job.Settings.OutputFormat = SaveFormat.PDF
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite
Dim folderToConvert As SPFolder = spSite.RootWeb.GetFolder(“Shared Documents”)
job.AddFolder(folderToConvert, folderToConvert, False)
job.Start()
Console.WriteLine(“Conversion job started”)
Dim status As ConversionJobStatus = _
    New ConversionJobStatus(wordAutomationServiceName, job.JobId, Nothing)
Console.WriteLine(“Number of documents in conversion job: {0}”, status.Count)
While True
    Thread.Sleep(5000)
    status = New ConversionJobStatus(wordAutomationServiceName, job.JobId, _
                                     Nothing)
    If status.Count = status.Succeeded + status.Failed Then
        Console.WriteLine(“Completed, Successful: {0}, Failed: {1}”, _
                          status.Succeeded, status.Failed)
        Console.WriteLine(“Deleting only items that successfully converted”)
        Dim convertedItems As ReadOnlyCollection(Of ConversionItemInfo) = _
            status.GetItems(ItemTypes.Succeeded)
        For Each convertedItem In convertedItems
            Console.WriteLine(“Deleting item: Name:{0}”, convertedItem.InputFile)
            folderToConvert.Files.Delete(convertedItem.InputFile)
        Next
        Exit While
    End If
    Console.WriteLine(“In progress, Successful: {0}, Failed: {1}”,
                      status.Succeeded, status.Failed)
End While
[Download Complete Example]


Integrating with the Open XML SDK


The power of using Word Automation Services becomes apparent when you use it in conjunction with the Open XML SDK.  You can programmatically modify a document in a document library using the Open XML SDK, and then use Word Automation Services to perform one of the tasks that are difficult using the Open XML SDK.


Updating the Table of Contents


A common need is to programmatically generate a document, and then generate or update the table of contents of the document.  Consider the following document, which contains a table of contents.


We want to modify this document, adding content that should be included in the table of contents.  This next example takes the following steps:




  1. It opens the site and retrieves the Test.docx document using a CAML query.


  2. It opens the document using the Open XML SDK, and adds a new paragraph styled as ‘Heading1’ at the beginning of the document.


  3. It then starts a conversion job, converting Test.docx to TestWithNewToc.docx.  It waits for the conversion to be done, and reports whether it was converted successfully.

Console.WriteLine(“Querying for Test.docx”);
SPList list = spSite.RootWeb.Lists[“Shared Documents”];
SPQuery query = new SPQuery();
query.ViewFields = @”<FieldRef Name=’FileLeafRef’ />”;
query.Query =
  @”<Where>
      <Eq>
        <FieldRef Name=’FileLeafRef’ />
        <Value Type=’Text’>Test.docx</Value>
      </Eq>
    </Where>”;
SPListItemCollection collection = list.GetItems(query);
if (collection.Count != 1)
{
    Console.WriteLine(“Test.docx not found”);
    Environment.Exit(0);
}
Console.WriteLine(“Opening”);
SPFile file = collection[0].File;
byte[] byteArray = file.OpenBinary();
using (MemoryStream memStr = new MemoryStream())
{
    memStr.Write(byteArray, 0, byteArray.Length);
    using (WordprocessingDocument wordDoc =
        WordprocessingDocument.Open(memStr, true))
    {
        Document document = wordDoc.MainDocumentPart.Document;
        Paragraph firstParagraph = document.Body.Elements<Paragraph>()
            .FirstOrDefault();
        if (firstParagraph != null)
        {
            Paragraph newParagraph = new Paragraph(
                new ParagraphProperties(
                    new ParagraphStyleId() { Val = “Heading1″ }),
                new Run(
                    new Text(“About the Author”)));
            Paragraph aboutAuthorParagraph = new Paragraph(
                new Run(
                    new Text(“Eric White”)));
            firstParagraph.Parent.InsertBefore(newParagraph, firstParagraph);
            firstParagraph.Parent.InsertBefore(aboutAuthorParagraph,
                firstParagraph);
        }
    }
    Console.WriteLine(“Saving”);
    string linkFileName = file.Item[“LinkFilename”] as string;
    file.ParentFolder.Files.Add(linkFileName, memStr, true);
}
Console.WriteLine(“Starting conversion job”);
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.Document;
job.AddFile(siteUrl + “/Shared%20Documents/Test.docx”,
    siteUrl + “/Shared%20Documents/TestWithNewToc.docx”);
job.Start();
Console.WriteLine(“After starting conversion job”);
while (true)
{
    Thread.Sleep(5000);
    Console.WriteLine(“Polling…”);
    ConversionJobStatus status = new ConversionJobStatus(
        wordAutomationServiceName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine(“Completed, Successful: {0}, Failed: {1}”,
            status.Succeeded, status.Failed);
        break;
    }
}
[Download Complete Example]

Console.WriteLine(“Querying for Test.docx”)
Dim list As SPList = spSite.RootWeb.Lists(“Shared Documents”)
Dim query As SPQuery = New SPQuery()
query.ViewFields = “<FieldRef Name=’FileLeafRef’ />”
query.Query = ( _
   <Where>
       <Eq>
           <FieldRef Name=FileLeafRef/>
           <Value Type=Text>Test.docx</Value>
       </Eq>
   </Where>).ToString()
Dim collection As SPListItemCollection = list.GetItems(query)
If collection.Count <> 1 Then
    Console.WriteLine(“Test.docx not found”)
    Environment.Exit(0)
End If
Console.WriteLine(“Opening”)
Dim file As SPFile = collection(0).File
Dim byteArray As Byte() = file.OpenBinary()
Using memStr As MemoryStream = New MemoryStream()
    memStr.Write(byteArray, 0, byteArray.Length)
    Using wordDoc As WordprocessingDocument = _
        WordprocessingDocument.Open(memStr, True)
        Dim document As Document = wordDoc.MainDocumentPart.Document
        Dim firstParagraph As Paragraph = _
            document.Body.Elements(Of Paragraph)().FirstOrDefault()
        If firstParagraph IsNot Nothing Then
            Dim newParagraph As Paragraph = New Paragraph( _
                New ParagraphProperties( _
                    New ParagraphStyleId() With {.Val = “Heading1″}), _
                New Run( _
                    New Text(“About the Author”)))
            Dim aboutAuthorParagraph As Paragraph = New Paragraph( _
                New Run( _
                    New Text(“Eric White”)))
            firstParagraph.Parent.InsertBefore(newParagraph, firstParagraph)
            firstParagraph.Parent.InsertBefore(aboutAuthorParagraph, _
                                               firstParagraph)
        End If
    End Using
    Console.WriteLine(“Saving”)
    Dim linkFileName As String = file.Item(“LinkFilename”)
    file.ParentFolder.Files.Add(linkFileName, memStr, True)
End Using
Console.WriteLine(“Starting conversion job”)
Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
job.UserToken = spSite.UserToken
job.Settings.UpdateFields = True
job.Settings.OutputFormat = SaveFormat.Document
job.AddFile(siteUrl + “/Shared%20Documents/Test.docx”, _
    siteUrl + “/Shared%20Documents/TestWithNewToc.docx”)
job.Start()
Console.WriteLine(“After starting conversion job”)
While True
    Thread.Sleep(5000)
    Console.WriteLine(“Polling…”)
    Dim status As ConversionJobStatus = New ConversionJobStatus( _
        wordAutomationServiceName, job.JobId, Nothing)
    If status.Count = status.Succeeded + status.Failed Then
        Console.WriteLine(“Completed, Successful: {0}, Failed: {1}”, _
                          status.Succeeded, status.Failed)
        Exit While
    End If
End While
[Download Complete Example]


 


After running this example with a document similar to the one above, it produces the following new document.


See Also


The Open XML Developer Center contains a number of resources to help you get started developing with Open XML.


Welcome to the Open XML Format SDK 2.0


Building Document Generation Systems from Templates using Word 2007 and Word 2010


Building Publishing Systems that Use Word 2010 or Word 2007


This post originated from and is provided by the MSDN Blogs RSS feed. The original post of the article can be found here.