본문 바로가기
기타

How to: Get All the Text in a Slide in a Presentation

by leo21c 2011. 4. 6.
SMALL

http://msdn.microsoft.com/en-us/library/cc850836.aspx

How to: Get All the Text in a Slide in a Presentation

Office 2010

This topic shows how to use the classes in the Open XML SDK 2.0 for Microsoft Office to get all the text in a slide in a presentation programmatically.

The following assembly directives are required to compile the code in this topic.

using System;using System.Collections.Generic;using System.Linq;using System.Text;using DocumentFormat.OpenXml.Presentation;using DocumentFormat.OpenXml.Packaging;

In the Open XML SDK, thePresentationDocumentclass represents a presentation   document package. To work with a presentation   document, first create an instance of thePresentationDocumentclass, and then work with that instance. To create the class instance from the   document call thePresentationDocument.Open(String, Boolean)method that uses a file path, and a Boolean value as the second parameter to specify whether a   document is editable. To open a   document for read/write access, assign the valuetrueto this parameter; for read-only access assign it the valuefalseas shown in the followingusingstatement. In this code, thefileparameter is a string that represents the path for the file from which you want to open the   document.

// Open the presentation as read-only.    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false)){    // Insert other code here.}

Theusingstatement provides a recommended alternative to the typical .Open, .Save, .Close sequence. It ensures that theDisposemethod (internal method used by the Open XML SDK to clean up resources) is automatically called when the closing brace is reached. The block that follows theusingstatement establishes a scope for the object that is created or named in theusingstatement, in this casepresentationDocument.

The basic   document structure of aPresentationML  document consists of the main part that contains the presentation definition. The following text from theISO/IEC 29500specification introduces the overall form of aPresentationMLpackage.

A PresentationML package’s main part starts with a presentation root element. That element contains a presentation, which, in turn, refers to aslidelist, aslide masterlist, anotes masterlist, and ahandout masterlist. The slide list refers to all of the slides in the presentation; the slide master list refers to the entire slide masters used in the presentation; the notes master contains information about the formatting of notes pages; and the handout master describes how a handout looks.
Ahandoutis a printed set of slides that can be provided to anaudiencefor future reference.
As well as text and graphics, each slide can containcommentsandnotes, can have alayout, and can be part of one or morecustom presentations. (A comment is an annotation intended for the person maintaining the presentation slide deck. A note is a reminder or piece of text intended for the presenter or the audience.)
Other features that a PresentationML   document can include the following:animation,audio,video, andtransitionsbetween slides.
A PresentationML   document is not stored as one large body in a single part. Instead, the elements that implement certain groupings of functionality are stored in separate parts. For example, all comments in a   document are stored in one comment part while each slide has its own part.
ⓒ ISO/IEC29500: 2008.

The following XML code segment represents a presentation that contains two slides denoted by the ID 267 and 256.

<p:presentation xmlns:p="…" … >    <p:sldMasterIdLst>      <p:sldMasterId         xmlns:rel="http://…/relationships" rel:id="rId1"/>   </p:sldMasterIdLst>   <p:notesMasterIdLst>      <p:notesMasterId         xmlns:rel="http://…/relationships" rel:id="rId4"/>   </p:notesMasterIdLst>   <p:handoutMasterIdLst>      <p:handoutMasterId         xmlns:rel="http://…/relationships" rel:id="rId5"/>   </p:handoutMasterIdLst>   <p:sldIdLst>      <p:sldId id="267"         xmlns:rel="http://…/relationships" rel:id="rId2"/>      <p:sldId id="256"         xmlns:rel="http://…/relationships" rel:id="rId3"/>   </p:sldIdLst>       <p:sldSz cx="9144000" cy="6858000"/>   <p:notesSz cx="6858000" cy="9144000"/></p:presentation>

Using the Open XML SDK 2.0, you can create   document structure and content using strongly-typed classes that correspond toPresentationMLelements. You can find these classes in theDocumentFormat.OpenXml.Presentationnamespace. The following table lists the class names of the classes that correspond to thesld,sldLayout,sldMaster, andnotesMasterelements.

PresentationML Element

Open XML SDK 2.0 Class

Description

sld

Slide

Presentation Slide. It is the root element of SlidePart.

sldLayout

SlideLayout

Slide Layout. It is the root element of SlideLayoutPart.

sldMaster

SlideMaster

Slide Master. It is the root element of SlideMasterPart.

notesMaster

NotesMaster

Notes Master (or handoutMaster). It is the root element of NotesMasterPart.

The sample code consists of three overloads of theGetAllTextInSlidemethod. In the following segment, the first overloaded method opens the source presentation that contains the slide with text to get, and passes the presentation to the second overloaded method, which gets the slide part. This method returns the array of strings that the second method returns to it, each of which represents a paragraph of text in the specified slide.

// Get all the text in a slide.public static string[] GetAllTextInSlide(string presentationFile, int slideIndex){    // Open the presentation as read-only.    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))    {        // Pass the presentation and the slide index        // to the next GetAllTextInSlide method, and        // then return the array of strings it returns.         return GetAllTextInSlide(presentationDocument, slideIndex);    }}

The second overloaded method takes the presentation   document passed in and gets a slide part to pass to the third overloaded method. It returns to the first overloaded method the array of strings that the third overloaded method returns to it, each of which represents a paragraph of text in the specified slide.

public static string[] GetAllTextInSlide(PresentationDocument presentationDocument, int slideIndex){    // Verify that the presentation   document exists.    if (presentationDocument == null)    {        throw new ArgumentNullException("presentationDocument");    }    // Verify that the slide index is not out of range.    if (slideIndex < 0)    {        throw new ArgumentOutOfRangeException("slideIndex");    }    // Get the presentation part of the presentation   document.    PresentationPart presentationPart = presentationDocument.PresentationPart;    // Verify that the presentation part and presentation exist.    if (presentationPart != null && presentationPart.Presentation != null)    {        // Get the Presentation object from the presentation part.        Presentation presentation = presentationPart.Presentation;        // Verify that the slide ID list exists.        if (presentation.SlideIdList != null)        {            // Get the collection of slide IDs from the slide ID list.            var slideIds = presentation.SlideIdList.ChildElements;            // If the slide ID is in range...            if (slideIndex < slideIds.Count)            {                // Get the relationship ID of the slide.                string slidePartRelationshipId = (slideIds[slideIndex] as SlideId).RelationshipId;                // Get the specified slide part from the relationship ID.                SlidePart slidePart = (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);                // Pass the slide part to the next method, and                // then return the array of strings that method                // returns to the previous method.                return GetAllTextInSlide(slidePart);            }        }    }    // Else, return null.    return null;}

The following code segment shows the third overloaded method, which takes takes the slide part passed in, and returns to the second overloaded method a string array of text paragraphs. It starts by verifying that the slide part passed in exists, and then it creates a linked list of strings. It iterates through the paragraphs in the slide passed in, and using aStringBuilderobject to concatenate all the lines of text in a paragraph, it assigns each paragraph to a string in the linked list. It then returns to the second overloaded method an array of strings that represents all the text in the specified slide in the presentation.

public static string[] GetAllTextInSlide(SlidePart slidePart){    // Verify that the slide part exists.    if (slidePart == null)    {        throw new ArgumentNullException("slidePart");    }    // Create a new linked list of strings.    LinkedList<string> texts = new LinkedList<string>();    // If the slide exists...    if (slidePart.Slide != null)    {        // Iterate through all the paragraphs in the slide.        foreach (var paragraph in slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())        {            // Create a new string builder.                                StringBuilder paragraphText = new StringBuilder();            // Iterate through the lines of the paragraph.            foreach (var text in paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())            {                // Append each line to the previous lines.                paragraphText.Append(text.Text);            }            if (paragraphText.Length > 0)            {                // Add each paragraph to the linked list.                texts.AddLast(paragraphText.ToString());            }        }    }    if (texts.Count > 0)    {        // Return an array of strings.        return texts.ToArray();    }    else    {        return null;    }}

Following is the complete sample code that you can use to get all the text in a specific slide in a presentation file. For example, you can use the followingforeachloop in your program to get the array of strings returned by the methodGetAllTextInSlide, which represents the text in the second slide of the presentation file “Myppt8.pptx.”

foreach (string s in GetAllTextInSlide(@"C:\Users\Public\Documents\Myppt8.pptx", 1))    Console.WriteLine(s);

Following is the complete sample code in both C# and Visual Basic.

// Get all the text in a slide.public static string[] GetAllTextInSlide(string presentationFile, int slideIndex){    // Open the presentation as read-only.    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))    {        // Pass the presentation and the slide index        // to the next GetAllTextInSlide method, and        // then return the array of strings it returns.         return GetAllTextInSlide(presentationDocument, slideIndex);    }}public static string[] GetAllTextInSlide(PresentationDocument presentationDocument, int slideIndex){    // Verify that the presentation   document exists.    if (presentationDocument == null)    {        throw new ArgumentNullException("presentationDocument");    }    // Verify that the slide index is not out of range.    if (slideIndex < 0)    {        throw new ArgumentOutOfRangeException("slideIndex");    }    // Get the presentation part of the presentation   document.    PresentationPart presentationPart = presentationDocument.PresentationPart;    // Verify that the presentation part and presentation exist.    if (presentationPart != null && presentationPart.Presentation != null)    {        // Get the Presentation object from the presentation part.        Presentation presentation = presentationPart.Presentation;        // Verify that the slide ID list exists.        if (presentation.SlideIdList != null)        {            // Get the collection of slide IDs from the slide ID list.            DocumentFormat.OpenXml.OpenXmlElementList slideIds =                 presentation.SlideIdList.ChildElements;            // If the slide ID is in range...            if (slideIndex < slideIds.Count)            {                // Get the relationship ID of the slide.                string slidePartRelationshipId = (slideIds[slideIndex] as SlideId).RelationshipId;                // Get the specified slide part from the relationship ID.                SlidePart slidePart =                     (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);                // Pass the slide part to the next method, and                // then return the array of strings that method                // returns to the previous method.                return GetAllTextInSlide(slidePart);            }        }    }    // Else, return null.    return null;}public static string[] GetAllTextInSlide(SlidePart slidePart){    // Verify that the slide part exists.    if (slidePart == null)    {        throw new ArgumentNullException("slidePart");    }    // Create a new linked list of strings.    LinkedList<string> texts = new LinkedList<string>();    // If the slide exists...    if (slidePart.Slide != null)    {        // Iterate through all the paragraphs in the slide.        foreach (DocumentFormat.OpenXml.Drawing.Paragraph paragraph in             slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())        {            // Create a new string builder.                                StringBuilder paragraphText = new StringBuilder();            // Iterate through the lines of the paragraph.            foreach (DocumentFormat.OpenXml.Drawing.Text text in                 paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())            {                // Append each line to the previous lines.                paragraphText.Append(text.Text);            }            if (paragraphText.Length > 0)            {                // Add each paragraph to the linked list.                texts.AddLast(paragraphText.ToString());            }        }    }    if (texts.Count > 0)    {        // Return an array of strings.        return texts.ToArray();    }    else    {        return null;    }}
LIST