Text and Image are the most commonly used elements in PowerPoint documents. In some cases, you may need to extract the text and images from a PowerPoint document so they can be used in another document. For such cases, this article will demonstrate how to extract text and images from a PowerPoint document using C# and VB.NET.
The following topics will be covered:
- Extract Text from a Specific PowerPoint Slide in C# and VB.NET
- Extract All Text from a PowerPoint Document in C# and VB.NET
- Extract Images from a Specific PowerPoint Slide in C# and VB.NET
- Extract All Images from a PowerPoint Document in C# and VB.NET
Installation
In order to deal with PowerPoint documents, I will be using Spire.Presentation for .NET API. The DLL files of Spire.Presentation for .NET API can be either downloaded from the official website or installed via NuGet by selecting Tools > NuGet Package Manager > Package Manager Console and then add the following code:
PM> Install-Package Spire.Presentation
Extract Text from a Specific PowerPoint Slide in C# and VB.NET
The following are the main steps to extract text from a specific PowerPoint Slide:
- Create an instance of Presentation class.
- Load a PowerPoint document using Presentation.LoadFromFile() method.
- Get the desired slide by its index using Presentation.Slides[index] property.
- Loop through the shapes on each slide using ISlide.Shapes collection.
- If the current shape is an IAutoShape, loop through the paragraphs in the shape using IAutoShape.TextFrame.Paragraphs collection, then extract the text of each paragraph using TextParagraph.Text property and add them into a StringBuilder instance.
- Finally, save the extracted text as file.
C#
using Spire.Presentation;
using System.IO;
using System.Text;
namespace ExtractSlideText
{
class Program
{
static void Main(string[] args)
{
//Create a Presentation instance
Presentation presentation = new Presentation();
//Load a PowerPoint document
presentation.LoadFromFile("Input.pptx");
//Get the first slide
ISlide slide = presentation.Slides[0];
//Create a StringBuilder instance
StringBuilder sb = new StringBuilder();
//Loop through the shapes on the slide
foreach (IShape shape in slide.Shapes)
{
//If the shape is IAutoshape
if (shape is IAutoShape)
{
//Loop through the paragraphs in the shape
foreach (TextParagraph tp in (shape as IAutoShape).TextFrame.Paragraphs)
{
//Append the text of each paragraph to the StringBuilder instance
sb.AppendLine(tp.Text);
}
}
}
//Write text to a .txt file
File.WriteAllText("Result.txt", sb.ToString());
}
}
}
VB.NET
Imports Spire.Presentation
Imports System.IO
Imports System.Text
Namespace ExtractSlideText
Friend Class Program
Private Shared Sub Main(ByVal args As String())
'Create a Presentation instance
Dim presentation As Presentation = New Presentation()
'Load a PowerPoint document
presentation.LoadFromFile("Input.pptx")
'Get the first slide
Dim slide As ISlide = presentation.Slides(0)
'Create a StringBuilder instance
Dim sb As StringBuilder = New StringBuilder()
'Loop through the shapes on the slide
For Each shape As IShape In slide.Shapes
'If the shape is IAutoshape
If TypeOf shape Is IAutoShape Then
'Loop through the paragraphs in the shape
For Each tp As TextParagraph In TryCast(shape, IAutoShape).TextFrame.Paragraphs
'Append the text of each paragraph to the StringBuilder instance
sb.AppendLine(tp.Text)
Next
End If
Next
'Write text to a .txt file
Call File.WriteAllText("Result.txt", sb.ToString())
End Sub
End Class
End Namespace
Extract All Text from a PowerPoint Document in C# and VB.NET
The code to extract all text from a PowerPoint document is very similar to the above code. The difference is that you need to loop through the slides in the PowerPoint document (instead of getting a desired slide) using Presentation.Slides collection.
C#
using Spire.Presentation;
using System.IO;
using System.Text;
namespace ExtractPptText
{
class Program
{
static void Main(string[] args)
{
//Create a Presentation object
Presentation presentation = new Presentation();
//Load the sample PowerPoint file
presentation.LoadFromFile("Input.pptx");
//Create a StringBuilder object
StringBuilder sb = new StringBuilder();
//Loop through the slides
foreach (ISlide slide in presentation.Slides)
{
sb.AppendLine("slide" + slide.SlideNumber);
//Loop through the shapes
foreach (IShape shape in slide.Shapes)
{
//Detect if a shape is IAutoshape
if (shape is IAutoShape)
{
//Loop through the paragraphs of a shape
foreach (TextParagraph tp in (shape as IAutoShape).TextFrame.Paragraphs)
{
//Append text of a paragraph to the string builder
sb.AppendLine(tp.Text);
}
}
}
}
//Write text to txt file
File.WriteAllText("Result.txt", sb.ToString());
}
}
}
VB.NET
Imports Spire.Presentation
Imports System.IO
Imports System.Text
Namespace ExtractPptText
Friend Class Program
Private Shared Sub Main(ByVal args As String())
'Create a Presentation object
Dim presentation As Presentation = New Presentation()
'Load the sample PowerPoint file
presentation.LoadFromFile("Input.pptx")
'Create a StringBuilder object
Dim sb As StringBuilder = New StringBuilder()
'Loop through the slides
For Each slide As ISlide In presentation.Slides
sb.AppendLine("slide" & slide.SlideNumber)
'Loop through the shapes
For Each shape As IShape In slide.Shapes
'Detect if a shape is IAutoshape
If TypeOf shape Is IAutoShape Then
'Loop through the paragraphs of a shape
For Each tp As TextParagraph In TryCast(shape, IAutoShape).TextFrame.Paragraphs
'Append text of a paragraph to the string builder
sb.AppendLine(tp.Text)
Next
End If
Next
Next
'Write text to txt file
Call File.WriteAllText("Result.txt", sb.ToString())
End Sub
End Class
End Namespace
Extract Images from a Specific PowerPoint Slide in C# and VB.NET
On a PowerPoint slide, an image can be added as a shape or a slide background. To extract images, you need to process the shapes and the background of the slide.
You can refer to the following steps:
- Create an instance of Presentation class.
- Load a PowerPoint document using Presentation.LoadFromFile() method.
- Get the desired slide by its index using Presentation.Slides[index] property.
- If the slide’s fill type is picture, then extract the background image using ISlide.SlideBackground.Fill.PictureFill.Picture.EmbedImage.Image.Save() method.
- Loop through the shapes on the slide.
- If the current shape is an IAutoShape, and its fill type is picture, extract the image in the shape using IAutoShape.Fill.PictureFill.Picture.EmbedImage.Image.Save() method.
- If the current shape is a SlidePicture, extract the image in the shape using SlidePicture.PictureFill.Picture.EmbedImage.Image.Save() method.
C#
using Spire.Presentation;
namespace ExtractSlideImages
{
class Program
{
static void Main(string[] args)
{
//Create an instance of Presentation class
Presentation ppt = new Presentation();
//Load a PowerPoint document
ppt.LoadFromFile("Input.pptx");
//Get the first slide
ISlide slide = ppt.Slides[0];
//Extract image from slide background
//If the slide has image background
if (slide.SlideBackground.Fill.FillType == Spire.Presentation.Drawing.FillFormatType.Picture)
{
//Extract the image
slide.SlideBackground.Fill.PictureFill.Picture.EmbedImage.Image.Save(string.Format("Background.png"));
}
//Extract image from shape
//Loop through the shapes on the slide
for (int i = 0; i < slide.Shapes.Count; i++)
{
IShape s = slide.Shapes[i];
//If the shape is IAutoShape
if (s is IAutoShape)
{
IAutoShape autoShape = s as IAutoShape;
//If the shape is filled with image
if (autoShape.Fill.FillType == Spire.Presentation.Drawing.FillFormatType.Picture)
{
//Extract the image
autoShape.Fill.PictureFill.Picture.EmbedImage.Image.Save(string.Format("Shape{0}.png", i));
}
}
//If the shape is SlidePicture
if (s is SlidePicture)
{
//Extract the image
SlidePicture ps = s as SlidePicture;
ps.PictureFill.Picture.EmbedImage.Image.Save(string.Format("Picture{0}.png", i));
}
}
}
}
}
VB.NET
Imports Spire.Presentation
Namespace ExtractSlideImages
Friend Class Program
Private Shared Sub Main(ByVal args As String())
'Create an instance of Presentation class
Dim ppt As Presentation = New Presentation()
'Load a PowerPoint document
ppt.LoadFromFile("Input.pptx")
'Get the first slide
Dim slide As ISlide = ppt.Slides(0)
'Extract image from slide background
'If the slide has image background
If slide.SlideBackground.Fill.FillType Is Spire.Presentation.Drawing.FillFormatType.Picture Then
'Extract the image
slide.SlideBackground.Fill.PictureFill.Picture.EmbedImage.Image.Save(String.Format("Background.png"))
End If
'Extract image from shape
'Loop through the shapes on the slide
For i As Integer = 0 To slide.Shapes.Count - 1
Dim s As IShape = slide.Shapes(i)
'If the shape is IAutoShape
If TypeOf s Is IAutoShape Then
Dim autoShape As IAutoShape = TryCast(s, IAutoShape)
'If the shape is filled with image
If autoShape.Fill.FillType Is Spire.Presentation.Drawing.FillFormatType.Picture Then
'Extract the image
autoShape.Fill.PictureFill.Picture.EmbedImage.Image.Save(String.Format("Shape{0}.png", i))
End If
End If
'If the shape is SlidePicture
If TypeOf s Is SlidePicture Then
'Extract the image
Dim ps As SlidePicture = TryCast(s, SlidePicture)
ps.PictureFill.Picture.EmbedImage.Image.Save(String.Format("Picture{0}.png", i))
End If
Next
End Sub
End Class
End Namespace
Extract All Images from a PowerPoint Document in C# and VB.NET
The following are the steps to extract all images from a PowerPoint document:
- Create an instance of Presentation class.
- Load a PowerPoint document using Presentation.LoadFromFile() method.
- Loop through the images in the document using Presentation.Images.Count property.
- Save each image to file using Presentation.Images[index].Image.Save() method.
C#
using Spire.Presentation;
using System.Drawing;
namespace ExtractPptImages
{
class Program
{
static void Main(string[] args)
{
//Create an instance of Presentation class
Presentation ppt = new Presentation();
//Load a PowerPoint document
ppt.LoadFromFile("Input.pptx");
//Loop through the images in the document
for (int i = 0; i < ppt.Images.Count; i++)
{
//Save each image to file
ppt.Images[i].Image.Save(string.Format("Images{0}.png", i));
}
}
}
}
VB.NET
Imports Spire.Presentation
Namespace ExtractPptImages
Friend Class Program
Private Shared Sub Main(ByVal args As String())
'Create an instance of Presentation class
Dim ppt As Presentation = New Presentation()
'Load a PowerPoint document
ppt.LoadFromFile("Input.pptx")
'Loop through the images in the document
For i As Integer = 0 To ppt.Images.Count - 1
'Save each image to file
ppt.Images(i).Image.Save(String.Format("Images{0}.png", i))
Next
End Sub
End Class
End Namespace
See More
Product Page 丨 Documentation 丨 Examples 丨 Forum 丨 Temporary License 丨