Create and Read PDF Portfolio in Java

Using PDF portfolio, you can put multiple types of files (such as Word & Excel files, images and PowerPoint presentations) together into a master PDF file. The files in a PDF Portfolio can retain their individual identities, so you can open, read, edit, and format each file independently of the other files. In this article, I am going to introduce how to work with PDF portfolio, especially create PDF portfolio and read/extract files from PDF portfolio programmatically in Java.

Add Dependencies

In order to create and read PDF portfolio, I use Free Spire.PDF for Java library. There are two ways to include Free Spire.PDF for Java in your Java project:

For maven projects:
Specify the following dependencies in your project’s pom.xml file:

<repositories>    
    <repository>    
        <id>com.e-iceblue</id>    
        <name>e-iceblue</name>    
        <url>http://repo.e-iceblue.com/nexus/content/groups/public/</url>    
    </repository>    
</repositories>    
<dependencies>    
    <dependency>    
        <groupId> e-iceblue </groupId>    
        <artifactId>spire.pdf.free</artifactId>    
        <version>4.4.1</version>    
    </dependency>    
</dependencies>

The latest version of Free Spire.PDF for Java is 4.4.1 (at the time of writing this article).

For non-maven projects:
Download Free Spire.PDF for Java pack from here: Download- Free Spire.PDF for Java, extract the zip file, then add Spire.Pdf.jar in the lib folder into your project as a dependency.

Create PDF Portfolio

import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;

import java.io.IOException;

public class CreatePortfolio {
    public static void main(String []args) throws IOException {

        String[] files = new String[] { "sample.pdf", "sample.docx", "sample.xlsx","sample.pptx","image.jpg" };

        //Create a PdfDocument instance
        PdfDocument pdf = new PdfDocument();
        //Create a PDF portfolio and add files to it
        for (int i = 0; i < files.length; i++)
        {
            pdf.getCollection().addFile(files[i]);
        }

        //Save the result file
        pdf.saveToFile("Portfolio.pdf", FileFormat.PDF);
        pdf.dispose();
    }
}

Output:

Create Portfolio

Read/Extract Files from PDF Portfolio

import com.spire.pdf.PdfDocument;
import com.spire.pdf.attachments.PdfAttachment;

import java.io.*;

public class ReadPortfolio {
    public static void main(String []args) throws IOException {
        //Create a PdfDocument instance
        PdfDocument pdf = new PdfDocument();
        //Load the PDF file
        pdf.loadFromFile("Portfolio.pdf");

        //Loop through the attachments in the file
        for(PdfAttachment attachment : (Iterable<PdfAttachment>)pdf.getAttachments()){
            //Extract files
            String fileName = attachment.getFileName();
            OutputStream fos = new FileOutputStream("extract/" + fileName);
            fos.write(attachment.getData());
        }
        pdf.dispose();
    }
}

Output:

Read Portfolio

C# Read/Extract Text from Image with OCR

At some point, you may want to read text from images. In this article, I will introduce how to read text from image programmatically in C# with OCR.

Installation

In order to read text from image, I used Spire.OCR for .NET library. The following are the steps to include Spire.OCR in .NET Core project.

Step 1: Create a .NET Core (Recommended target framework: .NET Core 3.0 or above) project in Visual Studio.

 Step 2: Add reference to Spire.OCR for .NET DLLs in your project.

You can install Spire.OCR for .NET through NuGet using NuGet Package Manager, refer the following steps:

  • In Solution Explorer, right-click the project or “Dependencies” and select “Manage NuGet Packages”.
  • Click “Browse” tab and search Spire.OCR.
  • Install Spire.OCR.

Step 3: Copy dependency DLLs of Spire.OCR to running directory of your project.

If your project’s target framework is .NET Core 3.0 or above, please build the project, then copy the 6 DLLs from bin\Debug\netcoreapp3.0\runtimes\win-x64\native folder to the running directory such as bin\Debug\netcoreapp3.0 or C:\Windows\System32 .

If your project’s target framework is below .NET Core 3.0, you need to download Spire.OCR from the official website, unzip the package, and then copy the 6 DLLs from Spire.OCR\Spire.OCR_Dependency\x64 folder to the running directory such as bin\Debug\netcoreapp2.1 or C:\Windows\System32.

After finishing the above steps, you have successfully included Spire.OCR in your project. Now let’s start coding.

Implementation

By default, Spire.OCR supports English and Chinese, but it also supports other languages such as Korean, French, Japanese and Germany. If you need to read non-English or non-Chinese text, you need to use OcrScanner. LoadLanguageFile() method to load the language package before calling OcrScanner.Scan() method.

The following code example shows how to read English text from an image using Spire.OCR.

using Spire.OCR;
using System.IO;

namespace SpireOCR
{
    class Program
    {
        static void Main(string[] args)
        {
            OcrScanner scanner = new OcrScanner();            
            scanner.Scan("image.png");
            File.WriteAllText("output.txt", scanner.Text.ToString());
        }
    }
}

Thanks for taking time to read my article. If you encounter any problems when using Spire.OCR, please contact support@e-iceblue.com.

Convert Excel to Image (PNG, JPEG, TIFF and SVG) in Java

In this article, I will show you how to convert Excel to common image formats such as PNG, JPEG, TIFF and SVG programmatically in Java application. The article will be divided into the following three parts for demonstration:

  • Convert Excel to Image (PNG, JPEG)
  • Convert Excel to TIFF
  • Convert Excel to SVG

Add dependencies

Free Spire.XLS for Java library is used to implement this task. If you use maven, you need to specify the following dependencies for Free Spire.XLS for Java library in your project’s pom.xml file.

<repositories>    
    <repository>    
        <id>com.e-iceblue</id>    
        <name>e-iceblue</name>    
        <url>http://repo.e-iceblue.com/nexus/content/groups/public/</url>    
    </repository>    
</repositories>    
<dependencies>    
    <dependency>    
        <groupId> e-iceblue </groupId>    
        <artifactId>spire.xls.free</artifactId>    
        <version>3.9.1</version>    
    </dependency>    
</dependencies>

For non-maven projects, you can download Free Spire.XLS for Java pack from this website and add Spire.Xls.jar in the lib folder into your project as a dependency.

The input Excel file

Input Excel

Convert Excel to Image (PNG, JPEG)

import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;

public class ExcelToImage {
    public static void main(String []args) throws Exception {
        //Load the Excel file
        Workbook workbook = new Workbook();
        workbook.loadFromFile("Input.xlsx");

        //Loop through worksheets
        for (int i = 0; i < workbook.getWorksheets().size(); i++) {
            //Convert worksheet to image
            Worksheet sheet = workbook.getWorksheets().get(i);
            BufferedImage bufferedImage = sheet.toImage(sheet.getFirstRow(), sheet.getFirstColumn(), sheet.getLastRow(), sheet.getLastColumn());
            ImageIO.write(bufferedImage,"PNG",new File("image/SheetToImage"+i+".png"));
        }
    }
}

Output:

Convert Excel To Image

Convert Excel to TIFF

import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;

public class ExcelToTiff {
    public static void main(String []args) throws Exception {
        //Load the Excel file
        Workbook workbook = new Workbook();
        workbook.loadFromFile("Input.xlsx");

        //Loop through worksheets
        for (int i = 0; i < workbook.getWorksheets().size(); i++) {
            //Convert worksheet to tiff
            Worksheet sheet = workbook.getWorksheets().get(i);
            //Save the first worksheet to tiff
            sheet.saveToTiff("tiff/SheetToTiff" + i + ".tif");
        }
    }
}

Output:

Convert Excel to Tiff

Convert Excel to SVG

import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;

import java.io.FileOutputStream;

public class ExcelToSvg {
    public static void main(String []args) throws Exception {
        //Open xls document
        Workbook workbook = new Workbook();
        workbook.loadFromFile("Input.xlsx");
        //Traverse worksheets
        for (int i = 0; i < workbook.getWorksheets().size(); i++) {
            FileOutputStream stream = new FileOutputStream("svg/sheet" + i + ".svg");
            //Convert worksheet to svg file
            Worksheet sheet = workbook.getWorksheets().get(i);
            sheet.toSVGStream(stream, sheet.getFirstRow(), sheet.getFirstColumn(), sheet.getLastRow(), sheet.getLastColumn());
            stream.flush();
            stream.close();
        }
    }
}
 

Output:

Convert Excel to Svg

Convert HTML and HTML String to Word in Java

In this article, I am going to demonstrate two approaches to convert HTML to Word in Java applications, they are:

  • Convert HTML to Word
  • Convert HTML String to Word

Add Dependencies

Free Spire.Doc for Java library is used to implement this task. If you use maven, you need to specify the following dependencies in your project’s pom.xml file for including Free Spire.Doc for Java into your Java project.

<repositories>    
    <repository>    
        <id>com.e-iceblue</id>    
        <name>e-iceblue</name>    
        <url>http://repo.e-iceblue.com/nexus/content/groups/public/</url>    
    </repository>    
</repositories>    
<dependencies>    
    <dependency>    
        <groupId> e-iceblue </groupId>    
        <artifactId>spire.doc.free</artifactId>    
        <version>3.9.0</version>    
    </dependency>    
</dependencies> 

For non-maven projects, you can download Free Spire.Doc for Java pack from this website and add Spire.Doc.jar in the lib folder into your project as a dependency.

Convert HTML to Word

Free Spire.Doc for Java library provides a Document class that represents a Word document. This class offers a saveToFile(String, FileFormat) method that can be used to convert a HTML file to Word document.

The input HTML document:

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.documents.XHTMLValidationType;

public class HtmlToWord {
    public static void main(String []args){
        //Create a Document instance
        Document document = new Document();
        //Load a Html file
        document.loadFromFile("Input.html", FileFormat.Html, XHTMLValidationType.None);

        //Save Html to Word
        document.saveToFile("HtmlToWord.docx",FileFormat.Docx_2013);
    }
}

The output Word document:

Convert HTML String to Word

To convert a HTML String to Word, you need to invoke the Paragraph.appendHTML(String) method.

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.Section;

public class HtmlStringToWord {
    public static void main(String []args){
        //Create a Document instance
        Document document = new Document();
        //Add a section
        Section sec = document.addSection();

        //HTML string
        String htmlString = "<html><head/><body> <h1>Html Heading</h1><p>This is an html document in a string literal.</p></body></html>";

        //Add a paragraph to the section and append a html string to the paragraph
        sec.addParagraph().appendHTML(htmlString);

        //Save the result document
        document.saveToFile("HTMLstringToWord.docx", FileFormat.Docx_2013);
    }
}

The output document:

Easy Way to Convert Word to Password Protected PDF in C#, VB.NET

The Microsoft Word API doesn’t support the password protection of PDF documents. In this article, I will introduce an easy solution to convert Word to password protected PDF in C# and VB.NET using a third-party API called Spire.Doc for .NET.

Installation

First of all, you need to add reference to Spire.Doc for .NET DLL in your project or install it using NuGet Package Manager as shown in the below image.

Implementation

The ToPdfParameterList class is used to customize Word to PDF conversion. You can invoke the Encrypt() method in ToPdfParameterList class to encrypt the PDF document during Word to PDF conversion. The Encrypt() method has four overloaded methods in order to fulfil developers’ different requirements, you can find the list of them below.

  • public void Encrypt(string openPassword);
  • public void Encrypt(string permissionPassword, PdfPermissionsFlags permissions);
  • public void Encrypt(string openPassword, string permissionPassword, PdfPermissionsFlags permissions, PdfEncryptionKeySize keySize);       
  • public void Encrypt(string openPassword, string permissionPassword, PdfPermissionsFlags permissions, PdfEncryptionKeySize keySize, string originalPermissionPassword);

The following are the steps to convert Word to password protected PDF.

Step 1: Create a Document instance.

Step 2: Load a Word document with LoadFromFile(String) method.

Step 3: Create a ToPdfParameterList instance.

Step 4: Call Encrypt(String) method to encrypt PDF document during conversion.

Step 5: Call SaveToFile(String, ToPdfParameterList) method to save Word to password protected PDF.

C# Code

using Spire.Doc;

namespace ConvertWordToPasswordProtectedPDF
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a Document instance
            Document doc = new Document();
            //Load a Word document
            doc.LoadFromFile("Sample.docx");

            //Create a ToPdfParameterList instance
            ToPdfParameterList ps = new ToPdfParameterList();
            //Encrypt the PDF document with open password when converting Word to PDF           
            ps.PdfSecurity.Encrypt("123456");

            //Convert Word to password protected PDF
            doc.SaveToFile("PasswordProtectedPDF.pdf", ps);
        }
    }
}

VB.NET Code

Imports Spire.Doc

Namespace ConvertWordToPasswordProtectedPDF
    Class Program
        Private Shared Sub Main(ByVal args As String())
            Dim doc As Document = New Document()
            doc.LoadFromFile("Sample.docx")
            Dim ps As ToPdfParameterList = New ToPdfParameterList()
            ps.PdfSecurity.Encrypt("123456")
            doc.SaveToFile("PasswordProtectedPDF.pdf", ps)
        End Sub
    End Class
End Namespace
Design a site like this with WordPress.com
Get started