PDF connector methods

Release version: Australia

Updated March 12, 2026

8 minutes to read

Summarize

Summarized using AI

Summary of PDF Connector Methods

The PDF connector in RPA Desktop Design Studio enables efficient processing and automation of PDF documents. It provides various methods for loading, converting, extracting, and manipulating PDF content, facilitating seamless document automation workflows.

Show full answer Show less

Key Features

Load: Initializes a PDF for interaction, requiring the file path and optionally a password for protected documents.
Close: Releases resources associated with the PDF after operations are complete.
ConvertToExcel: Converts PDF content to Excel format, with options to convert only tables.
ConvertToHTML: Converts a specified page (or all pages) of a PDF to HTML format.
ConvertToImage: Converts a specified page of a PDF to an image, allowing customization of DPI and quality.
ConvertToWord: Generates a Word document from a PDF file.
ExtractImages: Extracts images from specified pages of a PDF.
GetAllTables: Retrieves all tables from a PDF as a list of DataTables.
GetText: Extracts text from a specified range of PDF pages.
Merge: Combines multiple PDF files into a single document.
Split: Divides a single PDF into separate files for each page.

Key Outcomes

By leveraging these PDF connector methods, ServiceNow customers can streamline document processing, enhance data extraction capabilities, and improve overall efficiency in handling PDF files. This enables better data management and integration within automated workflows, ultimately supporting business objectives and operational efficiency.

Accelerate PDF processing for your document automation by using the various methods of PDF connector in RPA Desktop Design Studio.

Prerequisites for using the PDF connector

Use the Load method in PDF connector before using the other methods. Call this method with the full path to the PDF file (FilePath) and optionally provide a password (Password) if the PDF is protected.

Close

Closes the resources associated with the PDF document. Use this method to release any references and resources after using the Load method.

Call this method when you no longer need to use the PDF document or after completing operations with it.

ConvertToExcel

Converts a PDF document to a Microsoft Excel document. Optionally, only tables can be converted if specified.

Call this method with the file path where the converted Excel document must be saved, and optionally set ConvertTablesOnly to True if only tables must be converted.

Table 1. Parameters of the ConvertToExcel method
Parameter	Description	Data type
ExcelFilepath	The file path where the converted Excel document (.xlsx) is saved. Ensure the file path includes the file name and extension.	String
ConvertTablesOnly	If set to True, only tables from the PDF document are converted to Excel. Default is True.	Boolean

ConvertToHTML

Converts a specified page of a PDF to HTML format. If the page number is less than or equal to 0, all pages of the PDF are converted to HTML.

Call this method with the page number of the PDF that you want to convert to HTML. If you pass a page number less than or equal to 0, the entire PDF will be converted to HTML. The method returns the HTML content as a string.

Table 2. Parameters of the ConvertToHTML method
Parameter	Description	Data type
PageNumber (Data In)	The page number of the PDF to be converted to HTML. If this parameter is less than or equal to 0, all pages of the PDF are converted to HTML. Page numbers typically start from 1.	Int32
Return (Data Out)	This method returns the HTML content as a string, representing the content of the PDF file.	String

ConvertToImage

Converts a specified page of a PDF document to an image. Optionally, specify the image path where the image is saved, DPI (dots per inch), and image quality.

Call this method with the page number of the PDF to convert, the file path where the image must be saved, and optionally adjust the DPI and image quality parameters.

Table 3. Parameters of the ConvertToImage method
Parameter	Description	Data type
PageNumber	The page number of the PDF to be converted to an image. Page numbers typically start from 1.	Int32
ImagePath	The file path where the converted image is saved. Ensure the file path includes the file name and extension	String
Dpi	The DPI (dots per inch) resolution for the generated image. Default is 200 DPI.	Int32
Quality	The quality level of the generated image, ranging from 0 (lowest) to 100 (highest). Default is 95.	Int32

ConvertToImages

Converts a PDF document to images. Optionally, specify the folder path where the images are saved, DPI (dots per inch), image quality, and an optional list to store the generated file names.

Call this method with the folder path where the images must be saved. Optionally, adjust the DPI and image quality parameters. If you provide a list as the FileNames parameter, it is populated with the names of the generated image files.

Table 4. Parameters of the ConvertToImages method
Parameter	Description	Data type
Folderpath	The folder path where the converted images will be saved. Ensure the folder exists and has appropriate write permissions. For example, `/Users/Username/Documents/MyFolder`	String
Dpi	The DPI (dots per inch) resolution for the generated images. Default is 200 DPI.	Int32
Quality	The quality level of the generated images, ranging from 0 (lowest) to 100 (highest). Default is 95.	Int32

ConvertToWord

Converts a PDF to a Microsoft Word document.

Call this method with the file path where the converted Word document must be saved. The method creates a Word document from the PDF content at the specified path.

Table 5. Parameter of the ConvertToWord method
Parameter	Description	Data type
WordFilepath	The file path where the converted Word document (.doc) is saved. Ensure the file path includes the file name and extension.	String

ConvertToXml

Converts a specified page of a PDF document to Microsoft XML format. Optionally, only tables can be converted if specified.

Call this method with the page number of the PDF to convert, the file path where the XML output must be saved, and optionally set ConvertTablesOnly to True if only tables must be converted.

Table 6. Parameters of the ConvertToXml method
Parameter	Description	Data type
PageNumber	The page number of the PDF to be converted to XML format. Page numbers typically start from 1.	Int32
XmlFilePath	The file path where the converted XML document will be saved. Ensure the file path includes the file name and extension	String
ConvertTablesOnly	If set to True, only tables from the specified page will be converted to XML. Default is True.	Boolean

ExtractImages

Extracts images from specified pages of a PDF document. Optionally, specify the folder path where the images are saved and an output list to store the generated file names.

Call this method with the folder path where the images must be saved, the starting and ending page numbers from which to extract images, and an empty list to store the file names of the extracted images.

Table 7. Parameters for the ExtractImages method
Parameter	Description	Data type
Folderpath	The folder path where the extracted images are saved. Ensure the folder exists and has appropriate write permissions.	String
FromPage	The starting page number from which to extract images. Page numbers typically start from 1.	Int32
ToPage	The ending page number up to which images must be extracted. This number must be greater than or equal to the FromPage number.	Int32
FileNames	An output parameter that stores the file names of the extracted images.	List`1

GetAllTables

Extracts all tables from a PDF document and returns them as a list of DataTables.

Use the Return parameter to retrieve the extracted table data as a list.

Call this method without any parameters to retrieve all tables from the PDF document. The method returns a list of DataTables, where each DataTable represents a table extracted from the PDF.

Table 8. Parameter of the GetAllTables method
Parameter	Description	Data type
Return	This method returns list of DataTable that represents a tables extracted from the PDF file.	List`1

GetPageAsImage

Extracts data from a PDF document page and store it as an in-memory image.

Returns a specified page of a PDF document as an in-memory image.

Call this method with the page number of the PDF to retrieve the page as an image. The method returns the page as a System.Drawing.Image object.

Table 9. Parameters of the GetPageAsImage method
Parameter	Description	Data type
PageNumber	The page number of the PDF to be converted to an image. Page numbers typically start from 1.	Int32
Return	This method returns an image that represents a specified page of the PDF file.	Drawing.Image

GetPageCount

Retrieves the total number of pages in a PDF document. You must use the Return parameter to retrieve the total page count in the PDF as an integer.

Table 10. Parameter of the GetPageCount method
Parameter	Description	Data type
Return	This method returns an integer representing count of pages of the PDF file.	Int32

GetTable

Extracts a table from a PDF and returns it as a DataTable. The extraction method is specified by the ExtractBy parameter.

Call this method with the extraction type and the corresponding value. The method returns the extracted table as a DataTable.

Table 11. Parameters of the GetTable method
Parameter	Description	Data type
ExtractBy	The method of extraction to use. This parameter must be ExtractType, which includes the following options: Index (0) - extract by page number, and ContainsText (1) - extract by matching text.	ExtractType
Value	The value corresponding to the extraction type. For example, if ExtractBy is Index, this would be the page number as a string; if ExtractBy is ContainsText, this would be the text to match.	String
Return	This method returns a DataTable that represents a table extracted from the PDF file.	Table

GetText

Retrieves text from the given range of PDF pages.

Call this method with the starting and ending page numbers to retrieve text from those pages. The method returns the extracted text as a string.

Table 12. Parameters of the GetText method
Parameter	Description	Data type
FromPage	The starting page number of the range from which to extract text. Page numbers typically start from 1.	Int32
ToPage	The page number to which you retrieve text from the start page. Note: Ensure that the ToPage value is higher than the FromPage value.	Int32
Return	This method returns a string representing the text content of the PDF file.	String

Load

Loads a PDF file for interaction, enabling further operations such as extracting content.

Call this method with the full path to the PDF file (FilePath) and optionally provide a password (Password) if the PDF is protected.

Table 13. Parameters of Load method
Parameter	Description	Data type
FilePath	The full path to the PDF file to be loaded. This must include the file name and extension.	String
Password	The password for the PDF file if it is protected. If the PDF is not password-protected, this parameter can be an empty string.	String

Merge

Merges a list of PDF files into a single PDF file.

Call this method with a list of file paths of the PDFs to be merged, the output file path, and an optional overwrite flag.

Table 14. Parameters of Merge method
Parameter	Description	Data type
FileList	A list of file paths for the PDF files to be merged. Each path must be a valid path to a PDF file.	ArrayList
OutputFilePath	The file path where the merged PDF is saved. This must include the file name and extension.	String
Overwrite	If set to True, the method overwrites the existing file at the output path if it exists. If set to False, the method does not overwrite the existing file. Default is False.	Boolean

Note:

If the PDF files are password protected or in an incorrect format in the FileList parameter, the automation displays an error.

Split

Splits a single PDF into multiple files, where each page in the PDF is saved as a separate file.

Call this method with the output folder path where the split PDF pages must be saved.

Table 15. Parameter of Split method
Parameter	Description	Data type
OutputFolderPath	The path to the folder where the split PDF pages are saved. Ensure the folder exists or has appropriate permissions for writing files.	String