PDF connector methods
Summarize
Summary of PDF Connector Methods
The PDF connector in RPA Desktop Design Studio enables efficient processing and automation of PDF documents. It provides various methods for loading, converting, extracting, and manipulating PDF content, facilitating seamless document automation workflows.
Show less
Key Features
- Load: Initializes a PDF for interaction, requiring the file path and optionally a password for protected documents.
- Close: Releases resources associated with the PDF after operations are complete.
- ConvertToExcel: Converts PDF content to Excel format, with options to convert only tables.
- ConvertToHTML: Converts a specified page (or all pages) of a PDF to HTML format.
- ConvertToImage: Converts a specified page of a PDF to an image, allowing customization of DPI and quality.
- ConvertToWord: Generates a Word document from a PDF file.
- ExtractImages: Extracts images from specified pages of a PDF.
- GetAllTables: Retrieves all tables from a PDF as a list of DataTables.
- GetText: Extracts text from a specified range of PDF pages.
- Merge: Combines multiple PDF files into a single document.
- Split: Divides a single PDF into separate files for each page.
Key Outcomes
By leveraging these PDF connector methods, ServiceNow customers can streamline document processing, enhance data extraction capabilities, and improve overall efficiency in handling PDF files. This enables better data management and integration within automated workflows, ultimately supporting business objectives and operational efficiency.
Accelerate PDF processing for your document automation by using the various methods of PDF connector in RPA Desktop Design Studio.
Prerequisites for using the PDF connector
Use the Load method in PDF connector before using the other methods. Call this method with the full path to the PDF file (FilePath) and optionally provide a password (Password) if the PDF is protected.
Close
Closes the resources associated with the PDF document. Use this method to release any references and resources after using the Load method.
Call this method when you no longer need to use the PDF document or after completing operations with it.
ConvertToExcel
Converts a PDF document to a Microsoft Excel document. Optionally, only tables can be converted if specified.
Call this method with the file path where the converted Excel document must be saved, and optionally set ConvertTablesOnly to True if only tables must be converted.
| Parameter | Description | Data type |
|---|---|---|
| ExcelFilepath | The file path where the converted Excel document (.xlsx) is saved. Ensure the file path includes the file name and extension. | String |
| ConvertTablesOnly | If set to True, only tables from the PDF document are converted to Excel. Default is True. | Boolean |
ConvertToHTML
Converts a specified page of a PDF to HTML format. If the page number is less than or equal to 0, all pages of the PDF are converted to HTML.
Call this method with the page number of the PDF that you want to convert to HTML. If you pass a page number less than or equal to 0, the entire PDF will be converted to HTML. The method returns the HTML content as a string.
| Parameter | Description | Data type |
|---|---|---|
| PageNumber (Data In) | The page number of the PDF to be converted to HTML. If this parameter is less than or equal to 0, all pages of the PDF are converted to HTML. Page numbers typically start from 1. | Int32 |
| Return (Data Out) | This method returns the HTML content as a string, representing the content of the PDF file. | String |
ConvertToImage
Converts a specified page of a PDF document to an image. Optionally, specify the image path where the image is saved, DPI (dots per inch), and image quality.
Call this method with the page number of the PDF to convert, the file path where the image must be saved, and optionally adjust the DPI and image quality parameters.
| Parameter | Description | Data type |
|---|---|---|
| PageNumber | The page number of the PDF to be converted to an image. Page numbers typically start from 1. | Int32 |
| ImagePath | The file path where the converted image is saved. Ensure the file path includes the file name and extension | String |
| Dpi | The DPI (dots per inch) resolution for the generated image. Default is 200 DPI. | Int32 |
| Quality | The quality level of the generated image, ranging from 0 (lowest) to 100 (highest). Default is 95. | Int32 |
ConvertToImages
Converts a PDF document to images. Optionally, specify the folder path where the images are saved, DPI (dots per inch), image quality, and an optional list to store the generated file names.
Call this method with the folder path where the images must be saved. Optionally, adjust the DPI and image quality parameters. If you provide a list as the FileNames parameter, it is populated with the names of the generated image files.
| Parameter | Description | Data type |
|---|---|---|
| Folderpath | The folder path where the converted images will be saved. Ensure the folder exists and has appropriate write permissions. For example, /Users/Username/Documents/MyFolder |
String |
| Dpi | The DPI (dots per inch) resolution for the generated images. Default is 200 DPI. | Int32 |
| Quality | The quality level of the generated images, ranging from 0 (lowest) to 100 (highest). Default is 95. | Int32 |
ConvertToWord
Converts a PDF to a Microsoft Word document.
Call this method with the file path where the converted Word document must be saved. The method creates a Word document from the PDF content at the specified path.
| Parameter | Description | Data type |
|---|---|---|
| WordFilepath | The file path where the converted Word document (.doc) is saved. Ensure the file path includes the file name and extension. | String |
ConvertToXml
Converts a specified page of a PDF document to Microsoft XML format. Optionally, only tables can be converted if specified.
Call this method with the page number of the PDF to convert, the file path where the XML output must be saved, and optionally set ConvertTablesOnly to True if only tables must be converted.
| Parameter | Description | Data type |
|---|---|---|
| PageNumber | The page number of the PDF to be converted to XML format. Page numbers typically start from 1. | Int32 |
| XmlFilePath | The file path where the converted XML document will be saved. Ensure the file path includes the file name and extension | String |
| ConvertTablesOnly | If set to True, only tables from the specified page will be converted to XML. Default is True. | Boolean |
ExtractImages
Extracts images from specified pages of a PDF document. Optionally, specify the folder path where the images are saved and an output list to store the generated file names.
Call this method with the folder path where the images must be saved, the starting and ending page numbers from which to extract images, and an empty list to store the file names of the extracted images.
| Parameter | Description | Data type |
|---|---|---|
| Folderpath | The folder path where the extracted images are saved. Ensure the folder exists and has appropriate write permissions. | String |
| FromPage | The starting page number from which to extract images. Page numbers typically start from 1. | Int32 |
| ToPage | The ending page number up to which images must be extracted. This number must be greater than or equal to the FromPage number. | Int32 |
| FileNames | An output parameter that stores the file names of the extracted images. | List`1 |
GetAllTables
Extracts all tables from a PDF document and returns them as a list of DataTables.
Use the Return parameter to retrieve the extracted table data as a list.
Call this method without any parameters to retrieve all tables from the PDF document. The method returns a list of DataTables, where each DataTable represents a table extracted from the PDF.
| Parameter | Description | Data type |
|---|---|---|
| Return | This method returns list of DataTable that represents a tables extracted from the PDF file. | List`1 |
GetPageAsImage
Extracts data from a PDF document page and store it as an in-memory image.
Returns a specified page of a PDF document as an in-memory image.
Call this method with the page number of the PDF to retrieve the page as an image. The method returns the page as a System.Drawing.Image object.
| Parameter | Description | Data type |
|---|---|---|
| PageNumber | The page number of the PDF to be converted to an image. Page numbers typically start from 1. | Int32 |
| Return | This method returns an image that represents a specified page of the PDF file. | Drawing.Image |
GetPageCount
Retrieves the total number of pages in a PDF document. You must use the Return parameter to retrieve the total page count in the PDF as an integer.
| Parameter | Description | Data type |
|---|---|---|
| Return | This method returns an integer representing count of pages of the PDF file. | Int32 |
GetTable
Extracts a table from a PDF and returns it as a DataTable. The extraction method is specified by the ExtractBy parameter.
Call this method with the extraction type and the corresponding value. The method returns the extracted table as a DataTable.
| Parameter | Description | Data type |
|---|---|---|
| ExtractBy | The method of extraction to use. This parameter must be ExtractType, which includes the following options: Index (0) - extract by page number, and ContainsText (1) - extract by matching text. |
ExtractType |
| Value | The value corresponding to the extraction type. For example, if ExtractBy is Index, this would be the page number as a string; if ExtractBy is ContainsText, this would be the text to match. |
String |
| Return | This method returns a DataTable that represents a table extracted from the PDF file. | Table |
GetText
Retrieves text from the given range of PDF pages.
Call this method with the starting and ending page numbers to retrieve text from those pages. The method returns the extracted text as a string.
| Parameter | Description | Data type |
|---|---|---|
| FromPage | The starting page number of the range from which to extract text. Page numbers typically start from 1. | Int32 |
| ToPage | The page number to which you retrieve text from the start page. Note: Ensure that the ToPage value is higher than the FromPage value. |
Int32 |
| Return | This method returns a string representing the text content of the PDF file. | String |
Load
Loads a PDF file for interaction, enabling further operations such as extracting content.
Call this method with the full path to the PDF file (FilePath) and optionally provide a password (Password) if the PDF is protected.
| Parameter | Description | Data type |
|---|---|---|
| FilePath | The full path to the PDF file to be loaded. This must include the file name and extension. | String |
| Password | The password for the PDF file if it is protected. If the PDF is not password-protected, this parameter can be an empty string. | String |
Merge
Merges a list of PDF files into a single PDF file.
Call this method with a list of file paths of the PDFs to be merged, the output file path, and an optional overwrite flag.
| Parameter | Description | Data type |
|---|---|---|
| FileList | A list of file paths for the PDF files to be merged. Each path must be a valid path to a PDF file. | ArrayList |
| OutputFilePath | The file path where the merged PDF is saved. This must include the file name and extension. | String |
| Overwrite | If set to True, the method overwrites the existing file at the output path if it exists. If set to False, the method does not overwrite the existing file. Default is False. |
Boolean |
Split
Splits a single PDF into multiple files, where each page in the PDF is saved as a separate file.
Call this method with the output folder path where the split PDF pages must be saved.
| Parameter | Description | Data type |
|---|---|---|
| OutputFolderPath | The path to the folder where the split PDF pages are saved. Ensure the folder exists or has appropriate permissions for writing files. | String |