Extract Data from PDF stored in sys_attachment table.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎04-18-2023 03:33 AM
Is there a way to extract Data from PDF stored in sys_attachment table ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎04-18-2023 05:18 AM
@Community Alums
Within ServiceNow platform there is nothing which allows you to read data from PDF file.
Ankur
✨ Certified Technical Architect || ✨ 9x ServiceNow MVP || ✨ ServiceNow Community Leader
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎03-25-2025 04:12 AM
its 2025 is there any way or tool to extract the data of pdf, or we can use any api to extract the data??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎03-26-2025 11:40 PM
Yes, it is possible to extract data from a PDF stored in the sys_attachment table. You can use the Google Gemini API for this purpose. Below are the steps to achieve this:
- Get the Google Gemini API key (it's free).
- Store the API key in the sys_properties table in your instance.
- Run the following script in the background, replacing the sys_id with the ID of your PDF file:
var att = new GlideRecord('sys_attachment'); att.get('357118d18364ee10aaa4c8d6feaad3c6'); var rawBytes = new GlideSysAttachment().getBytes(att); if (rawBytes) { var generateRequest = new sn_ws.RESTMessageV2(); generateRequest.setEndpoint('https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=' + gs.getProperty('gemini_api_key')); generateRequest.setHttpMethod('POST'); generateRequest.setRequestHeader('Content-Type', 'application/json'); generateRequest.setRequestBody(JSON.stringify({ "contents": [{ "parts": [ { "inlineData": { "mimeType": "application/pdf", "data": new Packages.java.lang.String(Packages.org.apache.commons.codec.binary.Base64.encodeBase64(rawBytes)).toString() } }, { "text": "Extract all text from this PDF and return it as plain text." } ] }] })); var generateResponse = generateRequest.execute(); if (generateResponse.getStatusCode() == 200) { gs.info('Extracted Text: ' + JSON.parse(generateResponse.getBody()).candidates[0].content.parts[0].text); } else { gs.error('Failed: ' + generateResponse.getBody()); } } else { gs.error('No content retrieved'); }
This script will help you extract text from the PDF.
If you find this solution helpful, please accept it and mark it as useful.
Thanks,
Sahil Negi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎08-07-2025 02:31 AM
Im not sure why this is return me a error
*** Script: Failed: { "error": { "code": 400, "message": "The document has no pages.", "status": "INVALID_ARGUMENT" } }