Extract Data from PDF stored in sys_attachment table.

Community Alums
Not applicable

Is there a way to extract Data from PDF stored in sys_attachment table ?

5 REPLIES 5

Ankur Bawiskar
Tera Patron
Tera Patron

@Community Alums 

Within ServiceNow platform there is nothing which allows you to read data from PDF file.

Regards,
Ankur
✨ Certified Technical Architect  ||  ✨ 9x ServiceNow MVP  ||  ✨ ServiceNow Community Leader

its 2025 is there any way or tool to extract the data of pdf, or we can use any api to extract the data??

sahilnegi
Tera Contributor

 

Yes, it is possible to extract data from a PDF stored in the sys_attachment table. You can use the Google Gemini API for this purpose. Below are the steps to achieve this:

  1. Get the Google Gemini API key (it's free).
  2. Store the API key in the sys_properties table in your instance.
  3. Run the following script in the background, replacing the sys_id with the ID of your PDF file:
var att = new GlideRecord('sys_attachment');
att.get('357118d18364ee10aaa4c8d6feaad3c6');

var rawBytes = new GlideSysAttachment().getBytes(att);
if (rawBytes) {
    var generateRequest = new sn_ws.RESTMessageV2();
    generateRequest.setEndpoint('https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=' + gs.getProperty('gemini_api_key'));
    generateRequest.setHttpMethod('POST');
    generateRequest.setRequestHeader('Content-Type', 'application/json');
    generateRequest.setRequestBody(JSON.stringify({
        "contents": [{
            "parts": [
                {
                    "inlineData": {
                        "mimeType": "application/pdf",
                        "data": new Packages.java.lang.String(Packages.org.apache.commons.codec.binary.Base64.encodeBase64(rawBytes)).toString()
                    }
                },
                {
                    "text": "Extract all text from this PDF and return it as plain text."
                }
            ]
        }]
    }));

    var generateResponse = generateRequest.execute();
    if (generateResponse.getStatusCode() == 200) {
        gs.info('Extracted Text: ' + JSON.parse(generateResponse.getBody()).candidates[0].content.parts[0].text);
    } else {
        gs.error('Failed: ' + generateResponse.getBody());
    }
} else {
    gs.error('No content retrieved');
}

This script will help you extract text from the PDF.

If you find this solution helpful, please accept it and mark it as useful.

Thanks,
Sahil Negi

Im not sure why this is return me a error

*** Script: Failed: {
  "error": {
    "code": 400,
    "message": "The document has no pages.",
    "status": "INVALID_ARGUMENT"
  }
}