We've updated the ServiceNow Community Code of Conduct, adding guidelines around AI usage, professionalism, and content violations. Read more

Reading contents of Word, PDF file

cnshum
Tera Contributor

i have this use case where i need to read the contents of CV/resume uploaded by users and pass it to a LLM to extract the skills. Currently, I am only able to do so using a txt file (refer to the below code).  But when i use this code on a pdf or word file, the contents are all weird symbols. Can anyone point me in the right direction?

code:

extractSkillsFromCV: function() {
        var tableSysId = this.getParameter('sysparm_tableSysId');
        var grAttach = new GlideRecord("sys_attachment");
        grAttach.addEncodedQuery("table_sys_id=" + tableSysId);
        grAttach.query();
        if(grAttach.next()){

            var document;
            var attach = new GlideSysAttachment().getContentStream(grAttach.sys_id);
            var reader = new GlideTextReader(attach);
            var ln = ' ';
            while ((ln = reader.readLine()) != null) {
                document += ln + "\n";

            }

            return document;

        }
4 REPLIES 4

Tanushree Maiti
Giga Sage

Hi @cnshum 

 

You can try following code as  mentioned in following post:

 

getPayloadFileInBase64 : function(sourceGrSysId, fileName){
var attGr = this.getAttGr(sourceGrSysId, fileName);
if(attGr.next()) {
this.attachmentGr = attGr;
var gsa = GlideSysAttachmentInputStream(attGr.getValue("sys_id"));
var baos = new Packages.java.io.ByteArrayOutputStream();
gsa.writeTo(baos);
baos.close();
var payload = GlideBase64.encode(baos.toByteArray());
return payload;
}
return "";
},
 
getAttGr: function(sysId, fileName){
var attGr = new GlideRecord("sys_attachment");
attGr.addQuery("table_sys_id", sysId);
attGr.addQuery("file_name", fileName);
attGr.query();
return attGr;
},
 
 
 
 

 

Please mark this response as Helpful & Accept it as solution if it assisted you with your question.
Regards
Tanushree Maiti
ServiceNow Technical Architect
Linkedin:

@Tanushree Maiti 

it would be nice if you could share screenshots of the outcome whether you were able to print PDF or Word file content.

This makes easier for members to know if the approach worked or not.

Thanks

Regards,
Ankur
Certified Technical Architect  ||  10x ServiceNow MVP  ||  ServiceNow Community Leader

Hi my output of a test file is this. I cannot read the contents

 
 UEsDBBQABgAIAAAAIQAfIwT7cAEAACIGAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC0
lMtOwzAQRfdI/EPkLUrcskAINe2CxxIqUT7AtSethV+yp6+/Z9K0EUKlEbTdRHJm7r1nLGsGo7U1
2RJi0t6VrF/0WAZOeqXdrGQfk5f8nmUJhVPCeAcl20Bio+H11WCyCZAyUrtUsjlieOA8yTlYkQof
wFGl8tEKpGOc8SDkp5gBv+317rj0DsFhjrUHGw6eoBILg9nzmn43JBFMYtlj01hnlUyEYLQUSHW+
dOpHSr5LKEi57UlzHdINNTB+MKGu/B6w073R1UStIBuLiK/CUhdf+ai48nJhSVkctznA6atKS2j1
tVuIXkJKdOfWFG3FCu32/L9yuIWdQiTl+UFa606IhBsD6fwEjW93PCCS4BIAO+dOhBVM3y9G8c28
E6Si3ImYGjg/RmvdCYG0BqD59k/m2Noci6TOcfQh0VqJ/xh7vzdqdU4DB4ioj7+6NpGsT54P6pWk
QP01Wy4SentyfGNzIJxvN/zwCwAA//8DAFBLAwQUAAYACAAAACEAmVV+Bf4AAADhAgAACwAIAl9y
ZWxzLy5yZWxzIKIEAiigAAIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAKySTUsDMRCG74L/Icy9O9sqItLdXkToTWT9AUMy+4GbD5Kptv/eKIou
1LWHHjN558kzQ9abvR3VK8c0eFfBsihBsdPeDK6r4Ll5WNyCSkLO0OgdV3DgBJv68mL9xCNJbkr9
EJLKFJcq6EXCHWLSPVtKhQ/s8k3royXJx9hhIP1CHeOqLG8w/mZAPWGqrakgbs0VqOYQ+BS2b9tB
873XO8tOjjyBvBd2hs0ixNwfZcjTqIZix1KB8foxlxNSCEVGAx43Wp1u9Pe0aFnIkBBqH3ne5yMx
J7Q854qmiR+bNx8Nmq/ynM31OW30Lom3/6znM/OthJOPWb8DAAD//wMAUEsDBBQABgAIAAAAIQBB
HgnKLw4AAAdOAAARAAAAd29yZC9kb2N1bWVudC54bWzkXOty47YV/t+ZvgNGnXaSiWXxJpFSs9uh
 

 

Ankur Bawiskar
Tera Patron

@cnshum 

the issue is ServiceNow doesn't have any OOTB API/Class to read PDF or Word document.

So you can't read PDF/WORD within ServiceNow as per my experience and knowledge

Txt or CSV are plain files and you can grab the actual content from it

For Excel ServiceNow has ExcelParser API

Workaround: Use MID Server + External Java library to read the content or use some Javascript library which you can use within UI script

💡 If my response helped, please mark it as correct and close the thread 🔒— this helps future readers find the solution faster! 🙏

Regards,
Ankur
Certified Technical Architect  ||  10x ServiceNow MVP  ||  ServiceNow Community Leader