We've updated the ServiceNow Community Code of Conduct, adding guidelines around AI usage, professionalism, and content violations. Read more

Reading contents of Word, PDF file

cnshum
Tera Contributor

i have this use case where i need to read the contents of CV/resume uploaded by users and pass it to a LLM to extract the skills. Currently, I am only able to do so using a txt file (refer to the below code).  But when i use this code on a pdf or word file, the contents are all weird symbols. Can anyone point me in the right direction?

code:

extractSkillsFromCV: function() {
        var tableSysId = this.getParameter('sysparm_tableSysId');
        var grAttach = new GlideRecord("sys_attachment");
        grAttach.addEncodedQuery("table_sys_id=" + tableSysId);
        grAttach.query();
        if(grAttach.next()){

            var document;
            var attach = new GlideSysAttachment().getContentStream(grAttach.sys_id);
            var reader = new GlideTextReader(attach);
            var ln = ' ';
            while ((ln = reader.readLine()) != null) {
                document += ln + "\n";

            }

            return document;

        }
1 REPLY 1

Tanushree Maiti
Giga Sage

Hi @cnshum 

 

You can try following code as  mentioned in following post:

 

getPayloadFileInBase64 : function(sourceGrSysId, fileName){
var attGr = this.getAttGr(sourceGrSysId, fileName);
if(attGr.next()) {
this.attachmentGr = attGr;
var gsa = GlideSysAttachmentInputStream(attGr.getValue("sys_id"));
var baos = new Packages.java.io.ByteArrayOutputStream();
gsa.writeTo(baos);
baos.close();
var payload = GlideBase64.encode(baos.toByteArray());
return payload;
}
return "";
},
 
getAttGr: function(sysId, fileName){
var attGr = new GlideRecord("sys_attachment");
attGr.addQuery("table_sys_id", sysId);
attGr.addQuery("file_name", fileName);
attGr.query();
return attGr;
},
 
 
 
 

 

Please mark this response as Helpful & Accept it as solution if it assisted you with your question.
Regards
Tanushree Maiti
ServiceNow Technical Architect
Linkedin: