Search PDFs within a KB

Bill Bonnett
Kilo Expert

Colleagues,

I have about 4000 documents (PDFs) from scans years ago, and I have them in a KB. How can I search their contents since they are images? Can SN search PDF contents within a KB? I have the KB search in a portal page. I entered "five years" which I know is in a PDF file, but the image is an old photocopy, and therefore an image.

find_real_file.png

Here a screenshot of the PDF with "five years" in it:

find_real_file.png

Thanks in advance.

1 ACCEPTED SOLUTION

Shiva Thomas
Kilo Sage

Hi Bill,



To be able to parse images to retrieve scanned text, you must first parse the image or the PDFs using an OCR (Optical Character Recognition) engine.


ServiceNow doesn't include an OCR engine.


Some softwares, like Adobe Acrobat Pro DC, include one. Some scanner drivers now automatically do OCR you the scanned document. Usually, OCR software requires the image resolution to be at last 300 dpi, so you may have to check the resolution of the image included in your PDF.



If your images are 300dpi or more, my recommendation is to put all your old PDFs thru Acrobat Pro or another OCR software. (Procedure here: Batch OCR multiple PDFs in Acrobat DC )


Once your PDFs are updated, they will be searchable and indexable. (The text is added as an invisible layer on top of each image).



The good news is that PDF attachments are searchable and ServiceNow will recognize and index your updated PDFs. Any OCR'ed PDF attached to a KB will be included in the search result. (Source: Feature to Import a PDF Document? )


The bad news is that you have about 4000 documents to process thru OCR and re-attach to your KB 😕


View solution in original post

5 REPLIES 5

Shiva Thomas
Kilo Sage

Hi Bill,



To be able to parse images to retrieve scanned text, you must first parse the image or the PDFs using an OCR (Optical Character Recognition) engine.


ServiceNow doesn't include an OCR engine.


Some softwares, like Adobe Acrobat Pro DC, include one. Some scanner drivers now automatically do OCR you the scanned document. Usually, OCR software requires the image resolution to be at last 300 dpi, so you may have to check the resolution of the image included in your PDF.



If your images are 300dpi or more, my recommendation is to put all your old PDFs thru Acrobat Pro or another OCR software. (Procedure here: Batch OCR multiple PDFs in Acrobat DC )


Once your PDFs are updated, they will be searchable and indexable. (The text is added as an invisible layer on top of each image).



The good news is that PDF attachments are searchable and ServiceNow will recognize and index your updated PDFs. Any OCR'ed PDF attached to a KB will be included in the search result. (Source: Feature to Import a PDF Document? )


The bad news is that you have about 4000 documents to process thru OCR and re-attach to your KB 😕


Thanks, Shiva. I really appreciate you taking the time to reply and for the solution.



Thanks,



Bill


You're welcome. ^_^


lior grinberg2
Tera Expert

Hi Bill,

There is an application at the store that does OCR the name is "Data extract" here is the link.

Best regards,

Lior Grinberg