Barcode Extraction from PDFs & Images using ZXing on ServiceNow MID Server
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday
In Previous Part 1 - Extracting Text from PDF Attachments using Apache PDFBox using ServiceNow Midserver, we extracted text from PDF attachments using PDFBox. But many documents contain information encoded in barcodes that text extraction misses entirely. Medical supply orders have UDI barcodes. Shipping manifests carry tracking barcodes. Warehouse receipts use QR codes. This data is invisible to text extractors.
This article adds barcode scanning to the MID Server pipeline using ZXing (“Zebra Crossing”), the most widely-used open-source barcode library. Combined with PDFBox for PDF-to-image rendering, it can detect and decode barcodes embedded in any PDF document or standalone image attachment.
What You Will Build
• A MID Server Script Include that scans PDFs and images for barcodes
• Support for 12 barcode formats: QR Code, Code 128, Code 39, Code 93, EAN-13, EAN-8, UPC-A, UPC-E, Data Matrix, PDF 417, Aztec, and ITF
• Multi-barcode detection per page (GenericMultipleBarcodeReader)
• A combined extraction method that runs text + barcode scanning in a single MID Server call
• GTIN-to-record correlation using serial number prefix matching
Prerequisites
• Completed Part 1 (PDFBox installed, GlobalAttachmentHelper created)
• Or: MID Server with agent/extlib directory access
• Basic understanding of the ECC Queue and JavascriptProbe pattern from Part 1
Architecture
The barcode pipeline extends Part 1 by adding a rendering step between PDF loading and data extraction:
Stage What Happens
PDF Text (Part 1) PDFBox loads PDF → PDFTextStripper extracts text directly
PDF Barcode (Part 2) PDFBox loads PDF → PDFRenderer renders page as image at 600 DPI → ZXing scans image for barcodes
Image Barcode (Part 2) ImageIO reads PNG/JPG directly → ZXing scans for barcodes
Step 1: Install ZXing JAR Libraries
ZXing requires two JARs:
JAR File Size Purpose
core-3.5.3.jar ~600 KB Core barcode processing engine — format detection, decoding algorithms, hint system
javase-3.5.3.jar ~38 KB Java SE helpers — BufferedImageLuminanceSource for converting images to ZXing’s internal format
Download URLs
https://repo1.maven.org/maven2/com/google/zxing/core/3.5.3/core-3.5.3.jar
https://repo1.maven.org/maven2/com/google/zxing/javase/3.5.3/javase-3.5.3.jar
Install
1. Copy both JARs to your MID Server’s agent/extlib/ directory
2. Register each in the ecc_agent_jar table (same process as PDFBox in Part 1)
3. Restart the MID Server
⚠ Reminder: JARs MUST be registered in ecc_agent_jar or FileSync will delete them on restart.
If You Also Need PDFBox
To scan barcodes embedded in PDFs (not just standalone images), you also need PDFBox for rendering PDF pages as images. If you followed Part 1, pdfbox-app-2.0.31.jar is already installed. If starting fresh, install all three JARs.
Validate All Classes Load
var probe = new JavascriptProbe("YOUR_MID_SERVER");
probe.setName("ZXingValidation");
probe.setJavascript(
"var r = [];" +
"try { new Packages.com.google.zxing.MultiFormatReader();" +
" r.push('[OK] ZXing core'); }" +
"catch(e) { r.push('[FAIL] ZXing core: ' + e); }" +
"try { Packages.com.google.zxing.client.j2se.BufferedImageLuminanceSource;" +
" r.push('[OK] ZXing javase'); }" +
"catch(e) { r.push('[FAIL] ZXing javase: ' + e); }" +
"try { Packages.org.apache.pdfbox.rendering.PDFRenderer;" +
" r.push('[OK] PDFBox renderer'); }" +
"catch(e) { r.push('[FAIL] PDFBox renderer: ' + e); }" +
"r.join('\\n');"
);
var eccId = probe.create();
gs.info("ECC ID: " + eccId);
✅ Expected: [OK] ZXing core, [OK] ZXing javase, [OK] PDFBox renderer
Step 2: How ZXing Works
The Decoding Pipeline
ZXing processes barcodes through a series of transformations:
# Component What It Does
1 BufferedImage The source image (from ImageIO.read for files, or PDFRenderer for PDF pages)
2 LuminanceSource Converts color image to grayscale luminance values. BufferedImageLuminanceSource handles this.
3 Binarizer Converts grayscale to black/white. HybridBinarizer works best for photos and rendered pages.
4 BinaryBitmap The final black/white bitmap that barcode decoders analyze
5 MultiFormatReader Tries all enabled barcode formats against the bitmap. TRY_HARDER hint improves detection.
6 GenericMultipleBarcodeReader Wraps MultiFormatReader to detect ALL barcodes on a page, not just the first one
Supported Barcode Formats
Format Type Common Use
QR Code 2D Matrix URLs, product info, serial numbers, structured data (up to 4,296 characters)
Code 128 1D Linear Shipping labels, product tracking, GS1-128 supply chain identifiers
Code 39 1D Linear Military, automotive, healthcare (alphanumeric)
EAN-13 1D Linear International retail products (13-digit)
UPC-A / UPC-E 1D Linear North American retail products (12-digit / compressed 8-digit)
Data Matrix 2D Matrix Medical device UDI, electronic components, small item marking
PDF 417 2D Stacked ID cards, shipping labels, boarding passes (high capacity)
Aztec 2D Matrix Transportation tickets, mobile boarding passes
ITF 1D Linear Carton/case labeling, distribution (Interleaved 2 of 5)
The DPI Problem
Discovery: Barcode scanning at 300 DPI (the PDFBox default) fails for most embedded barcodes. 600 DPI is the reliable minimum for consistent detection. Small barcodes or those with thin bars may need even higher resolution.
When PDFBox renders a PDF page as an image, the DPI setting determines pixel density. A barcode that is 1 inch wide renders as 300 pixels at 300 DPI, but 600 pixels at 600 DPI. ZXing needs sufficient pixel resolution to distinguish individual bars.
Step 3: Create the Barcode Scanner Script Include
Navigate to MID Server > Script Includes > New (or add to your existing script):
Name: BarcodeScanner
Complete Script:
var BarcodeScanner = Class.create();
BarcodeScanner.prototype = {
initialize: function() {
this.ImageIO = Packages.javax.imageio.ImageIO;
this.MultiFormatReader = Packages.com.google.zxing.MultiFormatReader;
this.BinaryBitmap = Packages.com.google.zxing.BinaryBitmap;
this.HybridBinarizer = Packages.com.google.zxing.common.HybridBinarizer;
this.BufferedImageLuminanceSource =
Packages.com.google.zxing.client.j2se.BufferedImageLuminanceSource;
this.DecodeHintType = Packages.com.google.zxing.DecodeHintType;
this.BarcodeFormat = Packages.com.google.zxing.BarcodeFormat;
this.GenericMultipleBarcodeReader =
Packages.com.google.zxing.multi.GenericMultipleBarcodeReader;
this.PDDocument = Packages.org.apache.pdfbox.pdmodel.PDDocument;
this.PDFRenderer = Packages.org.apache.pdfbox.rendering.PDFRenderer;
},
/*
* Configure decode hints with all supported formats
*/
_getHints: function() {
var hints = new Packages.java.util.HashMap();
var formats = new Packages.java.util.ArrayList();
formats.add(this.BarcodeFormat.QR_CODE);
formats.add(this.BarcodeFormat.CODE_128);
formats.add(this.BarcodeFormat.CODE_39);
formats.add(this.BarcodeFormat.CODE_93);
formats.add(this.BarcodeFormat.EAN_13);
formats.add(this.BarcodeFormat.EAN_8);
formats.add(this.BarcodeFormat.UPC_A);
formats.add(this.BarcodeFormat.UPC_E);
formats.add(this.BarcodeFormat.DATA_MATRIX);
formats.add(this.BarcodeFormat.PDF_417);
formats.add(this.BarcodeFormat.AZTEC);
formats.add(this.BarcodeFormat.ITF);
hints.put(this.DecodeHintType.POSSIBLE_FORMATS, formats);
hints.put(this.DecodeHintType.TRY_HARDER,
Packages.java.lang.Boolean.TRUE);
return hints;
},
/*
* Detect ALL barcodes in a BufferedImage
*/
_decodeMultiple: function(bufferedImage) {
var source = new this.BufferedImageLuminanceSource(bufferedImage);
var bitmap = new this.BinaryBitmap(
new this.HybridBinarizer(source));
var reader = new this.MultiFormatReader();
var multi = new this.GenericMultipleBarcodeReader(reader);
var results = multi.decodeMultiple(bitmap, this._getHints());
var output = [];
for (var i = 0; i < results.length; i++) {
output.push({
text: "" + results[i].getText(),
format: "" + results[i].getBarcodeFormat()
});
}
return output;
},
Method 1: Scan Barcodes in a PDF
/*
* Scan all pages of a PDF for barcodes
* Handles: chunk decoding, gzip decompression, page rendering, barcode detection
*/
scanPDF: function(chunksJson, dpi) {
dpi = dpi || 600;
var response = { status: "success", pages: [] };
var document = null;
try {
// Decode chunks and decompress (same as Part 1)
var chunks = JSON.parse(chunksJson);
var decoder = Packages.java.util.Base64.getDecoder();
var baos = new Packages.java.io.ByteArrayOutputStream();
for (var i = 0; i < chunks.length; i++) {
var bytes = decoder.decode(chunks[i]);
baos.write(bytes, 0, bytes.length);
}
var allBytes = baos.toByteArray();
var isGzip = (allBytes.length > 2
&& (allBytes[0] & 0xFF) == 0x1F
&& (allBytes[1] & 0xFF) == 0x8B);
var pdfBytes;
if (isGzip) {
var gzis = new Packages.java.util.zip.GZIPInputStream(
new Packages.java.io.ByteArrayInputStream(allBytes));
var out = new Packages.java.io.ByteArrayOutputStream();
var buf = Packages.java.lang.reflect.Array.newInstance(
Packages.java.lang.Byte.TYPE, 4096);
var n;
while ((n = gzis.read(buf)) != -1) out.write(buf, 0, n);
gzis.close();
pdfBytes = out.toByteArray();
} else { pdfBytes = allBytes; }
// Load PDF and render each page as image
document = this.PDDocument.load(
new Packages.java.io.ByteArrayInputStream(pdfBytes));
var renderer = new this.PDFRenderer(document);
var pageCount = document.getNumberOfPages();
for (var p = 0; p < pageCount; p++) {
var pageResult = { page: p + 1, barcodes: [] };
try {
// Render page at specified DPI
var image = renderer.renderImageWithDPI(p, dpi);
pageResult.barcodes = this._decodeMultiple(image);
} catch (pageErr) {
pageResult.error = "No barcode found";
}
response.pages.push(pageResult);
}
} catch (e) {
response.status = "error";
response.error = "" + e.message;
} finally {
if (document != null) document.close();
}
return JSON.stringify(response);
},
Method 2: Scan Barcodes in an Image (PNG/JPG)
/*
* Scan a standalone image for barcodes
* No PDFBox needed - just ZXing + ImageIO
*/
scanImage: function(chunksJson) {
var response = { status: "success", barcodes: [] };
try {
var chunks = JSON.parse(chunksJson);
var decoder = Packages.java.util.Base64.getDecoder();
var baos = new Packages.java.io.ByteArrayOutputStream();
for (var i = 0; i < chunks.length; i++) {
var bytes = decoder.decode(chunks[i]);
baos.write(bytes, 0, bytes.length);
}
var allBytes = baos.toByteArray();
// Decompress if gzipped
var isGzip = (allBytes.length > 2
&& (allBytes[0] & 0xFF) == 0x1F
&& (allBytes[1] & 0xFF) == 0x8B);
var imgBytes;
if (isGzip) {
var gzis = new Packages.java.util.zip.GZIPInputStream(
new Packages.java.io.ByteArrayInputStream(allBytes));
var out = new Packages.java.io.ByteArrayOutputStream();
var buf = Packages.java.lang.reflect.Array.newInstance(
Packages.java.lang.Byte.TYPE, 4096);
var n;
while ((n = gzis.read(buf)) != -1) out.write(buf, 0, n);
gzis.close();
imgBytes = out.toByteArray();
} else { imgBytes = allBytes; }
var bais = new Packages.java.io.ByteArrayInputStream(imgBytes);
var image = this.ImageIO.read(bais);
if (image == null) throw new Error("Unable to read image");
response.barcodes = this._decodeMultiple(image);
} catch (e) {
response.status = "error";
response.error = "" + e.message;
}
return JSON.stringify(response);
},
type: "BarcodeScanner"
};
Step 4: Test Barcode Scanning
4A: Test with a PDF Containing Barcodes
// Get chunks for a PDF attachment
var helper = new global.GlobalAttachmentHelper();
var chunksJson = helper.getAttachmentChunksJson("PDF_ATTACHMENT_SYSID");
// Send barcode scan probe to MID Server
var script = 'var scanner = new BarcodeScanner();' +
'var result = scanner.scanPDF(probe.getParameter("chunks"), 600);' +
'result;';
var eccId = helper.submitMIDProbe("YOUR_MID_SERVER", "BarcodeScan",
script, JSON.stringify({ chunks: chunksJson }));
gs.info("ECC ID: " + eccId);
// Check result after 30 seconds:
var output = helper.getProbeResult("ECC_ID_HERE");
gs.info("Result: " + output);
Sample Output
{
"status": "success",
"pages": [
{
"page": 1,
"barcodes": [
{ "text": "62449010030203", "format": "CODE_128" },
{ "text": "64624619978346", "format": "CODE_128" }
]
}
]
}
4B: Test with an Image (PNG/JPG)
var helper = new global.GlobalAttachmentHelper();
var chunksJson = helper.getAttachmentChunksJson("IMAGE_ATTACHMENT_SYSID");
var script = 'var scanner = new BarcodeScanner();' +
'var result = scanner.scanImage(probe.getParameter("chunks"));' +
'result;';
var eccId = helper.submitMIDProbe("YOUR_MID_SERVER", "ImageScan",
script, JSON.stringify({ chunks: chunksJson }));
gs.info("ECC ID: " + eccId);
Sample Output
{
"status": "success",
"barcodes": [
{ "text": "16719618981739", "format": "CODE_128" }
]
}
Step 5: Combined Text + Barcode Extraction
For production use, extract both text and barcodes in a single MID Server call to avoid sending the same attachment data twice:
/*
* Extract text AND scan barcodes in one probe call
* Requires both PDFBox and ZXing
*/
extractAll: function(chunksJson, dpi) {
dpi = dpi || 600;
var response = { status: "success", text: {}, barcodes: {} };
try {
var textResult = JSON.parse(
this.extractText(chunksJson)); // From Part 1
var barcodeResult = JSON.parse(
this.scanPDF(chunksJson, dpi));
response.text = textResult;
response.barcodes = barcodeResult;
} catch (e) {
response.status = "error";
response.error = "" + e.message;
}
return JSON.stringify(response);
},
⚠ This method processes the PDF twice (once for text, once for barcode rendering). For very large PDFs (50+ pages), consider extracting text first and only scanning pages that need barcode detection.
Step 6: Correlating Barcodes with Extracted Records
The most valuable insight from this implementation: barcode values can be automatically linked to the correct line item using a prefix matching algorithm.
The Pattern
In supply chain documents, barcode-tracked items (medical implants, serialized devices) carry both a serial number in the text and a GTIN barcode. The serial number’s first segment appears within the barcode’s GTIN value:
Item Serial (from text) GTIN (from barcode)
Titanium Femoral Stem 8346-101219 64624619978346
Ceramic Acetabular Cup 0203-101277 62449010030203
Notice: serial prefix “8346” appears in GTIN “6462461997—8346”, and “0203” appears in “62449010—03—0203”.
Matching Algorithm
// After parsing text records and barcode results:
var gtinValues = []; // Collected from barcode scan results
// For each text record that has a serial number:
if (record.serial_number && gtinValues.length > 0) {
var serialPrefix = record.serial_number.split("-")[0];
for (var g = 0; g < gtinValues.length; g++) {
if (gtinValues[g].indexOf(serialPrefix) > -1) {
record.gtin_number = gtinValues[g];
gtinValues.splice(g, 1); // Remove matched GTIN
break;
}
}
}
The splice() call ensures each GTIN is only matched once. This prevents a single GTIN from being assigned to multiple records when serial prefixes overlap.
Step 7: Integrate into the Automated Pipeline
Update the Submit PDF Extraction Action
Modify the Flow Designer action to classify attachments and call the appropriate method:
// Inside the action script:
var fileType = "" + gr.file_type;
var dp = new YOUR_SCOPE.DocumentProcessor();
if (fileType === "pdf") {
// Combined text + barcode extraction
eccId = dp.submitFullExtraction("" + gr.sys_id);
} else if (fileType === "image") {
// Image barcode scan only
eccId = dp.submitImageBarcodeScan("" + gr.sys_id);
}
Update the Process PDF Result Action
Handle both PDF and image results:
if (fileType === "pdf") {
// Result has both text and barcodes
var barcodeData = result.barcodes || null;
var records = dp.parseExtractedText(result.text.fullText, barcodeData);
dp.insertRecords(records, emailSysId, fileName);
} else if (fileType === "image") {
// Result has barcodes only
if (result.barcodes && result.barcodes.length > 0) {
for (var b = 0; b < result.barcodes.length; b++) {
// Create record with GTIN from barcode
var gr = new GlideRecord("your_target_table");
gr.initialize();
gr.gtin_number = result.barcodes[b].text;
gr.source_file = fileName;
gr.insert();
}
}
}
Troubleshooting
Problem Solution
No barcode found at 300 DPI Increase to 600 DPI. Small or thin-bar barcodes need higher pixel resolution for ZXing to decode.
ClassNotFound: BufferedImageLuminanceSource Missing javase-3.5.3.jar. This JAR provides the bridge between Java images and ZXing.
PDFRenderer is not a function PDFBox JAR was deleted by FileSync. Re-register in ecc_agent_jar and restart MID.
NotFoundException from ZXing No barcode detected in image. Normal for pages without barcodes. Wrapped in try/catch per page.
Only one barcode found per page Using MultiFormatReader.decode() which returns first match. Use GenericMultipleBarcodeReader.decodeMultiple() instead.
Wrong barcode format detected Restrict POSSIBLE_FORMATS hint to only the formats you expect. Fewer formats = faster and more accurate.
GTIN not matching to records Check serial number format. Prefix matching assumes format like '8346-101219' where first segment is the lookup key.
Large PDF times out Rendering at 600 DPI is memory-intensive. For 50+ page PDFs, process specific pages or reduce DPI to 400.
Image barcode works but PDF barcode fails PDF rendering issue. Check that PDFBox can render the page: try text extraction first to verify the PDF loads.
Key Takeaways
1. DPI is everything: 300 DPI fails. 600 DPI works. This single setting is the difference between barcode detection working and not working on PDF documents.
2. Use GenericMultipleBarcodeReader: The basic MultiFormatReader.decode() only finds the first barcode per image. Real documents often have multiple barcodes per page.
3. TRY_HARDER matters: This hint tells ZXing to spend more time analyzing the image. It catches barcodes that are slightly rotated, partially occluded, or low contrast.
4. GTIN correlation via serial prefix: The most valuable non-obvious insight. Barcode GTIN values can be automatically matched to the correct line item by searching for the serial number’s first segment within the GTIN string.
5. Combined extraction saves round-trips: Running text extraction and barcode scanning in a single MID Server probe call halves the ECC Queue traffic and wait time.
6. Image vs PDF is a routing decision: PNG/JPG images go directly to ZXing. PDFs go through PDFBox rendering first, then ZXing. Your Flow Designer action should classify and route accordingly.
