Document extraction - Now assist Document intelligence
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
I have given instruction to extract data from Invoice PDF. I'm not getting any output. Any guidance is appreciated.
[
'system',
`You read invoice documents and return the information as a structured JSON. Use as many item lines as necessary. Output your answer as JSON that
matches the given schema: \n{schema}\n.`,
],
const schema = `
{
invNum: string - 'The Invoice Number',
invDate: string - 'The Date the Invoice Was Created',
tax: number - 'Total Tax on the invoice can be 0',
invTotal: number - 'Total Value of the invoice',
freight: number - 'Freight or Shipping Charges on Invoice',
supplier: {
name: string - 'The Name of the Invoice Supplier',
addr1: string - 'The Street of the Invoice Supplier',
city: string - 'The City of the Invoice Supplier',
state: string - 'The State of the Invoice Supplier',
zip: string - 'The Zip Code of the Invoice Supplier',
phone: string - 'The Phone of the Invoice Supplier',
email: string - 'The Email of the Invoice Supplier',
acctNum: string - 'The Account Number of the Receiver',
contactName: string - 'The Contact Name of the Supplier',
},
lineItems: [
{
supPartNum: string - 'Supplier Part Number',
itemDescr: string - 'Description of the Item on the Invoice',
uom: string - 'The Unit of Measurement per quantity, default to EA (each)',
qty: number - 'The Quantity Ordered',
unitPrice: number - 'The Price Per Unit',
extPrice: number - 'The Total of the Item',
}
],
}`;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi @DhathriC ,
The primary guidance for extracting data from Invoice PDFs to produce the desired JSON output is to ensure your instructions and extraction tool are properly aligned with the provided JSON schema. Common issues in not getting output include:
Schema Consistency: Confirm the schema you provide in the instruction exactly matches the expected structure and field types your extraction system can handle. The detailed schema you shared is comprehensive and should be compatible if the tool supports nested objects and arrays.
Clear Instruction: The instruction "You read invoice documents and return the information as a structured JSON" followed by the schema is correct but adding explicit examples or highlighting key fields to prioritize may improve output.
Tool Capability: Ensure the extraction tool or AI model you are using supports:
Reading multi-line item tables as arrays.
Nested supplier details objects.
Parsing numeric fields correctly, including zero values like tax.
Outputting JSON formatted text only (no additional commentary).
Input Quality: The PDF must be processed properly—if it is scanned or image-based, OCR quality is critical to recognize key elements like invoice number, dates, supplier info, and line items.
Troubleshooting Tips:
Test extraction on a simple invoice PDF to check if any output is generated.
Validate the JSON output manually by comparing it with expected schema.
If possible, view model confidence or diagnostic logs from the extraction system.
Use smaller chunks of prompt or split by data sections (supplier, header, lines) if one full pass is problematic.
If the system supports it, consider:
Including explicit extraction keywords matching your schema keys in the instruction.
Providing a few example JSON outputs of sample invoices.
In summary, your approach with a detailed schema is correct. The lack of output likely relates to parsing issues, mismatched prompt instructions versus tool capability, or input quality of the PDF. Trying a simpler schema or example-based prompt alongside validating input text could help isolate the problem, leading to correct JSON extraction results.
If you can indicate which system or extraction service you're using, more specific configuration or troubleshooting steps can be provided.
If it is helpful, please hit the thumbs button and accept the correct solution by referring to this solution in the future it will be helpful to them.
Thanks & Regards,
Mohammed Mustaq Shaik - ServiceNow Consultant - Lets connect on Linkedin:https://www.linkedin.com/in/shaik-mohammed-mustaq-43ab63238/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
I have uploaded the invoice and the instruction in LLM. Its giving the desired output. If I give the same in field description in now assist Document intelligence. I'm not able to get the data in JSON.
I'm able to extract the data in the text format.
