Preparing your articles for consumption by AI

Leri Andrews · ‎10-05-2023

Hello all,

I am keen to future-proof my articles for consumption by such technologies as Moveworks or ServiceNow's own AI search assist. How can I optimise my content so these technologies can trawl for relevant 'snippets' to surface in a chatbot (for example) or even to use a generative AI product to re-write a suitable answer based on information it has located in my articles.

Articles I have seen suggest some basic strategies I already follow such as:

1. don't use text as images

2. Avoid content in attached pdfs

3. Write well (active voice, short sentences, avoiding jargon etc)

4. Use heading styles to structure the content

Based on your own experiences in this area, is there anything else you would add? Have you run into issues, for example, with content in tables? Do templated articles or knowledge blocks cause issues? Have you enountered difficulties working in Chinese or Korean?

Many thanks

Js15 · ‎10-06-2023

Happy to see you asking this question though I wish there was more information. I recently watched some material about the Vancouver release and it got me excited about the possibility of using AI/LLM trained against our knowledge base and Incident/Task records for automated support.

I started looking at Microsoft's documentation on Azure OpenAI since it was listed as working with ServiceNow. On page for using your own data with Azure there is a snippet that states:

Azure OpenAI on your data supports the following filetypes:

.txt
.md
.html
Microsoft Word files
Microsoft PowerPoint files
PDF
There is an upload limit, and there are some caveats about document structure and how it might affect the quality of responses from the model:

The model provides the best citation titles from markdown (.md) files.

If a document is a PDF file, the text contents are extracted as a preprocessing step (unless you're connecting your own Azure Cognitive Search index). If your document contains images, graphs, or other visual content, the model's response quality depends on the quality of the text that can be extracted from them.

Perhaps there is good information there that is applicable to other AI models? I also asked ChatGPT how to best format articles for indexing by a LLM. It said:

To ensure that your company's knowledge base is effectively indexed and accessible by a large language model like ChatGPT, it's essential to structure and format your documents in a way that facilitates natural language understanding and retrieval. Here are some guidelines:

Use Clear and Consistent Formatting:

Ensure consistent formatting throughout the documents. Use headings, subheadings, bullet points, and numbered lists to organize information.
Use a legible font and maintain a consistent font size.
Natural Language:

Write content in natural, plain language. Avoid overly technical jargon or complex terminology unless necessary.
Use complete sentences and paragraphs to present information coherently.
Structured Data:

Organize your documents into sections with clear headings and subheadings. This helps the language model understand the document's structure.
Use metadata such as tags or labels to categorize and classify documents.
Tables and Lists:

Utilize tables and lists for presenting structured data, such as specifications, comparisons, or step-by-step instructions.
Hyperlinks:

Include hyperlinks to relevant external sources or internal documents where necessary.
Ensure that hyperlinks are descriptive and provide context about the linked content.
Images and Diagrams:

Use images, diagrams, and charts when visual aids can enhance understanding. Ensure that these are appropriately labeled and described in the text.
Consistent Naming Conventions:

Adopt a consistent naming convention for your documents and files. Clear file names make it easier to search for and retrieve specific documents.
Searchable Keywords and Phrases:

Incorporate relevant keywords and key phrases naturally throughout the document. This helps in retrieving specific information during searches.
Quality Control:

Review and proofread documents for accuracy and clarity.
Keep documents up to date to reflect the latest information.
Document Metadata:

Include metadata such as document titles, author names, creation dates, and version numbers. This information can be useful for search and organization.
Version Control:

Implement a version control system for your documents to keep track of revisions and updates.
Accessibility:

Ensure that your documents are accessible to all users, including those with disabilities. Use accessible document formats and provide alternative text for images and diagrams.
Document Size:

Avoid extremely long documents, as they may be more challenging to process for both humans and language models. Consider breaking lengthy topics into smaller, more focused documents.
Consider Document Encoding:

If your knowledge base relies on a specific document format, such as HTML or Markdown, ensure that it adheres to standard practices for those formats.
User Feedback:

Gather feedback from users to continually improve the usability and accessibility of your knowledge base.
By following these guidelines, you can create a knowledge base that is well-structured and easily digestible by large language models like ChatGPT, making it more effective for indexing and retrieval of information. Additionally, the structured nature of the content will enhance the overall user experience.

Some of this is obvious and I would hope most teams knowledge articles already follow most of these principles but there are some good reminders and things to think about.

Sorry I can't provide any actual experience with this. Our org only does every other ServiceNow release and I am chomping at the bit to get the new Vancouver Virtual Agent toys to play with but alas I will have to wait for Washington. But we can use this time to prepare for the coming changes. Looking forward to seeing what thoughts everyone has.

Laurent5 · ‎06-20-2024

This is a very interesting topic and as we are working on more engagements with customers, best practices are starting to emerge in that respect.

All the points mentioned by @Leri Andrews are very valid.

With regards to images, as we are now starting to see more multi-modal LLMs, I suspect it won't be long before the images are also taken into account by the LLM (We've already seen some examples at K24 for workflow creation)

An important element to consider is also ensuring the KBs use the same terminology/intents than what the users might request (although this can be mitigated with NLU, synonyms etc, it is easier if done at the source).

Finally, make sure the KBs can be found easily, check the scoring etc as the LLM's output will only be as good as the content provided to it (i.e the top results from the prompt). So make sure you fine tune AI search so that it surfaces the right content.

Per-Kristian BM · ‎06-27-2024

Very good points indeed. I would also add a point about old content and automating retirement of articles.

In our experience, in order to motivate users to use self-service, there has to be a feeling of added service. This means the knowledge base needs to be more or less complete, i.e most known issues have a knowledge article, not only the top 40. We found that the more we added articles, the more strain on approvals for both publishing and retirement of articles was high. We are working KCS now, and since we started attaching KB's to tickets and updated frequently, we found that we could automate this by a script running once a month, retire all articles not attached, not updated or viewed more then 5 times the last 12 months. After we automated this in our knowledge base, we could concentratie more on publishing new content to fill the KB, thus giving the ESC more value to the users.

Rachel38 · ‎06-28-2024

This is a very interesting topic. We started using HTML code to add expandable/collapsible sections to our knowledge articles. Do you think these will be compatible for AI consumption?

Laurie12 · ‎06-28-2024

I didn't know expandable/collapsible sections were possible. This is another topic, but could you tell me how you did this?