

- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
When I started working for ServiceNow, I was struck by how fast the Knowledge Management system was. This is far from universal in our industry, so I began to consider why.
Have you ever written an article or piece of text in Word and pasted it directly into a website CMS, say Wordpress? All the formatting copies across, even images – pretty nifty, right? Well, if you’ve ever examined the subsequent HTML, you’ll know it’s not really. I know it’s an expression which is totally overused but I’m embracing it today: a picture is worth a thousand words. Or perhaps a thousand kilo bytes. Now, I’m all in favour of using images to explain a problem or illustrate a point in a knowledge article. Screenshots, diagrams, if it clarifies the solution, by all means go for it.
But images tend to be big and not always easy to store. Back in the day when I was still building HTML pages by hand, adding an image was quite an undertaking: you resized it, uploaded it to FTP, got the URL, updated your image tag, set all the attributes. It wasn’t easy and was pretty time consuming.
Then Microsoft Word came along and we all started writing our documents with all the fancy formatting Word allows and just copied/pasted everything right into the rich-text editor. Not only did it retain all the formatting it also copied the images. But as easy as this was in the user interface, the result was often disastrous under the bonnet. It might look clean to the uninitiated, but it ballooned the size of the resulting HTML file. I know what you’re thinking, does document size really matter? Surely a few kilo bytes are not going to make much of a difference. Let’s have a look at how Word behaves exactly.
Behind the (HTML) scenes of copying/pasting from Word
Word is using a fair amount of formatting which isn’t really necessary, and it adds up. A 10KB article easily becomes a 50KB article. Sure, we’re still talking kilobytes, but let’s check what’s happening to those images that are so crucial to our article. You see, that whole manual process of uploading images to a dedicated folder made sense and it kept the article body nice and lean. That’s not something that would happen automatically when you copy/paste from Word. To retain the images in one document it uses base-64 encoding. This method makes it possible for images to be included in the HTML page. The images become part of the HTML and no additional images have to be downloaded. The encoding turns a file into a long series of letters and numbers which is safe to use in the HTML formatting. So rather than this:
<p><img src="My%20Photo.jpgx align="baseline" border="1" /></p>
You’d see this:
<p><img src=data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wgARCAKAAyADASIAAhEBAxEB/8QAGwAAAgMBAQEAAAAAAAAAAAAAAwQBAgUABgf/xA...
Now here’s the problem: some of these images are really big and since the base-64 encoding doesn’t compress the file in any meaningful sense the HTML code as a result is really large as well. Add to this the less-than-ideal formatting Word generates and you end up with sizable HTML documents. For example, take a one-page Word document with a series of images, the combined size of the HTML is easily 20 MB. Not something you want to store in your Knowledge Management system.
The counter argument is that HTML with in-line images load faster, but that doesn’t work for Knowledge Management. In order to process the knowledge records the data is passed from the database to the platform via a series of HTTP calls and for this to work the records need to be reasonably small. You can see this when you click on View Article in a knowledge record. In order to display the article, it gets the data and sends it as the payload of a POST request. The system is designed to handle small compact articles, that's how it retains its good performance. If you add very large articles it slows it down.
What happens when you add an image to a knowledge article?
So how do we ensure this doesn’t happen? Don’t worry, the days of uploading images via FTP are long behind us and we came up with a few clever solutions. If you create an article from scratch the rich-text editor has a button to add an image. In the background it uploads the image to a table and references the image in the HTML code:
<img src="sys_attachment.do?sys_id=76bef1dedb4…
I’m sure you expected as much, but check what happens when you copy/paste a document directly from Word.
Pretty impressive, right? It doesn’t just copy paste the HTML version of Word, it extracts all the images, uploads them to the sys_attachment table and replaces it with a reference. To top it all off it even offers to clean up the formatting.
Nice lean html code with no long base-64 encoding. Just the way we like it. It’s a good thing to consider as a best-practise when your users tend to prefer Word to write their articles: always allow the rich-text editor to clean up the formatting. That’s how we keep these articles compact and that’s how we ensure Knowledge Management performs well.
An exception to the rule
There’s one caveat to this story and that’s when you’re importing articles from different knowledge management applications. We take every precaution to ensure articles never end up with those large base-64 data strings, but the same doesn’t always go for 3rd party applications. If the articles are imported in the background containing in-line images, they’ll render correctly, but they might cause delays as they can be quite large. This is something you need to consider before you import your articles. Work with the provider to ensure images are separated from the content.
Best wishes and until next time,
Justin
- 1,136 Views
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.