Remove HTML formatting in a script in the same way as Remove formatting pop-up

Jan Ujcic1
Tera Contributor

Hi,

when using the comments field in the Workspace and copying and pasting text from somewhere else (Word, Outlook etc.) you get the pop-up "Paste Formatting Options" (Remove formatting/Keep Formatting). This usually removes the HTML tags. I am looking to use the same or similar function also in one of my scripts.

 

Use case: I am retrieving journal fields from tickets and sending them to external systems using flows and scripts. Some of the external systems cannot render HTML but only Markdown. People often forget to "Remove formatting" which creates very messy text in the other system.  I am looking to clean up the text of HTML tags in a script that retrieves the latest journal field. I could use regex but I'm saving that as a last resort since can become complicated with all the different things people can paste in that field.

 

I have already tried to use the function below and while it does the job it also removes all the structure in the text such as line breaks. This also often makes the text hard to read in other systems.

GlideSPScriptable().stripHTML(htmlText);

The "Remove formatting" option in the pop-up does a lot better job even if not completely perfect for my use case.  Is there a way to use it directly? I know that ServiceNow uses the TinyMCE editor but I'm having a hard time finding the exact function(s). I also welcome any other recommendations on how to handle this requirement.

1 ACCEPTED SOLUTION

-O-
Kilo Patron

I have a strong feeling it uses browser DOM to help it strip HTML.

A library called HTMLParser could be of help in transforming html into text while keeping formatting.

I took the library and added a function to it: HTMLtoText.

With help of this function the html

<p><span style="font-size: 10.0pt;">The x1 Carbon is Lenovo&#39;s lightest ThinkPad yet. It provides a QHD display that fights glare and weighs less than three pounds. Ideal for most computing tasks, and highly mobile. </span></p>
<p><span style="font-size: 10.0pt;">Technical Specs:</span></p>
<ul><li><span style="font-size: 10.0pt;">Intel core i5 processor</span></li><li><span style="font-size: 10.0pt;">512GB solid state drive (SSD) </span></li><li><span style="font-size: 10.0pt;">Backlit keyboard</span></li></ul>
<p><span style="font-size: 10.0pt;"><img style="align: baseline;" title="" src="picture.jpgx" alt="" width="256" height="229" align="baseline" border="" hspace="" vspace="" /></span></p>
<p> </p>
<p><span style="font-size: 10.0pt;"><img style="align: baseline;" title="" src="banking_services_login_bgpic.jpgx" alt="" width="564" height="376" align="baseline" border="" hspace="" vspace="" /></span></p>

prints like:

The x1 Carbon is Lenovo&#39;s lightest ThinkPad yet. It provides a QHD display that fights glare and weighs less than three pounds. Ideal for most computing tasks, and highly mobile.
Technical Specs:
	Intel core i5 processor
	512GB solid state drive (SSD)
	Backlit keyboard

 

And here's the code:

/*
 * HTML5 Parser By Sam Blowes
 *
 * Designed for HTML5 documents
 *
 * Original code by John Resig (ejohn.org)
 * http://ejohn.org/blog/pure-javascript-html-parser/
 * Original code by Erik Arvidsson, Mozilla Public License
 * http://erik.eae.net/simplehtmlparser/simplehtmlparser.js
 *
 * ----------------------------------------------------------------------------
 * License
 * ----------------------------------------------------------------------------
 *
 * This code is triple licensed using Apache Software License 2.0,
 * Mozilla Public License or GNU Public License
 *
 * ////////////////////////////////////////////////////////////////////////////
 *
 * Licensed under the Apache License, Version 2.0 (the "License"); you may not
 * use this file except in compliance with the License.  You may obtain a copy
 * of the License at http://www.apache.org/licenses/LICENSE-2.0
 *
 * ////////////////////////////////////////////////////////////////////////////
 *
 * The contents of this file are subject to the Mozilla Public License
 * Version 1.1 (the "License"); you may not use this file except in
 * compliance with the License. You may obtain a copy of the License at
 * http://www.mozilla.org/MPL/
 *
 * Software distributed under the License is distributed on an "AS IS"
 * basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
 * License for the specific language governing rights and limitations
 * under the License.
 *
 * The Original Code is Simple HTML Parser.
 *
 * The Initial Developer of the Original Code is Erik Arvidsson.
 * Portions created by Erik Arvidssson are Copyright (C) 2004. All Rights
 * Reserved.
 *
 * ////////////////////////////////////////////////////////////////////////////
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version 2
 * of the License, or (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
 *
 * ----------------------------------------------------------------------------
 * Usage
 * ----------------------------------------------------------------------------
 *
 * // Use like so:
 * HTMLParser(htmlString, {
 *     start: function(tag, attrs, unary) {},
 *     end: function(tag) {},
 *     chars: function(text) {},
 *     comment: function(text) {}
 * });
 *
 * // or to get an XML string:
 * HTMLtoXML(htmlString);
 *
 * // or to get an XML DOM Document
 * HTMLtoDOM(htmlString);
 *
 * // or to inject into an existing document/DOM node
 * HTMLtoDOM(htmlString, document);
 * HTMLtoDOM(htmlString, document.body);
 *
 */

(function () {
	// Regular Expressions for parsing tags and attributes
	var startTag = /^<([-A-Za-z0-9_]+)((?:\s+[a-zA-Z_:][-a-zA-Z0-9_:.]*(?:\s*=\s*(?:(?:"[^"]*")|(?:'[^']*')|[^>\s]+))?)*)\s*(\/?)>/,
	    endTag = /^<\/([-A-Za-z0-9_]+)[^>]*>/,
	    attr = /([a-zA-Z_:][-a-zA-Z0-9_:.]*)(?:\s*=\s*(?:(?:"((?:\\.|[^"])*)")|(?:'((?:\\.|[^'])*)')|([^>\s]+)))?/g;

	// Empty Elements - HTML 5
	var empty = make_map("area,base,basefont,br,col,frame,hr,img,input,link,meta,param,embed,command,keygen,source,track,wbr");

	// Block Elements - HTML 5
	var block = make_map("address,article,applet,aside,audio,blockquote,button,canvas,center,dd,del,dir,div,dl,dt,fieldset,figcaption,figure,footer,form,frameset,h1,h2,h3,h4,h5,h6,header,hgroup,hr,iframe,ins,isindex,li,map,menu,noframes,noscript,object,ol,output,p,pre,section,script,table,tbody,td,tfoot,th,thead,tr,ul,video");

	// Inline Elements - HTML 5
	var inline = make_map("a,abbr,acronym,applet,b,basefont,bdo,big,br,button,cite,code,del,dfn,em,font,i,iframe,img,input,ins,kbd,label,map,object,q,s,samp,script,select,small,span,strike,strong,sub,sup,textarea,tt,u,var");

	// Elements that you can, intentionally, leave open
	// (and which close themselves)
	var closeSelf = make_map("colgroup,dd,dt,li,options,p,td,tfoot,th,thead,tr");

	// Attributes that have their values filled in disabled="disabled"
	var fillAttrs = make_map("checked,compact,declare,defer,disabled,ismap,multiple,nohref,noresize,noshade,nowrap,readonly,selected");

	// Special Elements (can contain anything)
	var special = make_map("script,style");

	var HTMLParser = this.HTMLParser = function (html, handler) {
		var index,
		    chars,
		    match,
		    stack = [],
		    last = html;

		stack.last = function () {
			return this[this.length - 1];
		};

		while (html) {
			chars = true;

			// Make sure we're not in a script or style element
			if (!stack.last() || !special[stack.last()]) {
				// Comment
				if (html.indexOf("<!--") == 0) {
					index = html.indexOf("-->");

					if (index >= 0) {
						if (handler.comment)
							handler.comment(html.substring(4, index));

						html = html.substring(index + 3);
						chars = false;
					}
				}
				else
					if (html.indexOf("</") == 0) {
						match = html.match(endTag);

						if (match) {
							html = html.substring(match[0].length);
							match[0].replace(endTag, parseEndTag);
							chars = false;
						}
					}
					else
						if (html.indexOf("<") == 0) {
							match = html.match(startTag);

							if (match) {
								html = html.substring(match[0].length);
								match[0].replace(startTag, parseStartTag);
								chars = false;
							}
						}

				if (chars) {
					index = html.indexOf("<");

					var text = index < 0 ? html : html.substring(0, index);

					html = index < 0 ? "" : html.substring(index);

					if (handler.chars)
						handler.chars(text);
				}
			}
			else {
				html = html.replace(new RegExp("([\\s\\S]*?)<\/" + stack.last() + "[^>]*>"), function (all, text) {
					text = text.replace(/<!--([\s\S]*?)-->|<!\[CDATA\[([\s\S]*?)]]>/g, "$1$2");

					if (handler.chars)
						handler.chars(text);

					return "";
				});

				parseEndTag("", stack.last());
			}

			if (html == last)
				throw "Parse Error: " + html;

			last = html;
		}

		// Clean up any remaining tags
		parseEndTag();

		function parseStartTag (tag, tagName, rest, unary) {
			tagName = tagName.toLowerCase();

			if (block[tagName])
				while (stack.last() && inline[stack.last()])
					parseEndTag("", stack.last());

			if (closeSelf[tagName] && stack.last() == tagName)
				parseEndTag("", tagName);

			unary = empty[tagName] || !!unary;

			if (!unary)
				stack.push(tagName);

			if (handler.start) {
				var attrs = [];

				rest.replace(attr, function (match, name) {
					var value = arguments[2] ? arguments[2] : arguments[3] ? arguments[3] : arguments[4] ? arguments[4] : fillAttrs[name] ? name : "";

					attrs.push({ 'name': name, 'value': value, 'escaped': value.replace(/(^|[^\\])"/g, '$1\\\"') });
				});

				if (handler.start)
					handler.start(tagName, attrs, unary);
			}
		}

		function parseEndTag (tag, tagName) {
			// If no tag name is provided, clean shop
			if (!tagName)
				var pos = 0;

			// Find the closest opened tag of the same type
			else
				for (var pos = stack.length - 1; pos >= 0; pos--)
					if (stack[pos] == tagName)
						break;

			if (pos >= 0) {
				// Close all the open elements, up the stack
				for (var i = stack.length - 1; i >= pos; i--)
					if (handler.end)
						handler.end(stack[i]);

				// Remove the open elements from the stack
				stack.length = pos;
			}
		}
	};

	this.HTMLtoXML = function (html) {
		var results = "";

		HTMLParser(html, {
			start: function (tag, attrs, unary) {
				results += "<" + tag;

				for (var i = 0; i < attrs.length; i++)
					results += " " + attrs[i].name + '="' + attrs[i].escaped + '"';

				results += ">";
			},

			end: function (tag) {
				results += "</" + tag + ">";
			},

			chars: function (text) {
				results += text;
			},

			comment: function (text) {
				results += "<!--" + text + "-->";
			}
		});

		return results;
	};

	this.HTMLtoDOM = function (html, doc) {
		// There can be only one of these elements
		var one = make_map("html,head,body,title");

		// Enforce a structure for the document
		var structure = { 'link': "head", 'base': "head" };

		if (!doc) {
			if (typeof DOMDocument != "undefined") {
				doc = new DOMDocument();
			}
			else
				if (typeof document != "undefined" && document.implementation && document.implementation.createDocument) {
					doc = document.implementation.createDocument("", "", null);
				}
				else
					if (typeof ActiveX != "undefined") {
						doc = new ActiveXObject("Msxml.DOMDocument");
					}
		}
		else
			doc = doc.ownerDocument || doc.getOwnerDocument && doc.getOwnerDocument() || doc;

		var elems = [],
		    documentElement = doc.documentElement || doc.getDocumentElement && doc.getDocumentElement();

		// If we're dealing with an empty document then we
		// need to pre-populate it with the HTML document structure
		if (!documentElement && doc.createElement)(function () {
			var html = doc.createElement("html");
			var head = doc.createElement("head");

			head.appendChild(doc.createElement("title"));
			html.appendChild(head);
			html.appendChild(doc.createElement("body"));

			doc.appendChild(html);
		})();

		// Find all the unique elements
		if (doc.getElementsByTagName)
			for (var i in one)
				one[i] = doc.getElementsByTagName(i)[0];

		// If we're working with a document, inject contents into
		// the body element
		var curParentNode = one.body;

		HTMLParser(html, {
			start: function (tagName, attrs, unary) {
				// If it's a pre-built element, then we can ignore
				// its construction
				if (one[tagName]) {
					curParentNode = one[tagName];

					if (!unary)
						elems.push(curParentNode);

					return;
				}

				var elem = doc.createElement(tagName);

				for (var attr in attrs)
					elem.setAttribute(attrs[attr].name, attrs[attr].value);

				if (structure[tagName] && typeof one[structure[tagName]] != "boolean")
					one[structure[tagName]].appendChild(elem);
				else
					if (curParentNode && curParentNode.appendChild)
						curParentNode.appendChild(elem);

				if (!unary) {
					elems.push(elem);
					curParentNode = elem;
				}
			},

			end: function (tag) {
				elems.length -= 1;

				// Init the new parentNode
				curParentNode = elems[elems.length - 1];
			},

			chars: function (text) {
				curParentNode.appendChild(doc.createTextNode(text));
			},

			comment: function (text) {
				// create comment node
			}
		});

		return doc;
	};

	this.HTMLtoText = function (html) {
		var indent = 0,
		    text = '',
		    text_arr = [];

		HTMLParser(html, {
			start: function (tag, attr_arr, is_unary) {
				if (block[tag]) {
					text = finalize_line(tag, text_arr, text, indent);

					if (is_list(tag))
						indent++;
				}
			},

			end: function (tag) {
				if (block[tag]) {
					text = finalize_line(tag, text_arr, text, indent);

					if (is_list(tag))
						indent--;
				}
			},

			chars: function (ch) {
				text = get_text(text, ch);
			}
		});

		if (text != '')
			text_arr.push(text);

		return text_arr.join('\n');
	};

	function finalize_line (tag, text_arr, text, indent) {
		if (text != '')
			text_arr.push('\t'.repeat(indent) + text);

		return '';
	}

	function get_text (text, ch) {
		var head = text.trim(),
		    tail = ch.trim();

		if (supress_head_blank(head) || supress_tail_blank(tail))
			return [head, tail].filter(function (text) {
				return text != '';
			}).join('');
		else
			return [head, tail].filter(function (text) {
				return text != '';
			}).join(' ');
	}

	function is_list (tag) {
		return Boolean(~['dl', 'ol', 'ul'].indexOf(tag)).valueOf();
	}

	function make_map (str) {
		var obj = {},
		    items = str.split(",");

		for (var i = 0; i < items.length; i++)
			obj[items[i]] = true;

		return obj;
	}

	function supress_head_blank (text) {
		return Boolean(~['(', '{', '['].indexOf(text.substr(-1, 1))).valueOf();
	}

	function supress_tail_blank (text) {
		return Boolean(~[')', '}', ']', ',', '.', ';', ':', '!', '?'].indexOf(text.substr(0, 1))).valueOf();
	}
})();

var htmlString = '<p><span style="font-size: 10.0pt;">The x1 Carbon is Lenovo&#39;s lightest ThinkPad yet. It provides a QHD display that fights glare and weighs less than three pounds. Ideal for most computing tasks, and highly mobile. </span></p>\
<p><span style="font-size: 10.0pt;">Technical Specs:</span></p>\
<ul><li><span style="font-size: 10.0pt;">Intel core i5 processor</span></li><li><span style="font-size: 10.0pt;">512GB solid state drive (SSD) </span></li><li><span style="font-size: 10.0pt;">Backlit keyboard</span></li></ul>\
<p><span style="font-size: 10.0pt;"><img style="align: baseline;" title="" src="picture.jpgx" alt="" width="256" height="229" align="baseline" border="" hspace="" vspace="" /></span></p>\
<p> </p>\
<p><span style="font-size: 10.0pt;"><img style="align: baseline;" title="" src="banking_services_login_bgpic.jpgx" alt="" width="564" height="376" align="baseline" border="" hspace="" vspace="" /></span></p>';

print(HTMLtoText(htmlString));

View solution in original post

3 REPLIES 3

-O-
Kilo Patron

I have a strong feeling it uses browser DOM to help it strip HTML.

A library called HTMLParser could be of help in transforming html into text while keeping formatting.

I took the library and added a function to it: HTMLtoText.

With help of this function the html

<p><span style="font-size: 10.0pt;">The x1 Carbon is Lenovo&#39;s lightest ThinkPad yet. It provides a QHD display that fights glare and weighs less than three pounds. Ideal for most computing tasks, and highly mobile. </span></p>
<p><span style="font-size: 10.0pt;">Technical Specs:</span></p>
<ul><li><span style="font-size: 10.0pt;">Intel core i5 processor</span></li><li><span style="font-size: 10.0pt;">512GB solid state drive (SSD) </span></li><li><span style="font-size: 10.0pt;">Backlit keyboard</span></li></ul>
<p><span style="font-size: 10.0pt;"><img style="align: baseline;" title="" src="picture.jpgx" alt="" width="256" height="229" align="baseline" border="" hspace="" vspace="" /></span></p>
<p> </p>
<p><span style="font-size: 10.0pt;"><img style="align: baseline;" title="" src="banking_services_login_bgpic.jpgx" alt="" width="564" height="376" align="baseline" border="" hspace="" vspace="" /></span></p>

prints like:

The x1 Carbon is Lenovo&#39;s lightest ThinkPad yet. It provides a QHD display that fights glare and weighs less than three pounds. Ideal for most computing tasks, and highly mobile.
Technical Specs:
	Intel core i5 processor
	512GB solid state drive (SSD)
	Backlit keyboard

 

And here's the code:

/*
 * HTML5 Parser By Sam Blowes
 *
 * Designed for HTML5 documents
 *
 * Original code by John Resig (ejohn.org)
 * http://ejohn.org/blog/pure-javascript-html-parser/
 * Original code by Erik Arvidsson, Mozilla Public License
 * http://erik.eae.net/simplehtmlparser/simplehtmlparser.js
 *
 * ----------------------------------------------------------------------------
 * License
 * ----------------------------------------------------------------------------
 *
 * This code is triple licensed using Apache Software License 2.0,
 * Mozilla Public License or GNU Public License
 *
 * ////////////////////////////////////////////////////////////////////////////
 *
 * Licensed under the Apache License, Version 2.0 (the "License"); you may not
 * use this file except in compliance with the License.  You may obtain a copy
 * of the License at http://www.apache.org/licenses/LICENSE-2.0
 *
 * ////////////////////////////////////////////////////////////////////////////
 *
 * The contents of this file are subject to the Mozilla Public License
 * Version 1.1 (the "License"); you may not use this file except in
 * compliance with the License. You may obtain a copy of the License at
 * http://www.mozilla.org/MPL/
 *
 * Software distributed under the License is distributed on an "AS IS"
 * basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
 * License for the specific language governing rights and limitations
 * under the License.
 *
 * The Original Code is Simple HTML Parser.
 *
 * The Initial Developer of the Original Code is Erik Arvidsson.
 * Portions created by Erik Arvidssson are Copyright (C) 2004. All Rights
 * Reserved.
 *
 * ////////////////////////////////////////////////////////////////////////////
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version 2
 * of the License, or (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
 *
 * ----------------------------------------------------------------------------
 * Usage
 * ----------------------------------------------------------------------------
 *
 * // Use like so:
 * HTMLParser(htmlString, {
 *     start: function(tag, attrs, unary) {},
 *     end: function(tag) {},
 *     chars: function(text) {},
 *     comment: function(text) {}
 * });
 *
 * // or to get an XML string:
 * HTMLtoXML(htmlString);
 *
 * // or to get an XML DOM Document
 * HTMLtoDOM(htmlString);
 *
 * // or to inject into an existing document/DOM node
 * HTMLtoDOM(htmlString, document);
 * HTMLtoDOM(htmlString, document.body);
 *
 */

(function () {
	// Regular Expressions for parsing tags and attributes
	var startTag = /^<([-A-Za-z0-9_]+)((?:\s+[a-zA-Z_:][-a-zA-Z0-9_:.]*(?:\s*=\s*(?:(?:"[^"]*")|(?:'[^']*')|[^>\s]+))?)*)\s*(\/?)>/,
	    endTag = /^<\/([-A-Za-z0-9_]+)[^>]*>/,
	    attr = /([a-zA-Z_:][-a-zA-Z0-9_:.]*)(?:\s*=\s*(?:(?:"((?:\\.|[^"])*)")|(?:'((?:\\.|[^'])*)')|([^>\s]+)))?/g;

	// Empty Elements - HTML 5
	var empty = make_map("area,base,basefont,br,col,frame,hr,img,input,link,meta,param,embed,command,keygen,source,track,wbr");

	// Block Elements - HTML 5
	var block = make_map("address,article,applet,aside,audio,blockquote,button,canvas,center,dd,del,dir,div,dl,dt,fieldset,figcaption,figure,footer,form,frameset,h1,h2,h3,h4,h5,h6,header,hgroup,hr,iframe,ins,isindex,li,map,menu,noframes,noscript,object,ol,output,p,pre,section,script,table,tbody,td,tfoot,th,thead,tr,ul,video");

	// Inline Elements - HTML 5
	var inline = make_map("a,abbr,acronym,applet,b,basefont,bdo,big,br,button,cite,code,del,dfn,em,font,i,iframe,img,input,ins,kbd,label,map,object,q,s,samp,script,select,small,span,strike,strong,sub,sup,textarea,tt,u,var");

	// Elements that you can, intentionally, leave open
	// (and which close themselves)
	var closeSelf = make_map("colgroup,dd,dt,li,options,p,td,tfoot,th,thead,tr");

	// Attributes that have their values filled in disabled="disabled"
	var fillAttrs = make_map("checked,compact,declare,defer,disabled,ismap,multiple,nohref,noresize,noshade,nowrap,readonly,selected");

	// Special Elements (can contain anything)
	var special = make_map("script,style");

	var HTMLParser = this.HTMLParser = function (html, handler) {
		var index,
		    chars,
		    match,
		    stack = [],
		    last = html;

		stack.last = function () {
			return this[this.length - 1];
		};

		while (html) {
			chars = true;

			// Make sure we're not in a script or style element
			if (!stack.last() || !special[stack.last()]) {
				// Comment
				if (html.indexOf("<!--") == 0) {
					index = html.indexOf("-->");

					if (index >= 0) {
						if (handler.comment)
							handler.comment(html.substring(4, index));

						html = html.substring(index + 3);
						chars = false;
					}
				}
				else
					if (html.indexOf("</") == 0) {
						match = html.match(endTag);

						if (match) {
							html = html.substring(match[0].length);
							match[0].replace(endTag, parseEndTag);
							chars = false;
						}
					}
					else
						if (html.indexOf("<") == 0) {
							match = html.match(startTag);

							if (match) {
								html = html.substring(match[0].length);
								match[0].replace(startTag, parseStartTag);
								chars = false;
							}
						}

				if (chars) {
					index = html.indexOf("<");

					var text = index < 0 ? html : html.substring(0, index);

					html = index < 0 ? "" : html.substring(index);

					if (handler.chars)
						handler.chars(text);
				}
			}
			else {
				html = html.replace(new RegExp("([\\s\\S]*?)<\/" + stack.last() + "[^>]*>"), function (all, text) {
					text = text.replace(/<!--([\s\S]*?)-->|<!\[CDATA\[([\s\S]*?)]]>/g, "$1$2");

					if (handler.chars)
						handler.chars(text);

					return "";
				});

				parseEndTag("", stack.last());
			}

			if (html == last)
				throw "Parse Error: " + html;

			last = html;
		}

		// Clean up any remaining tags
		parseEndTag();

		function parseStartTag (tag, tagName, rest, unary) {
			tagName = tagName.toLowerCase();

			if (block[tagName])
				while (stack.last() && inline[stack.last()])
					parseEndTag("", stack.last());

			if (closeSelf[tagName] && stack.last() == tagName)
				parseEndTag("", tagName);

			unary = empty[tagName] || !!unary;

			if (!unary)
				stack.push(tagName);

			if (handler.start) {
				var attrs = [];

				rest.replace(attr, function (match, name) {
					var value = arguments[2] ? arguments[2] : arguments[3] ? arguments[3] : arguments[4] ? arguments[4] : fillAttrs[name] ? name : "";

					attrs.push({ 'name': name, 'value': value, 'escaped': value.replace(/(^|[^\\])"/g, '$1\\\"') });
				});

				if (handler.start)
					handler.start(tagName, attrs, unary);
			}
		}

		function parseEndTag (tag, tagName) {
			// If no tag name is provided, clean shop
			if (!tagName)
				var pos = 0;

			// Find the closest opened tag of the same type
			else
				for (var pos = stack.length - 1; pos >= 0; pos--)
					if (stack[pos] == tagName)
						break;

			if (pos >= 0) {
				// Close all the open elements, up the stack
				for (var i = stack.length - 1; i >= pos; i--)
					if (handler.end)
						handler.end(stack[i]);

				// Remove the open elements from the stack
				stack.length = pos;
			}
		}
	};

	this.HTMLtoXML = function (html) {
		var results = "";

		HTMLParser(html, {
			start: function (tag, attrs, unary) {
				results += "<" + tag;

				for (var i = 0; i < attrs.length; i++)
					results += " " + attrs[i].name + '="' + attrs[i].escaped + '"';

				results += ">";
			},

			end: function (tag) {
				results += "</" + tag + ">";
			},

			chars: function (text) {
				results += text;
			},

			comment: function (text) {
				results += "<!--" + text + "-->";
			}
		});

		return results;
	};

	this.HTMLtoDOM = function (html, doc) {
		// There can be only one of these elements
		var one = make_map("html,head,body,title");

		// Enforce a structure for the document
		var structure = { 'link': "head", 'base': "head" };

		if (!doc) {
			if (typeof DOMDocument != "undefined") {
				doc = new DOMDocument();
			}
			else
				if (typeof document != "undefined" && document.implementation && document.implementation.createDocument) {
					doc = document.implementation.createDocument("", "", null);
				}
				else
					if (typeof ActiveX != "undefined") {
						doc = new ActiveXObject("Msxml.DOMDocument");
					}
		}
		else
			doc = doc.ownerDocument || doc.getOwnerDocument && doc.getOwnerDocument() || doc;

		var elems = [],
		    documentElement = doc.documentElement || doc.getDocumentElement && doc.getDocumentElement();

		// If we're dealing with an empty document then we
		// need to pre-populate it with the HTML document structure
		if (!documentElement && doc.createElement)(function () {
			var html = doc.createElement("html");
			var head = doc.createElement("head");

			head.appendChild(doc.createElement("title"));
			html.appendChild(head);
			html.appendChild(doc.createElement("body"));

			doc.appendChild(html);
		})();

		// Find all the unique elements
		if (doc.getElementsByTagName)
			for (var i in one)
				one[i] = doc.getElementsByTagName(i)[0];

		// If we're working with a document, inject contents into
		// the body element
		var curParentNode = one.body;

		HTMLParser(html, {
			start: function (tagName, attrs, unary) {
				// If it's a pre-built element, then we can ignore
				// its construction
				if (one[tagName]) {
					curParentNode = one[tagName];

					if (!unary)
						elems.push(curParentNode);

					return;
				}

				var elem = doc.createElement(tagName);

				for (var attr in attrs)
					elem.setAttribute(attrs[attr].name, attrs[attr].value);

				if (structure[tagName] && typeof one[structure[tagName]] != "boolean")
					one[structure[tagName]].appendChild(elem);
				else
					if (curParentNode && curParentNode.appendChild)
						curParentNode.appendChild(elem);

				if (!unary) {
					elems.push(elem);
					curParentNode = elem;
				}
			},

			end: function (tag) {
				elems.length -= 1;

				// Init the new parentNode
				curParentNode = elems[elems.length - 1];
			},

			chars: function (text) {
				curParentNode.appendChild(doc.createTextNode(text));
			},

			comment: function (text) {
				// create comment node
			}
		});

		return doc;
	};

	this.HTMLtoText = function (html) {
		var indent = 0,
		    text = '',
		    text_arr = [];

		HTMLParser(html, {
			start: function (tag, attr_arr, is_unary) {
				if (block[tag]) {
					text = finalize_line(tag, text_arr, text, indent);

					if (is_list(tag))
						indent++;
				}
			},

			end: function (tag) {
				if (block[tag]) {
					text = finalize_line(tag, text_arr, text, indent);

					if (is_list(tag))
						indent--;
				}
			},

			chars: function (ch) {
				text = get_text(text, ch);
			}
		});

		if (text != '')
			text_arr.push(text);

		return text_arr.join('\n');
	};

	function finalize_line (tag, text_arr, text, indent) {
		if (text != '')
			text_arr.push('\t'.repeat(indent) + text);

		return '';
	}

	function get_text (text, ch) {
		var head = text.trim(),
		    tail = ch.trim();

		if (supress_head_blank(head) || supress_tail_blank(tail))
			return [head, tail].filter(function (text) {
				return text != '';
			}).join('');
		else
			return [head, tail].filter(function (text) {
				return text != '';
			}).join(' ');
	}

	function is_list (tag) {
		return Boolean(~['dl', 'ol', 'ul'].indexOf(tag)).valueOf();
	}

	function make_map (str) {
		var obj = {},
		    items = str.split(",");

		for (var i = 0; i < items.length; i++)
			obj[items[i]] = true;

		return obj;
	}

	function supress_head_blank (text) {
		return Boolean(~['(', '{', '['].indexOf(text.substr(-1, 1))).valueOf();
	}

	function supress_tail_blank (text) {
		return Boolean(~[')', '}', ']', ',', '.', ';', ':', '!', '?'].indexOf(text.substr(0, 1))).valueOf();
	}
})();

var htmlString = '<p><span style="font-size: 10.0pt;">The x1 Carbon is Lenovo&#39;s lightest ThinkPad yet. It provides a QHD display that fights glare and weighs less than three pounds. Ideal for most computing tasks, and highly mobile. </span></p>\
<p><span style="font-size: 10.0pt;">Technical Specs:</span></p>\
<ul><li><span style="font-size: 10.0pt;">Intel core i5 processor</span></li><li><span style="font-size: 10.0pt;">512GB solid state drive (SSD) </span></li><li><span style="font-size: 10.0pt;">Backlit keyboard</span></li></ul>\
<p><span style="font-size: 10.0pt;"><img style="align: baseline;" title="" src="picture.jpgx" alt="" width="256" height="229" align="baseline" border="" hspace="" vspace="" /></span></p>\
<p> </p>\
<p><span style="font-size: 10.0pt;"><img style="align: baseline;" title="" src="banking_services_login_bgpic.jpgx" alt="" width="564" height="376" align="baseline" border="" hspace="" vspace="" /></span></p>';

print(HTMLtoText(htmlString));

Jan Ujcic1
Tera Contributor

Hi, thank you for the idea. I'll try to test this and see how it behaves in my situation. 

 

Do you know of any OOTB things ServiceNow or TinyMCE could offer? I would be more inclined to use something that is in line with what ServiceNow is building rather than handling any issues that might appear from using an old function - but it still is a great backup plan.

As far as I can tell TinyMCE relies heavily on DOM in its functionality which is not available Server side. Besides the API you already mentioned, I believe GlideStringUtil also had such a method - stripHTML, but also does not "format" the output, plus it is no longer available in newer versions of SN. So given the constraint - to run server side, the option I have described is the only one I know of.