Narsing1
Mega Sage

Background: I was looking for a solution on how to read html tags without using DOM in a perfect way for a long time. By using Regular Expressions, you can do almost everything.  We will now see an example where anyone can think of using the Regular expressions when we don't get a proper solution.

Example: I have taken Microsoft Support Search as an example for using regular expressions instead of using a Microsoft Graph Connector/Microsoft Graph API.  However, for this kind of scenario, I don't see any solution provided by either Microsoft Graph API / in the community (but please do a comment if it has).

Microsoft Support Website: Click Here

Here I used the term as "Microsoft Teams" and we got the results for the same as below.

find_real_file.png

find_real_file.png

If you observe the Search Results URL, it will be something like this

https://support.microsoft.com/en-US/search/results?query=microsoft+teams&isEnrichedQuery=false 

And if you right click on the page and click on  "View page source".  You will see all the HTML code of the results.

find_real_file.png

Now, we will see how this results can be included as an "External Search Source" to the Portal search in Servicenow.

Open your Instance ==> go to the Portals ==> Click on "SP" Portal ==> Under the "Search Sources" create a new Search source called "Microsoft Support Search"

find_real_file.png

On this Search Source, Copy the below code into "Search Page Template"

find_real_file.png

Search page template

<div>
  <a href="{{item.url}}" class="h4 text-primary m-b-sm block" target="_blank">
    <span ng-bind-html="highlight(item.primary, data.q)"></span>
    
  </a>
  <span>{{item.short_desc}}</span>
  
</div>

On the "Data fetch script" use the below script (Make sure you check the option for "Is scripted source" to true as below and copy the code and save this source.

find_real_file.png

Data fetch script

(function(query) {
    var results = [];
    /* Calculate your results here. */
    try {
        var enQuery = GlideStringUtil.urlEncode(query);
        var eURL = "https://support.microsoft.com/en-US/search/results?query=" + enQuery;
        var ws = new sn_ws.RESTMessageV2();
        ws.setHttpMethod("get");
        ws.setEndpoint(eURL);
        var jsonOutput = ws.execute();
        if (jsonOutput) {
            var responseBody = JSON.stringify(jsonOutput.getBody());
            responseBody = responseBody.replaceAll('\\r', ' ');
            responseBody = responseBody.replaceAll('\\t', ' ');
            responseBody = responseBody.replaceAll('\\n', ' ');
            responseBody = responseBody.replaceAll('\\', '');
            responseBody = responseBody.replaceAll('"', '');
            responseBody = responseBody.replaceAll("<a class=header href=", "data-si-area");
            var allurls = responseBody.match(/data-si-area(.*?)data-bi-area/g);
            if (JSUtil.notNil(allurls)) {
                if (allurls.length > 0) {
                    for (var u = 0; u < allurls.length; u++) {
                        var mm = allurls[u].toString().replaceAll("data-bi-area", "");
                        mm = mm.replaceAll("data-si-area", "");
                        var labelurl = mm.split("aria-label=");
                        var sdesc = labelurl[1].toString().trim().replaceAll("<b>", "");
                        try {
                            sdesc = removeTags(sdesc);
                        } catch (ee) {
                            sdesc = labelurl[1].toString().trim().replaceAll("<b>", "");
                        }
                        sdesc = sdesc.replaceAll("</b>", "");
                        var rslt = {
                            "url": labelurl[0].toString().trim(),
                            "target": "_blank",
                            "primary": labelurl[1].toString().trim(),
                            "short_desc": sdesc
                        };
                        results.push(rslt);
                    }
                }
            }
        }
    } catch (e) {

    }

    return results;
})(query);

function removeTags(str) {
    var a = str.replace(/<\/?[^>]+(>|$)/g, ""); 
    var b = a.replace(/&amp;/g, '&'); 
    return b.replace(/&#(\d+);/g, function(match, dec) {
        return String.fromCharCode(dec);
    });
}

Now, go to the SP Portal and type "Microsoft teams" like how you used in Microsoft Support site.  You will see the results something like this.

Note:  The results may vary based on the priority of that article by Microsoft at that point of time, but the results that you see is entirely from Microsoft Support site by just reading the HTML response

If you observe the above code, its completely reading a html tags by using the Regular expressions & String functions.  In future, if the Microsoft support uses a different pattern, you may change your logic based on that..

Results in Servicenow Portal

find_real_file.png

Please do a comment if you have any other solution for reading HTML tags in a better way on Server side.

Thanks,

Narsing

Comments
Job1
Tera Contributor

Wonderful! We have been looking for close to 6 months for this! 

Thanks a lot!!!!

Job1
Tera Contributor

Hi

 

what was the reason you put in this code? 

 

function removeTags(str) {
    var a = str.replace(/<\/?[^>]+(>|$)/g, ""); 
    var b = a.replace(/&amp;/g, '&'); 
    return b.replace(/&#(\d+);/g, function(match, dec) {
        return String.fromCharCode(dec);
    });

It is fine for English searches, but once you search in French with special characters the result coming back is not good. 

 

Eg. search term: créer un site dans sharepoint

 

Brings back this url: 

https://support.microsoft.com/fr-fr/office/cr&#xE9;er-un-site-communautaire-dans-sharepoint-8a890f58...

Narsing1
Mega Sage

Hi,

I am not converting the url and looks like it is taking the exact url from the source.

Narsing1_0-1666206781600.png

I think it needs a little bit tweak to convert to unicode which will resolve the issue.

For your other question, "removeTags" is being used to make sure it doesn't have the html tags over the short description.  I could see some of the html tags even though with filters, thats why to strip off these, used this function.

 

The Microsoft support site is automatically taking care of converting hex code to html char code.  The same thing needs to be done in the code level

Narsing1_0-1666227685190.png

 

I will try from my end.

 

Thanks for your inputs.

Narsing

Srilatha2
Tera Explorer

function removeTags(str) {

var a = str.replace(/<\/?[^>]+(>|$)/g, "");
//to remove &amp;
var b = a.replace(/&amp;/g, '&');
//to remove HTML characters.
return b.replace(/&#(\d+);/g, function(match, dec) {
return String.fromCharCode(dec);

});
}

Job1
Tera Contributor

Hi @Narsing1 Narsing1, did you have any luck? 

Narsing1
Mega Sage

Hi @Job1 ,

Spent some time this weekend.  Need to convert to UTF8 to be able to redirect to the correct URL.  Here is the solution.

Create a Script Include and name that as "utf8Utils".  Copy the below script 

 

var utf8Utils = Class.create();
utf8Utils.prototype = {
    initialize: function() {},
    findHexAndConverttoUTF8: function(str) {
        str = str.replace(/&(.*?);/g, function(match) {
            var repl = new utf8Utils().toUTF8(match);
            return repl;
        });
		return str;
    },
    toUTF8: function(str) {
        var utf8 = [];
        for (var i = 0; i < str.length; i++) {
            var charcode = str.charCodeAt(i);
            if (charcode < 0x80) utf8.push(charcode);
            else if (charcode < 0x800) {
                utf8.push(0xc0 | (charcode >> 6),
                    0x80 | (charcode & 0x3f));
            } else if (charcode < 0xd800 || charcode >= 0xe000) {
                utf8.push(0xe0 | (charcode >> 12),
                    0x80 | ((charcode >> 6) & 0x3f),
                    0x80 | (charcode & 0x3f));
            }
            // surrogate pair
            else {
                i++;
                charcode = ((charcode & 0x3ff) << 10) | (str.charCodeAt(i) & 0x3ff);
                utf8.push(0xf0 | (charcode >> 18),
                    0x80 | ((charcode >> 12) & 0x3f),
                    0x80 | ((charcode >> 6) & 0x3f),
                    0x80 | (charcode & 0x3f));
            }
        }
        return utf8.join(",").replaceAll(",", "");
    },
    type: 'utf8Utils'
};

 

On the Data Fetch Script, use like this

Narsing1_0-1667747957603.png

Example: (For testing purpose, run this example using "Scripts - Background" after you copy the above script include and validate with the returned value)

 

var s = "https://support.microsoft.com/fr-fr/office/cr&#xE9;er-un-site-communautaire-dans-sharepoint-8a890f58-9492-4be1-b6b3-481fb0f9b4a5";
s = new utf8Utils().findHexAndConverttoUTF8(s);
gs.print(s);

 

 Output:

 

*** Script: https://support.microsoft.com/fr-fr/office/crundefineder-un-site-communautaire-dans-sharepoint-8a890f58-9492-4be1-b6b3-481fb0f9b4a5

 

Thanks,

Narsing

Job1
Tera Contributor

Thanks! Will try it out!

Version history
Last update:
‎09-17-2022 11:14 AM
Updated by: