- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-15-2019 06:00 AM
This is more of a general Javascript question (probably) than one specifically related to what happens in SN, but nothing I have found in an entire day's research has gotten me closer to an answer, so I'm wondering if anybody here has encountered (and conquered) this particular dragon.
What I am tasked with is figuring out how to remove embedded HTML tags from a field. And I'm almost there. In fact, removing tags is a relatively simple exercise (once one gets over the fear of regex), but there's a bit of wisdom I' heard once that states that the stories of most personal tragedies begin with the words "I decided"... And thus it is in this story. "I decided" htta it would be a good idea to replace line breaks and paragraphs with actual newlines. Which again seems like it would be a simple replacement, right? Something like this:
.replace(/<\s*\/?br\s*[\/]?>/gi, '\n')
Now I'll begin by telling you the regex works. I confirmed this by replacing the '\n' with 'line break' and with the input:
Niiiice<br/><div tabindex="-1" class="_rp_f1 _rp_g1" id="ItemHeader.ToContainer" role="heading"><span class="_rp_i1 ms-fwt-sb ms-font-color-black _rp_h1">To:</span> Somebody whose name is unimportant
that (plus some other replacements not shown here for brevity) results in the output
Niiiice line break To: Somebody whose name is unimportant
So it replaced the <br> with the literal "line break" which tells me the search was successful. But returning to the original .replace() call, the output when I specify '\n' as the replacement string looks like:
Niiiice To: Somebody whose name is unimportant
No line break. Big frowny face.
Knowing that the script is finding the line break successfully, I'm forced to conclude that Javascript simply doesn't like outputting a '\n' as a newline. But wait, there's more. It also doesn't like any of these options -- all of which stubbornly yield the same result:
- "\n",
- '\r''\n',
- '\r\n'
- "\r\n",
- '\r',
- '\0x0a' (or 0d0a)
- '\u000a' (or 000d 000a)
- os.EOL (require os)
And given that this also happens in Node JS at the command prompt if I log output to the console, I'm starting to think this is just one more example of Javascript being Javascript, and wondering if it really matters that much if the breaks don't break. But every blog/forum/post/tutorial I run across that has an answer that sounds even remotely on point says that what I originally did should work. And yet. So before I tap out entirely I figured I'd put the question to the brethren here and see if anyone else has run up against this.
Solved! Go to Solution.
- Labels:
-
Scripting and Coding
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-21-2019 11:25 AM
I see what did it now. It wasn't that the newlines weren't being inserted, it was that the last replacement before the trim that was taking them back out again
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-21-2019 07:53 AM
Did you try my suggested amendment above? It added line breaks for me....
function removeTags(string){
return string.replace(/<br\/>/gm, '\n')
.replace(/(<([^>]+)>)/ig, '')
.replace(/ /gi,' ')
.replace(/&tab;/gi,' ')
.replace(/&/gi,'&')
.replace(/>/gi,'>')
.replace(/</gi,'<')
.replace(/'/gi,"'")
.replace(/\s{2,}/g, ' ')
.trim();
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-21-2019 08:17 AM
Still testing with it. It appears to work when the one specific formulation of <br/> is used, but if I try to expand it to capture other permutations:
it fails.
Also apparently problematic is using the backtick (``) to define a multi-line literal. When I try that, even yours fails. That's supposed to be legit in JS, and I haven't gotten any complaints about it from the interpreter, so I assume it's a legal expression, but it's throwing a wrench in the works for this purpose.
As long as everything is in one line, it seems to work until I try to include </br> or <br /> or <br> all of which are technically "legal".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-21-2019 08:30 AM
Okay, as long as I keep it all on one line, the
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-21-2019 09:20 AM
hi there,
try below code , usually i always use it to deal with HTML
Now remove all the HTML tags from the HTML body
htmlcode = htmlcode.replace(/<style([\s\S]*?)<\/style>/gi, '');
htmlcode = htmlcode.replace(/<script>/gi, '');
htmlcode = htmlcode.replace(/<\/div>/ig, '\n');
htmlcode = htmlcode.replace(/<\/li>/ig, '\n');
htmlcode = htmlcode.replace(/<li>/ig, ' * ');
htmlcode = htmlcode.replace(/<\/ul>/ig, '\n');
htmlcode = htmlcode.replace(/<\/p>/ig, '\n');
htmlcode = htmlcode.replace(/<br\s*[\/]?>/gi, "\n");
htmlcode = htmlcode.replace(/<[^>]+>/ig, '');
htmlcode=htmlcode.replace(' ','');
If this resolves your query, please mark my comments as correct and helpful
.
Regards,
Ajay Chavan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎10-21-2019 10:24 AM
I saw this on your profile when you suggested it before. The problem I've been running into isn't with finding the tags -- that part works. It's been with inserting the new line in the output when the regex detects a </p> or a variation of <br /> (<br>, </br>, <br />, <br/>...). And in that regard I'm already doing what you suggested. I suspect it's working as well as it's going to. And it seems to be working well enough for what it was needed for.