- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2019 06:00 AM
This is more of a general Javascript question (probably) than one specifically related to what happens in SN, but nothing I have found in an entire day's research has gotten me closer to an answer, so I'm wondering if anybody here has encountered (and conquered) this particular dragon.
What I am tasked with is figuring out how to remove embedded HTML tags from a field. And I'm almost there. In fact, removing tags is a relatively simple exercise (once one gets over the fear of regex), but there's a bit of wisdom I' heard once that states that the stories of most personal tragedies begin with the words "I decided"... And thus it is in this story. "I decided" htta it would be a good idea to replace line breaks and paragraphs with actual newlines. Which again seems like it would be a simple replacement, right? Something like this:
.replace(/<\s*\/?br\s*[\/]?>/gi, '\n')
Now I'll begin by telling you the regex works. I confirmed this by replacing the '\n' with 'line break' and with the input:
Niiiice<br/><div tabindex="-1" class="_rp_f1 _rp_g1" id="ItemHeader.ToContainer" role="heading"><span class="_rp_i1 ms-fwt-sb ms-font-color-black _rp_h1">To:</span> Somebody whose name is unimportant
that (plus some other replacements not shown here for brevity) results in the output
Niiiice line break To: Somebody whose name is unimportant
So it replaced the <br> with the literal "line break" which tells me the search was successful. But returning to the original .replace() call, the output when I specify '\n' as the replacement string looks like:
Niiiice To: Somebody whose name is unimportant
No line break. Big frowny face.
Knowing that the script is finding the line break successfully, I'm forced to conclude that Javascript simply doesn't like outputting a '\n' as a newline. But wait, there's more. It also doesn't like any of these options -- all of which stubbornly yield the same result:
- "\n",
- '\r''\n',
- '\r\n'
- "\r\n",
- '\r',
- '\0x0a' (or 0d0a)
- '\u000a' (or 000d 000a)
- os.EOL (require os)
And given that this also happens in Node JS at the command prompt if I log output to the console, I'm starting to think this is just one more example of Javascript being Javascript, and wondering if it really matters that much if the breaks don't break. But every blog/forum/post/tutorial I run across that has an answer that sounds even remotely on point says that what I originally did should work. And yet. So before I tap out entirely I figured I'd put the question to the brethren here and see if anyone else has run up against this.
Solved! Go to Solution.
- Labels:
-
Scripting and Coding
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2019 11:25 AM
I see what did it now. It wasn't that the newlines weren't being inserted, it was that the last replacement before the trim that was taking them back out again
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2019 11:29 AM
Actually, when I broke the function down to individual assignments as you did, I was able to figure out what happened, so this was helpful after all. Turns out regex treats a newline as whitespace, so the last replacement I was doing basically recombined the line I had split earlier. smh.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2019 10:34 AM
Not that this is adding anything constructive as I believe above you have already discounted the code that I am about to provide. However, wanted to jump in as we are doing the exact same thing, i.e. taking the Acceptance Criteria field, copying it to a string field. What worked for us was just this:
text = text
.replace(/<\/p>/g, '\n')
.replace(/<\/li>/g, '\n')
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2019 10:50 AM
By all accounts that I have seen, that should work. And when I test it with my favorite regex validation site (https://regex101.com/ -- excellent resource, by the way) it appears for all the world like it would do the job perfectly. What's no end of frustrating is that the much more complex expression I used for converting break tags works. Which leads me to wonder if it isn't something in the sequence of .replace() calls that is the culprit here. My current version after takign all of the input here into account looks like this:
function removeTags(string){
return string.replace(/<\s*\/?br\s*\/?>/gim, '\n')
.replace(/<\/?p\/?>/gim, '\n')
.replace(/(<([^>]+)>)/gi, '')
.replace(/ /gi,' ')
.replace(/&tab;/gi,' ')
.replace(/&/gi,'&')
.replace(/>/gi,'>')
.replace(/</gi,'<')
.replace(/'/gi,"'")
.replace(/\s{2,}/g, ' ')
.trim();
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2019 11:25 AM
I see what did it now. It wasn't that the newlines weren't being inserted, it was that the last replacement before the trim that was taking them back out again