Why does Javascript hate me?

James Bengel
Giga Expert

This is more of a general Javascript question (probably) than one specifically related to what happens in SN, but nothing I have found in an entire day's research has gotten me closer to an answer, so I'm wondering if anybody here has encountered (and conquered) this particular dragon.

What I am tasked with is figuring out how to remove embedded HTML tags from a field.  And I'm almost there.  In fact, removing tags is a relatively simple exercise (once one gets over the fear of regex), but there's a bit of wisdom I' heard once that states that the stories of most personal tragedies begin with the words "I decided"...  And thus it is in this story.  "I decided" htta it would be a good idea to replace line breaks and paragraphs with actual newlines.  Which again seems like it would be a simple replacement, right?  Something like this:

.replace(/<\s*\/?br\s*[\/]?>/gi, '\n')

Now I'll begin by telling you the regex works. I confirmed this by replacing the '\n' with 'line break' and with the input:

Niiiice<br/><div tabindex="-1" class="_rp_f1 _rp_g1" id="ItemHeader.ToContainer" role="heading"><span class="_rp_i1 ms-fwt-sb ms-font-color-black _rp_h1">To:</span> Somebody whose name is unimportant

that (plus some other replacements not shown here for brevity) results in the output 

Niiiice line break To: Somebody whose name is unimportant

So it replaced the <br> with the literal "line break" which tells me the search was successful.  But returning to the original .replace() call, the output when I specify '\n' as the replacement string looks like:

Niiiice To: Somebody whose name is unimportant

No line break.  Big frowny face.

Knowing that the script is finding the line break successfully, I'm forced to conclude that Javascript simply doesn't like outputting a '\n' as a newline.  But wait, there's more. It also doesn't like any of these options -- all of which stubbornly yield the same result:

  • "\n",
  • '\r''\n',
  • '\r\n'
  • "\r\n",
  • '\r',
  • '\0x0a' (or 0d0a)
  • '\u000a' (or 000d 000a)
  • os.EOL (require os)

And given that this also happens in Node JS at the command prompt if I log output to the console, I'm starting to think this is just one more example of Javascript being Javascript, and wondering if it really matters that much if the breaks don't break.  But every blog/forum/post/tutorial I run across that has an answer that sounds even remotely on point says that what I originally did should work. And yet. So before I tap out entirely I figured I'd put the question to the brethren here and see if anyone else has run up against this.

1 ACCEPTED SOLUTION

James Bengel
Giga Expert

I see what did it now. It wasn't that the newlines weren't being inserted, it was that the last replacement before the trim that was taking them back out again 

.replace(/\s{2,}/g, ' ');
 
Once the tags ad been replaced by line breaks, the \s{2,} was seeing them as whitespace, and pulling the line back up again, to collapse the extra spaces into a single space.  And since the <br/> tag wasn't followed by a space and the </p> tag was, there appeared to be some inexplicable discrepancy between the two replacements.  But it was explicable once I broke the collection of .replace() calls into separate assignments and stuck a console.log between each one so I could see what the output from each call was.  With that done, I could see that the </p> DID initially convert as expected, and that somewhere below that it was "unconverted". At that point, it was just a matter of working back through the list of replacements to see which one preceded the "unconversion", and at that point it was obvious what had happened.
 
BUT, since the /\s{2,}/ is the final replacement before the trim, and every other tag that isn't converted to a discrete value is converted to a single space, all I had to do was replace the /\s{2,}/ with /  / (two literal spaces), et voila!
 
My thanks to all who replied; every suggestion offered something to try, which led to the clues that ultimately led to a solution.

View solution in original post

18 REPLIES 18

Actually, when I broke the function down to individual assignments as you did, I was able to figure out what happened, so this was helpful after all. Turns out regex treats a newline as whitespace, so the last replacement I was doing basically recombined the line I had split earlier. smh.

Duncan Pederse1
Giga Expert

Not that this is adding anything constructive as I believe above you have already discounted the code that I am about to provide. However, wanted to jump in as we are doing the exact same thing, i.e. taking the Acceptance Criteria field, copying it to a string field. What worked for us was just this:

text = text
.replace(/<\/p>/g, '\n')
.replace(/<\/li>/g, '\n')

By all accounts that I have seen, that should work.  And when I test it with my favorite regex validation site (https://regex101.com/ -- excellent resource, by the way) it appears for all the world like it would do the job perfectly.  What's no end of frustrating is that the much more complex expression I used for converting break tags works.  Which leads me to wonder if it isn't something in the sequence of .replace() calls that is the culprit here.  My current version after takign all of the input here into account looks like this:

 
function removeTags(string){
    return string.replace(/<\s*\/?br\s*\/?>/gim, '\n')
                .replace(/<\/?p\/?>/gim, '\n')
                .replace(/(<([^>]+)>)/gi, '')
                .replace(/&nbsp;/gi,' ')
                .replace(/&tab;/gi,' ')
                .replace(/&amp;/gi,'&')
                .replace(/&gt;/gi,'>')
                .replace(/&lt;/gi,'<')
                .replace(/&apos;/gi,"'")
                .replace(/\s{2,}/g, ' ')
                .trim();
  }
And I think it's working about as well as it's going to.  I may try breaking it down to individual replace calls as 
AjayChavan has done above, just so I can log the results of each one, but functionally there should be no difference between the two.  I think before I do that amount of refactoring I'll try just reversing the </p> and <br /> replacements and see if that makes a difference.  Worst case, the newlines aren't in the output and the line wraps in the report.  Which appears ot be what's happening now.

James Bengel
Giga Expert

I see what did it now. It wasn't that the newlines weren't being inserted, it was that the last replacement before the trim that was taking them back out again 

.replace(/\s{2,}/g, ' ');
 
Once the tags ad been replaced by line breaks, the \s{2,} was seeing them as whitespace, and pulling the line back up again, to collapse the extra spaces into a single space.  And since the <br/> tag wasn't followed by a space and the </p> tag was, there appeared to be some inexplicable discrepancy between the two replacements.  But it was explicable once I broke the collection of .replace() calls into separate assignments and stuck a console.log between each one so I could see what the output from each call was.  With that done, I could see that the </p> DID initially convert as expected, and that somewhere below that it was "unconverted". At that point, it was just a matter of working back through the list of replacements to see which one preceded the "unconversion", and at that point it was obvious what had happened.
 
BUT, since the /\s{2,}/ is the final replacement before the trim, and every other tag that isn't converted to a discrete value is converted to a single space, all I had to do was replace the /\s{2,}/ with /  / (two literal spaces), et voila!
 
My thanks to all who replied; every suggestion offered something to try, which led to the clues that ultimately led to a solution.