How to parse incoming email which is HTML format

Community Alums · ‎10-14-2024

Fieldin the incoming email is coming like below:

<td valign="top" align="left" class="updates-diff-label" style="border-collapse:collapse; border-spacing:0px; color:#172b4d; padding:0px; max-width:150px; vertical-align:top; color:#5e6c84; text-align:left; padding-right:10px; padding-bottom:5px">
Detected in:</td>

If you see above we have "Detected in" field which I need to extract. With its value in another html as below

<td valign="top" align="left" class="updates-diff-content" style="border-collapse:collapse; border-spacing:0px; color:#172b4d; padding:0px; vertical-align:top; text-align:left; padding-bottom:5px">
Development</td>

The value i need to extract is "Development"

I wrote below Inbound action script just to check if I can extract something or not:

var det=email.body.detected_in;

If I try to print above its not printing. Can you guide how to read values of such fields. We have all the fields coming like this in incoming email

Please suggest?

Shubham_Jain · ‎10-14-2024

Approach

You can use regular expressions to extract both the label (Detected in) and its corresponding value (Development). Since the content is HTML, we can look for the <td> tags that contain these values.

// Get the email body as a string
var emailBody = email.body_html; // Use body_html to get the HTML content

// Regular expression to capture the "Detected in" field and its value
var detectedInPattern = /<td.*?>Detected in:<\/td>\s*<td.*?>(.*?)<\/td>/i;

// Execute the regex pattern on the email body
var detectedInMatch = detectedInPattern.exec(emailBody);

if (detectedInMatch && detectedInMatch[1]) {
    // Extracted value of "Detected in" field
    var detectedInValue = detectedInMatch[1].trim();
    gs.log('Detected in: ' + detectedInValue);  // This should print "Development"
} else {
    gs.log('No match found for Detected in field');
}

✔️ If this solves your issue, please mark it as Correct.

✔️ If you found it helpful, please mark it as Helpful.

—
Shubham Jain

View solution in original post

Amit Verma · ‎10-14-2024

Hi @Community Alums

I will suggest you to first get rid of the HTML tags from the email body using the regex

/<[^>]+>/g

Post that, you will be left with Detected in:Development. With this, you can split on : and extract Development. Refer below snips :

var htmlString = '<td valign="top" align="left" class="updates-diff-label" style="border-collapse:collapse; border-spacing:0px; color:#172b4d; padding:0px; max-width:150px; vertical-align:top; color:#5e6c84; text-align:left; padding-right:10px; padding-bottom:5px">Detected in:</td><td valign="top" align="left" class="updates-diff-content" style="border-collapse:collapse; border-spacing:0px; color:#172b4d; padding:0px; vertical-align:top; text-align:left; padding-bottom:5px">Development</td>';
var plainText = htmlString.replace(/<[^>]+>/g,'').trim();
var detectedIn = (plainText.split(':')[1]).trim();
gs.print(detectedIn);

Output -

Thanks and Regards

Amit Verma

Please mark this response as correct and helpful if it assisted you with your question.