Advanced Parsing of Emails for Inbound Email Actions

jmiskey · ‎05-29-2025

I have received some great help in the past here in parsing some inbound emails to update certain fields on the user table.

The Subject Line of the email looks like this:

Security Notification - FirstName LastName - EmployeeNumber - Effective Date - yyyy mm dd - ACTION REQUIRED

where the items in red are variables. So an example looks like this:

Security Notification - John Doe - AB01234 - Effective Date - 2025 06 07 - ACTION REQUIRED

And the beginning of the body of the email looks like this:

Regarding: John Doe
Former Name: [not available]
Preferred Last Name: Doe
Preferred First Name: John
Employee Number: AB01234
Effective Date of Change: 2025 06 07
Previous Manager: Jim Smith
Proposed Manager: Barb Jones

The issue we have is with the Employee Number. It used to be exactly 6 or 7 characters long, and some people here helped me to come up with some great code that parses it perfectly. However, due to recent changes, now the Employee Number can also be 8 or 9 characters long.

Here are all the different formats it can be (where "A" represents a letter and "0" represents a number):

000000

0000000

AA00000

AA000000

AAA000000

We have two different Inbound Email Actions currently, which work on Employee Numbers that are exactly 6 or 7 characters long.

Here is the section of script that pulls the Employee Number from the Subject Line:

//get values from email subject
var esub = email.subject;

// Regular expression to match the pattern
var regex = /Security Notification - (.+?) - (\d{6,7}) - Effective Date - (\d{4} \d{2} \d{2}) - ACTION REQUIRED/;
var match = esub.match(regex);
    
if (match) {
    var ee_name = match[1];
    var ee_num = match[2];
    var tdate = match[3];
}
gs.log("ADD TRANSFER DATE: employee number = " + ee_num);

Our second Inbound Email Action pulls it from the body of the email, and that section of script looks like this:

//get values from email body
var ebody = email.body_text;

//find employee number
var ee_num = '';
var match = ebody.match(/Employee Number:\s*(\d{6,7})/);
if (match) {
    ee_num = match[1];
}
gs.log("ADD TRANSFER DATE: employee number = " + ee_num);

I first tried updating the \d{6,7} part in each of the code to \d{6,7,8,9} and then tried \d{6,9}. Neither one worked (I was hoping it would be that easy, but I guess that shows that I do not really understand how that code works).

How can I update my code so that it works for all the different lengths/formats of my Employee Number (which can be between 6 and 9 characters long)? I really don't care if we update the Subject or the Body, I can use either one.

Thanks

John Gilmore · ‎05-29-2025

@jmiskey \d{6,9} would be correct if it were all numerals, this regex would be looking for a string of length 6-9 consisting of the integers 0-9. Given the way you are using the regex we need to update it to accept alpha characters in addition to numeric digits.

\w{6,9} would be the simplest but also includes the underscore character. If you aren't worried about catching a value that doesn't fit the formats you presented then you can simple use this.

If you need it to be specific to the exact position of the alpha characters then you will need a much more complex regex that better defines all the possible formats. If this is the case let me know.

View solution in original post

John Gilmore · ‎05-29-2025

@jmiskey \d{6,9} would be correct if it were all numerals, this regex would be looking for a string of length 6-9 consisting of the integers 0-9. Given the way you are using the regex we need to update it to accept alpha characters in addition to numeric digits.

\w{6,9} would be the simplest but also includes the underscore character. If you aren't worried about catching a value that doesn't fit the formats you presented then you can simple use this.

If you need it to be specific to the exact position of the alpha characters then you will need a much more complex regex that better defines all the possible formats. If this is the case let me know.

jmiskey · ‎05-29-2025

Thanks for the reply. No, I do not need to do any sort of validation of the values, I just need to pull them, whatever they may be (ignoring any trailing spaces).

I will make your suggested change, and run some testing on the various values and report back if it seems to work or not.

jmiskey · ‎05-29-2025

John,

After testing all scenarios, it appears that the formula works exactly as I need it to.

Thank you!

Would you mind explaining what the different arguments/values mean (i.e. "d" versus "w")?

James Chun · ‎05-29-2025

\w means 'any word character' (which includes the underscore as per John's comment) where as \d means 'any digit'.

So, using \w{6,9} would allow you to enter any word characters as long as its length is between 6-9.

As John's comment, if you need to ensure the format (e.g. first 3 characters must be alphabet), you will need to use more 'complex' regex expression.

FYI, you can use regex101.com to test your regex expression