Retrieving and Processing Files from Amazon S3

Tomko · ‎02-26-2021

Retrieving and Processing Files from Amazon S3

Because FTP is so last year

In an increasingly “as a service” world, we were finding that, for us to import and export data files from and to other systems, we were having to build jobs outside of the platform to move those files between Amazon S3 and FTP. So we decided to cut out the middleman.

What

Our objective was to be able to list the contents of an Amazon S3 bucket, and then selectively read and process files (objects) from the bucket, for the purpose of creating and updating records in our ServiceNow instance. Given that we have IntegrationHub Enterprise, we had the option to leverage the Amazon S3 Spoke on the ServiceNow Store.

Note: the approach here can be adapted to work via scheduled jobs, script includes, and business rules, but if you have the option to use IntegrationHub and Flow Designer, that is a much better way to go – the interface makes it really easy to organize your work and build re-usable functions.

How

Step 1: Download and Install the Amazon S3 Spoke from the ServiceNow Store

We’ll be building a number of custom actions, but starting with the spoke gives us some solid core functionality, and allows us to work in an existing application scope.

Step 2: Configure your Connection and Credential Alias

Goes without saying, but easy to overlook. If you add the OOTB spoke action “List Objects By Bucket” and get errors or variable “Bucket Name” isn’t empty, you probably don’t have the right connection settings or permissions. For the process described here, you’ll need an AWS Access Key and Secret Key for a user with AmazonS3ReadOnlyAccess at least (if you want to put files in the bucket, you’ll obviously need to adjust this accordingly).

Step 3: Plan your Flow

Our use case involves creating and updating Customer Account and Account Relationship records. The bucket contains a large number of .JSON files. We need to go to the bucket, list all of the files, determine whether they meet a naming criteria and whether they’ve been updated since the last time we executed the flow. We then process each file that meets the criteria.

The overall Flow we are building looks like this:

Step 4: Create a System Property

We’ll need someplace to store the last date and time we ran the flow. Why not use a system property? You can put your property in Global or the Amazon S3 Spoke, but remember you’ll need to set cross-scope access depending on where you put it.

Step 5: Build some Actions

The OOTB Actions for Amazon S3 are great, but aren’t quite suited to what we need. You can list objects in a bucket, but you can’t filter by date updated. You can download an object, but you can only attach it to an Incident record. No worries though – we can fix that! Keep in mind that, since we’re working in a vendor’s scope and all of our Actions are custom, we’ll want to use some sort of naming convention to keep track of which ones are ours.

Action 1: Custom List Objects By Bucket JSON

This action is very similar to an OOTB action, but has the added benefit of including the last modified date of the S3 object, which we need so we can be smart about what objects we process (keep those API calls to a minimum).

Start with OOTB Action “List Objects By Bucket JSON”.
Edit step 3 “Post-Processing & Error Handling.
1. On line 6, replace label with modified.
2. On line 13, replace label with modified.
3. Replace the entire contents of line 24 with "modified": contents[i]['LastModified'].toString().

Now you have an action that will go to your S3 bucket and put all of the object names and last modified dates in a JSON object that you can work with.

Action 2: Custom Get System Property

Create a utility action you can re-use whenever you need it. This action has a simple script step that grabs the system property we created earlier and outputs it to a Flow variable we’ll use in the next step.

Inputs
1. Label: Property; Name: property; Type: String; Mandatory: true
Script Step
1. Input Variables
  1. Name: prop_name; Value: action > Property
2. Script
  1. outputs.prop_val =gs.getProperty(inputs.prop_name)
3. Output Variables
  1. Label: Property Value; Name: prop_val; Type: String; Mandatory: true
4. Outputs
  1. Label: Property Value; Name: prop_val; Type: String; Mandatory: true; Value: step > Script step > Property Value

Action 3: Custom Build S3 File List

Get all of the files from Action 1 that meet the date criteria from Action 2. We’ll also do some optional file name filtering so we just get the objects we want.

Inputs
1. Label: File JSON; Name: file_json; Type: JSON; Mandatory: true
2. Label: Last Run; Name: last_run; Type: String; Mandatory: false
3. Label: Object Name Filter; Name: object_name_filter; Type: String; Mandatory: false
4. Label: File Extension; Name: file_extension; Type: String; Mandatory: false
Script Step
1. Input Variables
  1. Name: fileJSON; Value: action > File JSON
  2. Name: lastRun; Value: action > Last Run

Name: nameFilter; Value: action > Object Name Filter

Name: fileExtension; Value: action > File Extension

Script

(function execute(inputs, outputs) {

  var lpVal = inputs.lastRun || "2000-01-01 00:00:00";

  var nameFilter = inputs.nameFilter || "";

  var extension = inputs.fileExtension || "";

  var fileString = JSON.stringify(inputs.fileJson); 

  var processList = [];

 

  var debug = false;

 

  var lastProcessed = new GlideDateTime(lpVal);

 

 gs.info("Flow Designer Action: CH Build S3 File List - starting processing of JSON File: " + fileString + "; Contents length: " + inputs.fileJson.data.length);

 

  for (var i = 0; i < inputs.fileJson.data.length; i++){

    var file = inputs.fileJson.data[i].name;

    var fmVal = inputs.fileJson.data[i].modified || "2000-01-02 00:00:00";

   

    if(debug){

                gs.info("Flow Designer Action: CH Build S3 File List - Processing file " + file + ", last modified " + fmVal);

    }

   

    var modified = new GlideDateTime(fmVal);    

    var compRes = "before";   

    var filterRes = "No file filter was set"

    var action = "Ignoring";

   

    if(modified > lastProcessed && file.includes(extension)){

      compRes = "after";

      action = "Processing";

      if(nameFilter && nameFilter > ""){

        if(file.includes(nameFilter)){

          filterRes = "File name matches filter";

        } else {

          filterRes = "File name does not meet filter criteria";

          action = "Ignoring"

        }

      }     

    }

   

    if(debug){

                gs.info("Flow Designer Action: CH Build S3 File List - For file " + file + ", last modified is " + modified + " which is " + compRes + " the last processing date of " + lastProcessed + "; " + filterRes + ".  " + action + " this file");

    }

   

    if(action == "Processing")

      processList.push(file);

  }

 

                gs.info("Flow Designer Action: CH Build S3 File List - Finding files modified after " + lastProcessed + " using file name filter '" + nameFilter + "'; File list contains " + processList.length + " records:  " + processList.toString());

 

                outputs.files = processList;

                outputs.records = processList.length;

 

})(inputs, outputs);

Output Variables
1. Label: files; Name: files; Type: Array.String; Mandatory: true
  1. Label: file; Name: file_record; Type: String
2. Label: records; Name: records; Type: String; Mandatory: true
Outputs
1. Label: Files; Name: files; Type: Array.String; Mandatory: true; Value: step > Build Outputs > files
  1. Label: File Key; Name: variable_child0;
2. Label: File Count; Name: file_count; Type: String; Mandatory: true; Value: step > Build Outputs > records

Note: The code above gets the timestamp from the S3 object as generated by AWS. When you set the value of fmVal using the timestamp, the time part will be set to 00:00:00. We’re just processing once a day, so that doesn’t matter for us, but it’s something you will want to address if you are processing multiple times per day.

Flow Logic: If any files need to be processed

For Each loops error when they have nothing to process. We need to know if there were any files that passed the check in the previous action, which is why we have the File Count output.

Condition 1: 3 > File Count is not 0

Referring back to the overview of the Flow, the next 3 nodes will fall into this check.

Flow Logic: For Each Item in 3 > Files

Go through the array of files we created in Action 3. This is where we will actually read in the objects from the bucket that met our criteria, and do something with them.

Action 4: Custom Get S3 Object

This is where you’ll start to do your own thing, based on your use case. We’ll all start out the same way, getting the object from the bucket. What you do with the object is up to you. In my case, the objects are text files containing JSON-formatted data. I end this action by getting the text and placing it in an output string variable, so I can use it later.

Start with OOTB Action “Download S3 Object to ServiceNow Record”.
In the Inputs section, remove “Record” and “Download As”.

In the Pre-Processing script, remove Input and Output Variables “key” and “file_name”. Your script should look like this:

(function execute(inputs, outputs) {

    outputs.bucket_name = new AmazonS3Utils().ValidateBucketName(inputs

        .bucket_name);

    outputs.bucket_region = new AmazonS3Utils().getBucketRegion(outputs

        .bucket_name)

})(inputs, outputs);

Rename the REST Step from “Download S3 Object to ServiceNow Record” to “Get Object by Key”.
1. Under Request Details, replace the Resource Path variable with action > File Key (make sure you keep the “/” before it).
2. Under Response Handling, uncheck Save as Attachment.
Update the Post Processing & Error Handling step. This is where we’ll make the object’s contents usable for whatever we need, rather than attaching it to an Incident record.
1. After line 3, insert outputs.contents = inputs.responsebody;
2. Add an Output Variable
  1. Label: contents; Name: contents; Type (will depend on what you are doing with the contents; in my case “String”).

Actions 5 through X: Process Files

This part is really up to you and your use case. For me, I needed to take the contents output in Action 4, convert the string to JSON, and use the resulting JSON object to create and update customer accounts, and to create relationships to other accounts.

Action X + 1: Get Current Date and Time

Now that we’re done processing, we need to set the system property we retrieved in Action 2. To do this, we need to get the current date and time.

Inputs (none)

Script Step

Input Variables (none)

Script

(function execute(inputs, outputs) {
  var now = new GlideDateTime();

  outputs.nowdtasdatetime = now;

  outputs.nowdtasstring = now.toString();

})(inputs, outputs);

Output Variables
1. Label: status; Name: status; Type: string; Mandatory: true
2. Label: message; Name: message; Type: string; Mandatory: true
Outputs
1. Label: status; Name: status; Type: string; Mandatory: true; Value: step > Script Status > status
2. Label: message; Name: message; Type: string; Mandatory: true; Value: step > Script Status > message

Action X + 2: Set System Property

Finally, put the current date and time in our system property, so we know when we last processed.

Inputs
1. Label: Property; Name: property; Type: String; Mandatory: True
2. Label: Value; Name: value; Type: String; Mandatory: True
Script Step
1. Input Variables
2. 1. Name: property; Value: action > Property
  2. Name: value; Value: action > Value
3. Script
```
(function execute(inputs, outputs) {
  var gr = new GlideRecord("sys_properties")
  gr.addQuery("name", inputs.property);
  gr.query();
  
  if(gr.next()){
  	gr.value = inputs.value;
    gr.update();
    outputs.status = "success"
  } else {
    outputs.status = "error"
    outputs.message = "Unable to set system property " + inputs.property + "; please ensure this is a valid system property.";
  }
})(inputs, outputs);
```
4. Output Variables
5. 1. Label: status; Name: status; Type: string; Mandatory: true
  2. Label: message; Name: message; Type: string; Mandatory: true
6. Outputs
7. 1. Label: status; Name: status; Type: string; Mandatory: true; Value: step > Script Status > status
  2. Label: message; Name: message; Type: string; Mandatory: true; Value: step > Script Status > message

All Together Now

Trigger

Whatever trigger conditions you want, but this was built to be run on a schedule.

Actions (and their inputs)

Custom List Objects by Bucket JSON
1. Bucket Name: (select from list based on your AWS permissions)
Custom Get System Property
1. Property: (name of system property used to store last run date/time)
Custom Build File List
1. File JSON: [1 > Contents]
2. Last Run: [2 > Property Value]
3. Object Name Filter: (optional literal value)
4. File Extension: (optional literal value)
If at Least One File was Returned by Custom Build File List
1. [3 > File Count] is not 0
For each Item in
1. Items [3 > Files]
Custom Get S3 Object
1. Bucket Name: (same value as 1a)
2. File Key: [5 > File Key]
Use Case Processing
1. Contents: [6 > Contents]
Custom Get Current Date and Time
1. No inputs
Custom Set System Property
1. Property: (name of system property used in step 2)
2. Value: [8 > Now Date Time as String]

And there you have it. Have any suggestions to improve this? Did you find it useful? Let me know!

dianneg · ‎02-26-2021

Great Article!

Mahesh_Krishnan · ‎03-24-2021

Really appreciate your taking the time to document your process is such detail for the community!!

Now, what if I do not have Integration Hub. Interestingly, I am doing to work with Amazon so they will be open to the best method to get the data to us.

Our need is to get the file only on a weekly basis, so no realtime update happening. For our use case I think FTP should be fine but I still have a lot of questions.

I will start a new thread on my questions and hope you can respond to that.

Thanks again for your article. It has some really good info for me to use even if I do not have IH.

Tomko · ‎03-24-2021

Hi Mahesh -

I responded on your other thread. Specific to FTP, you're probably better off using a scheduled import (calling it from a script will allow you to make it more robust than a standard scheduled import).

That aside, while IntegrationHub is really nice, you CAN do the same things using old-fashioned Business Rules. Below is some of the code I wrote when I needed to build an integration with Salesforce, to push CSM Case Tasks back and forth. The snippet is from a client-callable script include I wrote (we're using oAuth, which is what the token stuff is all about):

var token = getSFToken("createCase");

            var r = new sn_ws.RESTMessageV2('SkyTouch Salesforce', 'CreateCaseFromCaseTask');
            r.setRequestHeader("Authorization", token);
            r.setStringParameterNoEscape('serverDomain', sfDomain);
            r.setStringParameterNoEscape('ApiVersion', sfApiVersion);
            r.setStringParameterNoEscape('propAccountID', sfAcctID);
            r.setStringParameterNoEscape('ownerID', owner);
            r.setStringParameterNoEscape('shortDescription', shortDescription);
            r.setStringParameterNoEscape('description', description);
            r.setStringParameterNoEscape('priority', priority);
            r.setStringParameterNoEscape('TaskNumber', current.number);
            r.setStringParameterNoEscape('URL', urlVal);
            r.setStringParameterNoEscape('AssignDate', sfDateTime());
            r.setStringParameterNoEscape('recordTypeID', sfRecordTypeID);
            r.setStringParameterNoEscape('SysID', current.sys_id);
            r.setStringParameterNoEscape('nowTime', sfDateTime());
            r.setStringParameterNoEscape('createdBy', current.opened_by.getDisplayValue());
            r.setStringParameterNoEscape('caseNumber', current.parent.number);

            var response = r.execute();
            var responseBody = response.getBody();
            var httpStatus = response.getStatusCode();

Mahesh_Krishnan · ‎03-25-2021

Thanks a bunch Tomko!

Chitra23 · ‎05-10-2023

Hi @Tomko ,

Great article. Thanks for providing us all the required details. I have a requirement to download files from s3 bucket to servicenow table. I've added credentials in Amazon s3 Credential Alias. When i use "List Bucket" action in my flow, i'm getting access denied errors, as it's using "https://s3.amazonaws.com" as the base url instead of "https://[bucket name].s3.amazonaws.com". I don't see any option to configure the connection url in credential alias. I would be great if can throw some light on this. When i try to use "Download s3 objects to servicenow record" action, dummy bucket list is displayed. Amazon s3.png dummy buckets.png

Lukaz · ‎10-22-2023

Hi @Chitra23 ,

I am facing the same issue. Did you manage to figure out how to solve this problem?

I am also only beeing able to see the dummy buckets.

Chitra23 · ‎10-23-2023

Hi @Lukaz ,

Please check if the credentials has "List" permission configured in Amazon s3.

https://linuxhint.com/configure-s3-bucket-permissions-aws/

John Tomko · ‎10-23-2023

In addition to making sure you have list permissions to the bucket, be aware that "bucket name" on both the OOTB S3 actions and my custom action are Dynamic Choices that use the "List Buckets JSON" OOTB action. I have a love/hate relationship with Dynamic Choices since, if the action fails or doesn't do exactly what you want, you end up with a situation where the value you need doesn't show up, even though you know you have permissions. I'd recommend looking at the "List Buckets JSON" action, and running it by itself to make sure you're getting a valid list of buckets.

Lukaz · ‎10-24-2023

@Chitra23 @John Tomko Thank you for you answers. I'm still struggling to make the connection.

Here are the steps I followed:

In ServiceNow: I registred my AWS Credentials in Connection & Credential Aliases-> Amazon S3 (or sn_amazon_s3_spoke.Amazon_S3). Here I inserted a valid Access Key ID and Secret Access Key from my user.
In AWS: I attached a policy with s3:List*, Read and List to my user. As shown in the image bellow. Kind of following the link you shared: https://linuxhint.com/configure-s3-bucket-permissions-aws/

Finally, when I try to "List Buckets" or "Download S3 Object to Record" with the connector I can't find my bucket, similar as the image @Chitra23 showed.

Do you think this is a permission problem from AWS I have configured? I am wondering because the only information ServiceNow has are my access/secret key from AWS.

Lukaz · ‎10-24-2023

@Chitra23 @John Tomko I have to rectify! It was a AWS permission problem.

Thank you both for your help! Now the connections between ServiceNow and AWS works 🙂