Transform takes hours to run - your comments/advice please

thomaskennedy · ‎09-13-2023

I'm standing up a transform that will replace the manual processing of an Excel (xlsx) file from a vendor. In the current state this file is reviewed every few weeks, and manual updates are done to alm_asset based on the file's contents. In the future state we will receive the file in our ServiceNow tenant, attach it to a Data Source, and run a transform map against it. Run Business Rules is unchecked.

A design objective is to keep all the intelligence about how the vendor's key values ('Part ID') map to our references ('Model', 'Model Category') on this side. So the vendor can simply send us everything. But a great deal of what they send will have to be filtered out, because there are devices we don't care about etc.

A second design objective is that nothing is hard coded. So the list of vendor Part IDs to ignore is not hard coded, and neither is the Model that a not-ignored Part ID does map to.

I use Decision tables to do both these things: If a given Part ID is in the Model-Ignore decision table, that row gets ignored. And the row is ignored if its Part Group does not match something in the Model Category decision table, and so on. This is implemented in the transform map's Script section. The thing to note here is that I am using Decision Tables as filters:

(function transformRow(source, target, map, log, isUpdate) {

    // Ignore this row if the part id is on the ignore list
    decisionSysId = new x_acso_spencer_imp.DecisionUtility().getDecisionSysId("Model-ignore");
    dt = new sn_dt.DecisionTableAPI();
    inputs = {};
    inputs["u_part_id_is"] = source.u_part_id;
    response = dt.getDecision(decisionSysId, inputs);
    if (response) {		
        ignore = true;
        return;
    }

    // several more such decision tests

}

The Field Maps are mostly also scripted, and use the same Decision Tables, this time as maps. I can assume that the map returns a value, because I already tested this in the transform script:

answer = (function transformEntry(source) {
    var decisionSysId = "";
    var dt;
    var inputs;
    var gr_decision;
    var result = "";

    // Look up the Model category mapped to this part group

    // Get the decision table sys_id
    gr_decision = new GlideRecord("sys_decision");
    gr_decision.addQuery("name", "Model Category");
    gr_decision.setLimit(1);
    gr_decision.query();
    gr_decision.next();
    decisionSysId = gr_decision.getValue("sys_id");

    dt = new sn_dt.DecisionTableAPI();
    inputs = {};
    inputs["u_part_group"] = source.u_part_group;
    var response = dt.getDecision(decisionSysId, inputs);
    result = response.result_elements.u_model_category;

    return result;
})(source);

This works fine, and allows the business user (given the right role) to update the decision tables any time a new Model mapping is needed or a model is retired. But it's extravagantly cpu-intensive, with the same decision tables being instantiated over and over. The net of this is that the transform takes three hours to run in our dev environment, for a file having about 8000 rows. That should be OK if we run it only on Saturday, which is fine in this case.

I thought of backing out the Transform map and using a script execution to parse the data source through GlideExcelParser and do all the DT stuff. It seems to me that by keeping the decision tables instantiated for the duration it would be much faster. I haven't tried this.

When I check whether a given row should be ignored, I do it by instantiating the Model-Ignore decision table through the API (see code snippet #1 above). Is there a way I can pull all the entries from this table once, and cache them for the duration of the transform in an object, so I'm not instantiating that table 8000 times?

Can anyone comment on putting in a transform that takes this long to run, even if it only runs on a weekend (when our system is very quiet)? Is there anything I can do to speed this up short of dropping the transform in favor of a script?

Ben_G · ‎09-13-2023

Is it just a 1 to 1 relationship for the decision table, as in 1 input and then 1 output? It may be overkill if so. A decision table will excel with multiple inputs to get the right output

A data lookup table may be more efficient with a Get call.

thomaskennedy · ‎09-15-2023

It is 1 to 1 for now - there have been some discussion that it will be more complex in the future, but nothing definite.

I'm refactoring it to have a Script Include pull the data in from the attachment using GlideExcelParser. That makes it easy to skip rows I don't care about (most of them), and I can maintain a cache of mappings (ex: Part Id -> Model) so I don't have to repeat a lookup. In other words I'm trying to remove as much complexity from the transform map as possible.

But then I have to set up my import set through the API objects as shown here, and I found that GlideImportSet is not allowed in scoped apps. I would prefer not to put this in global, so I'm scratching my head trying to understand how you do this without that class.