Adventures in Tensorland - Using TensorFlow with ServiceNow data - part 3

Laurent5 · ‎11-08-2019

This is the 3rd and final article on building a TensorFlow prediction model with ServiceNow data.

Please refer to parts 1 and 2 for data preparation.

Once our data is prepared, we can start building the Machine Learning model in order to run our predictions.

There are various ways this can be done in TensorFlow. It is a fast evolving framework and chances are by the time you read this another approach will be available. Essentially, you can access various API layers using different levels of abstraction. One such approach is to use the TF Estimator API.

This is a high level API that includes a large number of ready to use algorithms or allows you to build your own. It takes care of much of the plumbing (such as creating sessions etc.) and should be easier to use. Example algorithms available include LinearRegression, DNNRegressor, LogisticRegression, BoostedTrees (sort of Random Forests) etc.

So, having imported our data from a csv into our notebook (for this final example I have created a new data set with many more data points, 30 000 to be exact) we need to import TensorFlow in our Notebook. This is easily done with Import TensorFlow as tf.

We then need to create feature columns which will hold our data (features). This is a specific TensorFlow object that is used to store our raw data and make it available to the estimator. It enables further manipulation, which is beyond the scope of this article. For now, we will create feature_columns of type numeric_column.

We then instantiate our estimator API, which offers ready to use algorithms. In this example we will use a Linear Regression. We also specify which feature columns it will use.

We then need to do our test-train split, as discussed in previous articles.

Note that in this instance, I am converting our data to Numpy using the to_numpy() function. This is because when creating our inputs for the Estimator in the next step, it expects numpy arrays. It is possible to create inputs directly from Pandas but I haven’t explored this as yet!

Next, we need to create input functions for the estimator that will feed our data for training and evaluation of our model.

They contain critical information such as the data source (x_train, y_train and x_eval, y_eval from our train-test split above) but also batch sizes (which sends smaller batches to the estimator rather than all the data in one go in order to optimise processing) and the number of epochs. An epoch is an iteration over the entire training data so that number will define how many time it will train the model to reduce the error rate. Usually the more epochs the more it will reduce the error rate and increase accuracy of the prediction. There is a point however where adding epochs will not lead to any better predictions and will just consume resources.

And finally we need to train our model which is done, simply enough, with the estimator.train function.

One point to highlight here is that we are defining the number of steps.

What is the difference between steps and epochs you may ask? That is indeed a very good question and an area of slight confusion, as can be seen in this thread:

https://stackoverflow.com/questions/38340311/what-is-the-difference-between-steps-and-epochs-in-tens...

My understanding is that step pertains to the batch size (i.e how many steps you need to cover your batch size).

Whilst training our model, we can see from the progress over our 1000 steps, and how the loss is reduced overtime.

The next step is, you guessed it, to evaluate our model once it’s been trained:

To review our progress, we can print some stats with the print function:

That’s it! Our model is built!

We can now use the estimator.predict function to run some predictions.

For that, we will create a numpy array of some test data and pass it to the estimator.

So, as a reminder, what we are trying to predict here is the Survey Result (between 1 and 5, 1 being the highest), based on the time to fix (TTF).

In our example, x_data was the duration and y_data our score (metricres_string_value). So let’s create a small array between the min and max values in our data set.

input_data_predict=np.linspace(x_data.min(),x_data.max(),5)

Then we create an input function for prediction and pass it our test data array

input_fn_predict = tf.estimator.inputs.numpy_input_fn({'x':input_data_predict},shuffle=False)

We then need to create a new array (predictions_result) which will store the resulting predictions.

This way, we can visualise our results (so basically the expected survey score given a Time to Fix)

To do so we can plot our predicted y values (the score) against the x (durations) we just created and overlay our data set (or a sample of it depending on sizer of data set) as in the example below.

And voila!

Conclusions

This concludes part 3 of this article.

Obviously this is not the end and there is still a lot to learn in this Machine Learning journey. For a start more time is needed to scrutinise the model and I am sure there are other or more efficient ways to go about it. Feel free to comment and share your feedback and experiences.

I also need to explore how to package and deliver the model and how the overall ML pipeline could be automated using the Now platform (so maybe the subject of another article!).

However, my ML journey so far has highlighted how important understanding and preparing your data is.

If you stick to using pre-canned algorithms likes the one offered by the tf.estimator api (Or other ML platforms equivalent), then the bulk of the time should be spent understanding your data, what outcome you want from your model and carefully preparing your training data. The old adage of GIGO (garbage in garbage out) applies in ML as much as any other data related project. Be also mindful of bias. Does your training data accurately reflects the real world, for instance all regions/Employees/Services?

That’s it for now but please don’t hesitate to put your comments and share you experiences building models on the platform or using external ML frameworks.

Until next time, happy coding!