Editor’s note: In their book, “Data Science: Concepts and Practice,” authors Vijay Kotu and Bala Deshpande explain the core principles and applications of modern data science. Kotu is vice president of analytics at ServiceNow; Deshpande is a data scientist and consultant. This article, which focuses on the difference between data science and machine learning, is adapted with permission.
Artificial intelligence (AI), machine learning, and data science are all related to each other. Unsurprisingly, they are often used interchangeably and conflated with each other in popular media and business communication. However, these three fields are distinct depending on the context.
AI is about giving machines the capability of mimicking human behavior, particularly cognitive functions. Examples include facial recognition, automated driving, and sorting mail based on postal code. In many cases, machines have far exceeded human capabilities (sorting thousands of postal mails in seconds) and in other cases we have barely scratched the surface (search “artificial stupidity”).
There are quite a range of techniques that fall under AI: linguistics, natural language processing, decision science, bias, vision, robotics, planning, etc. Learning is an important part of human capability. In fact, many other living organisms can learn.
Machine learning can either be considered a sub-field or one of the tools of AI. It gives machines the capability of learning from experience. Experience for machines comes in the form of data. Data that is used to teach machines is called training data. Machine learning turns the traditional programming model upside down. A program, a set of instructions to a computer, transforms input signals into output signals using predetermined rules and relationships.
How machines learn
Machine learning algorithms, also called “learners,” take both the known input and output (training data) to figure out a model for the program which converts input to output. For example, many organizations like social media platforms, review sites, or forums are required to moderate posts and remove abusive content.
Machine learning turns the traditional programming model upside down. A program, a set of instructions to a computer, transforms input signals into output signals using predetermined rules and relationships.
How can machines be taught to automate the removal of abusive content? The machines need to be shown examples of both abusive and non-abusive posts with a clear indication of which one is abusive. The learners will generalize a pattern based on certain words or sequences of words in order to conclude whether the overall post is abusive or not. The model can take the form of a set of “if-then” rules. Once the data science rules or model is developed, machines can start categorizing the disposition of any new posts.
Data science is the business application of machine learning, AI, and other quantitative fields like statistics, visualization, and mathematics. It is an interdisciplinary field that extracts value from data. In the context of how data science is used today, it relies heavily on machine learning and is sometimes called data mining. Examples of data science user cases are: recommendation engines that can recommend movies for a particular user, a fraud alert model that detects fraudulent credit card transactions, a model to find customers who will most likely churn next month or predict revenue for the next quarter.