Hi everyone, and thanks for attending this webinar. My name is Rebecca Grollman, I’m a data scientist and today I’m going to talk about how to apply anomaly detection to your IoT data. So, at Bsquare I typically work with customers to help them build predictive models using machine learning. I also research and build tools to accelerate the data science process.
Today, we will introduce you to anomalies and anomaly detection. Then we’ll look at how anomaly detection can improve your ROI, followed by tips for accelerating anomaly detection and a case study looking at an apartment HVAC system.
To start, anomalies are everywhere. You can find them in any industry, from transportation, to energy, to oil and gas, to retail, to manufacturing.
There are three types of anomalies. The first type you might be most familiar with, and it’s called a point anomaly, which is some measurement that falls above or below a given threshold. Let’s say you’re measuring pump source pressure and suddenly it spikes up and then drops back down. This could indicate that your sensor may be malfunctioning.
The next type of anomaly is a contextual anomaly. This is where your measurement makes sense because it falls within some expected range. However, in context or compared to nearby data, it doesn’t make sense. For example, let’s say you’re monitoring the number of road calls at a particular time, and that volume usually follows some cyclical pattern. Then all of a sudden you have a spike in your number of road calls, which could indicate that the roads were particularly icy at that point in time.
The last type of anomaly is a collective anomaly. Now this is similar to a contextual anomaly, but it’s more like a bunch of contextual anomalies in a row. For example, let’s say you’re measuring whether something is light or dark with a sensor, then you suddenly have some intermediate state. This can indicate your sensor has leak.
So why anomaly detection? First of all, anomaly detection is great for when you want to know how your system is running, and whether it’s acting as expected or not. Even determining what normal operating standards look like is a difficult task that anomaly detection can simplify by providing actionable insights. Other examples of applications for the resulting insights include things like quality assessment, predictive maintenance, and predictive sales.
Broadly, there are two basic use cases when employing anomaly detection. The first is when you have very little historical data. In this case, it’s really easy to start learning about normal operating conditions. You can even begin making sense of your data without having that massive historical data set. Let’s say you already have a massive historical data set, and you even have labeled anomalies. That’s great. You can actually start building a robust anomaly detector with this historical data and may even find that traditional machine learning methods are unnecessary.
Now, there are other ways to handle your data and extract actionable insights. One such method is through rule engines, but this also requires some sort of subject matter expertise to write these rules. Without them, you’re a little lost. However, if you use anomaly detection, you can actually build rules based on your anomalies.
Machine learning is another method, but it requires a well-defined problem to solve as well as relevant historical data sets. Without these two things, machine learning is a step too far, making anomaly detection a great place to start. While you set up your anomaly detector you can start building that historical data set and exploring your data to understand what problem you may want to solve with machine learning later.
First of all, you can establish normal operating parameters, identify anomalies, and determine what actually needs investigating. You can also create rules to generate actionable insights. Here are a few examples of anomaly detection applications across industries and how each can enhance your ROI.
Across the transportation industry, companies are collecting lots of data for things like tire pressure and temperature, the number of road calls, and maybe even the number of forced regenerations. From there, you could fix faulty tires, overhaul vehicles in need, and even address a recurring problem.
In manufacturing, companies measure things like different sensors readings, looking at the number of rejected products, and even looking at time to complete a cycle. From there you can fix faulty sensors or equipment, overhaul your production cycle, and implement a new production cycle if necessary.
In the oil and gas industry, companies collect data on things like flow rate, source pressure, and pump vibration levels. From there you may learn that you need to inspect the pump early, fix a sensor on a pump, or even alert junior technicians of a problem.
Lastly, in retail, a company may be interested in tracking things like customer traffic, the number of drinks ordered per day, and products ordered across all stores. And from this type of data, you can figure out if you need to increase the number of employees on the floor, order more of a particular beverage, or even warn an individual store that they’re not performing at the same standard as other stores.
Across all of these industries there are lots of common themes, one of which is increasing revenue. For example, you may be able to identify periods of increased activity through anomaly detection. Another example is fixing labor shortages by identifying anomalies that only more experienced employees would have detected. You can also save costs by fixing equipment before a catastrophic failure occurs as well as enhance customer satisfaction by preparing for customers’ needs in advance to meet demand more effectively.
Now let’s look at how to accelerate the anomaly detection process, starting with speeding up the delivery timeline for creating an anomaly detector.
To create an anomaly detector, start by determining which anomaly detection algorithms are best suited for your industrial IoT use case. From there, you can optimize these anomaly detectors for your needs. To identify the best anomaly detectors, you need to find some benchmark industrial IoT data sets to test your algorithms. However, this can be difficult given the lack of data sets available. Most publicly available data sets are from non-time series data, or things like server metrics. In other words, not industrial IoT. Once you find applicable data sets and are able to test all these algorithms, you need to build your framework.
There are lots of ways you can optimize an anomaly detector to your needs. Here are some examples:
First, look at how frequently you want to be alerted. In other words, how sensitive of a detector do you want? Let’s say you have a pump that fails every five years on average. You probably don’t want alerts every week saying your pump is about to fail. On the other hand, if you’re dealing with something like a truck that fails a little more frequently, then those frequent alerts are actually necessary and important.
Also, consider the value of speed versus accuracy. Can you sacrifice a little speed to gain some accuracy? How much lead time is actually appropriate between sending data and getting a response? A large sheet machine may need an immediate shutdown in case of emergency. But maybe you have something else that’s safe to continue running, even if an anomaly is detected, for another 10 minutes or an hour.
Finally, consider your tolerance for false positives. Can you endure a few false positives to identify a few more real anomalies? If you have a really expensive part, you probably don’t want many false positives due to the high cost of making those replacements. Conversely, if the replacement is really cheap and easy, a few false positives may be worthwhile if it means that you’re actually avoiding unplanned downtime and improving overall productivity by replacing some parts that don’t need to be replaced.
Some useful features to look for in an anomaly detection framework include an algorithm recommender, configuration recommender, and even interactive plots for testing. Again, these should all be based on extensive research for industrial IoT applications. To demonstrate, we’ve actually built an anomaly detection tool, and I’d like to show you what some of the interactive plots look like to give you a feel for why a tool like this is beneficial.
In this example we’re looking at the outdoor temperature over time. In gray, you can see that we are looking at the actual temperature recorded. In orange, we’re looking at the moving average of that temperature, and those purple dots are all of our detected anomalies.
Let’s say that I put in my configuration parameters and this is the model that I got. So, these are the recommended parameters and algorithm that my anomaly detection framework told me to use. Here I find that four anomalies have been detected. But let’s say now that I have this interactive plot, I’m looking at my data and realize I would actually expect to see more anomalies and I’d like to see what that looks like.
By adjusting something like the threshold, many more anomalies appear. This sort of functionality is important in an anomaly detection tool because it gives us a way to see what’s actually happening by allowing us to tweak things and get a visual of the results and the effects of different parameters.
Next, I’d like to dive into a case study looking at anomaly detection and an HVAC system using two data sources. The first, compiled by the Fraunhofer Center for Sustainable Energy Systems, is HVAC data from one apartment building with 79 units in Revere, MA over the winter of 2012. The second is weather data compiled by NOAA at a local Boston weather station.
For this example, we need to analyze three main datasets. The first is apartment data – unit number, floor, and thermostat type. The second is HVAC data – whether the system is on or off. This allows us to determine things like how long the HVAC system is active and how many times it’s turned on and off. The third is temperature and relative humidity for both the inside and outside of each apartment.
Using some basic exploratory analysis, we can look at things like the average indoor relative humidity against the average indoor temperature. And here we can see that there’s a slight negative correlation between these two things. So, in other words, as temperature increases, the relative humidity slightly decreases.
Now we can start asking questions like: What should the apartment temperature be if we know the outside relative humidity and desired indoor relative humidity? We also notice these four points off to the side that are anomalies in the data. These would be great apartments to investigate to make sure everything is running smoothly.
Another way to look at anomalies is look at where anomalies lined up across multiple sensors. For example, let’s say we select one specific apartment and analyze the outside temperature, the time HVAC is on, and the indoor temperature. Again, this is using the same color coding as before, where the gray represents the actual measurements, orange is the moving average, and all those purple dots are detected anomalies.
In this green box you can see that there’s an anomaly in the outside temperature. The outside temperature spikes, the time the HVAC is on also spikes, and there’s an anomaly detected. However, the apartment temperature is not affected. So, this is a great example where the HVAC is actually functioning well.
In the next case, if you look at the orange box, you can see that the outside temperature drops, but there is no anomaly detected for the HVAC. However, there is an anomaly detected for the apartment – its abnormally cold. Here you can write a rule, something like if we detect an anomaly in the outside temperature, but not at the time the HVAC is on, then you need to send an alert to the apartment manager to avoid cold temperatures. Then the apartment manager could do things like tell residents to adjust their thermostats or even investigate if the thermostat and HVAC system are working.
So, while this is one way to write rules, it would be really time consuming to look at every single apartment and every single metric collected and try to find where those different types of anomalies of overlap. Instead, try this approach:
For example, let’s say we’re looking over time and an anomaly is detected in sensor one. You’ll see each of these lightning bolts represents a different sensor based on its color, and as we go through, we see that sensor one and sensor two tend to overlap with each other. So now, maybe we can write a rule based on these two anomalies being detected together.
Here are the top five groupings of anomalies detected in the same three-hour window for that HVAC use case. Let’s look a little bit closer into three potential rules you could create using this data.
The first is finding that there’s an anomaly detected in HVAC activity and in the apartment temperature. If this were to happen, it may mean – after talking with an SME or subject matter expert – that the HVAC is overcompensating unnecessarily and that the action should be to inspect the HVAC.
For the next rule, let’s say the apartment temperature and apartment relative humidity show an anomaly in the same three-hour window. However, everything else is running as expected. In this situation, the HVAC is not addressing the apartment temperature and humidity. This could result in two potential actions: Check your thermostat, and if the thermostat seems okay then inspect the HVAC system.
The last rule I’d like to call out is where there’s an anomaly detected in the outside temperature and outside relative humidity, but there are no anomalies detected inside the apartment. This is a great example where the HVAC is functioning well, and you do not need to send an alert. So even though anomalies are detected, you don’t want to overwhelm your system with lots of alerts.
To recap the case study, we started with three types of data: apartment data, HVAC data, and temperature and relative humidity. Through anomaly detection were able to understand normal operating parameters, identify anomalies and data collected for each apartment, and build rules to take action.
And we saw all of this in play with an HVAC case study using anomaly detection with rules.
Thank you so much for joining and we look forward to seeing you next time.
Any questions? We’d love to connect: bsquare.com/contact
Rebecca Grollman, PhD | Data Scientist – Bsquare
Rebecca works with companies deploying IoT initiatives to help them understand their data capabilities and build actionable predictive models using machine learning. She also researches and builds tools to accelerate data science processes and provide deeper insights. Rebecca holds both an MS and PhD in physics from Oregon State University.
Thank you for your interest in our Data Science Webinar Series.
We’d love to connect: bsquare.com/contact