Anomaly Detection with Machine Learning

Anomaly Detection with Machine Learning
Murat Hacioglu, Ebru Sevik, İsmail Denizli 01.07.2020
AutoMLB2Metric AIAnomaly DetectionUnbalanced Data AnalysisInsurance Fraud DetectionTelecom Fraud DetectionFinance Fraud DetectionMachine LearningAnomaly Detection on Mobile Payment Systems

Anomaly detection is a method to notice anomalous actions or data. It predominantly focuses on the problem referring to recognize all intrusive attacks based on their anomalous activities that diverge from “normal activity profile” in a system. It also checks to see if an activity is within the predetermined acceptable condition. If it is not in this scope, it assumes that it is known as an unacceptable or bad attitude. The common way of the anomaly detection method occurs from the following components: First, a basis for all acceptable behaviors and situations is created. All activities observed while establishing this basis are taken as basis. The current situation is configured. This configuration is for a model that can be interpreted by applicable technologies. The observed activity is compared with the basis model that created, it is checked whether there is a deviation. Situations that observed as an anomaly are reported. Anomaly detection, rare observations or situations that are disparate from the remainder of the watchings, which may cause suspicion. Such "abnormal" situations typically transform into a kind of problem, such as a fault machine on a server, cyber attack, failure capsules in the cloud network, financial frauds, mobile sensor data, statistical process control (SPC) for production.
The best anomaly detection frame:
1-) Estimate main errors with up to 95% accuracy,
2-) Notice uncommon changes in system actions spontaneously,
3-) Service providers should strictly know how to fix issues. Therefore, show elementary to understand root cause analysis.

Img. reference site

Challenges in Anomaly Detection Models

Some difficulties make the task of anomaly detection difficult. Machine learning algorithms often need large amounts of data. This is because anomalies are not very likely, they are statistically small, and data sets are often unstable. Train and test data of models that developed for detect anomalies may be finite. It may also be unlabeled for testing and training. For example, there are states that normal behavior is more than abnormal behavior. This causes additional difficulties in training models that detect and predict abnormalities. Anomaly detection system ought to be as a dynamic system with fast-growing usage bases. In addition, as the underlying system develops, it has to update its behavior over time and adapt it to development.

How Anomaly Detection Approach Works?

Technically, the most distinctive criteria between normal and abnormal data point is whether there are similar data points around it in the analytical plane. In this context, the areas where similar points become very clustered are considered normal, and the areas where they become sparse are called abnormal areas. This is where the inference benefits of machine learning algorithms on analytical planes come into play. After these regions are determined by machine learning algorithms, abnormalities of data points are predicted. Unbalanced data visualization that taken from Towards DataScience

Many machine learning algorithms have been developed throughout the history of machine learning to identify these areas. So what makes machine learning algorithms different for abnormal detection processes? There are 2 major detection ways to process abnormal patterns; supervised and unsupervised anomaly detection.

Supervised Anomaly Detection:

The tagged dataset that includes both abnormal and normal sample data to create a prediction model that can classify future data points is needed for the supervised anomaly detection method. Algorithms such as Support Vector Machine Learning, Supervised Neural Networks, K-Nearest Neighbors Classifier are frequently used algorithms for this motive.

Unsupervised Anomaly Detection:

In this method, any training data does not necessary. Unsupervised anomaly detection assumes two things about data rather than training data. Only a percentage of the data is abnormal and any anomaly is completely dissimilar from normal samples. After these surmises, the data is clustered using the measure of similarity, and then data points that away from the cluster are appraised as anomalies. Large labeled data sets are needed to train these algorithms and achieve high-performance estimation results. Conversely, it is difficult to obtain such large-scale tagged data sets, and field knowledge from professional is necessary for the disclosure process.

The thriving performance of supervised learning in previous years has also led to unsupervised learning achieving very good results. Although there is a new tendency to adopt unsupervised attempts, attempts based on ML algorithms on anomaly detection generally focus on supervised models. The scarcity of tagged data is increasingly seeking to develop unsupervised learning models. B2Metric Machine Learning Studio (Register & Start Free Trial Now!) can be applied to these and many other problems, it solves these problems for you and allows you to make anomaly determinations in the most accurate way.

Real World Scenario of Where Anomaly Detection Used?

Anomaly detection affects business decisions across sectors. Sectors such of, insurance, finance, telecom, manufacturing, banking are the main sectors which anomaly detection is of great importance. Detection and prevention of abnormally high purchases-deposits, fraudulent spends, revenue fraud, abuse, service disruptions are main real case scenarios of anomaly detection.

Insurance Frauds

According to FBI reports; there is $40B loss for Insurance frauds in United States every year.

Anomaly detection in the insurance sector is one of the services that takes basic problems in different fields of insurance. For instance, identification of fraud in insurance and securities, and irregularity detection in health services' data are among the scopes of anomaly detection in insurance. In addition to these services, increasing cybersecurity calls have become a need in the insurance industry in recent years. With these developments, damage fraud detection actions have started in the insurance industry. In short, anomaly detection is a method used for insurance fraud detection. For instance, insurance companies can use anomaly detection technology to identify suspicious user behavior in the insurers' network.

Anomaly Detection for Telecommunication Industry

With the development of the telecommunications sector, the sector started to produce and collect huge amounts of data. These data are so large that it is impossible to deal with this data manually. Therefore, data mining technologies for the telecommunication sector develop. Abnormal situations such as network failures occurring in telecoms and unusual customer calls are called anomalies. Detection of these anomalies has an important place in the telecom sector.

Cyber Security Anomaly Detection

Network monitoring tools owned by cyber security systems can learn normal network behavior due to the large amount of data they have. Entries that unusual and intrusion are called anomalies. These anomalies must be detected and intervened to ensure cyber security. Denial of service (DoS) attacks is an example of anomaly. Although they don't crash or receive data, DoS attackers aim and focus on downloading a network and rejecting service to legitimate users. Starting DoS attacks is easy. DoS attacks block users from getting the right service by forcing physical resources or network connections. The attack happens the service is filled with too much traffic or data. Therefore, DoS attacks must be detected. For this, first of all, normal behaviors should be specified in the system. Then the system should alarm when the behavior deviates from normal to anomaly (DoS).

Network Faults Anomaly Detection

In order to provide high quality service in IP networks, the downtime of the service should be shortened as much as possible after the network errors occur. However, there are also network errors that cannot be detected by operators by simply monitoring device states. It is necessary to focus on anomaly detection to solve such abnormal problems.