Five minute guide to artificial intelligence in cybersecurity
There is a lot of buzz around analytics in cybersecurity. Here is a quick guide to how it is being applied.
Let's begin by defining a few terms that we use when discussing artificial intelligence (AI)-based security and cyber-attacks; when talking about analytics, you may find terms like AI, machine learning, and data sciences used interchangeably. These are three different concepts that can be defined as follows:
- Artificial intelligence – This is the broadest term about getting machines to mimic humans or how to impart sensing, understanding, and responding capabilities. Primarily, an artificially intelligent machine will need two things to mimic humans: knowledge and action.
- Machine learning – How do machines build knowledge? One approach is to codify rules that human experts know, like an expert system. But no finite set of rules can model our complex world. So, can we make machines learn from past data and create their own knowledge? This is what machine learning does. It is a subset of AI but the most critical part of AI. Because it is a subset, many people confuse AI and machine learning.
- Data science - To make machine learning work, you need to define datasets, choose appropriate variables and metrics, and carry out various data engineering tasks such as data collection, preparation, integration, visualization, measuring algorithm performance, etc. Data science embodies all these.
For this post, when I use the term analytics, I mean combining data science practices with machine learning algorithms.
AI application to cybersecurity
You can look at the use of analytics in cybersecurity and AI-driven security from three different perspectives, based on the sources of data on which analytics is being applied, based on machine learning methods being used or based on intended results to be achieved.
Based on data sources
If you are more inclined toward logs, security events, and data, you may find classifying analytics and AI for cybersecurity based on types of data sources more meaningful. The security industry currently describes analytics mostly from this perspective. See diagram 1 below.
Diagram 1: Analytics based on data sources
Based on machine learning methods
If you are inclined toward algorithms and mathematics, you may find classifying analytics based on types of algorithms more meaningful. See diagram 2 below, which describes the main types of algorithms used for security analytics.
Diagram 2 - Cyber analytics based on models, algorithms
We can illustrate with some use cases. Spam filtering and phishing detection use Bayesian techniques for classifying good versus spam emails. Fraud detection uses neural networks and decision trees for deciding on frauds. Detecting insider threats like abnormal user access or data exfiltration uses clustering techniques. Bots can be seen through the entropy function, which can detect machine-to-machine communication patterns. Association analysis can reveal attacker groups that are using similar attack methods in your network.
There is another way to categorize the machine learning models above, which is supervised learning. This model is where machines learn from past data that humans have already labeled as good or bad, attack or false positive, fraud or normal data. Unsupervised learning is where no past labeled data exists or reinforcement learning, where the machine learns from feedback from its longer-term results. Supervised learning will include classification, regression, and deep learning. Unsupervised learning includes clustering, association, and pattern matching. With this approach, diagram 2 will now become diagram 3 below.
Diagram 3 - Cyber analytics based on learning
If you are thinking, why do we have so many algorithms (the above list is only a partial list), and why can't there be just one machine learning method that can be used everywhere, you are ahead of your time. Currently, no one algorithm works for every problem. Based on the type of data you have and the analytics's end objective, you will need to try out multiple algorithms and choose the best fit (this is called the 'no free lunch' theorem in machine learning).
Based on end objective of analytics
If you are focused on business results, you may find classifying analytics based on end objectives more meaningful. See diagram 4 below.
Diagram 4 - Cyber analytics based on end objective
Many of the current hunting tools and analytics products like EDR and network forensics are good examples of diagnostic and detective analytics. User and entity behavior tools can provide predictive analytics based on past risk behaviors.
What should you use from the above?
The adage – that there is no silver bullet for security or AI security – is also true for security analytics. No single analytics can solve the security challenges. For example, if you take the top nine attack methods today, including AI-based cyber-attacks, no single security analytics can detect them. The table below shows each security analytics's relevance to top threats (here, we are defining analytics by the source of data). We could have also defined it by types of algorithm or types of end objectives.
Diagram 5 – Security analytics to detect the top 9 attack methods