How Machine Learning Is Used In Cyber Security

How Machine Learning Is Used In Cyber Security?

Last Updated on 4 May 2022 by admin

As cyber criminals utilise new technology in their attacks, information security professionals must also adapt and implement new methods in their cyber defence. This game of cat and mouse means that cyber security is always at the cutting edge of technology.

In recent years, machine learning has been used in cyber security to predict and identify attacks as they happen. However, cybercriminals are also utilising machine learning to hide their malware or launch the most convincing phishing campaigns. In this two-part blog, we will look at how machine learning plays a crucial role in cyber security and cyber-attacks.

In this first part, we explore ways in which machine learning is used in cyber defence. While this should play a crucial role in any cyber defence strategy, recent reports suggest GCHQ are not utilising Machine learning enough.

What is machine learning?

Machine learning uses models that use sets of data to learn the underlying concepts so that they are able to predict what future data should look like or classify data into groups. In the context of cyber security, this could be classifying network traffic as malicious or normal.

How is machine learning used in cyber security?

Networks generate a lot of traffic. Too much for any single team of security professionals to analyse meaningfully. Machine learning can be used to learn the underlying trends of the data, allowing for future predictions to be made such as changes to malware. They are also excellent at classifying threats and differentiating between malicious and normal network traffic.

Machine learning can also be used to detect if a person has clicked on malicious links that may launch phishing attacks or if a website is hosting malicious content by analysing the content on the site.

Below we provide some important examples of how machine learning can support your cyber security strategy.

Threat detection of malware

As particular malware becomes more familiar in the security industry, all antivirus software will become aware of it and easily be able to identify its signature. This heavy reliance on previous knowledge can mean small variations in the malware will enable it to avoid detection. Machine learning models can take the previously known malware and learn its underlying concepts. From this, it will be able to detect malware even after it has been altered to avoid detection. This has been implemented in many antivirus solutions, known as heuristic detection. Researchers have achieved accuracy of over 85% using this method.

The power of machine learning comes from the fact that it can detect new types of malware that have not been seen before. It can also detect known threats with a higher degree of accuracy than previous tools were capable of. This is because machine learning utilizes huge amounts of data to analyse the behaviour of a file, rather than relying on human experts to recognise malicious code.

Phishing page and URL detection

Phishing attacks are an extremely common and successful way of stealing a victim’s credentials. A website is crafted to look like the target site, such as a fake banking application with the aim of tricking a user into entering their credentials. Often URLs leading to phishing sites are embedded in web applications waiting for a user to click on them and enter their sensitive data. Machine learning algorithms are able to analyse the URL and classify it as malicious or benign. Other attributes such as geolocation, website contents and word analysis can improve the accuracy of the prediction.

There are two main challenges for machine learning algorithms when detecting phishing pages:

  • Firstly, if a page looks genuine, it is difficult to identify that it is malicious. A good example of this is the lack of “https” in the URL. Human eyes usually see this as a clear indicator that the website may not be genuine or safe, however an algorithmic approach may overlook it.
  • Secondly, while certain patterns and indicators can be used to distinguish a genuine page from a compromised page, they are context-specific and may not apply to every case. For example, a logo at the top of a genuine website might appear slightly differently on a malicious page. This can make it hard for automated detection algorithms to identify which page is genuine and which is disingenuous.

Bot detection

One of the most prolific and devastating DDoS attacks was due to the Mirai botnet which used thousands of devices to perfect large scale attacks. One way of preventing such an attack again is to analyse the traffic of all devices on a network. When a botnet attack occurs, the traffic will deviate from standard use. This anomaly-based detection has had over 99.9% accuracy using some machine learning models.

User behaviour analysis

Similar to bot detection, the day-to-day traffic of a user can be monitored. If there’s a large anomaly in a users activity, this may indicate a compromised account. A standard users network footprint will be varied and complex due to the wide range of applications in use. As such, the false positive rate may be high. This still allows for security teams to be notified and act accordingly to the potential threat.

An issue frequently raised is how ethical is mass surveillance of employees? Students in Australia have voiced privacy concerns over software intended to analyse their actions during examinations from home. Such software should strike a balance between user privacy and effective detection of malicious activity. Data anonymization may help alleviate some of the concerns.

Optimising the human analysis

Machine learning is not removing the need for security analysts. Instead, it is completing the easier tasks and empowering the security analysts to draw from the highest quality data they can. For example, machine learning models are able to generalise trends from logs and point out points of interest for the security analyst. Another common issue with security analysts is alarm fatigue. This phenomenon results in repeatedly seeing false positive threats, meaning when a legitimate threat arises they are not mentally prepared to deal with it. By giving the analyst higher quality data and reducing the noise, can eliminate the fatigue.


It is evident that machine learning is being used effectively in cyber security in a wide range of applications. But how are criminals using the same technology to their advantage? In the next article, we will explore how machine learning is used in malicious contexts.