Machine learning is not only utilized by security professionals, but by adversaries with malicious intent. How are they using this to improve their cyber attacks?
In the previous post, we explored the many ways security analysts and organizations are utilizing machine learning in order to prevent cybercrime. The ability of machine learning algorithms to classify previously unseen data and predict future data means they have a wide range of uses in cyber defense. However, the same traits of machine learning can equally be used in malicious contexts.
In this second post, we will explore the different ways attackers are leveraging machine learning to improve social engineering and hide their malware from antivirus.
What is machine learning?
To recap, machine learning is the process of feeding a model sets of data with useful features in order for it to â€˜learn the underlying concepts and make predictions about future data. It can predict what future data may look like, or classify previously unseen data points. In the context of cybercrimes, it may be used to generate sophisticated and targeted phishing emails.
How is machine learning used in cyber attacks?
Where machine learning was used in cyber security to identify similar malware and malicious links, instead of with cybercrime it is used to evade filters, bypass CAPTCHA checks, and generate targeted phishing emails. When comparing the two, cyber security appears to have much more consolidated uses for machine learning. But future trends towards evasive malware and phishing may pose a serious threat to the cyber security industry.
CAPTCHAs are present in web applications throughout the internet, with the purpose of preventing automated scripts from brute-forcing or mass-signing up many bot accounts for malicious purposes. CAPTCHAs involve completing a simple challenge that a robot may find difficult. However, as machine learning becomes more advanced, models can become extremely efficient at solving these and bypassing the protection. This could lead to brute force attacks which may compromise an account.
This quickly becomes a game of cat and mouse; the CAPTCHAs become more obscure and difficult to answer, but the machine learning algorithms are improved in order to identify the photos. A paper published in 2018 was able to achieve 100% accuracy on certain CAPTCHA implementations and over 92% in Amazon’s implementation.
Password brute force
Machine learning algorithms can also be used to generate similar data to a given dataset. This is especially useful in generating passwords out of terms that relate to a user. This method has been showcased in the popular TV show Mr Robot, and many online tools take inspiration from this. The combination of Open Source Intelligence (OSINT) and machine learning can generate password lists much more concisely and is likely to result in a successful brute force attempt. 13% of passwords generated in one method were actually used in real-world scenarios, which in terms of brute-forcing is highly impressive.
Machine learning used in the detection of malware could be considered one of the first true applications of artificial intelligence in the cyber security industry. However, recent papers have been published showing how malware can be crafted to evade these detections through the use of machine learning. One paper used a machine-learning algorithm to change a particular malware and was able to successfully evade detection in over 45% of cases. The repercussions of this type of malware evolving would be serious and felt around the world. Luckily these techniques have not been viewed successfully in the wild (apart from malware implementing AI phishing scams) and current antivirus software is holding its own.
Phishing attacks have been around for over 20 years and have continued to be successful even though public knowledge of them has increased. To increase the likelihood of a successful scam, manual information gathering can be done on a target to craft the perfect tweet or email. This can achieve up to 45% clickthrough rates on a malicious link. However, this process is much slower than automated methods.
A Blackhat published paper has used machine learning to automate this, scraping a Twitter user’s profile and generating targeted tweets from bot accounts. This method was found to be 4 times faster than manual phishing, and maintain a high click-through rate of between 33% and 66%, making it as believable as manual phishing in some scenarios, if not more so. This is a real-world example of how automation and AI can be used to target and scam specific users.
With social media being a battleground for fake news and the spreading of misinformation, it is inevitable that AI bots are being used for spreading such information in the most believable way. Equally, however, machine learning may hold the key to fact-checking and preventing the spread.
Cyber attacks using machine learning are only recently being explored and developed. Evasive malware which poses the most serious risk has yet to be seen in a real-world attack. In absolute terms, machine learning in cyber security is more developed and widely implemented than its cyberattack counterpart. However, the next few years may see a shift in malicious activity, with artificial intelligence at the heart of it.