Data Security in Machine Learning

The Fourth Industrial Revolution is commencing. The fourth installment of the revolution is marked by technology that blurs the line between human and machine and its most sought-after commodity is data. Therefore, data security and privacy are of the utmost concern in the digital economy.

The developments in Machine Learning (ML) are rapid, and the opportunities that this technology creates are wonderous. The rapidity of recent breakthroughs has no precedent, historically and is impacting almost every industry and encounter. Smart technologies are operating in our homes (Alexa, Amazon Echo etc) and in our cars (think Tesla), they are customising our online shopping experiences (Amazon), and soon in store (with facial recognition technology), they are supporting our military and our infrastructure, our agriculture and our financial services. Technology is shaping almost every aspect of human experience. Technology is interacting with, and learning from, the people with which it is intermingling.

But our love of technology also comes at a price. The more of it we adopt, the more we are vulnerable to cyber attacks. ML utilises data to teach its systems, and the better the learning data is, the more accurate the results are. Clean and reliable data is a valuable commodity, but the sheer amount of it makes it also an easy target, but while it must be remembered that cyberattack agents are a constant and real threat, this doesn’t mean that AI technologies are unsafe. They just require a heightened level of data security.

Fighting fire with fire

The best way to protect data privacy and an organisation’s network is by utilising the power of ML. In other words, matching the technology that the opponent is using.

Cybersecurity strategies that employ ML can monitor big data for anomalies. ML models can be trained on “good data” meaning data that shows how the network should be operating. When data does not fit into this ideal, the system is quick to recognise and flag the issue.

Many cyber-attacks begin with a phishing scheme, where sensitive information is stolen through malicious emails or other communication and it is human error that lets the threat agent enter. ML systems can learn from and adapt to, human behaviour. The continuous user feedback that the system gains, helps it to watch for flags and recognise breaches far faster than non-NL security measures.

Limiting the attack surface is a fundamental security practice.

It involves restricting access, employing layered defences and placing smart monitors at each point of weakness in a system.

An attack surface is a summation of all the different points of entry (attack vectors) or unwanted vulnerabilities that a hacker can get data out of, or get malware into, inside a software environ. By shrinking attack surfaces you are looking for a needle in a few pieces of hay instead of inside of a haystack.

ML, when used to support good cybersecurity practices, helps to shrink the attack surface.

Reducing the amount of time an adversary has inside your system to carry out a cyber attack is another good way to reduce the attack surface.

Limiting infiltration time involves the continuous monitoring of an organisation’s data.

Having an automated cybersecurity detection system based on taught ML models means a faster, more efficient response to a breach. At its best, it can interact with the threat agent and lure them away from valuable assets, create a duplicate environment and trap them in it.

Getting educated about cybersecurity

Technology is constantly trying to keep up with the blackhats and vice versa, but the best cyber defence is still educations. According to Varonis, 71% of cyberattacks begin with spear-phishing emails, where the weakest link in defence is always a human. Through proper education on safe cyber-security practices, we can decrease the amount of successful phishing attacks.

Cyber-education should be mandatory for every employee, and should also include executives, who are often targets for “whaling” scams. Whaling is spear phishing geared towards the executive branch who have access to privileged data and corporate funds. A portion of the lessons on cyber hygiene should be teaching employees, through phishing simulation exercises, how to identify indicators of phishing and whaling schemes and what they should do if they receive one.


To truly stay on top of the game we need to bring out the big guns. Educate our employees to work as a line of defence, as well as employ the best cyber security technologies out there. The benefits that Machine Learning will bring to humanity far outweigh the potential detriments to data security and privacy, but it is essential to understand any threats involved. With companies relying on technology more and more, the potential attack surface is growing, but fighting fire with fire is possible when ML technologies are employed to monitor and react to the most sophisticated attacks out there.