Securing Encrypted Traffic on a Global Scale
Introduction
How many engineers does it take to find malware in encrypted traffic? In case of Cisco, the core of machine learning team that enables Encrypted Traffic Analysis (ETA) is about 50 engineers, security researchers and AI researchers based in Europe and United States. Their goal is pretty simple to describe, but far more difficult to achieve. They make sure that Cisco customers will be protected against malware despite the fact that most of the network traffic is now encrypted and an even larger majority of malicious traffic uses TLS, Tor or other encrypted protocols.
Our answer is based on ETA telemetry produced by a growing range of Cisco routers and switches. ETA telemetry is conceptually very simple. In addition to the normal Netflow export, the switch or router produces additional data fields that add information to specifically address our targeted use cases. The Initial Data Packet (IDP) contains the payload from the first packet of each unidirectional flow. In the case of TLS, the client and server IDP would contain the client and server hello’s, and in the case of HTTP, it would contain the initial request and response. The Sequence of Packet Lengths and Times (SPLT) contains information about the sizes and timing of the connection’s initial packets. This information allows us to differentiate between different categories of communication inside the TLS tunnel, e.g., web browsing, email downloads, file uploads and downloads, and many others. The Netflow and ETA information elements are processed by the on-premise Stealthwatch appliance, but the malware discovery is done in the Cisco cloud.
The Engine that Powers ETA
So, how do we secure something that we can’t decrypt and inspect on the content level? Our answer is based on a combination of modern machine learning and automated communication analysis performed on a global scale. ETA malware detection has not been built from scratch, but is based on Cognitive Threat Analytics (CTA), a cloud-based behavioral detection engine.
The CTA engine receives and processes the behavioral data from Cisco clients and returns them the list of suspected ongoing infections of their devices, with an estimation of risk, malware actor attribution and other information that is critical for the investigation. As our input, we are processing more than 10 billion Netflow records and proxy logs per day, together with information coming from additional channels. All the flows and requests are attributed to a specific customer, and most of the engine’s processing is customer-specific. This allows us to guard against information leakage that might implicitly arise if we were to process the behavioral data across multiple customers.
The combination of different categories of classifiers is intentional. The specific classifiers allow the system to achieve short detection times in case of known and easy-to-detect attacks. For most of the specific classifiers, they also annotate the finding with risk assessment, threat information and predict the files that can be found on the infected host, as well as other system damage. On the other hand, the generic, long-term classifiers give the system the ability to find new malware, distinguish it from known attacks and to start building its behavior models as discussed below.
Global visibility
The next steps in processing are no longer customer-specific. The CTA system consolidates the incidents identified across all the customers, which allows some incidents to be attributed to known malicious actors. Given this information, the behavioral models representing these actors’ techniques, tactics, and resources are updated. The behaviors of incidents that cannot be attributed to known actors are then cross-references with information coming from other Cisco security products such as AMP and ThreatGrid. This preliminary model can then be confirmed to create a model of a new malicious actor. These models are then transformed into additional classifiers that are applied to the data in the classification phase.
CTA leverages the global intelligence collected across several products offered by Cisco. In particular, the above-mentioned correlation of data from endpoint-based security products such as AMP and ThreatGrid and data from network-based security products allows Cisco to track malicious campaigns as well as server groups used by legitimate applications in real time.
While monitoring network behavior of known and well established legitimate applications on the endpoints may seem redundant, it is actually crucial for the effectiveness of our system. CTA is aware of legitimate servers typically used by these applications at any point in time. At the same time, we create behavioral models of such servers and applications. Both the servers models and application behavioral models are then used in ETA classifiers in order to improve their precision and reduce false positives.
Tracking the network behavior of known malware families allows us to create behavioral models of malicious behavior. Consequently, any abrupt change in the behavior of any malware family is promptly observed and behavioral models are updated; leaving little to no space to evade detection. As such, this improves the recall of the classifiers.
Conclusions
Globally, our system is based on two main principles. It is built as an cascade of hundreds of classifiers, progressively analyzing the data and discarding anything found normal. This allows us to solve the base-rate problem (as malware constitutes only a small part of traffic) while maintaining the system’s ability to discover the new malware. The other principle is feedback. The knowledge generated by complex classifiers and subsystems deep in our funnel is used to constantly generate updated classifiers and parameters for the early stages of the funnel. Updated classifiers then produce even better data and information and start the self-improvement cycle.
Finding malware in encrypted traffic (or even non-encrypted traffic, for that matter) is not an easy job, but it is the job that we love and can be very passionate about. It is not magic. Anything that we achieve is the result of the hard work of a passionate and dedicated team. Our engineers release 5 or 6 new versions of the software every week and make sure that the platform can constantly receive, process and destroy the vast amounts of input data. Our researchers constantly design new algorithms to find even mole malware for you. And our security researchers are engaged in a daily battle of wits against the attackers, to make sure that the results we give you are as complete and as actionable as possible.
https://blogs.cisco.com/security/securing-encrypted-traffic-on-worldwide-scale