Data Collection: The Key to Fluency’s Stance as a Next-Gen SIEM Innovator

by Christopher Jordan

July 6, 2020

The Internet is Your Network

Security operation centers need to consider the Internet as their network. It used to be that security savvy companies had well-defined networks. Their data resided on office systems and they communicated to data centers and remote office via virtual private networks (VPNs). But competitive businesses changed the definition of the corporate network. Now, data resides as much outside the network as it does inside the network perimeter.

Although your data and processing is now dispersed, your collection still needs to pull it all into a central location. Central Log Management is the core capability of a SIEM. SIEM technology often makes references to collectors. You should never need a collector. All systems can produce syslog. Windows systems will need a little help but can still send syslog. For all those cloud products, they use RESTful connections. This means that no matter where the device is in the world, if it can connect to the internet you can get its logs.

Having an agnostic collection strategy is the key to business flexibility and growth. It used to be that well-formatted logs were corporately defined formats like the Common Event Format (CEF) or Log Event Extended Format (LEEF). These allowed SIEM vendors to more accurately parse and normalize data. That was then. Now you will find systems using JavaScript Object Notation (JSON). It is the record format of the web. More syslog servers are supporting it, and all web servers do too. It is the expected format of RESTful services. With the move to a common format, not tied to a vendor, central log management has become easier to implement, while producing better results.

Fluency’s award-winning analytics and log management solution collects everything through data streams, versus data lakes. That way, the solution is able to make use of all available data. The data ingest process includes the important step of masking Personally Identifiable Information (PII) and Personal Health Information (PHI) that might affect compliance. Fluency’s masking capabilities serve as a notable differentiator in the industry.

Just following what happens to a record as it goes through a system gives a good indication of if the product views that data as a stream or a lake. Stream designs move the data through a chain of processing, trying to avoid using the database. Storing the data should be a final step, as once it is stored it requires more effort to search and analyze. This is why some products do well in a lab but cannot perform in real environments.

What makes good log management is not the collection, but how the data is handled. As stated, the collection infrastructure should be agnostic. How the data is processed makes the difference in how the data can be used, and how well the system can scale. When data processing is done by the user, the industry refers to this as Security Orchestration and Automated Response (SOAR). But many SOAR concepts should be automated by the log management. These internally are referred to as:

  • Data Enhancement
  • Data Association
  • Data Correlation

Fluency flexibly offers the ability to run in the AWS cloud or on-premises. The AWS option enables high availability and high durability. It’s a multi-tenant architecture, so each client can work within its own dedicated cloud instance. AWS also facilitates significant capacity, rated at 250+ TB/day and up to 12 million events per second (EPS). The solution supports Internet Protocol version 6 (IPv6), the most recent edition of the Internet Protocol, for identifying and locating systems on networks and routing traffic across the Internet.

Fluency is able to ingest security data from so many varied sources due to its use of standards-based RESTful APIs. With an API-based architecture, the solution pulls data from virtually any source and eliminates the need for custom-coded connectors. Standards-based APIs also bring significant flexibility for adding and modifying connections with data sources.

The varied data streams coming into Fluency are uniquely fused into a meaningful form, first normalizing the many incoming data sets and then building a correlated event record that spans multiple data inputs. For example, a data point from a network firewall log, which is meaningless on its own, may correlate in the fused event record with a simultaneous event in an IDS. Together, these two correlated events may reveal a threat that deserves attention. The creation of the comprehensive correlated event record is only the first, most basic step in running analytics on security data. Fluency makes effective use of this data and interprets what’s going on in order to spot potential issues before they turn into serious security problems. That is where Fluency’s threat scoring processes come into play.

Once Fluency has identified a threat, it assigns a score based on numerous parameters. The initial discovery may occur due to the recognition of a known threat signature or bad IP address. Fluency uses leading reputation services such as VirusTotal and Webroot for this purpose. At other times, AI and ML work together to identify anomalies in the data that suggest the presence of a threat. Figure 2 shows an example of how Fluency compares actual versus expected system event activity to detect possible threats.

Figure -- Example of Fluency’s ability to identify the most urgent situations based on an AI-derived certainty level.

The Fluency approach is distinctly multi-layered. For the end user, however, the process is transparent. While the solution looks at relationships between events and ties in data about historical activities, use of devices, user identities and so forth, the user simply sees a list of high-priority notifications based on threats scores. The analyst can use the Fluency EventWatch toolset to dive into a suspected threat and explore events and attributes that were not part of the original issue but may be related – to create a more complete picture of potential compromise.

Using advanced data analytics, Fluency is able to fuse and correlate anomalies, signatures and reputation data from all possible sources — giving the SOC the ground truth, comprising:

  • All aspects of the data – raw data plus derived data such as alerts and process insights
  • Intelligence – third-party threat intel, reputation data, geo-IP, etc.
  • Association – what’s known from inside the system, the user and so forth

Ground truth gives analysts real-time situational awareness. This comes from Fluency’s ability to perform analytics in-flight. Figure 3 offers an example of Fluency’s ability to identify the most urgent situations based on an AI-derived certainty level.

Fluency identifies and highlights the issues that need the most attention. Alerts may come to the SOC from Fluency in the form of a ML-based (i.e. inferred) threat such as a newly discovered domain or IP address. There can be reputation alerts, such as an unknown HASH or systemic alerts and suggestions that the analyst block a firewall or IPS. Fluency also brings new capabilities for forensics.

Continuously improving threat detection and response is crucial for maintaining a robust security posture on an ongoing basis. With this in mind, analysts can use Fluency to investigate past security incidents to constantly enhance their ability to detect threats.