Like many other security analysts, I got my first peak at the Amazon Security Lake (ASL) last week in a private briefing ahead of the official release on May 30.
So just what is this ASL thing, and why is it different from the myriad of data lakes and log management systems of the past?
ASL is a fully managed security data lake service that can automatically centralize security data from AWS environments, SaaS providers, on-premises data centers, cloud providers and third-party sources into a data lake that’s stored in your AWS account, the company said.
Basically, ASL is a cloud-based data lake hosted on AWS, using S3 buckets for storage. Though that’s the basic makeup, I believe ASL may represent a big step forward for cybersecurity data management and analytics. The ASL could lead to a number of improvements. Here are six possibilities, in no particular order:
- Create a central data service owned by the security team. Security teams collect, process and analyze huge amounts of data using a multitude of tools such as endpoint detection and response, network detection and response, security information and event management (SIEM) and vulnerability management systems. It’s pretty common that they collect the same data multiple times with different analytics systems for different use cases. Recognizing the redundancy and waste within this methodology, TechTarget’s Enterprise Strategy Group (ESG) envisioned a common distributed storage service as the basis for its security operations and analytics platform architecture (SOAPA) in 2016. ASL isn’t exactly the distributed storage service we imagined, but it can act as a common repository for all security data. It can also be owned by security teams, so they no longer need to request data access from disparate IT and software development teams each time they need it. When security owns the data, they are likely to figure out new and creative ways to use it.
- Change the economics of security data storage. Security data management has always been a Faustian bargain between retaining data and paying storage costs. Since stealthy cyber attacks can remain undetected for months or years, storage limitations — like retaining online data for 90 days — can affect the accuracy and efficacy of retrospective forensic investigations. By utilizing S3 buckets, Amazon changes the economics of data retention, making long-term data retention more attractive. To be clear, this isn’t a new development, as Google Chronicle had a similar value proposition when it was released in 2018. Nevertheless, lower security storage prices are good news — especially for users betting heavily on AWS for storage of Virtual Private Cloud flow logs, CloudTrail, and so on.
- Push adoption of the open cybersecurity schema framework (OCSF). As a reminder, OCSF is a common data schema announced at Black Hat 2022. This was a promising potential security data standard, but there hasn’t been much news around it since last August. No longer: Amazon claims 138 members are involved in OCSF, including technology vendors such as CrowdStrike, IBM, Salesforce and Splunk, as well as large financial services firms and government agencies. According to our research, 30% of enterprise organizations use more than 15 different tools for security operations, with most using their own proprietary logging format. This saddles security engineers with tasks like data normalization, transformation and management. OCSF could alleviate this bottleneck by creating a lingua franca for security data. Beyond ASL, Amazon is also promoting OCSF by delivering an open source tool for mapping data into the schema. Thus, ASL could bring together a common data schema combined with a central security data management service — a potential new foundation for security analytics and operations.
- Accelerate and accentuate detection engineering. This one gets into the benefits of a common schema. ESG research reveals that 48% of enterprise organizations develop a significant number of custom detection rules on top of those their security vendors provide. Once again, security tools sprawl gets in the way, as individual tools tend to have their own query language and/or detection rules syntax. Yes, there are ways around this with standards like Sigma or by working with vendors such as Anvilogic, CardinalOps or SOC Prime, but OCSF could make queries (and detection rules) transportable across analytics tools. This could unleash higher volume, distribution and sharing of detection rules.
- Initiate new types of analytics. Every security analytics vendor must develop a high-speed, scalable data pipeline before they can start analyzing the data. Again, there are workarounds like building on top of Databricks or Snowflake, but OCSF, S3 and a slew of other Amazon services make ASL an attractive security data repository and development platform. I could see tools like cloud detection and response, extended detection and response, and security asset management systems such as Axonius, Brinqa, JupiterOne and Panaseer sitting on top of ASL in the future.
- Serve as a basis for AI models. Training large language models requires a lot of data, processing and storage. Check, check and check for ASL. It may be a while until enterprises are ready to build custom AI models, but when they do, ASL will be ready.
Amazon made two things extremely clear upon the announcement of ASL:1) It is a data lake/log management system not a SIEM service, so bring your own security analytics; and 2) customers own their own data — Amazon won’t look over their shoulders for its own data mining or model-creation purposes.
While ASL is not a SIEM or a public data repository, it is an affordable, high-performance, scalable security data lake built on open standards. That alone represents real progress.
Jon Oltsik is a distinguished analyst, fellow and the founder of Enterprise Strategy Group’s cybersecurity service.
Dig Deeper on Data management strategies