This is a follow-on to my previous post here.
In the last few months I have had the chance to talk to a number of other companies and gotten a lot of great inbound response to the post about disk forensics in the modern era. To summarize:
“Our goal is to move towards a streaming model for disk forensics by updating various existing tools that are built around the concept of imaging a single machine and creating a file. In a cloud environment we need to image multiple (even hundreds) of machines and process all those images into meaningful event data. Acquisition is fairly painless via API, but then what to do with the resulting snapshots?”
While we were hoping for a cloud agnostic solution it seems we will need to solve this locally first and then seek areas to reuse code and integrate our systems in the future. We continue to approach this problem by decoupling acquisition, processing and analysis. We also hope to implement this in a scalable manner, ideally leaning into serverless techniques to minimize infrastructure maintenance.
Google is investing in DFTimewolf as part of the log2timeline/Plaso, along with GRR and Rekall. I think the Rekall agent architecture is the way to go. Where past agents relied on a callback infrastructure - generally an https server somewhere - the new way forward is to use cloud block storage (or S3 in AWS) to backhaul evidence and provide tasking. In our case we will likely go step further and provide deployment and tasking through native orchestration mechanisms instead of via an object store.
We have been working on a project, DextIR, that leverages AWS native features like the snapshot API and Amazon Simple Server Manager (SSM) to perform acquisition. By either snapshotting the underlying volumes, or executing live response scripts through SSM (a subsystem we call AWSler) we are able to acquire specific files of interest and move them to S3 for later processing. This targeted approach gives us a speed increase over Plaso - which tries to ‘brute force’ parse every file on the disk - at the cost of coverage. We hope to address coverage with differential analysis either with a hash-tree or perhaps leveraging filesystem capabilities (ZFS, or Docker layers for example), and the fact that we deploy ephemeral/immutable instances and thus have a gold image handy.
We have a python module that ingests Forensic Artifact YAML definitions and moves the specified files and command outputs into an S3 bucket. Setting up the various IAM policies to support this in a multi-account environment was an interesting look into the world of S3. TL;DR we use a role in the forensics account that any other role in our environment can assume into, this role allows writing files to our evidence bucket. The use of a role ensures that the owner of the bucket is the same as the owner of the key (file) written into it, and saves us some grief when we come back to analyze the files.
We are also working with Facebook and OSQuery to see if we can leverage their great work and deploy OSQueryi as a trusted binary via SSM. This way we would not need to trust binaries on the target, and we get clean JSON output of interesting system state; i.e. as opposed to running lsmod and parsing the stdout, we get a json blob from Osquery with all the kernel modules.
Google is investing in Turbinia, a distributed worker model built on top of some cool Google / GCP capabilities. It will allow a piece of evidence to be passed to multiple workers which could be functions or full up VMs. Right now they support Plaso workers that can grind through a disk image and extract events. Laika BOSS is another distributed framework for forensics maintained by Lockheed Martin and used by a couple tech firms in the Bay. These look like compelling options but are not built with AWS in mind.
We are looking to extend AirBnB’s BinaryAlert to support multiple paser types. BinaryAlert is entirely serverless and built on top of AWS. Deploying it was as easy as running a terraform script, so for us, as an AWS-focused company, this makes a ton of sense. Currently BinaryAlert waits for files in an S3 bucket and then executes a set of compiled YARA signatures against it. This allows asynchronous collection of files, and on-demand scaling to process them. As it is all written in lambda functions and SQS queues there are no server racks idling waiting for files. With a slight tweak we hope to be able to route files to a broader set of lambdas based on metadata tagging of the S3 keys. Most of these functions will implement file parsers, but they also could spin up VMs for heavier tasks (detonation, etc), or forward on to external systems like Turbinia or vendor APIs.
These parsing functions offer a great opportunity for code reuse. One of the big problems in detection and alerting (and security / devops / etc) is the integrations problem. Whether it is integrating with a new API or ingesting a custom log format, we have so many systems we need to interoperate with getting them to talk to each other is huge task. For this use case we generally have configuration files that we want to convert from various legacy linux/unix line formats to JSON. We don’t have to go too heavy on the JSON schema (no one wants to recreate CEF), but it would be nice to share a common set of principles on how these system characteristics should be represented. OSquery paved the way here, so we will try to fit with their data structures as much as possible.
Call to Action!
If this sounds interesting please reach out. We have plenty of opportunities to collaborate on parsers and other aspects of the system, and I am hiring a responder with development skills to help build this out if you are interested: job description.