Memory Forensics in Clouds and Containers

At Netflix my targets are primarily Linux VMs running in a microservices environment. I rarely have to pursue criminal prosecution but often want to figure out what is happening. Much of the work in memory forensics has targeted Windows systems and forensically sound capture methodologies that pull the entire system memory. This focus makes sense given the attack surface that user systems represent; however, given my goals I believe that userland memory acquisition and targeted analysis of memory at process granularity could yield more efficient results. Microservices, CI/CD deployments of immutable and ephemeral instances, the trend to containers, and various hardening efforts in the kernel, confer advantages on the defender that we must leverage to keep up with the scale and velocity of our environment. Another area that is not well developed is when target processes on the edge manage their own memory, as many of today’s popular languages do. I am curious what folks are doing in this space, so I wanted to mention some trends I see and solicit feedback:

Desktops vs Datacenters vs Microservices

User systems are the most difficult to secure. Particularly general compute devices operated by a technical workforce that needs full access to their systems, and will install all manner of tooling from around the web to accomplish their jobs. The trend to ‘sandboxed’ systems tied to a signed and vetted app store (like iOS, ChromeOS, Win10S) will shrink the attack surface for less technically demanding staff, and will lock out most forensic analysis. It will be interesting to see what happens to the EDR space.

Servers, assuming they are not being used to browse the web, should have far less variation in their workloads. Further microservice nodes run very few processes, and if deployed to scale horizontally have a group of peers built from the same disk image against which to compare for ‘known good’. ASLR and other factors will of course lead to differences in live memory layout; however, there will be many similarities we can leverage to eliminate areas for inspection. In a serverless future we may be similarly locked out of forensic access and limited to distributed tracing (logging), but I think that is further off. There is still room for impact in sever-focused memory forensics.

System vs Process Memory

There is a body of work on system introspection, vice forensic analysis - things like LibVMI - which focus on system-wide visibility with more flexibility in how things are captured. Process level capture can be performed by carving the process out of a system-wide memory capture, but that does not provide any benefit for performance. Instead using core dumps, process_vm_readv(), or ptrace to capture a single process reduces the time and resources required to perform acquisition. Projects like ECFS can decorate a core dump with information from the ELF file and produce a rich set of information to detect memory abnormalities.

Cloud Deployments

Most commonly today you will deploy a virtual machine into a cloud service, in our case AWS. As part of the shared responsibility model the provider maintains the host and you get full access to the guest. Unfortunately you cannot use host functions to access guest memory on most public cloud platforms, although I encourage all the providers to look into giving their customers a libVMI API. Since you also do not have access to the racks in the data center, you do not have a practical way to take a traditional forensically sound image from outside the guest itself. You have to operate on the guest to image the guest and that violates some basic principals - fortunately as I mentioned this isn’t a blocker to my goals of detection and investigation vice prosecution.

So we are on the guest. We can use LiME or another tool to snapshot the entire memory of the guest… but do we really need the entire memory footprint? Even the kernel is a process, so if we want to look for rootkits we don’t need the 2GB of java mapped memory.

Containers

Containers are simply groups of processes from the host perspective. They provide an interesting opportunity to inspect a container from the relative safety of the ‘host’. In a simple case you can run ‘docker top CONTAINERID’ and get the global PID of the processes running within that container, and then dump its memory. It would be interesting to see a set of memory analysis tools that were ‘container aware’ and could decorate dumps with context from the container manager as well as the system.

Interpreted Languages

Many popular languages for developing web applications these days make use of bytecode interpreters that manage their own memory space. It is not uncommon to see a virtual machine memory space that basically just maps in a giant Java process. All the good stuff is happening within Java (Python, node, Ruby, etc). What have folks found to address this?

Conclusion

These are a couple of the areas we are investigating and I would be curious to hear what folks have found useful in terms of detecting in-memory attacking in their server infrastructure.