Android Application (Dalvik) Memory Analysis & The Chuli Malware

April 1, 2013

Introduction

In this blog post, we will be presenting new functionality that will be incorporated into the next major Volatility release after version 2.3. This functionality allows for deep analysis of application internals on the Android operating system. All Android applications, such as those downloaded from Google Play, are powered by Dalvik, which is Google’s version of the Java Virtual Machine (JVM). Each application runs in a separate process and is given its own instance of Dalvik. By performing memory analysis on specific instances of Dalvik, we can recover all information of an application, such as loaded classes, and the static and instance variables and methods (functions) of each class.

Background

Back in 2011, our friend and co-researcher Andrew Case (@attrc) performed initial research on this topic, and presented it at SOURCE Seattle 2011. However, his work was quite incomplete, as even though he could recover loaded classes and information about class variables, he could only recover the values for the base types (int, char, etc). This means that the initial effort could not handle application-specific types and class objects. Because of this, there were obvious limitations for deep investigation.

Fast forward to late last year, when 504ENSICS Labs received a six month DARPA Cyber Fast Track award to build on this proof-of-concept. Our goal was to be able to recover all Dalvik types and to ensure that the code would be compatible with the latest version of Volatility.

Through collaboration between 504ENSICS Labs and Andrew, infrastructure was built into Volatility that could produce a SQLite database of all Dalvik information stored in memory for a process or set of processes. A GUI, Dalvik Inspector (DI), was also developed that presented this information in a hierarchical structure viewed by each loaded class. This allowed for viewing of the class’ static variables (the same for each instance) along with every instance and its own instance variables. The GUI also supports searching across class names, variable and method names, as well as the contents of the variable for each instance. The tools can find historical instances of classes as well as the current ones. Assuming the data pointed to by these historical instances is not overwritten, we can recover information going far back in time of the application’s use.

Malware and Deep Application Analysis

By using this set of functionality and the associated GUI, analysts can very quickly uncover deep internals of a running Android application. These internals include the exact classes and variables used to store important information, such as configuration data, usernames and passwords, data to be exfiltrated, and much more. Compared to the existing methods of manual reverse engineering, this can save considerable time and effort. It can also uncover information that is only determined at runtime, such as data downloaded from C&C servers, data read from files and databases across disk, etc.

To help with automating analysis of interesting variables and classes across many samples and cases, we also developed an API within Volatility where the user only has to specify the class(es) of interest along with the class member(s) of interest to a template plugin. The API then recovers this information and reports it on the command line.

Analysis of the Chuli Malware

In a recent blog post on SecureList, Kaspersky malware experts posted about a malware sample that had strong political ties (see the post for those details) as well as interesting technical properties, such as the use of a C&C protocol that incorporated text messages for definitive command as well as exfiltration of user’s sensitive data.

In their post they discuss the reversing that was done in order to recover relevant information, such as the hostname of the C&C server, the phone number to receive command texts from, and the set of possible C&C commands.

To investigate this malware, with DI, we obtained a sample, infected our emulator, and then took a physical RAM capture with LiME. We then simply searched for “http” inside instance variables and we are immediately shown a class com.google.services.PhoneService with instance variables nativenumber, which is described in the blog post, and hostname, which is the hostname of the C&C server:

Note: We blanked the phone number value in the picture to avoid revealing sensitive information.

We also find the class that holds the valid C&C commands and their current values:

To automate such analysis, we developed a chuli_info plugin that can automatically recover this data from memory (note: the phone number masked with #s):

# python vol.py --profile=Linuxemulatorx86 -f ../infected.lime dalvik_chuli_info -a 997 -b /root/chuli.good.db
Volatile Systems Volatility Framework 2.3_alpha
Application                    PID             Phone Number              URL
------------------------------ --------------- ------------------------- ----------------------------------------
google.services                        997 phone#######        http://64.78.161.133

This shows the phone number and C&C URL value. Because of the simplicity of our developed API, the entire plugin is 27 lines of code, where 26 of them are based on a template and the investigator only had to fill in the class name and instance variable names of interest, which can be gathered within the DI GUI.

Other Interesting Data

During the course of our research, we found a number of in-memory classes that are useful across nearly all Android applications. The first is android.content.pm.ApplicationInfo and its members dataDir, installLocation, processName, sharedLibraryFiles, and uid. These tell you which process name the application is running as, where it was installed, any shared libraries (native code) that it uses, and the user ID that it is running as. These are important as shared library can contain code outside of Dalvik and will require more general reversing (e.g. with IDA Pro), and the uid member will be the user ID in which files are created on the in-phone filesystem. This can be used in conjunction with disk forensics to determine any dropped files by the malware. We have a plugin for this data that will be released when the code is pushed into Volatility SVN later this year.

Another interesting class is java.util.regex.Matcher and its input member. This member holds information on recent data that was passed through the regex class. In our tests, this can often act as a ‘keylogger’ for each application. For example, in the text message app, it holds parts of text messages and contacts typed in order to send to, in the phone app it holds numbers dialed, and in the browser app it holds information typed into the URL bar and other forms. It also holds information displayed in applications, and in the Chuli sample there are a number of instances that hold the message displayed on the screen when the malware is executed. We have also developed a plugin to automatically enumerate and parse objects of this class type and their input members.

Finally, there are a number of classes related to network information that we have explored. These let you determine URLs and ports used for network connections, amount of data sent, and context used. These will all be explained later, but we cannot give everything away in a teaser right?

Other, Related Research Efforts

Earlier this year, two other groups of researchers emailed the Volatility Developers list with their own research efforts that could enumerate information from Dalvik instances. Much of Holger’s original research overlaps parts of the 504ENSICS effort and we take other approaches in some places. Holger’s work eventually led to the publication of his thesis in January, which is a must-read for anyone interested in Dalvik internals and how it relates to memory forensics.

Closing Thoughts

We have given a very brief glimpse of our Android memory analysis capabilities that will be integrated into Volatility over the next several months. Dalvik analysis is extremely powerful and allows investigators to undercover data in minutes that normally takes hours or days of reverse engineering. The search abilities allow for quickly uncovering interesting parts of the application. Another interesting aspect (that will be detailed in a later post) is how this work can be used to bypass “packed” and anti-debugging malware. Since we are inspecting the core components of Dalvik, we do not have to care at all about the on-disk data and any related countermeasures that malware authors may use. This equally applies to analyzing commercial Android applications that attempt to protect themselves from reversing.

About 504ENSICS Labs

504ENSICS Labs is a privately-owned firm that specializes in cutting edge research and development of tools and techniques for digital forensics and computer security. In order to keep current, 504ENSICS Labs also provides digital forensics, network, and application security services and offers training on the same.

The co-founders, Lodovico Marziale, Ph.D. (CV) and Joe T. Sylve, M.S. (CV), are well known in the digital forensics and computer security research community, have published research in several peer-reviewed academic journals, and have presented their research at many conferences across the country.