We present sandbox mining, a technique to confine an application to resources accessed during automatic testing. Sandbox mining first explores software behavior by means of automatic test generation, and extracts the set of resources accessed during these tests. This set is then used as a sandbox, blocking access to resources not used during testing. The mined sandbox thus protects against behavior changes such as the activation of latent malware, infections, targeted attacks, or malicious updates.

The use of test generation makes sandbox mining a fully automatic process that can be run by vendors and end users alike. Our BOXMATE prototype requires less than one hour to extract a sandbox from an Android app, with few to no false alarms on frequently used functionality.

Watch Andreas Zeller present BOXMATE at the TCE conference:

Vision paper

You can obtain our "Visions of 2025 and Beyond" submission here.

Technical report

In our paper (submitted for publication) we introduce the notion of sandbox mining. The anonymized experimental data used in the paper can be accessed here.

The data contains:

  • summaries and comparisons of our test generator (DroidMate) runs to human-written use cases, including the data on which API calls have been observed and when;
  • false positives observed during use case runs;
  • raw data points used to generate charts;
  • raw Android device logcat logs obtained from the device when conducting human-written use cases;
  • AppGuard API list.


I want to write malware. How can I stay in business?
With BOXMATE, you are in a “disclose or die” dilemma. Either you expose malicious behavior during mining, and then it becomes explicit for scrutiny and discussion; or you do not, and then the sandbox prevents it.

How about I ship a permissive sandbox with my malware?
With BOXMATE, anyone can safely assess all resource accesses of your program as well as your provided sandbox before installation.

I could craft and propagate an “official” rule that allows my attack.
Your rule would have to withstand public scrutiny, very much like patches to open source programs.

I could hide my malicious access in a myriad of others.
Minimizing the sandbox would reveal which accesses are really required, and a large difference would be suspicious by itself.

I am a Secret Service. How can I do my job?
BOXMATE only detects behavior changes in programs; it relies on the environment (operating system, hardware, network...) to be uncompromised.

I am a user. I am getting a false alarm. What can I do?
Use an “official” sandbox provided by the vendor. Or re-run BOXMATE to have your sandbox include the legitimate behavior.

How can I trust a supplied sandbox?
You can run BOXMATE yourself and compare; if the supplied sandbox allows more resource accesses than your sandbox, there should be a legitimate reason.

If BOXMATE exercises a backdoor during mining, it becomes part of the sandbox, right?
A backdoor is typically designed such that testing would not find it. On top, you can always have BOXMATE reduce the sandbox to a minimum, preventing backdoor usage.

Does BOXMATE collect usage data?
BOXMATE is set to detect differences in program behavior, not usage behavior. It neither collects nor assesses program usage, and the mined rules are set to generalize from any user-provided data.

Does BOXMATE track information flow?
For performance reasons, BOXER does not analyze how accessed sensitive data is processed; it thus assumes it may leak to all accessed sinks.

I am a vendor. How can I ensure the mined sandbox encompasses all legit behavior?
Modern test generators are well set to achieve high coverage; having additional rule sets drive them will get even higher coverage faster. You would top this with your own set of extensive tests, and ship the mined sandbox with your program or upload it to the shared repository.

How will BOXMATE ever get 100% coverage, say of exceptional behavior?
We only need to cover sensitive resource accesses, which is a small subset of behavior. “Benign” exceptional behavior would rarely access yet unseen sensitive resources—in contrast to backdoors, for instance.

But BOXMATE may not cover exceptional behavior.
“Benign” exceptional behavior would only rarely access yet uncovered sensitive resources—in sharp contrast to backdoors, for instance.

Automatic learners for intrusion detection are hardly used in practice.
BOXMATE uses neither automated classifiers nor training on usage data or usage profiles: Using test generation, BOXMATE can systematically and automatically explore all normal program behavior well before production.

Can I use BOXMATE to protect an embedded system?
If the interaction of your system fits the BOXMATE model, BOXMATE could run as a watchdog process or device, monitoring system behavior.

How can I protect my intellectual property?
By design, BOXMATE relies only on externally visible dynamic interaction; the implementation can remain unchanged and arbitrarily obfuscated.