[Soot-list] [Android][FlowDroid][SPARK] A question about the precision of taint analysis in Flowdroid (and possible false negatives in spark)

Tue Apr 16 14:28:46 EDT 2019

Hi Sumaya,

The hand-annotated training set for Susi might  still work reasonably well, because Android has not changed completely. Most of the methods have already been around for a while and will also still be in the code base for quite a while. Further, the Android project still follows the same or at least sufficiently similar coding styles for, e.g., naming classes and methods. Therefore, the model for the learning algorithm didn’t change much. In the paper, we even reported good results for APIs that weren’t directly Android, but rather Chromcast or the like.

SPARK isn’t complete. It’s fairly precise. The problem lies within the algorithm itself. It’s worth reading the original SPARK paper by Ondrej Lhotak for the details. In short, SPARK first looks for allocation sites. It then propagates the type information through the CFG. Whenever it encounters a call site, it knows the precise type of the base object from that propagation and can add a callgraph edge. This makes sure to significantly reduce spurious edges. However, if an object is used, but there was no visible allocation site (e.g., because the object was created through a factory method that is not part of the app code, but rather lies within the Android framework), there is no type information that SPARK could propagate. Consequently, for call sites with this object as the base object on which some method is called, no edges will be created, and you have an incomplete CG. The only alternative would be to over-approximate those cases and fall back to CHA or CTA for just those call sites. Adding such a feature to SPARK would probably be worthwhile. FlowDroid uses a different approach and relies on external domain models generated by StubDroid instead, which is more precise.

Best regards,

  Steven

From: Sumaya Abdullah A Almanee <salmanee at uci.edu>
Sent: Tuesday, April 16, 2019 8:16 PM
To: Arzt, Steven <steven.arzt at sit.fraunhofer.de>
Cc: soot-list at cs.mcgill.ca
Subject: Re: [Soot-list] [Android][FlowDroid][SPARK] A question about the precision of taint analysis in Flowdroid (and possible false negatives in spark)

Thank you so much for your response Steven!

I've actually changed my approach since I've sent the previous email. Now I marked native methods as sinks and used sources produced by SuSi. which brings me to my next question:

I run SuSi on API 28 and used the same ground truth listed in SuSi's Github repo. The recall and precision reported in the output (result of the ten-fold cross validation) were high (89% for recall and precision). Which is not what I expected given the fact that I didn't hand-annotate a ground truth for Android's 28 API. I did notice that the number of methods per buckets were relatively small in API 28 as compared to API 17 and some entities were purged before running the 10-fold cross validation. I suspect that there will be more false negatives than those reported in the recall.

Regarding SPARK's call graph, is the lack of completeness in SPARK due to implementation reasons, or is it an inherent issue with the algorithm? I was under the impression that SPARK is sound and more precise than the rest of the cg algorithms that's why I opted for this it when running FlowDroid. Since reducing false negatives in my analysis is more vital, I'll consider using VTA instead.

Thanks again for your valuable feedback Steven.

On Mon, Apr 15, 2019 at 6:40 AM Arzt, Steven <steven.arzt at sit.fraunhofer.de<mailto:steven.arzt at sit.fraunhofer.de>> wrote:

   Hi Sumaya,

   I’m not sure that your approach is likely to scale to realistic apps. For every source, FlowDroid needs to track a taint abstraction through the program. With thousands of sources, I don’t think the analysis will terminate in any realistic time frame. You normally have a few dozen or maybe a few hundred sources that apply to a single application, but not thousands.

   I’d suggest that you only specify those methods as sources that you are actually interested in. It might very well be the case that data from some method foo() is passed to native code, but if the return value of foo() is not of interest, it doesn’t really matter what the native code does with it.

   Secondly, concerning the callgraph: SPARK’s callgraph is incomplete, because it needs to propagate type information from allocation sites to call sites. Therefore, if there is no call site (e.g., because the call site is hidden inside a factory method in the OS), the calls on the respective base object are missing from the CG. FlowDroid handles these cases through StubDroid summaries, and does not rely on the SPARK CG alone.

   Best regards,

     Steven

   From: Soot-list <soot-list-bounces at cs.mcgill.ca<mailto:soot-list-bounces at cs.mcgill.ca>> On Behalf Of Sumaya Abdullah A Almanee
   Sent: Thursday, April 11, 2019 12:35 AM
   To: soot-list at cs.mcgill.ca<mailto:soot-list at cs.mcgill.ca>
   Subject: [Soot-list] [Android][FlowDroid][SPARK] A question about the precision of taint analysis in Flowdroid (and possible false negatives in spark)

   Im currently using FlowDroid to simply track taint propagations between certain sources and sinks. since I'm performing a separate analysis on some native libraries of Android apks, I've decided to leverage FlowDroid to track any taints passed/leaked from the dalvik-side to the native-side.

   The way I configured the Source_Sink files is by first examining the reachable functions in the call graph generated by FlowDroid (using spark) and then marking these reachable functions as follow: any native function is marked as _SINK_ and everything else as _SOURCE_.

   I obtained some initial results. A small snippet of these results is shown below: (The results highlighted in yellow are the ones that Im mainly interested in)

   Based on the way I constructed the sources and sinks config file I was expecting more leaks to be reported. If I understand correctly these results might contain false positives for example in the case of arrays or collections (due to over-approximations). However, FlowDroid is unlikely to miss any leaks (low false negatives rate). Is this correct? What I'm trying to figure out here is:

   1) An estimate of false positives or false negatives in FlowDroid's reported leaks.

   2) Possible reasons why some leaks might be missing (false negatives)?

   3) Since FlowDroid is relaying on the call graph for reporting taints (in this case SPARK) and since the absence of a node in the graph might result also in missing reported leaks. I was wondering is there's also an estimate of false negatives in Sprak?

   I really appreciate your time and help with this!

   Best,

   Sumaya

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.CS.McGill.CA/pipermail/soot-list/attachments/20190416/f4512520/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 105629 bytes
Desc: image001.png
URL: <https://mailman.CS.McGill.CA/pipermail/soot-list/attachments/20190416/f4512520/attachment-0001.png>