[Soot-list] Missing call graph edges

Peter Kim chpkim at gmail.com
Tue Feb 10 15:13:10 EST 2015


Hi Steven,

Thanks for your clarification. What I was really trying to ask was if there
is an easy way to manually fill in the gaps, missing due to library stubs,
for call graph construction using Spark for Android apps. Basically, a
model of the Android framework like taint wrappers, but for precise Android
call graph construction.



On Tue, Feb 10, 2015 at 7:45 PM, Steven Arzt <Steven.Arzt at cased.de> wrote:

> Hi Peter,
>
>
>
> There still seems to be some misunderstanding about the concept of taint
> wrappers. You write that you do not want to perform taint tracking. In that
> case, the taint wrappers provided by FlowDroid will not be of much help, so
> it doesn’t even matter what you put into EasyTaintWrapperSource.txt. Again:
> Taint wrappers have nothing to do with callgraph construction. They are a
> means for the taint analysis to get along with an incomplete callgraph and
> “fill the gaps” with respect to the semantics of taint tracking.
>
>
>
> The conceptual problem I described in my last e-mail has nothing to do
> with how you obtain the callgraph in Soot either. Your callgraph will be
> incomplete. That’s what happens because of how the SPARK callgraph
> algorithm works.
>
>
>
> The methods I explained in my last e-mail are ways to deal with the
> problem. You can either analyze your apps together with full OS / library
> implementations (with all the downsides this has), or you can extend your
> client analysis to work with an incomplete callgraph which is what I
> recommend. Besides that, if you are willing to greatly sacrifice callgraph
> precision, you might also give CHA a try. CHA does not depend on allocation
> sites, so the CHA callgraph will at least be somewhat sound (given the
> usual small print). For most analyses, CHA is however not an option due to
> its really heavy over-approximation.
>
>
>
> If you chose to include the library code in your analysis, this has
> nothing to do with where you get your apps from. With “library”, I mean the
> Android platform implementation, the stuff that is installed on your phone
> ever since. By the way, “just an APK” is never sufficient. You always need
> some kind of library model; usually, you just use a very minimalistic one
> provided through the stub JAR files from the Android SDK.
>
>
>
> Best regards,
>
>   Steven
>
>
>
> *Von:* soot-list-bounces at CS.McGill.CA [mailto:
> soot-list-bounces at CS.McGill.CA] *Im Auftrag von *Peter Kim
> *Gesendet:* Dienstag, 10. Februar 2015 19:13
>
> *An:* Steven Arzt
> *Cc:* soot-list at cs.mcgill.ca
> *Betreff:* Re: [Soot-list] Missing call graph edges
>
>
>
> Hi Steven,
>
>
>
> Thanks for your response. I have some follow up questions:
>
>
>
> - I changed "EasyTaintWrapperSource.txt" to include "<java.util.ArrayList:
> java.lang.Object get(int)>" but it's still not working. Is this the right
> way to add a taint wrapper?
>
>
>
> - I am not interested in taint analysis. I only want a call graph for an
> Android app. Is what I am doing, i.e. extending Infoflow and getting call
> graph through Scene, the best way to get the call graph or is there a
> better way? Note that I'm getting the call graph *before* FlowDroid starts
> looking at sources/sinks.
>
>
>
> - Suppose that I'm analyzing APKs found on the web. Will these ever come
> with full library implementations or will they have just stubs? So if I
> wanted to transform/analyze library code, it seems that it would not be
> possible to do so with just an APK.
>
>
>
>
>
> On Tue, Feb 10, 2015 at 9:56 AM, Steven Arzt <Steven.Arzt at cased.de> wrote:
>
> Hi Peter,
>
>
>
> The callgraph edges have nothing to do with basic blocks. The difference
> here is that “remove” is called on the base object “objects” and “free” is
> called on “obj”. What you observe is that you have outgoing call edges for
> all calls on “objects”, but none for calls on “obj”. This usually means
> that the SPARK callgraph algorithm is unable to propagate allocation sites
> to the base object “obj”. Therefore, it cannot decide where the calls
> should go.
>
>
>
> The question now is why the allocation site propagation fails for “obj”. I
> guess you are running FlowDroid with the Android JAR files from the
> official SDK? These JAR files are only stubs, so there is no real
> implementation of “java.util.ArrayList”. All methods inside this class (and
> all other system classes) will only throw NotImplementedExceptions.
> Therefore, SPARK cannot know what the return type of “get()” would be – it
> could be everything. In such a case, SPARK does not perform an
> over-approximation using CHA, but simply leaves out the respective edges.
>
>
>
> How to get around the problem? Firstly, you could use a full
> implementation of the Android system classes and analyze them along with
> your program. The disadvantage of this approach is that you are analyzing
> tens of megabytes of system code together with an app of a few kilobytes.
> Your memory consumption will likely blow up to tens of gigabytes due to all
> the allocation site propagation inside the system libraries and you will
> have to wait a long time for your callgraph.
>
>
>
> Another idea would be to just live with the incomplete callgraph. That’s
> what FlowDroid does. We know that we don’t have call edges for some call
> sites. If we encounter such a situation during the taint propagation, we
> query a store of explicit models for library methods on how to continue
> with the taint propagation. You might want to read up on Taint Wrappers in
> the FlowDroid paper. This approach is fast and scalable, with the downside
> of having to provide these models by hand. In FlowDroid, we currently have
> a rule set that works pretty well for most of the Android API.
>
>
>
> This explanation is by the way also consistent with your observation what
> you get a call edge if you change “free” to a static method: In that case,
> you get a StaticInvokeExpr instead of a VirtualInvokeExpr and call
> resolution becomes much more trivial.
>
>
>
> Best regards,
>
>   Steven
>
>
>
>
>
> M.Sc. M.Sc. Steven Arzt
>
> Secure Software Engineering Group (SSE)
>
> European Center for Security and Privacy by Design (EC SPRIDE)
>
> Rheinstraße 75
>
> D-64293 Darmstadt
>
> Phone: +49 61 51 869-336
>
> Fax: +49 61 51 16-72118
>
> eMail: steven.arzt at ec-spride.de
>
> Web: http://sse.ec-spride.de
>
>
>
>
>
>
>
>
>
> *Von:* Peter Kim [mailto:chpkim at gmail.com]
> *Gesendet:* Montag, 9. Februar 2015 22:38
> *An:* Sam Blackshear
> *Cc:* Steven Arzt; soot-list at cs.mcgill.ca
>
>
> *Betreff:* Re: [Soot-list] Missing call graph edges
>
>
>
> Hi Sam,
>
>
>
> Yes, I changed the code, re-ran Soot, and Soot still doesn't report the
> edges.
>
>
>
>
>
> Hi Steve,
>
>
>
> Note that when I change "free()" to a static method, the edge is reported,
> but when it is an instance method, it is not reported. In light of the
> discussion with Sam, I want to make it absolutely clear that the code runs
> fine even when it is an instance method, so in my view, it seems to be a
> bug or perhaps Infoflow is constructing a call graph that is different from
> the traditional call graph, but since you told me that you changed it back
> to return a traditional call graph, I think it deserves an investigation.
>
>
>
> Thanks.
>
>
>
> On Mon, Feb 9, 2015 at 9:31 PM, Sam Blackshear <
> samuel.blackshear at colorado.edu> wrote:
>
> Call graph construction is typically flow-insensitive, so it is not
> precise enough to do the kind of reasoning you are doing in your head (i.e.
> For "objects.remove()" to be included in the call graph, "obj" cannot be
> null). If you are not familiar with the flow-insensitive call graph
> construction algorithms used by tools like Soot, this
> <http://manu.sridharan.net/files/aliasAnalysisChapter.pdf> is a good
> place to start.
>
>
>
> Now, if you changed the code in that way that you described (adding one or
> more objects to the objects list), re-ran Soot, and Soot still does not
> report the edges, that is a problem for Soot (and thus goes beyond my
> expertise :)). But for the example program you posted, Soot's result is as
> expected for a flow-insensitive call graph construction algorithm.
>
>
>
> - Sam
>
>
>
> On Mon, Feb 9, 2015 at 2:22 PM, Peter Kim <chpkim at gmail.com> wrote:
>
> Hi Sam,
>
>
>
> The code snippet is the following:
>
>
>
> BaseTweet<?> obj = objects.get(i);
>
> if (obj.isFinished() && obj.isAutoRemoveEnabled) {
>
>   objects.remove(i);
>
>   obj.free();
>
> }
>
>
>
> For "objects.remove()" to be included in the call graph, "obj" cannot be
> null. So even without any object in the list, if "objects.remove()" is
> included, then "obj.free()" should be included as well.
>
>
>
> Just to be absolutely sure though, I just ran the app with one object in
> the list and made sure that the true branch is executed. Soot still returns
> only "remove()" in the call graph. I also made sure that "free()" prints
> output (meaning it shouldn't be excluded from the call graph).
>
>
>
>
>
> On Mon, Feb 9, 2015 at 9:08 PM, Sam Blackshear <
> samuel.blackshear at colorado.edu> wrote:
>
> Peter,
>
>   What you observe is consistent with what I explained. The
> objects.remove() method is included because objects is initialized to a
> non-null ArrayList. However, obj.free() is not included because the
> analysis is smart enough to determine that there is no possible concrete
> execution in which the method obj.free() will be called (the statement
> obj.free() will throw an exception *if* it ever executes, because obj will
> always be null).
>
>
>
> - Sam
>
>
>
> On Mon, Feb 9, 2015 at 1:54 PM, Peter Kim <chpkim at gmail.com> wrote:
>
> Hi Sam,
>
>
>
> The loop iteration is executed only if there is an item in the list, so it
> shouldn't matter if no object has been added to the list. The code runs
> without exception. The problem is that of two call graph edges that are
> part of the same basic block (remove() and free()), only one call graph
> edge (remove()) is being returned, which is strange.
>
>
>
> On Mon, Feb 9, 2015 at 8:19 PM, Sam Blackshear <
> samuel.blackshear at colorado.edu> wrote:
>
> Hi Peter,
>
>   My suspicion is that the callgraph is correct here. You never add
> anything the the objects ArrayList, so whenever you try to read a BaseTweet
> object out of the list, the analysis (correctly) concludes that only null
> could be returned. If you call a method on null (like isFinished), the call
> graph (correctly) concludes that this would result in an NPE and thus does
> not add the edge. If you want to see these edges in the callgraph, extend
> your code to add something to the objects ArrayList:
>
>
>
> objects.add(new BaseTweet())
>
>
>
> When debugging your static analysis results, it's often helpful to
> concretely execute your target program and be sure that it behaves as you
> expect!
>
>
>
> - Sam
>
>
>
>
>
> On Mon, Feb 9, 2015 at 12:40 PM, Peter Kim <chpkim at gmail.com> wrote:
>
> Hi Steven,
>
>
>
> Here is a complete minimal example as an Eclipse project (just import into
> your workspace):
> https://drive.google.com/file/d/0B9KLXcAovVUHa0FuN3gzRGJETmc/view
>
>
>
> I retrieve the CFG of this app at Infoflow.runAnalysis(final
> ISourceSinkManager sourcesSinks, final Set<String> additionalSeeds),
> calling "CallGraph cg = Scene.v().getCallGraph();" right before "iCfg =
> icfgFactory.buildBiDirICFG(callgraphAlgorithm);". I use cg, not iCfg.
>
>
>
> The edges out of com.example.toyandroid.ChpkimMainActivity.chpkimUpdate()
> I get are:
>
>
>
> <java.util.ArrayList: int size()>
>
> <java.util.ArrayList: java.lang.Object get(int)>
>
> <java.util.ArrayList: java.lang.Object remove(int)>
>
>
>
> But they should be:
>
>
>
> <java.util.ArrayList: int size()>
>
> <java.util.ArrayList: java.lang.Object get(int)>
>
> <java.util.ArrayList: java.lang.Object remove(int)>
>
> <com.example.toyandroid.BaseTweet: boolean isFinished()>
>
> <com.example.toyandroid.BaseTweet: void free()>
>
> <com.example.toyandroid.BaseTweet: void update(float)>
>
>
>
> Thanks for your help.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Feb 9, 2015 at 8:40 AM, Steven Arzt <Steven.Arzt at cased.de> wrote:
>
> Hi Peter,
>
>
>
> Can you please send me a more complete minimal example with which I can
> reproduce the issue?
>
>
>
> Best regards,
>
>   Steven
>
>
>
> *Von:* soot-list-bounces at CS.McGill.CA [mailto:
> soot-list-bounces at CS.McGill.CA] *Im Auftrag von *Peter Kim
> *Gesendet:* Sonntag, 8. Februar 2015 19:05
> *An:* Steven Arzt
> *Cc:* soot-list at cs.mcgill.ca
> *Betreff:* Re: [Soot-list] Missing call graph edges
>
>
>
> eliminateDeadCode() is *not* being called and I'm still running into the
> problem. Thanks in advance for your help.
>
>
>
> On Sun, Feb 8, 2015 at 5:37 PM, Peter Kim <chpkim at gmail.com> wrote:
>
> Hi Steven,
>
>
>
> I'm still running into the same problem after pulling from Github.
>
>
>
>
>
> On Fri, Feb 6, 2015 at 9:24 AM, Steven Arzt <Steven.Arzt at cased.de> wrote:
>
> Hi Peter,
>
>
>
> that might have to do with an optimization I added recently. In short,
> FlowDroid removes these callgraph edges for which it can easily decide that
> having them does not influence the outcome of the taint analysis. I can
> however fully understand that this might lead to surprising results if you
> are using the FlowDroid components for other analyses, so I decided to make
> this optimization optional and turn it off by default.
>
>
>
> The new code is on Github and a new nightly build will be available
> tomorrow.
>
>
>
> Best regards,
>
>   Steven
>
>
>
>
>
> M.Sc. M.Sc. Steven Arzt
>
> Secure Software Engineering Group (SSE)
>
> European Center for Security and Privacy by Design (EC SPRIDE)
>
> Rheinstraße 75
>
> D-64293 Darmstadt
>
> Phone: +49 61 51 869-336
>
> Fax: +49 61 51 16-72118
>
> eMail: steven.arzt at ec-spride.de
>
> Web: http://sse.ec-spride.de
>
>
>
>
>
>
>
> *Von:* soot-list-bounces at CS.McGill.CA [mailto:
> soot-list-bounces at CS.McGill.CA] *Im Auftrag von *Peter Kim
> *Gesendet:* Freitag, 6. Februar 2015 00:05
> *An:* soot-list at cs.mcgill.ca
> *Betreff:* [Soot-list] Missing call graph edges
>
>
>
> Hi,
>
>
>
> I'm extending FlowDroid to construct an Android app's call graph. More
> specifically, I get the call graph by modifying Infoflow.runAnalysis(final
> ISourceSinkManager sourcesSinks, final Set<String> additionalSeeds) to call
> Scene.v().getCallGraph(). The call graph is missing edges in an odd way -
> for a function, the graph has some outgoing edges but is missing ones that
> should be there. Namely, given the following function (shown in Java rather
> than jimple for readability), the called methods should be "get()",
> "isFinished()", "remove()", "free()", "size()", "update()", but I'm only
> getting "get()", "size()", and "remove()". I don't understand why
> "remove()" is included but "free()" is not since they are in the same basic
> block. I'm using soot.jimple.toolkits.callgraph.TransitiveTargets to
> analyze the call graph.
>
>
>
> public void update(float x) {
>
>   for (...size()..) {
>
>       get();
>
>       if (isFinished()) {
>
>         remove();
>
>         free();
>
>       }
>
>   }
>
>
>
>   if (y) {
>
>     if (x) {
>
>       for (... size()...)  get().update(x);
>
>     } else {
>
>       for (...size()...)  get().update(x);
>
>     }
>
>   }
>
> }
>
>
>
> Thank you for your help.
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Soot-list mailing list
> Soot-list at CS.McGill.CA
> https://mailman.CS.McGill.CA/mailman/listinfo/soot-list
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.CS.McGill.CA/pipermail/soot-list/attachments/20150210/fd092db0/attachment-0001.html 


More information about the Soot-list mailing list