[Soot-list] Missing call graph edges

Peter Kim chpkim at gmail.com
Tue Feb 24 15:37:49 EST 2015


I got it to work by commenting out excludedPackages.add("java.*") in Scene.
determineExcludedPackages().

On Tue, Feb 24, 2015 at 9:06 AM, Steven Arzt <Steven.Arzt at cased.de> wrote:

> Hi Peter,
>
>
>
> Do not add any transformers to the “cg” pack, that is too early. Only add
> your SceneTransformers to the “wjtp” pack which runs after the callgraph
> has been constructed and all necessary classes have already been loaded.
>
>
>
> Also make sure that you have indeed enabled whole-program mode.
>
>
>
> Best regards,
>
>   Steven
>
>
>
> *Von:* Peter Kim [mailto:chpkim at gmail.com]
> *Gesendet:* Montag, 23. Februar 2015 22:59
>
> *An:* Steven Arzt
> *Cc:* soot-list at cs.mcgill.ca
> *Betreff:* Re: [Soot-list] Missing call graph edges
>
>
>
> Hi Steven,
>
>
>
> I pointed to the fully implemented JAR files, but I'm still having the
> same problem. To see if the method implementations are actually being
> picked up by Soot, I tried to see if I can retrieve the body of
> java.util.ArrayList.get() in a SceneTransformer added to the call graph
> pack. I'm getting the following error:
>
>
>
> This operation requires resolving level BODIES but java.util.ArrayList is
> at resolving level SIGNATURES
>
> If you are extending Soot, try to add the following call before calling
> soot.Main.main(..):
>
> Scene.v().addBasicClass(java.util.ArrayList,BODIES);
>
> Otherwise, try whole-program mode (-w)
>
>
>
> I tried addBasicClass(), whole program mode, and forceResolve() as well
> against "java.util.ArrayList", but I still cannot retrieve the body. Could
> you please let me know how I can get Soot to pick up the bodies of library
> implementations?
>
>
>
> Thanks.
>
>
>
> On Mon, Feb 23, 2015 at 1:56 PM, Steven Arzt <Steven.Arzt at cased.de> wrote:
>
> Hi Peter,
>
>
>
> There’s not much to modify. Just point FlowDroid to fully implemented JAR
> files instead of specifying the platforms directory of your Android SDK
> which contains only the stub  versions.
>
>
>
> Best regards,
>
>   Steven
>
>
>
> *Von:* Peter Kim [mailto:chpkim at gmail.com]
> *Gesendet:* Donnerstag, 19. Februar 2015 19:11
>
>
> *An:* Steven Arzt
> *Cc:* soot-list at cs.mcgill.ca
> *Betreff:* Re: [Soot-list] Missing call graph edges
>
>
>
> Hi Steven,
>
>
>
> Could you please tell me how/where to change Infoflow.java or a related
> file so that full, non-stub, library jars, i.e. android.jar, rt.jar, and
> jce.jar, are picked up?
>
>
>
> Thanks
>
>
>
> On Tue, Feb 10, 2015 at 9:14 PM, Steven Arzt <Steven.Arzt at cased.de> wrote:
>
> Hi Peter,
>
>
>
> What a suitable filler for the gap should look like depends on your
> analysis problem. If you are ok with treating the library methods as black
> boxes, but want the incoming and outgoing call edges (i.e., those edges
> that cross the boundary of the library’s interface), there has been some
> work in the area. I am not aware of any readily available implementation on
> top of Soot, but for WALA there is Averroes:
>
>
>
>
> http://link.springer.com/chapter/10.1007/978-3-642-39038-8_16
>
>
>
> The basic idea behind Averroes is to take a full implementation of a
> library and throw away everything that is not required for callgraph
> construction. This gives you a very lightweight stub that you can use when
> analyzing client programs while still maintaining a full callgraph.
>
>
>
> Best regards,
>
>   Steven
>
>
>
> *Von:* soot-list-bounces at CS.McGill.CA [mailto:
> soot-list-bounces at CS.McGill.CA] *Im Auftrag von *Peter Kim
> *Gesendet:* Dienstag, 10. Februar 2015 21:13
>
>
> *An:* Steven Arzt
> *Cc:* soot-list at cs.mcgill.ca
> *Betreff:* Re: [Soot-list] Missing call graph edges
>
>
>
> Hi Steven,
>
>
>
> Thanks for your clarification. What I was really trying to ask was if
> there is an easy way to manually fill in the gaps, missing due to library
> stubs, for call graph construction using Spark for Android apps. Basically,
> a model of the Android framework like taint wrappers, but for precise
> Android call graph construction.
>
>
>
>
>
>
>
> On Tue, Feb 10, 2015 at 7:45 PM, Steven Arzt <Steven.Arzt at cased.de> wrote:
>
> Hi Peter,
>
>
>
> There still seems to be some misunderstanding about the concept of taint
> wrappers. You write that you do not want to perform taint tracking. In that
> case, the taint wrappers provided by FlowDroid will not be of much help, so
> it doesn’t even matter what you put into EasyTaintWrapperSource.txt. Again:
> Taint wrappers have nothing to do with callgraph construction. They are a
> means for the taint analysis to get along with an incomplete callgraph and
> “fill the gaps” with respect to the semantics of taint tracking.
>
>
>
> The conceptual problem I described in my last e-mail has nothing to do
> with how you obtain the callgraph in Soot either. Your callgraph will be
> incomplete. That’s what happens because of how the SPARK callgraph
> algorithm works.
>
>
>
> The methods I explained in my last e-mail are ways to deal with the
> problem. You can either analyze your apps together with full OS / library
> implementations (with all the downsides this has), or you can extend your
> client analysis to work with an incomplete callgraph which is what I
> recommend. Besides that, if you are willing to greatly sacrifice callgraph
> precision, you might also give CHA a try. CHA does not depend on allocation
> sites, so the CHA callgraph will at least be somewhat sound (given the
> usual small print). For most analyses, CHA is however not an option due to
> its really heavy over-approximation.
>
>
>
> If you chose to include the library code in your analysis, this has
> nothing to do with where you get your apps from. With “library”, I mean the
> Android platform implementation, the stuff that is installed on your phone
> ever since. By the way, “just an APK” is never sufficient. You always need
> some kind of library model; usually, you just use a very minimalistic one
> provided through the stub JAR files from the Android SDK.
>
>
>
> Best regards,
>
>   Steven
>
>
>
> *Von:* soot-list-bounces at CS.McGill.CA [mailto:
> soot-list-bounces at CS.McGill.CA] *Im Auftrag von *Peter Kim
> *Gesendet:* Dienstag, 10. Februar 2015 19:13
>
>
> *An:* Steven Arzt
> *Cc:* soot-list at cs.mcgill.ca
> *Betreff:* Re: [Soot-list] Missing call graph edges
>
>
>
> Hi Steven,
>
>
>
> Thanks for your response. I have some follow up questions:
>
>
>
> - I changed "EasyTaintWrapperSource.txt" to include "<java.util.ArrayList:
> java.lang.Object get(int)>" but it's still not working. Is this the right
> way to add a taint wrapper?
>
>
>
> - I am not interested in taint analysis. I only want a call graph for an
> Android app. Is what I am doing, i.e. extending Infoflow and getting call
> graph through Scene, the best way to get the call graph or is there a
> better way? Note that I'm getting the call graph *before* FlowDroid starts
> looking at sources/sinks.
>
>
>
> - Suppose that I'm analyzing APKs found on the web. Will these ever come
> with full library implementations or will they have just stubs? So if I
> wanted to transform/analyze library code, it seems that it would not be
> possible to do so with just an APK.
>
>
>
>
>
> On Tue, Feb 10, 2015 at 9:56 AM, Steven Arzt <Steven.Arzt at cased.de> wrote:
>
> Hi Peter,
>
>
>
> The callgraph edges have nothing to do with basic blocks. The difference
> here is that “remove” is called on the base object “objects” and “free” is
> called on “obj”. What you observe is that you have outgoing call edges for
> all calls on “objects”, but none for calls on “obj”. This usually means
> that the SPARK callgraph algorithm is unable to propagate allocation sites
> to the base object “obj”. Therefore, it cannot decide where the calls
> should go.
>
>
>
> The question now is why the allocation site propagation fails for “obj”. I
> guess you are running FlowDroid with the Android JAR files from the
> official SDK? These JAR files are only stubs, so there is no real
> implementation of “java.util.ArrayList”. All methods inside this class (and
> all other system classes) will only throw NotImplementedExceptions.
> Therefore, SPARK cannot know what the return type of “get()” would be – it
> could be everything. In such a case, SPARK does not perform an
> over-approximation using CHA, but simply leaves out the respective edges.
>
>
>
> How to get around the problem? Firstly, you could use a full
> implementation of the Android system classes and analyze them along with
> your program. The disadvantage of this approach is that you are analyzing
> tens of megabytes of system code together with an app of a few kilobytes.
> Your memory consumption will likely blow up to tens of gigabytes due to all
> the allocation site propagation inside the system libraries and you will
> have to wait a long time for your callgraph.
>
>
>
> Another idea would be to just live with the incomplete callgraph. That’s
> what FlowDroid does. We know that we don’t have call edges for some call
> sites. If we encounter such a situation during the taint propagation, we
> query a store of explicit models for library methods on how to continue
> with the taint propagation. You might want to read up on Taint Wrappers in
> the FlowDroid paper. This approach is fast and scalable, with the downside
> of having to provide these models by hand. In FlowDroid, we currently have
> a rule set that works pretty well for most of the Android API.
>
>
>
> This explanation is by the way also consistent with your observation what
> you get a call edge if you change “free” to a static method: In that case,
> you get a StaticInvokeExpr instead of a VirtualInvokeExpr and call
> resolution becomes much more trivial.
>
>
>
> Best regards,
>
>   Steven
>
>
>
>
>
> M.Sc. M.Sc. Steven Arzt
>
> Secure Software Engineering Group (SSE)
>
> European Center for Security and Privacy by Design (EC SPRIDE)
>
> Rheinstraße 75
>
> D-64293 Darmstadt
>
> Phone: +49 61 51 869-336
>
> Fax: +49 61 51 16-72118
>
> eMail: steven.arzt at ec-spride.de
>
> Web: http://sse.ec-spride.de
>
>
>
>
>
>
>
>
>
> *Von:* Peter Kim [mailto:chpkim at gmail.com]
> *Gesendet:* Montag, 9. Februar 2015 22:38
> *An:* Sam Blackshear
> *Cc:* Steven Arzt; soot-list at cs.mcgill.ca
>
>
> *Betreff:* Re: [Soot-list] Missing call graph edges
>
>
>
> Hi Sam,
>
>
>
> Yes, I changed the code, re-ran Soot, and Soot still doesn't report the
> edges.
>
>
>
>
>
> Hi Steve,
>
>
>
> Note that when I change "free()" to a static method, the edge is reported,
> but when it is an instance method, it is not reported. In light of the
> discussion with Sam, I want to make it absolutely clear that the code runs
> fine even when it is an instance method, so in my view, it seems to be a
> bug or perhaps Infoflow is constructing a call graph that is different from
> the traditional call graph, but since you told me that you changed it back
> to return a traditional call graph, I think it deserves an investigation.
>
>
>
> Thanks.
>
>
>
> On Mon, Feb 9, 2015 at 9:31 PM, Sam Blackshear <
> samuel.blackshear at colorado.edu> wrote:
>
> Call graph construction is typically flow-insensitive, so it is not
> precise enough to do the kind of reasoning you are doing in your head (i.e.
> For "objects.remove()" to be included in the call graph, "obj" cannot be
> null). If you are not familiar with the flow-insensitive call graph
> construction algorithms used by tools like Soot, this
> <http://manu.sridharan.net/files/aliasAnalysisChapter.pdf> is a good
> place to start.
>
>
>
> Now, if you changed the code in that way that you described (adding one or
> more objects to the objects list), re-ran Soot, and Soot still does not
> report the edges, that is a problem for Soot (and thus goes beyond my
> expertise :)). But for the example program you posted, Soot's result is as
> expected for a flow-insensitive call graph construction algorithm.
>
>
>
> - Sam
>
>
>
> On Mon, Feb 9, 2015 at 2:22 PM, Peter Kim <chpkim at gmail.com> wrote:
>
> Hi Sam,
>
>
>
> The code snippet is the following:
>
>
>
> BaseTweet<?> obj = objects.get(i);
>
> if (obj.isFinished() && obj.isAutoRemoveEnabled) {
>
>   objects.remove(i);
>
>   obj.free();
>
> }
>
>
>
> For "objects.remove()" to be included in the call graph, "obj" cannot be
> null. So even without any object in the list, if "objects.remove()" is
> included, then "obj.free()" should be included as well.
>
>
>
> Just to be absolutely sure though, I just ran the app with one object in
> the list and made sure that the true branch is executed. Soot still returns
> only "remove()" in the call graph. I also made sure that "free()" prints
> output (meaning it shouldn't be excluded from the call graph).
>
>
>
>
>
> On Mon, Feb 9, 2015 at 9:08 PM, Sam Blackshear <
> samuel.blackshear at colorado.edu> wrote:
>
> Peter,
>
>   What you observe is consistent with what I explained. The
> objects.remove() method is included because objects is initialized to a
> non-null ArrayList. However, obj.free() is not included because the
> analysis is smart enough to determine that there is no possible concrete
> execution in which the method obj.free() will be called (the statement
> obj.free() will throw an exception *if* it ever executes, because obj will
> always be null).
>
>
>
> - Sam
>
>
>
> On Mon, Feb 9, 2015 at 1:54 PM, Peter Kim <chpkim at gmail.com> wrote:
>
> Hi Sam,
>
>
>
> The loop iteration is executed only if there is an item in the list, so it
> shouldn't matter if no object has been added to the list. The code runs
> without exception. The problem is that of two call graph edges that are
> part of the same basic block (remove() and free()), only one call graph
> edge (remove()) is being returned, which is strange.
>
>
>
> On Mon, Feb 9, 2015 at 8:19 PM, Sam Blackshear <
> samuel.blackshear at colorado.edu> wrote:
>
> Hi Peter,
>
>   My suspicion is that the callgraph is correct here. You never add
> anything the the objects ArrayList, so whenever you try to read a BaseTweet
> object out of the list, the analysis (correctly) concludes that only null
> could be returned. If you call a method on null (like isFinished), the call
> graph (correctly) concludes that this would result in an NPE and thus does
> not add the edge. If you want to see these edges in the callgraph, extend
> your code to add something to the objects ArrayList:
>
>
>
> objects.add(new BaseTweet())
>
>
>
> When debugging your static analysis results, it's often helpful to
> concretely execute your target program and be sure that it behaves as you
> expect!
>
>
>
> - Sam
>
>
>
>
>
> On Mon, Feb 9, 2015 at 12:40 PM, Peter Kim <chpkim at gmail.com> wrote:
>
> Hi Steven,
>
>
>
> Here is a complete minimal example as an Eclipse project (just import into
> your workspace):
> https://drive.google.com/file/d/0B9KLXcAovVUHa0FuN3gzRGJETmc/view
>
>
>
> I retrieve the CFG of this app at Infoflow.runAnalysis(final
> ISourceSinkManager sourcesSinks, final Set<String> additionalSeeds),
> calling "CallGraph cg = Scene.v().getCallGraph();" right before "iCfg =
> icfgFactory.buildBiDirICFG(callgraphAlgorithm);". I use cg, not iCfg.
>
>
>
> The edges out of com.example.toyandroid.ChpkimMainActivity.chpkimUpdate()
> I get are:
>
>
>
> <java.util.ArrayList: int size()>
>
> <java.util.ArrayList: java.lang.Object get(int)>
>
> <java.util.ArrayList: java.lang.Object remove(int)>
>
>
>
> But they should be:
>
>
>
> <java.util.ArrayList: int size()>
>
> <java.util.ArrayList: java.lang.Object get(int)>
>
> <java.util.ArrayList: java.lang.Object remove(int)>
>
> <com.example.toyandroid.BaseTweet: boolean isFinished()>
>
> <com.example.toyandroid.BaseTweet: void free()>
>
> <com.example.toyandroid.BaseTweet: void update(float)>
>
>
>
> Thanks for your help.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Feb 9, 2015 at 8:40 AM, Steven Arzt <Steven.Arzt at cased.de> wrote:
>
> Hi Peter,
>
>
>
> Can you please send me a more complete minimal example with which I can
> reproduce the issue?
>
>
>
> Best regards,
>
>   Steven
>
>
>
> *Von:* soot-list-bounces at CS.McGill.CA [mailto:
> soot-list-bounces at CS.McGill.CA] *Im Auftrag von *Peter Kim
> *Gesendet:* Sonntag, 8. Februar 2015 19:05
> *An:* Steven Arzt
> *Cc:* soot-list at cs.mcgill.ca
> *Betreff:* Re: [Soot-list] Missing call graph edges
>
>
>
> eliminateDeadCode() is *not* being called and I'm still running into the
> problem. Thanks in advance for your help.
>
>
>
> On Sun, Feb 8, 2015 at 5:37 PM, Peter Kim <chpkim at gmail.com> wrote:
>
> Hi Steven,
>
>
>
> I'm still running into the same problem after pulling from Github.
>
>
>
>
>
> On Fri, Feb 6, 2015 at 9:24 AM, Steven Arzt <Steven.Arzt at cased.de> wrote:
>
> Hi Peter,
>
>
>
> that might have to do with an optimization I added recently. In short,
> FlowDroid removes these callgraph edges for which it can easily decide that
> having them does not influence the outcome of the taint analysis. I can
> however fully understand that this might lead to surprising results if you
> are using the FlowDroid components for other analyses, so I decided to make
> this optimization optional and turn it off by default.
>
>
>
> The new code is on Github and a new nightly build will be available
> tomorrow.
>
>
>
> Best regards,
>
>   Steven
>
>
>
>
>
> M.Sc. M.Sc. Steven Arzt
>
> Secure Software Engineering Group (SSE)
>
> European Center for Security and Privacy by Design (EC SPRIDE)
>
> Rheinstraße 75
>
> D-64293 Darmstadt
>
> Phone: +49 61 51 869-336
>
> Fax: +49 61 51 16-72118
>
> eMail: steven.arzt at ec-spride.de
>
> Web: http://sse.ec-spride.de
>
>
>
>
>
>
>
> *Von:* soot-list-bounces at CS.McGill.CA [mailto:
> soot-list-bounces at CS.McGill.CA] *Im Auftrag von *Peter Kim
> *Gesendet:* Freitag, 6. Februar 2015 00:05
> *An:* soot-list at cs.mcgill.ca
> *Betreff:* [Soot-list] Missing call graph edges
>
>
>
> Hi,
>
>
>
> I'm extending FlowDroid to construct an Android app's call graph. More
> specifically, I get the call graph by modifying Infoflow.runAnalysis(final
> ISourceSinkManager sourcesSinks, final Set<String> additionalSeeds) to call
> Scene.v().getCallGraph(). The call graph is missing edges in an odd way -
> for a function, the graph has some outgoing edges but is missing ones that
> should be there. Namely, given the following function (shown in Java rather
> than jimple for readability), the called methods should be "get()",
> "isFinished()", "remove()", "free()", "size()", "update()", but I'm only
> getting "get()", "size()", and "remove()". I don't understand why
> "remove()" is included but "free()" is not since they are in the same basic
> block. I'm using soot.jimple.toolkits.callgraph.TransitiveTargets to
> analyze the call graph.
>
>
>
> public void update(float x) {
>
>   for (...size()..) {
>
>       get();
>
>       if (isFinished()) {
>
>         remove();
>
>         free();
>
>       }
>
>   }
>
>
>
>   if (y) {
>
>     if (x) {
>
>       for (... size()...)  get().update(x);
>
>     } else {
>
>       for (...size()...)  get().update(x);
>
>     }
>
>   }
>
> }
>
>
>
> Thank you for your help.
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Soot-list mailing list
> Soot-list at CS.McGill.CA
> https://mailman.CS.McGill.CA/mailman/listinfo/soot-list
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.CS.McGill.CA/pipermail/soot-list/attachments/20150224/080372b1/attachment-0001.html 


More information about the Soot-list mailing list