[Soot-list] PathReconstructionMode vs PathBuildingAlgorithm

Arzt, Steven steven.arzt at sit.fraunhofer.de
Wed Apr 11 15:09:54 EDT 2018


Hi Miguel,

The path reconstructor begins at the sink and the follows the chain of predecessors towards the source. Each abstraction stores a pointer to its predecessor in the taint graph. The method you mentioned checks whether this traversal has already reached a source. A source is an abstraction that does not have a predecessor. So for the majority of taint abstractions  (i.e., all those that are not at a source), the "then" branch is the correct branch to take.

The main question now is as follows: Why does the traversal never reach an abstraction at a source, i.e., an abstraction without a predecessor? This can be a cutoff, or it could be a plain old threading issue. Do you have a minimal example that I can look at? It's crucial that the example is as small as possible. Otherwise, it's really hard to see anything here because of the large number of paths to inspect during traversal.

Best regards,
  Steven

-----Original Message-----
From: Miguel Velez <mvelezce at cs.cmu.edu> 
Sent: Wednesday, April 11, 2018 8:53 PM
To: Arzt, Steven <steven.arzt at sit.fraunhofer.de>; soot-list at cs.mcgill.ca
Subject: Re: [Soot-list] PathReconstructionMode vs PathBuildingAlgorithm

I have been debugging the path reconstruction and I think the problem lies in the following method (I check both the ContextSensitivePathBuilder and ContextInsensitivePathBuilder):

private boolean checkForSource(Abstraction abs, SourceContextAndPath scap) {
     if(abs.getPredecessor() != null)
        return false;
      ....
}

I see that source and sink pairs are added to the final result if the then branch is not executed. However, for the sinks that are missing from the final results, I see that every time this method gets executed, it executes the then branch and no result is added (i.e., all abstractions do not correspond to a source). Do you know why a sink that was reached during propagation would not have an abstraction for the source? Does it have to do with the cutoffs during path reconstruction?

Thanks,

Regards,

Miguel Velez
On 4/11/18 1:32 PM, Arzt, Steven wrote:
> Hi Miguel,
>
> The TaintPropagationResults class in an internal data structure that stores taint abstractions that arrive at sink statements. This map serves as an input the path reconstruction algorithm. While the taint propagation is running, this data structure  is expected to change whenever a new taint arrives at a sink. Note that FlowDroid propagates taints point-wise, so this happens gradually.
>
> The InfoflowResults object contains the final result, i.e., the result generated by the path reconstruction algorithm. Depending on the algorithm you chose, this result might be imprecise. Since some taint graphs can be very large and/or complex, FlowDroid uses various cutoffs during path reconstruction. In case such a cutoff it hit, the corresponding path is no longer being followed, and correspondingly, no such source-to-sink connection can be reported. In case you are missing flows, and you are very sure that abstractions have originally arrived at the respective sinks, this might be a reason. Of course, there might also be a bug in the path reconstructor.
>
> You can try to set a breakpoint in the path reconstructor and check the sinks there. In case you do not find a source for one of your sinks, you can have a look at the respective taint abstraction and try to find out why the path reconstruction fails. Setting breakpoints at SourceContextAndPath.extendPath() might also help to see whether a cutoff if triggered.
>
> Best regards,
>    Steven
>
>
> -----Original Message-----
> From: Miguel Velez <mvelezce at cs.cmu.edu>
> Sent: Wednesday, April 11, 2018 6:50 PM
> To: Arzt, Steven <steven.arzt at sit.fraunhofer.de>; 
> soot-list at cs.mcgill.ca
> Subject: Re: [Soot-list] PathReconstructionMode vs 
> PathBuildingAlgorithm
>
> Hi Steven,
>
> The explanation helped a lot. Thanks!
>
> However, I am running into problems when obtaining the results of which sources flow into which sinks. For instance, when comparing the results of the map in the TaintPropagationResults class and the map in InfoflowResults, they do not much on the number of entries. Now, based on your explanation of how taint abstractions are propagated and paths are constructed, this should be expected since, as taints get propagated, new taint abstractions are created and they are also propagated through the program.
>
> However, if I just look at the unique sinks at both of these maps, they do not match. In some cases, there are sinks missing in the results of InfoflowResults. I believe that is a bug. Even weirder is that if I run multiple times the same program with the same settings and same sources and sinks, I get different results (i.e., even more sinks are missing from the InfoflowResults map or I get all of the correct sinks). I am not sure if this behavior is similar to the issue described here:
>
> https://github.com/secure-software-engineering/soot-infoflow-android/i
> ssues/31
>
> Thanks,
>
> Regards,
>
> Miguel Velez
> On 4/11/18 12:08 PM, Arzt, Steven wrote:
>> Hi Miguel,
>>
>> The two concepts are indeed related. There are different algorithms that can be used for building source-to-sink paths from a taint graph. FlowDroid always first propagates the taint abstractions from the sources through the interprocedural control flow graph, which yields a taint graph. It then needs to find paths through this graph to connect a taint abstraction at a sink with a taint abstraction at a source through a chain of predecessors in the taint graph. There are multiple approaches to this problem, modeled as different algorithms. Some algorithms, for example, are context-sensitive, while others are not. Some are faster than others, etc.
>>
>> On the other hand, there are also choices you can make regardless of the chosen algorithm. The details are complex, but in case you do not need a completely precise paths, you can abstract away from a few things along the path. You will still get the correct source-to-sink connection, but you be ok with losing a few statements on the path for improving your performance. That's what the modes are about.
>>
>> Unless you run into problems, I'd suggest to leave both settings alone and see whether it's already useful what you get. If not, you most commonly want a different algorithm. Only under rare circumstances, you need a different mode. In fact, the modes were implement because of a very specific problem one of my student has encountered in his use of the tool for his thesis.
>>
>> Best regards,
>>     Steven
>>
>> -----Original Message-----
>> From: Soot-list <soot-list-bounces at cs.mcgill.ca> On Behalf Of Miguel 
>> Velez
>> Sent: Wednesday, April 11, 2018 2:42 PM
>> To: soot-list at cs.mcgill.ca
>> Subject: [Soot-list] PathReconstructionMode vs PathBuildingAlgorithm
>>
>> Hello,
>>
>> I am unable to understand what is the difference between PathReconstructionMode and PathBuildingAlgorithm. There has to be some difference between them, but "reconstructing" or "building" a path seem similar to me. Are they somehow related to one another? Do they interact in the results that we obtain from the analysis? Do the individual settings affect the results (i.e., would the settings change what sources flow into what sinks)?
>>
>> Thanks,
>>
>> --
>> Regards,
>>
>> Miguel Velez
>> _______________________________________________
>> Soot-list mailing list
>> Soot-list at CS.McGill.CA
>> https://mailman.CS.McGill.CA/mailman/listinfo/soot-list



More information about the Soot-list mailing list