Last active
February 28, 2019 22:36
-
-
Save dvanhorn/8909f5ad8d44a4d1a5eabe6848c1e6ae to your computer and use it in GitHub Desktop.
Feedback on Google proposal
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Specifying keep rules for your own code is hard, but the really | |
hard problem is specifying keep rules for the reflection in a | |
library that you did not write yourself. Understanding which keep | |
rules are necessary (and why) when you are not the author of the | |
code in question is highly non-trivial. We know that the majority | |
of Android applications are shipped without shrinking and | |
optimization, probably due to the difficulty in getting rules | |
right. To me, one of the most compelling possible outcomes of this | |
research would be a tool that does a safe over-approximation of | |
the keep rules necessary so that users have an easy way to get | |
started using Proguard or R8. Of course that safe over-approximation | |
should be as tight as possible and not just be the trivial keep | |
everything (!) over-approximation. Since the optimal keep rules | |
for a program is an interesting problem to solve it is also | |
undecidable (as with any interesting static analysis), therefore, | |
improving the approximation could become a year long interesting | |
research topic. For the evaluation, it will be important to find | |
real-life examples of keep rules for actual shipping products that | |
look reasonable. Keep rules have a bunch of modifiers and the | |
specification language is fairly rich. Therefore, finding reasonable | |
benchmarks for the evaluation could be non-trivial (how do you | |
evaluate how good a set of baseline keep rules are?) and will be | |
important to get a realistic picture of the results. Although R8 | |
is generating an optimized _dex_ code for now, it aims to be a | |
general Java bytecode optimizer, and soon generates an optimized | |
Java bytecode too. That is, inside R8 (and Proguard too I believe), | |
we're not using any sorts of modeling or assumptions regarding | |
Android. Moreover, modeling Android framework and/or runtime | |
itself is a huge, challenging research topic that has been tried | |
but never finished nor achieved in a production level. Thus, I | |
would recommend PI to narrow down the scope to Java reflections | |
only. - I like the clear statement of a fairly narrow problem that | |
will have impact for existing applications. I think the authors did | |
very well to include examples of where a failure of entry point | |
inference has caused real, publicly visible bugs. - Though I quite | |
like the problem presented, I felt there was a significant jump | |
between the problem as stated and the idea to solve it using symbolic | |
execution. There are two specific types of clarifications I would | |
have liked to see in the proposal. a. It seems like a lot of | |
conceptual work to have symbolic execution work for event-driven | |
programs. Given that this hasn't been done before, and requires | |
modelling the event queue, and several other semantic details | |
related to event handlers, I feel it is not a small task. That's | |
just the conceptual aspect. The Android-specific implementation | |
will add more difficulties. I looked at Reference [2] which is | |
recent work on inference of libraries for symbolic execution and | |
I got some idea of how that might work. Would the idea be to | |
reuse an existing implementation to model the entire Android | |
event handling system? b. I wonder why symbolic execution is | |
the approach to solve this problem. I am missing an intuition | |
as to why it makes sense. For providing sound input to optimization | |
systems, I would expect something more like a whole program analysis. | |
Symbolic execution is not guaranteed to cover all behaviours and | |
unless the authors plan to include some other procedure for invariant | |
inference that will either be heavyweight (quantifier elimination or | |
interpolation) or heuristic, in which case incomplete, this approach | |
will produce false negatives and not actually solve the main problem | |
being addressed. c. I'm somewhat skeptical that the main challenge | |
for having this work _in practice_ is a rich theory for reasoning | |
about strings. I think it would be useful to distinguish between | |
idiomatic, benign uses of reflection and adversarial ones, where | |
reflection is used for obfuscation. In the idiomatic cases, I | |
wonder if it would be enough to use value-propagation in some | |
distributive analysis framework. For the adversarial cases, I | |
can imagine that string reasoning is required. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment