dvanhorn · February 28, 2019 22:36
diff --git a/feedback.txt b/feedback.txt
 Specifying keep rules for your own code is hard, but the really
 hard problem is specifying keep rules for the reflection in a 
 library that you did not write yourself. Understanding which keep
 rules are necessary (and why) when you are not the author of the
 code in question is highly non-trivial. We know that the majority
 of Android applications are shipped without shrinking and 
 optimization, probably due to the difficulty in getting rules 
 right. To me, one of the most compelling possible outcomes of this
 research would be a tool that does a safe over-approximation of 
 the keep rules necessary so that users have an easy way to get 
 started using Proguard or R8. Of course that safe over-approximation 
 should be as tight as possible and not just be the trivial keep 
 everything (!) over-approximation. Since the optimal keep rules 
 for a program is an interesting problem to solve it is also 
 undecidable (as with any interesting static analysis), therefore, 
 improving the approximation could become a year long interesting 
 research topic. For the evaluation, it will be important to find 
 real-life examples of keep rules for actual shipping products that
 look reasonable. Keep rules have a bunch of modifiers and the 
 specification language is fairly rich. Therefore, finding reasonable 
 benchmarks for the evaluation could be non-trivial (how do you 
 evaluate how good a set of baseline keep rules are?) and will be 
 important to get a realistic picture of the results. Although R8 
 is generating an optimized _dex_ code for now, it aims to be a 
 general Java bytecode optimizer, and soon generates an optimized 
 Java bytecode too. That is, inside R8 (and Proguard too I believe), 
 we're not using any sorts of modeling or assumptions regarding 
 Android. Moreover, modeling Android framework and/or runtime 
 itself is a huge, challenging research topic that has been tried 
 but never finished nor achieved in a production level. Thus, I 
 would recommend PI to narrow down the scope to Java reflections 
 only. - I like the clear statement of a fairly narrow problem that
 will have impact for existing applications. I think the authors did 
 very well to include examples of where a failure of entry point 
 inference has caused real, publicly visible bugs. - Though I quite 
 like the problem presented, I felt there was a significant jump 
 between the problem as stated and the idea to solve it using symbolic 
 execution. There are two specific types of clarifications I would 
 have liked to see in the proposal. a. It seems like a lot of 
 conceptual work to have symbolic execution work for event-driven 
 programs. Given that this hasn't been done before, and requires 
 modelling the event queue, and several other semantic details 
 related to event handlers, I feel it is not a small task. That's 
 just the conceptual aspect. The Android-specific implementation 
 will add more difficulties. I looked at Reference [2] which is 
 recent work on inference of libraries for symbolic execution and 
 I got some idea of how that might work. Would the idea be to 
 reuse an existing implementation to model the entire Android 
 event handling system? b. I wonder why symbolic execution is 
 the approach to solve this problem. I am missing an intuition 
 as to why it makes sense. For providing sound input to optimization 
 systems, I would expect something more like a whole program analysis. 
 Symbolic execution is not guaranteed to cover all behaviours and 
 unless the authors plan to include some other procedure for invariant 
 inference that will either be heavyweight (quantifier elimination or 
 interpolation) or heuristic, in which case incomplete, this approach 
 will produce false negatives and not actually solve the main problem 
 being addressed. c. I'm somewhat skeptical that the main challenge 
 for having this work _in practice_ is a rich theory for reasoning 
 about strings. I think it would be useful to distinguish between 
 idiomatic, benign uses of reflection and adversarial ones, where 
 reflection is used for obfuscation. In the idiomatic cases, I 
 wonder if it would be enough to use value-propagation in some 
 distributive analysis framework. For the adversarial cases, I 
 can imagine that string reasoning is required.
	Specifying keep rules for your own code is hard, but the really
	hard problem is specifying keep rules for the reflection in a
	library that you did not write yourself. Understanding which keep
	rules are necessary (and why) when you are not the author of the
	code in question is highly non-trivial. We know that the majority
	of Android applications are shipped without shrinking and
	optimization, probably due to the difficulty in getting rules
	right. To me, one of the most compelling possible outcomes of this
	research would be a tool that does a safe over-approximation of
	the keep rules necessary so that users have an easy way to get
	started using Proguard or R8. Of course that safe over-approximation
	should be as tight as possible and not just be the trivial keep
	everything (!) over-approximation. Since the optimal keep rules
	for a program is an interesting problem to solve it is also
	undecidable (as with any interesting static analysis), therefore,
	improving the approximation could become a year long interesting
	research topic. For the evaluation, it will be important to find
	real-life examples of keep rules for actual shipping products that
	look reasonable. Keep rules have a bunch of modifiers and the
	specification language is fairly rich. Therefore, finding reasonable
	benchmarks for the evaluation could be non-trivial (how do you
	evaluate how good a set of baseline keep rules are?) and will be
	important to get a realistic picture of the results. Although R8
	is generating an optimized _dex_ code for now, it aims to be a
	general Java bytecode optimizer, and soon generates an optimized
	Java bytecode too. That is, inside R8 (and Proguard too I believe),
	we're not using any sorts of modeling or assumptions regarding
	Android. Moreover, modeling Android framework and/or runtime
	itself is a huge, challenging research topic that has been tried
	but never finished nor achieved in a production level. Thus, I
	would recommend PI to narrow down the scope to Java reflections
	only. - I like the clear statement of a fairly narrow problem that
	will have impact for existing applications. I think the authors did
	very well to include examples of where a failure of entry point
	inference has caused real, publicly visible bugs. - Though I quite
	like the problem presented, I felt there was a significant jump
	between the problem as stated and the idea to solve it using symbolic
	execution. There are two specific types of clarifications I would
	have liked to see in the proposal. a. It seems like a lot of
	conceptual work to have symbolic execution work for event-driven
	programs. Given that this hasn't been done before, and requires
	modelling the event queue, and several other semantic details
	related to event handlers, I feel it is not a small task. That's
	just the conceptual aspect. The Android-specific implementation
	will add more difficulties. I looked at Reference [2] which is
	recent work on inference of libraries for symbolic execution and
	I got some idea of how that might work. Would the idea be to
	reuse an existing implementation to model the entire Android
	event handling system? b. I wonder why symbolic execution is
	the approach to solve this problem. I am missing an intuition
	as to why it makes sense. For providing sound input to optimization
	systems, I would expect something more like a whole program analysis.
	Symbolic execution is not guaranteed to cover all behaviours and
	unless the authors plan to include some other procedure for invariant
	inference that will either be heavyweight (quantifier elimination or
	interpolation) or heuristic, in which case incomplete, this approach
	will produce false negatives and not actually solve the main problem
	being addressed. c. I'm somewhat skeptical that the main challenge
	for having this work _in practice_ is a rich theory for reasoning
	about strings. I think it would be useful to distinguish between
	idiomatic, benign uses of reflection and adversarial ones, where
	reflection is used for obfuscation. In the idiomatic cases, I
	wonder if it would be enough to use value-propagation in some
	distributive analysis framework. For the adversarial cases, I
	can imagine that string reasoning is required.