leonardpauli · October 16, 2019 11:23
diff --git a/rim.extreme-data-domain-separation.15oct2019 b/rim.extreme-data-domain-separation.15oct2019
 # rim.extreme-data-domain-separation
 ' created by Leoanard Pauli, 15 oct 2019

 # rim extreme data separation model
 ' A data domain separation model, for risk free use of untrusted code

 Background:
 - Software systems can be modeled as a stateless black box, with state stored in a companion "memory box", and with input and output data streams. (TODO: digital design mealy/measly machine?)
 - Large systems/boxes may be refactored into smaller boxes/systems connected to each other.
 - Initially, all data streams may be seen as arbitrary, and thus, without validation, might lead to "unwanted behaviour".
 - When refactored into small enough boxes, the wanted behaviour of a box may be precisely described (eg. a specific sorting function).
 - To ensure wanted behaviour, the input data has to be constrained to the expected type.
 - This may be done by adding an intermediate box before the input to the main box, reassuring that only expected data will get in.
 - Creating a graph/diagram of the boxes (nodes) and their respective data flows (edges), allows for, eg., data flow analasys.
 - (Using a declarative (vs imperitative) description of a system might make this box refactoring and graph creation easier.)
 - Each external (to the system as a whole) input stream may have different classification domains (eg. sensitive password data from one app should not be read from another by default)
 - Same for each external output stream (eg. monitor, memory, network)
 - Using the data flow graph, path's from one external input domain to one external output domain may be found
 ---
 Theoretically, assuming:
 - each box is it's own air-gapped system
 - each box's input/output streams going through the intermediate (also airgapped) validator box (eg. with one directional optical connection)
 - the validator box to work as wanted (eg. a formally proved FPGA implementation (thus always taking the same time, with constrained value set), shielded from external effects)
 	- (data security: if an input takes different time/size, that meta information may be partly used to deduce information)
 - the actual boxes may contain untrusted logic.
 Then:
 - Before integrating a box in a larger system, the relevant subsecvent subgraph of data flow may be derivied
 - To allow the box to be integrated, the path to relevant external input/output domains must be permitted
 	eg. memory with sensitive password (input domain)
 		-> password manager
 		-> to (output domain) screen memory
 		-> (then as input domain) to the box
 			(marketed as an keyboard-shortcut based automation tool, with the ability to eg. move the cursor to a button on the screen (using image recognition) and press it on shortcut press)
 		-> to (output domain) mouse/keyboard
 		-> (then as input domain) to movie application
 		-> to (output domain) memory
 		-> (then as input domain) to movie-uploader application
 		-> to (output domain) internet
 	- In this example, the immediate external domains are screen memory -> mouse/keyboard, which might not seem dangerous at first glance, though analysis of the full system shows that it may be used to transmit a sensitive password to a malicious reciever (memory with sensitive password -> ... -> internet)
 	- This may be solved by splitting the screen memory into domain-specific screen memories (layers) + a compositing box
 		- Then, adding another compositing box excluding paths from the "memory with sensitive password input domain", allows us to still use the untrusted box without risk of leaking the password (the password would then show up on the screen, but not be visible (eg. a fixed-width/password padded rectangle) to the automation software/box)
 Thus:
 - allowing us to use untrusted code and input without any risk of leaking classified data, except when explicitly permitted, and without limitations of end-user functionallity
 ---
 In practise:
 - Such a solution would be expensive hardware-wise
 - There might be bugs in the validators
 - Human developers might be tempted to give an untrusted module higher permissions then necessary
 	- Might be mostly mitigated by clear permission user interface, permission reviews, and...
 	- ...dynamic domain-specific path generation for (especially external) input/output streams
 		eg. in example above, using a compositing box before display output, will:
 			- allow all modules to read/write to all external inputs/outputs...
 			- ...including data from domains they have permission to
 			- ...without leaking data from domains they don't
 			- ...though possibly leaking that there is data it does not have access to
 				(eg. automation box in example above might detect the gray box)
 On common hardware:
 - there is no air-gap separation model
 	-> different levels of untrusted logic must share hardware
 - the hardware can not be fully trusted
 	- no formal proofs widely available
 	- not even hardware architecture in many cases
 		- though riscv (?) looks promising (open architecture?)
 	- new hardware bugs used for software attacks and/or data leaks discovered continuously
 		(eg. memory sidechannel attack, time based attacks, etc)
 	- possibly mitigated by transforming untrusted logic to only use limited instruction set in a limited way (eg. with bounds checks)
 		- such that, based on some assumtions about the hardware, a formal proof could be made
 			- showing that the instructions, on a hardware where the assumtions hold, also assuming the corresponding "software validator box" works, the behaviour will be equivivalent to the theoretical extream air-gapped system
 -> thus reducing the attack surface area to the hardware assumptions and humans giving higher permissions than necessary


 My goal is to build a declarative language + execution environment making it possible to create systems with this data flow analysis + dynamic domain splitting and logic to instructions generation.
	# rim.extreme-data-domain-separation
	' created by Leoanard Pauli, 15 oct 2019

	# rim extreme data separation model
	' A data domain separation model, for risk free use of untrusted code

	Background:
	- Software systems can be modeled as a stateless black box, with state stored in a companion "memory box", and with input and output data streams. (TODO: digital design mealy/measly machine?)
	- Large systems/boxes may be refactored into smaller boxes/systems connected to each other.
	- Initially, all data streams may be seen as arbitrary, and thus, without validation, might lead to "unwanted behaviour".
	- When refactored into small enough boxes, the wanted behaviour of a box may be precisely described (eg. a specific sorting function).
	- To ensure wanted behaviour, the input data has to be constrained to the expected type.
	- This may be done by adding an intermediate box before the input to the main box, reassuring that only expected data will get in.
	- Creating a graph/diagram of the boxes (nodes) and their respective data flows (edges), allows for, eg., data flow analasys.
	- (Using a declarative (vs imperitative) description of a system might make this box refactoring and graph creation easier.)
	- Each external (to the system as a whole) input stream may have different classification domains (eg. sensitive password data from one app should not be read from another by default)
	- Same for each external output stream (eg. monitor, memory, network)
	- Using the data flow graph, path's from one external input domain to one external output domain may be found
	---
	Theoretically, assuming:
	- each box is it's own air-gapped system
	- each box's input/output streams going through the intermediate (also airgapped) validator box (eg. with one directional optical connection)
	- the validator box to work as wanted (eg. a formally proved FPGA implementation (thus always taking the same time, with constrained value set), shielded from external effects)
	- (data security: if an input takes different time/size, that meta information may be partly used to deduce information)
	- the actual boxes may contain untrusted logic.
	Then:
	- Before integrating a box in a larger system, the relevant subsecvent subgraph of data flow may be derivied
	- To allow the box to be integrated, the path to relevant external input/output domains must be permitted
	eg. memory with sensitive password (input domain)
	-> password manager
	-> to (output domain) screen memory
	-> (then as input domain) to the box
	(marketed as an keyboard-shortcut based automation tool, with the ability to eg. move the cursor to a button on the screen (using image recognition) and press it on shortcut press)
	-> to (output domain) mouse/keyboard
	-> (then as input domain) to movie application
	-> to (output domain) memory
	-> (then as input domain) to movie-uploader application
	-> to (output domain) internet
	- In this example, the immediate external domains are screen memory -> mouse/keyboard, which might not seem dangerous at first glance, though analysis of the full system shows that it may be used to transmit a sensitive password to a malicious reciever (memory with sensitive password -> ... -> internet)
	- This may be solved by splitting the screen memory into domain-specific screen memories (layers) + a compositing box
	- Then, adding another compositing box excluding paths from the "memory with sensitive password input domain", allows us to still use the untrusted box without risk of leaking the password (the password would then show up on the screen, but not be visible (eg. a fixed-width/password padded rectangle) to the automation software/box)
	Thus:
	- allowing us to use untrusted code and input without any risk of leaking classified data, except when explicitly permitted, and without limitations of end-user functionallity
	---
	In practise:
	- Such a solution would be expensive hardware-wise
	- There might be bugs in the validators
	- Human developers might be tempted to give an untrusted module higher permissions then necessary
	- Might be mostly mitigated by clear permission user interface, permission reviews, and...
	- ...dynamic domain-specific path generation for (especially external) input/output streams
	eg. in example above, using a compositing box before display output, will:
	- allow all modules to read/write to all external inputs/outputs...
	- ...including data from domains they have permission to
	- ...without leaking data from domains they don't
	- ...though possibly leaking that there is data it does not have access to
	(eg. automation box in example above might detect the gray box)
	On common hardware:
	- there is no air-gap separation model
	-> different levels of untrusted logic must share hardware
	- the hardware can not be fully trusted
	- no formal proofs widely available
	- not even hardware architecture in many cases
	- though riscv (?) looks promising (open architecture?)
	- new hardware bugs used for software attacks and/or data leaks discovered continuously
	(eg. memory sidechannel attack, time based attacks, etc)
	- possibly mitigated by transforming untrusted logic to only use limited instruction set in a limited way (eg. with bounds checks)
	- such that, based on some assumtions about the hardware, a formal proof could be made
	- showing that the instructions, on a hardware where the assumtions hold, also assuming the corresponding "software validator box" works, the behaviour will be equivivalent to the theoretical extream air-gapped system
	-> thus reducing the attack surface area to the hardware assumptions and humans giving higher permissions than necessary


	My goal is to build a declarative language + execution environment making it possible to create systems with this data flow analysis + dynamic domain splitting and logic to instructions generation.