FlorianHeigl · June 29, 2017 21:14
diff --git a/testing layers b/testing layers
 --------------------------------------
 ||| Test return to stable state
 --------------------------------------
 ||| External Fault injection tests
 --------------------------------------
 ||| System health
 --------------------------------------
             \========/
 Below:        \      /
   "insufficient tests for OS code
    (nice to have, no strong meaning)"
                 \/
 --------------------------------------
 || System testing
 --------------------------------------
 || Data tests / Fault injection tests
 --------------------------------------
             \======/
 Below:        \    /
   "can't trust this to run safely"
                \/
 --------------------------------------
 | Unit tests
 --------------------------------------
 | Coding standards
 --------------------------------------
 | "it compiles"
 --------------------------------------

 Design and error handling (fix, continue, abort)
 how we act has to match critical code paths and managed objects
 single transaction? not? step by step?
 => do stop a san rescan on a single host if first path doesn't come back
 => don't stop a core router reconfiguration on error if the rest of the config might fix the issue!
 what's the desired outcome?
 partial reach of desired outcome possible?
 => don't stop starting VMs because the first failed!

 on error in adding supporting elements
 abort what costs redundancy
 retry what could gain redundancy
 continue on error if it could limit scope of an issue
 continue on error if it could solve the issue
 don't block on retrying a redundant component if the consumers are unavailable
	--------------------------------------
	\|\|\| Test return to stable state
	--------------------------------------
	\|\|\| External Fault injection tests
	--------------------------------------
	\|\|\| System health
	--------------------------------------
	\========/
	Below: \ /
	"insufficient tests for OS code
	(nice to have, no strong meaning)"
	\/
	--------------------------------------
	\|\| System testing
	--------------------------------------
	\|\| Data tests / Fault injection tests
	--------------------------------------
	\======/
	Below: \ /
	"can't trust this to run safely"
	\/
	--------------------------------------
	\| Unit tests
	--------------------------------------
	\| Coding standards
	--------------------------------------
	\| "it compiles"
	--------------------------------------

	Design and error handling (fix, continue, abort)
	how we act has to match critical code paths and managed objects
	single transaction? not? step by step?
	=> do stop a san rescan on a single host if first path doesn't come back
	=> don't stop a core router reconfiguration on error if the rest of the config might fix the issue!
	what's the desired outcome?
	partial reach of desired outcome possible?
	=> don't stop starting VMs because the first failed!

	on error in adding supporting elements
	abort what costs redundancy
	retry what could gain redundancy
	continue on error if it could limit scope of an issue
	continue on error if it could solve the issue
	don't block on retrying a redundant component if the consumers are unavailable
No results found