PJK · October 2, 2019 12:44
diff --git a/_readme.md b/_readme.md
diff --git a/questions.tex b/questions.tex
 \documentclass[a4paper,10pt,titlepage]{article} \usepackage[utf8]{inputenc}
 \usepackage{a4wide}
 \usepackage[small,compact]{titlesec}

 \usepackage{graphicx}
 \usepackage{amsthm}
 \usepackage{amsmath}
 \usepackage{amsfonts}
 \usepackage{amssymb}
 \usepackage{csquotes}
 \usepackage{hyperref}
 \usepackage{pdfpages}

 \newcommand{\task}[1]{ \framebox{#1}}
 \newcommand{\q}[0]{ \textbf{??}  }
 \usepackage[inline]{enumitem}
 \newcommand{\s}[1]{\includegraphics[page=#1,width=\textwidth]{slides}}

 \begin{document} \pagestyle{empty}	
 \section*{Syscon fall 15 questions}
 Based on \url{http://lec.inf.ethz.ch/syscon/2015/}, \url{http://www-oldurls.inf.ethz.ch/personal/felixf/WorkshopTajpeh/SystemsOnChipWorkshopFriedrich.pdf}
 \url{http://e-collection.library.ethz.ch/eserv/eth:26082/eth-26082-02.pdf}
 \begin{enumerate}
 \item Some Properties of the ARM instruction set? What is Thumb? Why is it implemented? Can Thumb and ARM work together? How?

 32b RISC, 32 or mixed 16/32 instruction encoding in Thumb, register-register style. Biendian arithmetics, little-e by default. Supports modes (IRQ, user, privileged, etc). Load-store architecture (separate instructions for memory access (unlike e.g. x86)), generally 3 operands. Index optimized instructions, e.g.  \href{http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/BABCAEDD.html}{STMDB - store multiple register to address (starting from the operand), decreasing before or after each access, and optionally write the final address back to register}. PC-relative addressing (provide offset -- useful for e.g. constants). Link register: register dedicated to storing return address after subroutine. Coprocessor access instructions.

 Thumb and Thumb2 IS: 16b IS with 2 operands, instructions are less powerful, goal of increasing code density (decreased memory cost per `functionality'). Strict subset of ARM IS. Implemented using a dedicated decoder that expands the instructions to the 32b equivalents (code can be mixed). The CPU can be switched to and from \emph{Thumb state}.


 %============================================================ %

 \item Why does a procedure that shall be used for an interrupt have to be flagged as interrupt procedure? On Minos / Arm an additional parameter has to be supplied, why?

 Because it has a different calling convention (CPU mode has to be changed and later restored, so return cannot be a simple branch/return \url{https://www.inf.ethz.ch/personal/wirth/Oberon/Interrupts.pdf}). This is used to correctly `install' the procedure to interrupt table (aka \href{http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211h/Babfeega.html}{exception vector} on ARM). This is then used to set the appropriate CPU mode (e.g. FIQ/IRQ) and invoke the correct handler.

 %============================================================ %

 \item How do you bring a boot file to the hardware? Alternatives?

 \q Either on e.g. SD card, or via some other (generally writable) memory, possibly internal to the device (e.g. EEPROM). This might require some HW support (UART, JTAG).

 %============================================================ %

 \item Assume you have a programming language that is claimed to be safe. What kind of functionality has to be supported in OS development that is unsafe?

 Direct access to memory (to handle memory mapped IO and setup e.g. interrupt vector), special calling convention for interrupt procedures and global initialization,  unsafe type casts (e.g. input devices `bytes' to representation), register manipulation (link register, PC), alignment and linking support (giving fixed location of fragment), inline assembly, untyped/unchecked pointers and manual memory management.

 %============================================================ %

 \item Properties of the ARM architecture particularly suitable for operating systems?. What is a FIQ / Fast Interrupt on ARM?

 Modes that can be switched through interrupts (even external), dedicated MMU. FIQ is a higher priority interrupt that can occur even during other interrupts' handling and cannot itself be interrupted. It has it's dedicated set of registers that can be used in the handler. (Jargon: 6 banked registers, nonmaskable). It can be used to enforce priority handling of critical (either performance or safety) events.

 %============================================================ %

 \item What does it mean to activate a command in an environment with dynamic (module) loading, what happens when a command is activated? What kind of information is necessary during loading and what kind of information for module compilation? How to check consistency at load time?

 (In A2) Module are either statically linked or dynamically loaded. If the module is not loaded yet, it will be loaded (transitively if required) and its body will be executed.

 Compilation: modules are compiled to (i) symbol files (interface) and (ii) object files. Objects are fingerprinted -- when linking a dependent module, the fprint can be checked to ensure version match.

 Linking (s. 70) -- compilation \& linking of dependencies, fixups (resolution of symbolic imports to actual addresses), \q execution of body.

 %============================================================ %

 \item What does an object file in a system with dynamic loading consist of (dependency information: imports / exports, entries to be fixuped, code and data) What is a fixup?

 Note: Fixup = relocation and relocation table. I don't understand the choice.

 Header (name, key, selffixup), Imports (name, key, fixup root), \q commands, \q entries, size info, code and data.

 Fixup -- replacement of symbolic references (stored in imports table) with actual addresses based on the data from the other modules (stored in entry table).


 %============================================================ %

 \item How is a multi-processor system typically booted?

 A master processor initiates the boot sequence and only engages other cores later. Boot steps:

 \begin{enumerate}
 	\item HW init
 	\item Boot image copied to RAM
 	\item OS initializer started (usually a conventional address)
 	\item  Set stack registers for all processor modes  (handlers of interrupts have separate stacks)
 	\item Setup free heap list and module list
 	\item Initialize MMU \& page table
 	\item Setup interrupt handlers \& runtime vectors
 	\item  Start timer \& enable interrupts
 	\item Initialize other runtime data structures 
 	\item Initialize UARTs
 	\item Initialize RAM disk
 	\item Enter scheduling loop on OS
 \end{enumerate}

 (s 207): Boot processor:

 \begin{enumerate}
 	\item Setup boot program
 	\item Enter processor IDs into table
 \item Send startup message to P via APIC
 \item Wait with timeout on started flag by P
 \end{enumerate}

 All processor:

 \begin{enumerate}
 \item Set 32-bit runtime environment
 \item Initialize control registers, memory management, interrupt handling, APIC 
 \item Set started flag
 \item  Setup Scheduler
 \end{enumerate}

 \item Describe the advantages of the Active Cells model in comparison to more “conventional” approaches like using General Purpose Architectures or resorting to High Level Synthesis

 Modularity and composability, predictable performance (no resource sharing between processors, no time sharing), safety and predictability guarantees (no thread synchronization, no memory sharing, no interrupts), scales massively and predictably, simplicity of cells, easy to create bespoke systems, no operating system, implicit data parallelism, low energy consumption. SW-HW codesign, ability to simply use special components (`engines'). Think actors each running on its own machine, with a reliable interconnect.

 %============================================================ %

 \item Describe the hybrid compilation approach of Active Cells.

 The system is described in a HLL. Cells correspond to TRMs or similar and their software, cellnets are hardware `wirings' of these. Cells get compiled to IC and binaries, cellnets/code containing `hardware implementation' gets compiled to HDL. The binaries can be easily \emph{patched} into the bitstream, whereas changes in the HDL require full synthesis. The idea is that the former happens much more often.

 \item Sketch the process state diagram for processes in A2.

 (They are called \emph{activities}. IMHO the important think to notice is that the scheduler is aware of the synchronization protocol.)

 \s{209}

 %============================================================ %

 \item Where are the data structures for a scheduler typically stored?

 \q In kernel's heap.

 \s{210}

 %============================================================ %

 \item Where are data structures for monitors stored?

 \q In object headers.
 %============================================================ %


 \item Classical memory layout of a uniprocessor system? Contrast to Memory layout of a multiprocessor system (A2)?

 \s{86}

 Multiprocessor: We need need multiple active stacks at the same time. We allocate the stacks by pages on the \emph{page heap}. That way virtual addresses for stacks remain contiguous and constant even as the stack grows and shrinks (\emph{virtual stacks}. Implication: DMA buffers spanning more than one page cannot be stack-allocated. Implication: The address range is known in advance and is limited. Cf. stack reallocation on overflow.

 \s{215}

 %============================================================ %


 \item Why is it in general necessary to have one stack per process? How is it achieved to schedule tasks in a preemptive way by using only one stack? Preconditions?

 One stack: Higher priority tasks run to completion. Lower priority task might get preempted when creating a new stackframe. Then then preempting activity leaves the stack as it was (because the original one won't get scheduled until the higher one finished, i.e. emptied the stack).

 If scheduling doesn't behave like this, then it's necessary to have a stack per thread to ensure that the previous stackframe belongs to us (or we would have to handle stack frame `interleaving', but then one would have to keep track of the fragmented space). 


 %============================================================ %


 \item Difference between spinlocks and scheduling based locks? Rough (!) sketch of implementations in A2.

 \q (Assume that by scheduling based locks we mean monitors)
 Spinlocks keep the thread active (busywait/spinning).  Monitors allow for conditional signaling and are handled by the scheduler (thread yields/awaits). Common pitfall: signaling without locking to ensure actual mutual exclusion leads to races. Common pitfall: interruption handled by exceptions.

 Implementation: Object headers contain: \verb|lockedBy, awaitingLock, awaitingCondition|. These are used to keep track of who waits for whom. Upon unlocking/signalling, the scheduler goes through the list of blocked/awaiting threads/objects and moves them to Ready state.

 %============================================================ %


 \item How do you protect data structures against concurrent access in a multi-processor system? Simpler way in single-processor system?

 Locking! If there's no preemption (cooperative scheduling or certain `critical' regions can be transitively protected from preemption), then we don't have to worry about concurrent modification.

 \q Is this it?

 %============================================================ %


 \item Programs shall be written for a target operating system that is not on the host machine. How can this be done without always rebuilding the system / boot image).

 \emph{Crosscompilation}. In a normal system, getting the crosscompiled binary to the target is enough (UART). In Oberon, there are no programs as such and the process is supported by dynamic module loading.

 %============================================================ %


 \item How can a system be flashed without direct connection to the hardware via JTAG or the like? (Flash-System in pre-boot environment)

 \q Boot firmware (RPI: running from VC) copies the boot image from any supported storage (RPI: SD) to RAM.

 In general, the hardware may be such that it looks for the boot image elsewhere (soft boot ROM). Then it will likely support something similar.

 Sidenote: Either I'm completely missing the point of these flashing questions or they're quite silly (?)


 %============================================================ %


 \item Do you remember why the first page of our system was unmapped? Pitfall?

 To trap \verb|NIL| pointers \cite[p 40]{aos}. \q Doesn't help with other invalid references.

 %============================================================ %


 \item Sketch a classical memory layout of a system. Why unmapped page between top and bottom in Minos Memory Layout?

 See above.

 First page: Trapping \verb|NIL|. First 1MB: conventional (graphics memory). Stack page: trapping stack overflows.

 %============================================================ %


 \item How can dynamic stack allocation be implemented?

 Reallocation (move and grow stack elsewhere), pitfall: all pointers have to be updated. Virtual stack: reserve address space and gradually allocate pages, pitfall: not contiguous, limited space, addresses are virtual.

 %============================================================ %


 \item What is a type descriptor and what is it used for? How to implement a type check / type guard? Sketch a type descriptor lookup.



 If we have a type system with subtyping, the actual runtime type may not be known. Implementing guards (glorified \verb|typeof|/dynamic dispatch) requires knowing the actual type. The instances have tags pointing to the decriptor strucutres (one per type).

 Oberon: links to all members of subtree within the descriptor. Warning: I believe the first type guard should compile to \verb|  CMP t.tag.ext[0], adr(typedesc T11)| because we are asking for the lowest descendant. The version on the slide is definitely wrong as it evaluates to false.

 \s{78}

 %============================================================ %



 \item Describe a mechanism to check if a system is still alive / to check if a certain process does something "often enough". What if the hardware does not provide this functionality?

 APIC (Advanced Programmable Interrupt Controller) can provide periodical interrupts and the OS can react accordingly. Typically the interval is the LCM of all periods, scheduler preempts lower priority/background tasks accordingly. APIC \emph{local interrupts} can interrupt a particular core.

 ARM \href{http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0479b/I1017629.html}{watchdog} also provides a simple counter that generates interrupts.

 \q If there are no timers, the best we can do is to inject checks in the compiler. 


 %============================================================ %



 \item Describe what has to be done to implement an Unload functionality in a system with dynamic module loading

 The module loader has to keep track of the number of references in the dependency DAG. If no modules depend on it, it can be removed. Statically linked module cannot be removed. Dependency DAG has to be updated. \q What do we do about the memory?

 %============================================================ %


 \item Describe a very simple scheduler that supports tasks with (periodic) high-, low- and (non-periodic) background-priority. What is a sufficient condition to make this scheduler work? (high- and low- tasks have to run to completion within their periods.) Sketch a running szenario and comment on the stack. Is one stack enough?

 One stack is enough if higher-priority tasks run to completion (q. 15) and there is only one of each priority level at a time. Task can preemt ones with lower priorities. The scheduler keeps a linked list of tasks to be executed sorted by \verb|(period, priority)|

 \s{115}

 In that case, scheduler state and saved system state can also be stored on the stack:

 \s{124}

 (s. 122 makes the assumption that the lo-pri interval is a multiple of hi-pri interval)

 %============================================================ %



 \item On a (symmetric) multiprocessor system there is usually a need for local and global interrupts. How is this implemented? Sketch the APIC system architecture. What are the APICs used for?

 \s{170}

 Messages to processors:
 \begin{itemize}
 \item Start Processor -- Activation and Initialization of individual processors (see mutliprocessor boot sequence)d
 \item Halt Processor -- Deactivation of individual processors
 \item Halt Process, schedule new process -- Interrupt in order to transfer control to scheduler
 \item  Local timers -- Periodical interrupts
 \end{itemize}

 The local and global APIC is configured separately by either \emph{local APIC register region} (basically a memmapped table) or the \emph{ACPI table}.

 %============================================================ %



 \item What is an active object. What is the advantage of implementing active objects in the language (and not as library => peer managed vs. system managed implementation). Semantics of active objects.

 An object with its own thread attached to it. Promotes parallelism and avoids the pitfalls of transferring ownership between threads.

 Advantages of built-in implementation: easier to use, hiding complexity, can be analyzed statically, compiler can automatically provide appropriate synchronization (\verb|EXCLUSIVE|), compiler can `understand' \verb|AWAIT| statements and possibly optimize for them (in library you have to `poll' with the predicate). The runtime support can cooperate with the scheduler based on the semantics of the AO model.

 \s{182}

 %============================================================ %



 \item Schedule the life cycles of activities in the active object system / a system with monitor (157) When does which transition take place?

 \s{209}
 \s{230}

 %============================================================ %



 \item What kind of data structures do support threads / processes in a system with a monitor / the active object sytem?. What kind of action takes place during transitions in the process state diagram

 Array of processes running on CPUs. Each object: list of objects processes on conditional variable (\q how exactly is it associated with the object) and the locking process. Ready to run queues.

 \s{236}

 %============================================================ %



 \item How can stack allocation be supported in a system with virtual memory for multiple processes ? Is the stack conceptually unbounded?

 Either reallocation or slicing VAS. Yes, no. See 15.

 %============================================================ %



 \item In the course we distinguished between synchronous and asynchronous context switches. Why? Explain what has to be done for context switch (sync->sync, async->sync, async->async, sync->async)
  Synchronous:
  \begin{itemize}
  	\item Explicit: Terminate,Yield   
  	\item Implicit: Implicit
  Awaiting condition   Mutual exclusion
 \end{itemize}
  Asynchronous:
    \begin{itemize}
  \item Preemption: Priority handling , Timeslicing
 \end{itemize}

 \s{218}
 \s{219}
 \s{220}

 Synchronous context switch happens on `stackframe boundary', so saving FP, SP suffices. Async switches have to save all registers.

 %============================================================ %



 \item In A2, certain objects provide a monitor. Where is the monitor information (lock queues etc.) stored?

 (in the header of the object, not in the type descriptor).
 
 \q Why? I assume it is b/c of signal-and-exit semantics where we need the queues at actual objects.

 %============================================================ %



 \item In a system with (active or passive) objects, objects have to provide certain metadata. Which?. Where are the metadata stored?

 Type, GC info, possibly owner thread. Possibly locking/synchronization info. Methods, fields, initializers, finalizers. Hierarchy info. In instance and type desc, ofc.

 %============================================================ %



 \item When monitors are implemented using constructs such as "Lock", "Unlock" and "Await", behind the scenes fine-granular locks have to be used. Why and which?

 \q Does this mean the locking queue? 

 Lock: disable preemption, acquire header lock, set \verb|lockedBy|.
 Unlock: unset, switch to any objects in the lock queue.
 Await: disable preemption, acquire header lock, set \verb|lockedBy| transitively, set condition.

 Reason: keeping consistency of these structures? I don't understand how you can always acquire the lock/not deadlock when preemption is disabled....

 %============================================================ %



 \item In A2 waiting conditions are evaluated when a process exits the lock of an object. How is that accomplished.

 \begin{displayquote}
 	To suspend the running process, its descriptor is added to the await- ingCondition queue of the object. The condition procedure and frame pointer are saved in the process descriptor for later re-evaluation of the condition. Then the object header lock is released, preemption is enabled and a new process is switched to, as was done at the end of the Lock system call.
 	
 	...

 To evaluate a condition, a normal procedure call is made through procedure variable condition stored in the process descriptor. As the condition procedure is compiled as a normal self-contained procedure, no full context switch has to be made. The frame pointer passed to the procedure gives it access to the local variables of the procedure containing the await statement and the fields of the relevant object. In a sense this is a lightweight synchronous context switch to the process that executed the await statement.
 \end{displayquote} \cite[p 93]{aos}

 %============================================================ %



 \item Sketch a scenario with priority inversion. What is the problem?

 P1 (lopri) acquires lock and is preempted by P3 (hipri) which fails to acquire the lock and yields to P2. P2 runs excessively long/livelocks (it has priority over P1), preventing P1 from releasing the lock and unblocking P3.

 %============================================================ %

 \item What kind of data structures are necessary to implement priority inheritance?

 Priority inheritance = process owning the lock gets at least as high priority as any other process competing for the lock. Processes have \verb|waitingOn| ptr and array counter of resoucres by the highest priority of process waiting for them, resources have an array counter of blocked processes by priority and \verb|lockedBy| ptr.

 \s{245}

 %============================================================ %



 \item Where are the following metadata stored: type tag, array descriptor, lock/await queus?

 Instance, \q what does this mean, instance / object header (I assume that's the same thing).

 %============================================================ %



 \item What kind of information does a garbage collector need to work? Where is this information stored?

 A2 mark and sweep:

 \s{316}

 %============================================================ %



 \item What types of parallelism may be exploited in a dataflow programming approach ?

 Dataflow, explicit, instruction-level.

 Pipelining/stream para, task parallelism (data), loop/local data parallelism (SIMD), instruction parallelism.

 %============================================================ %



 \item Problems with current commercial multicore processor architecture? 

 Memory Bottleneck, Synchronisation overhead, Cache invalidation.

 %============================================================ %



 \item What is the advantage of using 18-bits for each instruction on TRM? Disadvantage?

 Tribute to PDP :) High density, low decoding overhead. Problem: Long intermediates and jumps cannot be encoded within 1 instruction. Fix: store in memory/encode by multiple instructions, chained jumps.

 %============================================================ %



 \item What is the advantage of dynamically building hardware for dataflow programs?

 \begin{itemize}
 	\item No global memory
 	\item No OS, no interrupts
 	\item Easy to integrate specific HW (Engines), performance
 	\item Low overhead, low power consumption (due to configurable interconnect)
 \end{itemize}

 %============================================================ %



 \item What happens during execution of a command in Oberon?

 \q Module is loaded if necessary and the appropriate procedure executed?

 %============================================================ %



 \item Can you generate (conceptually) arbitrarily many processes in A2? If not, why not?

 Depends on the definition of conceptually. The number is limited by the number of stacks that can be allocated, which depends on the size of VAS and maximum stack size. \cite[p 43]{aos}

 %============================================================ %



 \item What is a monitor? What is the purpose of waiting and locking queues?

 Synchronization structure, essential a mutex paired with a condvar. Condvar is used to `push' signal the availability of the mutex. Wait queue =  list of threads waiting on the variable. Lock queue = list of threads in AwaitingLock state on a locked object.

 %============================================================ %



 \item Assume a seven digit LED display with four characters where only one character at a time can be controlled by hardware. How do you implement a display of four distinct characters? What do you do in hardware and what in software?

 Refresh it fast enough. (Ass 11) This can be either a software loop or a hardware component (no general processing needed). Then it suffices to save the 4 states and alternate the output based on a timer.

 \end{enumerate}

 \begin{thebibliography}{1}
 	
 \bibitem{aos} Muller {\em AOS}  \url{http://e-collection.library.ethz.ch/eserv/eth:26082/eth-26082-02.pdf}.
 	
 \end{thebibliography}

 \end{document}
diff --git a/slides.pdf b/slides.pdf