Software that provides a programmer friendly interface between application programs and the hardware by providing a virtual environment to applications.
Software that handles resource requests from application programs and prevents applications from trampling each other.
Hardware interface - The operating system provides an interface for device drivers to carry out I/O tasks on our behalf.
Software interface - The operating system provides userland space for user applications.
A trap is generated by userland applications when an exception is thrown. Examples include division by zero, invalid memory access or system calls.
An interrupt is generated by hardware for example when the hardware wants the attention of the CPU. Example, key being pushed on a keyboard.
- System calls
- Invalid memory access
- Devision by zero
Userland code is executed in unprivileged mode.
Kernel code is executed in privileged mode.
The difference between these modes is something that the hardware understands and enforces.
The OS runs in privileged mode, it can interact with hardware directly and can manipulate hardware abstractions.
User programs run in unprivileged mode. They cannot interact with hardware directly and see the machine as a virtual machine.
A process is a program that's in execution. A process includes the code, the processes state and the memory it can access.
Process control block.
The OS must know specific information about processes in order manage, control and to implement the process model. The OS maintains a table called the process table, with one entry per process. These entries are called PCBs. They contain information about the process' state, its program counter, stack pointer, memory application, the state of its open files, scheduling information and everything else that must be saved when the process is switched from ready to blocked state so that it can be restarted later as if it had never been stopped.
The size of the PCB is 1.7kb.
- Exclusive CPU
- Process address space
A process thinks it is the only thing running on a computer. It believes that it has full access to the CPU and complete memory access.
+----------------------------------+
| |
| KERNAL |
| |
+----------------------------------+
| STACK |
+----------------------------------+ |
| | v
| |
+----------------------------------+ ^
| | |
| | |
| HEAP |
| |
| |
+----------------------------------+
| |
| DATA |
| |
+----------------------------------+
| TEXT |
+----------------------------------+
Stack - Function parameters and local variables.
Heap - Dynamically loaded memory.
Data - Static variables.
Text - Machine executable functions.
The heap
Processes are created by first forking another process and then calling exec on that newly cloned process. Exec replaces the current process with the new process.
By the PID returned by fork. If it is the child process this will be 0 if it is the parent process it will be the ID of the child.
Executes the exec command and replaces itself with the new process.
Yes, only duplicate process memory if the process changes something from its parent. (Copy on write)
A proces can exit by:
- Calling exit.
- Returning from main
- Calling abort
- Receiving a signal from its parent
After exiting the process returns an exit status to the parent process.
When the original process and all its children have terminated.
The exit status is stored within the PCB of its parent.
Its parent.
If the parent doesn't exist the process is adopted by init and it collects its return status.
It goes into a zombie state and waits for them to exit.
Interruptible - Responds to signals.
Uninterruptible - Does not respond to signals.
Copy on write is where a child process only copies its parents memory when it writes to it.
This is done in order to maximise the effectiveness of copy on write. If the parent was run before it's child and modified all pages it would have to create a separate copy for each then when the child process executes and immediately uses exec to replace itself with a new process, all copying becomes unnecessary.
CTRL+C will send a SIGINT to the process running.
There are 31 signals.
- SIGHUP
- SIGINT
- SIGALRM
- SIGUSR1
- SIGUSR2
Terminate the process and dump its core.
Mask it with sigprocmask
Signals are useful to programmers because they help avoid the issue of busy waiting, by sleeping until a signal is received the process avoids the need to loop and repeatedly check to see if the event has occurred.
A signal is sent to a process by the kernel marking it as pending in its PCB.
When the process next returns from the kernel the signal handler for the pending signal will run.
- sigaction
- sigset
sigprocmask
To avoid deadlocking and infinite signal calls.
static void handler(int signo) {
printf("Wake-up!\n"); //print wakeup
alarm(5); // set alarm for 5 ms now
}
int main() {
sigset_t set; // initialise the set
sigemptyset(&set); // empty the set
sigset(SIGALRM, handler) // set a handler for sigalrm
alarm(5); // set alarm for 5 ms from now.
while(1) {
sigsuspend(&set); // Release blocked signals and wait for interrupt.
}
return 0;
}
Program will print "Wake-up!" every 5 ms due to the "alarms" being fired.
- Access to multiple cores
- Threads execute in parallel
- Design by module
- Other thread can run should another one be blocked
+----------------------------------+
| |
| KERNAL |
| |
+----------------------------------+
| STACK | |
+----------------------------------+ |
| STACK | v
+----------------------------------+ |
| | v
| |
+----------------------------------+ ^
| | |
| | |
| HEAP |
| |
| |
+----------------------------------+
| |
| DATA |
| |
+----------------------------------+
| TEXT |
+----------------------------------+
Userspace using libraries. Kernel space using system calls
- More efficient, kernel is not involved.
- Less system calls are made.
- Single thread can block whole process
- Programmer has to schedule threads
n to 1
kernel
1 to 1
- Kernel can see threads
- Single thread does not block whole process
- Lots of system calls
hybrid
m to n model
unbound threads
bound thread
Linux sees them as processes sharing address space
Advantages:
- Simplistic
- No new data structures
Disadvantages:
- There may of been an efficiency gain in implementing explicit thread support.
clone
pthread is linux's implementation of the posix thread. They first appeared in 1996.
Yes, there is windows implementation of it, winthread.
int pthread_create(pthread_t *thread,
const pthread_attr_t *attr,
const *(*start_routine)(void *),
void arg*);
void pthread_exit(void *value_ptr);
- Pointer to thread
- Thread attributes such as stack size, priority, detached/non-detached.
- handler/function name
- handler/function arguments
typedef struct {int a, b; } pair_t;
void * add(void *p_in) {
pair_t *p = (pair_t *) p_in;
printf("Answer is: %d\n", p->a + p->b);
}
void adder(int x, int y) {
pthread_t t;
pair_t p;
p.a = x;
p.b = y;
pthread(&t, NULL, add (void *) &p);
}
Global data or pass a struct containing the data.
pthread_exit
pthread_join
Yes they exist if the thread hasn't exited cleanly.
Detach it.
pthread_t t;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_setstacksize(&attr, 20 * 1024 * 1024);
pthread_attr_setdetached_state(&attr, PTHREAD_CREATE_DEATACHED);
pthread_create(&t, &attr, startroute, arg);
We create a detached thread with a custom stack size.
void *foo(void *p) {
pthread_mutex_lock(&mutex);
// Update p
pthread_mutex_unlock(&mutex);
return 0;
}
pthread_mutex_t mutex;
pthread_mutex_init(&mutex, NULL);
pthread_create(&t1, NULL, foo, (void *) n);
pthread_create(&t2, NULL foo, (void *) n);
We create two threads that modified shared data based on a mutex becoming available.
Condition variables allows a thread to wait for something to become available.
pthread_cond_wait
first - condition waiting on second - mutex
The condition that is being waited on.
pthread_cond_signal
pthread_mutex_lock(&mutex);
while(occupied == size) {
pthread_cond_wait(&room, &mutex);
}
/* insert data */
pthread_cond_signal(&data);
pthread_mutex_unlock(&mutex);
It waits on a mutex to become available. When the mutex is available it inserts data, sends a signal off to notify other threads that data has been added and it unlocks the mutex.
To avoid a spontaneous wake up
A spontaneous wake up is when a thread returns from pthread_cond_wait without any thread having called pthread_cond_signal or pthread_cond_broadcast.
Being woken only serves as a hint that the condition being waited on might have changed.
hint
pthread_cancel to request termination from main pthread_exit to exit from the thread itself
pthread_cancel and pending cancel
- It might be in the process of changing shared data
- It might hold some lock
Cancellation state
Enabled or Disabled
If the cancellation state is enabled the effect of the pending cancel depends on whether the ___ is ___ or ___
Cancel type is async or deferred
If the cancel type is ___ the cancel is acted upon immediately and any ___ are executed and the thread terminates.
- async
- clean up handlers
- deferred
- cancellation point
pthread_setcancelstate
pthread_cond_wait
If we do not know where cancellation points exist it is possible that we may end up never realising a mutex.
pthread_mutex_lock(&mutex);
while(should_wait) {
pthread_cond_wait(&cv, &mutex);
}
pthread_mutex_unlock(&mutex);
The thread might cancel and the mutex will still be locked.
pthread_mutex_lock(&mutex);
pthread_cleanup_push(pthread_mutex_unlock, &mutex);
while(should_wait) {
pthread_cond_wait(&cv, &mutex);
}
pthread_mutex_unlock(&mutex);
pthread_cleanup_pop(1);
We register handler to unlock the mutex after a cancellation point has been reached.
Each thread should be given its own errno.
Some libraries were originally written under the assumption that they would only ever be called from a single-threaded process. Their functions were not ___
thread-safe
gethostbyname - not thread safe.
gethostnamebyname_r - is thread safe.
computation_state_t state;
void handler(int signal) {
display(&state);
}
void procedure(void) {
sigset(SIGINT, handler);
while(true) {
update_state(&state);
}
}
The state could be corrupted. update_state is not async-safe.
No this will result in a deadlock.
async signal safe.
functions ending with _r are async-signal-safe
Use threads.
Does a process block signals or do threads do so individually? if we wend a signal to a multithreaded process which thread receives the signal?
A process blocks signals.
The thread that gets the signal is chosen randomly from the set that does not have the signal blocked.
computation_state_t state;
sigset_t set;
int main () {
pthread_t thread;
sigemptyset(&set);
sigaddset(&set, SIGINT);
pthread_sigmask(SIG_BLOCK, &set, 0);
pthread_create(&thread, 0 , monitor, 0);
long_running_procedure();
}
void long_running_procedure() {
while(true) {
pthread_mutex_lock(&mutex);
update_state(&state);
pthread_mutex_unlock(&mutex);
}
}
void *monitor() {
int sig;
while(true) {
sigwait(&set, sig);
pthread_mutex_lock(&mutex);
display(&state);
pthread_mutex_unlock(&mutex);
}
return NULL;
}
We solved the problem here by using a threads. We updated our state inside a thread and we wrap a mutex around it.
Now even if our function wasn't thread-safe it will still safely update the state because it must acquire the mutex before it can do so.
They do not mean the same thing. Reentrant is stricter.
- Block Devices
- Character Devices
A block device stores information in fixed-sized blocks, each within its own address and transfers are in terms of one or more blocks.
Hard disks, CD-ROMs and USB sticks are blocked devices.
A character devices delivers or accepts a stream of bytes, without regard to any block structure.
Printers, modems and mice are all character devices
Controller is hardware or a firmware that controls the operating of a hardware device. An example is a SCSI controller.
It talks to the device controller. We do not need a different driver for every single device.
A bus is a set of wires and a well-defined protocol for sending a set of messages across those wires.
This includes things such as:
- PCI
- USB
- SCSI
- IDE
Instead of having special methods for accessing the values to be read or written, just get them from memory or put them to memory.
The device is connected directly to certain main memory locations
DMA, direct memory access allows I/O requests to be processed without tying up the CPU.
A DMA controller is hardware that allows for DMA. It may be for one device or shared with multiple.
It varies based on the DMA configuration.
The CPU stops what its doing and uses the device number to index an interrupt vector which leads it to the interrupt service routine for that interrupt.
Programmed I/O is where the CPU does all the work to make an I/O request.
The process stays on the CPU until all data is received. This results in a busy wait.
Interrupt I/O allows the CPU to do something useful while data is being received.
The process gives up the CPU, sits on a queue and waits for its I/O request to be completed. When the I/O operation is completed an interrupted will be generated which makes the process as run-able again.
The operating exposes a well-defined driver interface which third parties use to write their drivers.
If drivers did not implement the same interface what would be required every time a new device was to be supported
Recompiling the kernel
In addition to looking the same to the OS, Unix devices look the same on the outside too. What do they look like? What is major vs. minor device number?
In unix all devices look like files.
A devices file inode contains a major device number and a minor device number. The major number selects a driver and the minor number identifies a particular device.
We are repeatedly blocking/unblocking a progress for a single character. It is inefficient.
There is a possibility that newly arriving characters will overwrite earlier ones
Same problem again but it is delayed.
While one buffer is full and copied to userspace another is being filled with arriving data.
By the time the second buffer is full the first has been emptied and can start accepting data again.
platters surface
tracks
sectors
Not on modern disks
heads cylinder
- Position in the heads over the correct cylinder. Seek time
- Rotate the disk until the desired sector is under the head. rotational latency
- Rotate the sector under the head. transfer time
Seek time is the main bottle neck.
Cylinder skew is the amount by which sector 0 in one track is offset from sector 0 in neighbouring tracks.
- Elevator
- First in first out
- Shortest seek time first
rotation speed: 10,000 RPM
number of surfaces: 8
sector size: 512 byte
sector/track: 750 (average)
tracks/surface: 100,000
average seek time: 4ms
one track seek time: 0.2ms
maximum seek time: 10 ms
Calculate:
capacity:
surfaces * tracks * sectors * sector size
8 * 100,000 * 750 * 512 = 307200000000 bytes
maximum transfer rate (single track):
bytes read in 1 second = tracks in 1 second * bytes per track
= (RPM / 60) * sectors per track * bytes per sector
= (10,000 / 60) * 750 * 512
= 64000000 bytes
maximum transfer rate (single cylinder):
Due to head skewing its the same.
maximum transfer rate (multiple cylinders):
bytes per cylinder / (time to read 1 cylinder + time to move to next cylinder)
bytes per cylinder = bytes per track * track per cylinder * number of surfaces
= 750 * 512 * 8
= 3072000
Time to read 1 track = 60 / RPM
= 60 / 10000
= 0.006 s
Time to read 1 cylinder = time to read 1 track * tracks per cylinder
= 0.006 * 8
= 0.048 s
Time to move to next cylinder = 0.0002 s
Time to read 1 cylinder + time to move to next cylinder = 0.0482 s
So we have: 3072000 / 0.0482
= 63.7 million bytes/s
cylinder skew:
(one track seek time * RPM * sectors per track) / 60
(0.0002 * 10000 * 750) / 60 = 25
maximum transfer rate (multiple cylinders with no cylinder skew):
bytes per cylinder / (time to read 1 cylinder + time for 1 revolution)
total time = time to read 1 cylinder + time for 1 revolution
= 0.048 + time for 1 revolution
= 0.048 + ( 60/10000 )
= 0.040 + 0.006
= 0.054s
So we have: 3072000 / 0.054
= 56.9 million bytes/s
Combining CPUs in parallel has been proven a good idea. Can we do the same with disks to prove I/O performance? What is this approach called?
Yes, this approach is known as RAID, redundant array of independent disks.
RAID - Redundant array of independent disks.
SLED - Single large expensive drive
A SCSI controller handles all implementation. The array is just exposed to the operating system as a single disk.
The operating system is not aware, all of the implementation is taken care of by the SCSI controller.
Data striping describes how the files will be stored. It involves splitting the files across x disks.
It allows requests to be serviced in parallel
It allows for reading in parallel
Fine grained arrays interleave data in small units so all I/O requests, regardless of size, access all disks in the array.
This gives the advantage of high transfer rates for all I/O requests however it has two disadvantages:
- Only one I/O request can be serviced at any instant
- All disks must waste time positing for each request
Coarse grained arrays interleave data in large units. This means:
- Small I/O requests access only a subset of the disks and several can be served simultaneously
- Large requests benefit from high transfer rates
Every disk is involved resulting in high transfer rates
Only one I/O request can be serviced at any instant. All disks must waste time positioning for each request.
- Small I/O requests access only a subset of the disks and several can be served simultaneously.
- Large requests benefit from high transfer rates
- Not all requests will benefit
- Increased chance of failure
We assign extra disks for redundancy purposes.
- Do a direct 1:1 mirror of the disk(s).
- Use parity information to figure out what data has been lost.
- Data striped across all disks in an array
- No parity
Advantages:
- Good performance: With N disks, roughly N times speedup.
Disadvantages:
- Poor reliability: one disk failure results in data loss.
It is not considered true RAID as it failed to implement redundancy
Keep a mirrored copy of the data
Advantages:
- Good reliability: one disk failure is OK.
- Good read performance
Disadvantages:
- High cost, one data disk requires one parity disk
A data sector striped across data disks Compute error-correcting parity and store in parity disks
Advantages:
- Good reliability with higher storage utilisation than mirroring.
Disadvantages:
- Unnecessary cost: disk can already detect failure
- Poor random performance
It is never implemented
Single parity disk (XOR of each stripe of a data sector)
Advantages:
- Same reliability with one disk failure as RAID2 since disk controller can determine what disk fails.
- Higher storage utilisation
Disadvantages:
- Poor random performance
A set of data sectors striped across data disk
Advantages
- Same reliability as RAID3
- Good random read performance
Disadvantages:
- Poor random write and read-modify-write performance
Parity sectors distributed across all disks
Advantages
- Good performance
Master boot record.
It is located at sector 0 of the disk. It contains the partition table.
The BIOS reads it.
It is the datastructure that represents a file.
It contains information like size, permissions, owner, group, accessed time, modified time, created time.
File of directory entries
Data structure representing the contents of a directory
Filenames are a variable length A filename is stored within a directory entry or within the heap.
Looking up a file name involves linearly searching a direcrory from beginning to end.
We can use a hash table within each directory to speed up the process
A shared file shows up in more than one directory listening and thus requires two directory entries. Sharing poses a problem where a directory antry contains pointers to blocks. What is the problem?
When one version gets edited how does the other one get updated.
The filename and the inode pointer
Hard link, soft link
Hard Link
Soft Link
Use buffer cache which is a collection of disk blocks retained in memory.
Tree or Hash
The buffer cache is of finite size. This occasionally blocks must be removed. In what order do we maintain the cache to facilitate such removal?
Least recently used to Most recently used
If an inode block were reguarly modified it would remain in the cache for an extended period? Why is this a potential problem?
It would stay in the cache, this means we could experience a data loss should the machine cash.
- How likely is it to be used again
- Is it essential to file system consistency
Blocks are likely to be required again in the near future go to the ___ while blocks unlikely to be used again in the near future go to the ___ of the buffer cache.
back, front.
Irrespective of where it goes in the list, what happens with blocks essential to the file system consistency?
They get written out straight away
If a modified block is not essential to file system consistency how long should it be kept in the buffer cache before we write it back to disk?
30 seconds
Linux - 30 seconds worth Windows - None
Read blocks based on assumptions
Cylinder groups involve culstering files and their inodes in the same cylinder group. This results in fast reading.
- Remove the directory entry for the file
- mark the files inode as free
- mark the files data blocks as free
The log is used to hold pending actions to be carried out. It allows us to run its contents when restoring from a system crash.
It means that running them several times should have no harmful side effects.
It must have this property as log operations can be carried out multiple times.
Unix merges different file systems into a single simeless directory structure. Does windows do the same thing?
No within windows each file system has its own distinct drive name.
What is it that enables these different file systems to co-exist and be used without any modification to the linux kernel?
The virtual file system. It abstracts the common operations that all file systems must support and moves them into an implementation-independent layer that calls the file system implementation.
To log onto a unix workstatio a user enters a username and password and hits return. What happenms between hitting return and a command shell appearing?
The password entered is hashed, compard with that the username's password in the password and if they match the user is access is granted
A process has 6 IDs associated with it.
- Effective User
- Effective Group
- Real User
- Real Group
- Saved User
- Saved Group
They come from a mapping with the passwords file, specifically /etc/passwd
Real and effective are inheritied Saved is copied from the effective
The processes effective uids and guid
Within its inode.
A directory is simplely a file of directory entries.
A directory entry is composed of files.
It contains a filename and an inode pointer
Read write and execute
Read write and traverse
The user will need execute access on a and b. They will need write access on c.
A little, this means it is possible for people to guess file names within your directory and open them.
Write a command to set the permissions on the file foo such that you and your classmates can read it but no one else can. Change it to allow you to read and write it. How does the kernel determine whether or grant or deny access?
chmod 440 foo chmod 640 foo
The kernel applies access based on the following:
- If the effective user id is 0 grant access.
- If the effective user id matches the owner of the file and the appropriate permission bit is set, grant access. Otherwise deny
- If the effective group id of the process or one of its supplementary group IDs matches the group ID of the file and the appropriate permision bit is set, grant acccess. Otherwise deny access.
- If the appropriate other permission bit is set grant access, otherwise deny.
They are taken from the effective UID and GUID of the process. The permissions are determined based of the umask value.
A multiuser workstation has a single shared /tmp directory for handly temporary files. What's unusual about this directory?
All users are allowed to write to it however users are not able to delete each others files. This is done by setting a sticky bit.
The program used to change the password, passwd has a sticky bit set. This sticky bit says that when the process is executed the effective owner and group should be taken from the owner and group of passwd rather than the user executing it.
This is known as setuid
The effective user and group is taken from the excutables owner/group.
You load a CD containing a setuser ID root program on a lab machine. You run the program. Does it run with root privileges?
No, we disable setuid operations on mount.
No. it has full root permission
In order to avoid accidently giving somebody route privileges by not correctly dropping access.
Permanently - setreuid it will modify the real and effective user id, and if needs be the saved user id. Temporarily - seteuid will modify the effective uid. We can gain privileges again by copying over the real id.
In some cases it will modify real, effective and saved user ID. In other it will only modify real and effective.
When and how often are access permions on a requested resource checked? Given file descriptors are inherited across exec, what are the security implications?
Permissions are only checked on the open system call. Since file descriptors are inherited across exec a untrusted program may access an already open file descriptor.
- Temporarily drop privileges and become the user.
Your setuser ID root program wishes to check if the real user should be allowed to access a particular file. What are the potential security pitfalls in the coding check.
An attacker could replace the file between the time access and open are executed. A better way to do this is to temporarily drop privileges before open.
Why are buffer overflow vulerabilities so serious? What does a buffer overflow vulnerability allow an attacker to do?
They allow the execution of arbitrary code by an attacker on a victims machine.
Poor C programming practices combined with a failure to properly santise user input.
- gets
- strcpy
- strcat
- scanf
- memcpy
char *gets(char *dest) {
int c = getc();
char *p = dest;
while(c != EOF && c != '\n') {
*p++ = c;
c = getc();
}
*p = '\0'
return dest;
}
We use the gets function and tthere is no way to specify a limit on a number of characters to be read. Thus we could keep writing to the stack and overwrite the return address.
void foo() {
char bufffer[8];
puts("Please enter your name:");
gets(buffer);
}
+----------------------------------+
| |
| Stack Frame for main |
| |
+----------------------------------+
| Return address |
+----------------------------------+
| Saved %ebp |
+----------------------------------+
| [7][6][5][4][3][2][1][0] |
+----------------------------------+
Modify the return address to run arbirary code that will launch a shell.
static char sneakystring[] =
"\xeb\x12\x5e\x31\xc0\x88\x46\x07"
"\x50\x56\x31\xd2\x89\xe1\x89\xf3"
"\xb0\x0b\xcd\x80\xe8\xe9\xff\xff"
"\xff\x2f\x62\x69\x6e\x2f\x73\x68";
Machine code to launch a shell
Placing a payload into the address space. Jumping to that address space
- Write correct code
- Make areas of process address space non-executable
- Use the compiler to do bounds checking
- Address space randomisation
+----------------------------------+
| |
| Stack Frame for main |
| |
+----------------------------------+
| Return address |
+----------------------------------+
| Canary |
+----------------------------------+
| Saved %ebp |
+----------------------------------+
| [7][6][5][4][3][2][1][0] |
+----------------------------------+
- Terminator canaries: detect runaway strings
- Random canaries: detect sequential writes
- Random XOR canaries: Detect random access memory writes
Canary is only checked on return
We can modify the clearance flag.
Address space randomisation makes the stack segment and the start locations of other segments random. Guessing a new return address on the stack is made more difficult if for each invocation of a program the stack starts at a new random address.
The object owner has full control over their own files.
Mary downloads and runs Angry Birds. What are the risks under DAC? With what privileges does the AngryBirds process run?
AngryBirds can access all of marys files.
It runs with all of marys privileges
Only give the needed access to the processes.
Linux does not apply it?
- Firewalls
- Patches
- Strong passwords
- A zero-day is a new vulnerability that no patch exists for.
- A zero-day attack is an attack that exploits the zero-day exploit.
- Can defend against them but can't prevent them
Using multiple layers of security
Access to files is mandated by a system policy
A linux module made by the NSA for MAC in linux.
- Security Policy
- Access Vector Cache
- Policy Enforcement Server
- selinuxfs
Expressly permitted. Denied
The main type of permission control is type enforcement. This causes program sandboxing through consigning each program to a domain and restricting eac domain to privileges sufficient only for its purpose.
Role-based access control is also available. It is used to prevent unauthorized SELinux users enterining particular domains.
A domain is a sandbox. A sandbox constraits access based on two different things:
- Role-based access control prevents users from entering unauthorized domains.
- Type enforcement limits each domain only ot those privileges it requires in order to do its job.
Transitions Entry point
When the role is authorized to access the domain
You only sandbox high risk applications
A security context which contains an selinux user, a role and a domain.
The blink program blinks messages from messages on the LED display. Ordinary users can run the blink executable. Write the SELinux policy rules to enforce this behaviour. What is the benefit?
/usr/bin/blink {system_u, object_r, blink_exec_t}
/etc/blink/msgs.txt {system_u, object_r, blink_data_t}
/dev/led {system_u, object_r, blink_data_t}
allow blink_t blink_data_t : file {read write};
allow user_t blink_exec_t : file {execute};
allow blink_t blink_exec_t : file {entrypoint};
allow user_t blink_t : process {transition};
type_transition user_t blink_exec_t : process blink_t;
role user_t types blink_t
This results in the blink application executing within a sandbox. Users can not explicitly access /dev/led.
- Access decisions state what the user is allowed access to.
- Transition decisions state if the user is allow to transition from one domain to another.
execute transition
bash ------------------> /bin/blink --------------------> blink {user_u, user_r, blink_t}
{user_u, {system_u /\
user_r, user_t} object_r, /dev/led /etc/blink/messages.txt
blink_exec_t} {system_u
object_r
blink_data_t}
It is complex and difficult to maintain.
A monolithic kernel is one where all kernel components are linked together into a single program that runs in a privileged mode.
Windows and Linux
-
Data structures can be easily shared and kernel components can easily communicate with one another as there is just one address space and all kernel components inhabit it together.
-
A bug in one component can affect the others. A big can bring down the entire OS.
A microkernel implements only the loewest level details of memory management and virtual memory threads and I/O.
Higher level abstractions such as processes, file systems and device drivers are implemented in servers that run in user space.
Applications communicate with these servers to invokve the kernel rather than with the kernel directly.
Mach and Chorus
- Increased security, if a buggy component fails it does not affect the entire operating system.
- Development is straight forward: debuggers are userland applications so debugging a file system implemented in a user land is simpler than debugging one in kernel mode
Virtueal means ot is not physically existing but emulated by software
IBM invented them. They did so because they were having difficulty building a multiuser time-sharing system.
A hypervisor is a virtual machine monitor.
- virtual machines
- single-user time sharing system
- A hypervisor running on real hardware is a type-1 hypervisor
- A hypervisor running on a host OS is a type-2 hypervisor
Type-1 hypervisors include:
- Xen Server
- VMWare ESXi
Type 2 hypervisors include:
- VMWare workstation
- VirtualBox
Why is virtualization good for a company running a number of servers that provide critical services? What causes a machine to crash: hardware or software?
It is a good idea because of server consolidation and service isolation. This means instead of the company having a large amount of physical machines to provide criticial services they have one very powerful machine that runs a hypervisor. From this they then use virtual machines to isolate the criticial services from each other.
Software normally causes a machine to crash.
#### What is a virtual appliance? What can you find at turnkeylinux.org?
A virtual appliance is a runtime environment and an operating system that is bundled together and shipped as a single unit. This is done to save on installation headaches.
Virtual appliances.
Some applications may require legacy hardware. By using virtualization we can emulate this hardware.
Software developers can test their software on a single machine against a variety of operating systems in a convenient manner without the need to reboot
Debugging and testing operating systems that are running as applications on virtual machines is simpler than debugging a live oeprating system executing on real hardware
The operating system asks what hardware is available on bootup.
Intercept requests to the host hardware and spoof responses
Memory, CPU and I/O
#### In what mode is the guest operating system executing? Why is this unusual?
Userland. This is unusual because the kernel of the guest operating system is going to need to fulfil system calls.
This is where the hypervisor takes control and spoofs a response for the hardware.
Popek and Goldberg discovered that it will not always work due to sensitive and privileged instructions.
A privileged instrsuction when executed in privileged mode execute fully but when executed outside of privileged mode causes a trap
Sensitive instructions are those that affect the allocation of real system reources whose effects depends on the state of the real hardware
If P is the set of privileged and S is the set of sensitive instructions, what must be true for a "trap-and-emulate" (pure virtualization) approach to be possible?
Sensitive instructions must be a subset of privileged instructions.
What happens when a privileged instruction is executed in a virtual machine running a type-1 hypervisor? In what two ways may this event be handled by the hypervisor?
The real processor traps to the hypervisor. The hypervisor checks the state of the virtual machine. If its in virtual user mode then control is passed to the guest OS. If it is in virtual kernel mode then the effective of the instruction is emulated by the hypervisor.
What does the x86 popf instruction do? what is unusual about it? Why is this a problem for "trap-and-emulate" / pure virtualization?
It is sensitive but not privileged. It will not cause a trap.
No, there are workarounds.
Vmware
It uses binary translation. It listens for sensitive instructions that are not privileged and re-writes them to hypervisor instructions.
VT-x allows the processor to notify a hypervisor when a configurable set of events occur.
Binary translation as it is not as expensive.
- The trap goes to the hypervisor which sees it was generated by code executing in virtual user mode.
- The hypervisor hands control to the system call handling code in the guest operating system
- The guest operating system executes the system call.
What happens when a privileged instruction is executed in a virtual machine running on a type-2 hypervisor? In what two ways may this event be handled by the hypervisor?
The host hands control to the hypervisor and hands control to the system call handling the code in the guest operating system. The guest operating system then executes the system call.
- A software interrupt is generated
- A trap goes to the host operating system
- The hypervisor registeres itself as a debugger with the host and is passed control by the host in response to the trap.
- The hypervisor determines that the trap was caused by code executing in virtual user mode.
- The hypervisor hands control to the system call handling code in the guest operating system.
- The guest operating system executes the system call.