CS 439H

Gheith's spinLock

class SpinLock{
    Atomic<bool> isTaken = false;
public:
    lock(){
        while(isTaken.exchange(true));
        // exchange returns value at location
    }
    unlock(){
        isTaken.set(false);
    }
}

If isTaken is true to start with you will just an infinite loop. If initially false, and is changed to true and that is returned, we know we are allowed to go and do work.

Semaphore

class Semaphore {
    uint32_t count;
    Queue<Thread> blocked
public:
    Semaphore (uint32_t count) : count(count), blocked(new Queue<Thread>());
    void down(){
        if (count == 0){
            blocked.add(Thread::current);
            Thread::yield(); // don't add self to readyQ
        }else{
            count--; // being atomic would not help much
        }
    }
    void up(){}
}

You cannot surround down() with a lock i.e.

lock.lock(){

}lock.unlock

System would spin, and there would be no way to unlock the lock. Semaphore would be done with. Must build mechanism to release the Semaphore

//down:
lock();
if(count == 0){
    blocked.add(me);
    unlock();
}

Thread A is running the former. Around the same time Thread B (runs the latter) gets spinlock and tries to help Thread A onto the readyQ

//up:
lock();
if(someoneIsWaiting){
    readyQ->add(someone);
}

What if someone changes count while a thread is trying to block.
- Run yield before unlock, and only unlock if yield is successful
- Add a new DND state that tells other threads not to bother you. Allowing you to finish your job

Create idleThread for cores to run when they have nothing to do.

if(first){
    //  .
    //  .
    //  .
    kernelMain();
}else{
    while(true)Thread::yield();
}

currentThread using PerCPU

Figuring out who you are using SMP::me()

Other uses of Semaphores

Networking

Consider multiple threads running on server, producing data
- Called Producers
Consumers, or workers
- pick up what the produceres output and do work on given data
  - Called the Producer/Consumer problem
Any data structure that does not forget, like a Queue, sits between them
- Keep it atomic
Put backpressure on Producers (flow control)
- When consumers cannot handle work they tell Producers to Stop It
- Have to be able to tell Producers to wait if the finite-queue is full

Bounded-buffer

Can handle the Q and telling the sender a status

template<typename T, int N> //N is # of elements in buffer
class BB{
    T data[N];
    void put(T e){ ... }
    T get(){ ... }
}

Both put() and get() are able to block
- In the scenario that the buffer is full on put(), or if the buffer is empty on get()
- Must have two flags instead of semaphore to say whether semaphore is empty/full

class B <N>{
    Semaphore nEmpty = N;
    Semaphore nFull = 0;
    Semaphore lock; // just used for protection of data structure, could also be a spin

    put(T e){
        nEmpty.down(); // if nEmpty is 0, then you block
        lock.down();
        // need an atomic Q
        // .
        // . compete over it
        // .
        lock.up();
        nFull.up();
    }

    T get(){
        nFull.down();
        //compete over the Q
        nEmpty.up();

    }
}

By competing over the semaphore, we know that the right number of threads are allowed in.

lock.down();
// need an atomic Q
// .
// . compete over it
// .
lock.up();

This is simply a thread safe Q

Example of bad put and get

//put:
lock();
nEmpty.down();
//.
//.
//.
nFull.up();
unlock();

//get:
lock();
nFull.down();
//.
//.
//.
nEmpty.up();
unlock();

By swapping pairs of commands, we can be deadlock free

Barrier

Barrier(int n) : n(n){}
void sync(){
    //up to n - 1 threads can sync
    //nth thread makes every other thread wake up
    n = n - 1;
    if(n.add_fetch(-1) == 0){
        wait.up();
    }else{
        wait.down();
        wait.up();
    }// add check to see if less than 0 for invariant
    
}

Monitor

Monitor_bounded_buffer{
    condition queueIsNotFull;
    condition queueIsNotEmpty;
    //gurantees mutual exclusion
    void put(T v){
        [lock]
        while(queueIsFull()){
            [unlock]
            wait queueIsNotFull; // wait is a keyword
            [lock]
            // condition is pulse not a state. have to wait for the signal
            // once out of wait we know q is not full
        }
        signal queueIsNotEmpty;
    }
    T get(){
        if(queueIsEmpty()){
            wait queueIsNotEmpty;

        }
        <<is not empty>>
        // do stuff to get from queue
        <<isNotFull>>
        // q cannot be full since something is removed
        signal queueIsNotFull;
        // hand off lock
        // what if I want to continue doing work here
    }
}
bounded_buffer{
    synchronized void put(){}
    synchronized void get(){}
}

Have to say synchronized
Would be a good alternative that upon wake up you hand off the lock
When signaled, the lock is given away to other monitor so no one can sneak in and change the state
When singaling, we expect the other monitor to check the state of the Queue

Problems

If want to work after I signal
- just make it illegal to work after the signal
actually very difficult to atomically unlock and then wait
- if not atomic there will be race conditions

Fundamental Problem

Nested Monitor Problem

P6

For every data structure we have to ask ourselves:

is it global
per per thread
per CPU Try to make everything per thread.

Examine every data structure in P4 and check what scope it can have

active thread ptr, should be per core
- use PerCpu to get

Figure out locking mechanism
Find shortest time needed to hold the lock, otherwise race conditions will arise

if(readyQ.isEmpty()){
    // stuff
}

Need lock to preserve invariants. Want to presrve the fact that the Q is empty while dealing with the Q (that is supposed to be empty)

Semaphore details

//down:
spin.lock(); // lock should be of the same scope as the data structure
if(count == 0){
    waitingForMe.add(Thread::current());
    // cannot unlock here because thread could be picked up before contextSwitch happens
    // set state that youre in the state of blocking
    yieldButDontReady(); // without unlock its a deadlock
}
//up:
spin.lock();

Ask next thread on readyQ to unlock for you

Use the callback class

Deadlocks

Dining philosophers problem Reason for deadlocks

Mutual exclusion
- avoid this to never have deadlocks

Things don't exclude each other from getting work done

Hold and wait
- Someone holds a resource and prevents others from using it
- never grab more than one thing at a time. Set one lock at a time build hierarchical locks
Non-preemption
- notion of transactions so that there is some history of some transaction
- 2 phase locking
  - get all locks up front.
  - You tell me all the locks you want, in any order you like. Go through phase where we get all your resources. While doing this, if system detecs a deadlock, it can take away resources to give to someone else. Then later these resources can be given back.
Cycles
- cyclical waiting
- change natural order and remove one

If we construct of resource graph, and we have a cycle, we immediately have a deadlock.

Have to start implementing tech for recovert

Midterm

monitor BB{

    condition c1;
    condition c2;
    /* cannot tough condition from outside */

    void m1(){

    }
    void m2(){
        if(notHappy) return;

    }
}

class Barrier{
    Atomic<uint32_t> ctr;
    Semaphore wait;
public:
    Barrier(uint32_t ctr): ctr(ctr), wait(0){

    }
    void sync(){
        if(ctr.add_fetch(-1) == 0 ){ 
            // terminal state
            wait.up();
        }else{
            wait.down();
            wait.up();
        }
    }
}

monitor Barrier{
    condition allHere;
    uint32_t ctr;

    void sync(){
        if(ctr == 0) return;
        ctr--;
        if(ctr == 0){
            signal allHere;
        }else{
            while(ctr != 0){
                wait allHere;
            }
            signal allHere;
        }
    }
}

sem2.down()l
bool isSecond = firstIsHere(true);
if(isSecond){
    y = data;
    secondIsHere.up();
    xIsReady.down();

    return x;
}else{
    x = data;
    xIsReady.up()
    secondIsHere.down();
    T temp = y
    sem2.up();
    return y;
}

P8

Program
- file with executable things in it
- file
  - bytes that belong together that can be written to and read from
- executable
  - has meta-data that describes the contents of the file
    - magic number
    - headers that show what needs to be loaded from memory
  - most linux exec.'s are ELF files
  - data
  - instructions
- after being read in, executable is converted to a process
  - process is a program loaded in memory
  - give it a thread
  - set address space
- n/a
  - fork
    - creates copy of process
    - copy will have a different pid
    - when copy returns it returns again (since it is a copy of running process)
    - ultimately they run independently
    - has ppid (parent pid)

int id = fork();
if (id < 0){
    /* parent,
    no child, failed */
    printf("%d\n", errno);
}else if(id == 0){
    /* child */
}else{
    /* parent */
    /* id is the child pid */
}

exec

start running new program from entry point exec???("gcc",...)

wait

wait(...) waitpid(. ) - parent waits (join)

system calls needed for p8

fork exec wait ( family ) kill alarm ( read about these ) open (files ) creat

-o -e: fork, open files, change stdout/err,

Virtual Memory

We have managed to get processes to run in parallel etc. Now what we want to do is

Give the illusion that they are using the full address space
Also want to isolate them from other processes
That there is more virtual than physical memory
- give 64MB but make it seem that there is 4GB
less virtual memory than physical
demand paging
- Only bring things into memory when they are requested.
- Don't have memory so will have to kick someone out
protection
- mechanism that allows reading but not writing
optimize program loading
- thanks to demand paging, don't have to load the whole thing
copy-on-write (COW)

MMU (Memory management unit)

Handles the memory mapping for different processes Takes an address (virtual address) and outputs an address (physical address)

base register is added to virtual address and creates physical address
- this is a simple memory management method
- depending on the process running, you put different values in the reigster
- doesn't achieve isolation
  - typical solution is to add a limit register which specifies the highest limit on how much memory one can use
  - have a compare circuit that checks the virtual address with the limit. if VA is greater than limit we throw a segmentation violation
  - cannot do demand paging though
When the system resets, the "trustMeImTheKernel" register is set to 1
- This needs to be set to 0 right before a user program called
- Hardware has to implement switch of this register on call to kernel
- In x86 there is a CPL, which is a number from 0-4

Problems with external fragmentation

Cannot have some number of registers that have multiple bases

What if we just split up the address space into equal sized blocks of 1 MB
- make pages 1000 B
- VA = 1700
- virtual page # (VPN) = VA / page size = 1700 / 1000 = 1
  - By restricting page size to a power of two, all you have to do is shift by $\log_2(\text{pagesize})$
  - Now how do we get the physical page number?
- (PPN, ok) = MMU(VPN) for some function in the MMU
  - "ok" tells you whether there is an actual corresponding page number
```
(ppn, ok) = pageTable[vpn]
// with multi level (ppn,ok) = PD[l1][l2]
if(ok){
    pa = ppn * pagesize + offset;
}else{
    pageFault
}
```
- PA = PPN * pageSize + offset
  - in this case... offset = VA % pageSize
- All these calculations are done by the hardware
- in x86 the head of the multi level tables is called the page directory (PD)
  - children are called page table (PT)
  - the whole thing is also called the page table (PT)
- for context switching, a register is dedicated to pointing to the page directory
  - this register is called CR3
- PDE is a page directory entry (x86)
  - contains PPN of following PT
  - stored in high order 20 bits
  - starting at the least significant bit:
    - present bit. 0 = invalid, 1 = valid
    - RW, read/write, 0 = read-only, 1 = write
    - US, user supervisor, 0 = supervisor (CPL == 0 [most privileged process]), 1 = user
    - D, dirty, if a program is writing to this PDE hardware will set the dirty bit.
    - A, accessed, 0 = has not been accessed, 1 = has been accessed
  - Redirects us to a PTE, this has the same flags as the PDE

Page Tables

Cannot have a table with an entry per memory address. So we increase the page size; however, now we increase the page size.

Physical pages
- AKA
  - page frames
  - frames
  - physical frames The table index is representative of the VPN, while the returned value within the table is the PPN Don't use magic values for things being empty etc.
TLB
- translation look-aside buffer
- cache of finite size. while traversing page table (walking), answers are put in the TLB
  - TLB holds the VPN & PPN (along with extra bits)
- every core has its own TLB
- only way virtual memory actually works
- TLB is not aware of modification to the Virtual Memory
- x86 has INVLPG instruction which clears TLB page address
  - TLB shootdown, have to go invalidate other TLB if there are multi processors

User process

Need to give it some address space - give it a tree

Have to have a loader
- The thing that reads ELF file
- This action sits behind exec

Disk

* is a just an array of sectors
* used to be 512 byte sectors
* in modenr times the sectors are 4k

Put files in a tree
- Files in this hierarchy have permissions (and other things)
- files are first created in a sector, where the last 4 bytes are a pointer to the next sector containing more data
- the super block has information about where the information about the file system is
  - This is usually stored at the beginning of the file system
- meta data
  - file name
  - starting block
  - size
  - permissions
  - type
    1. directory
    2. file Similar to virtual memory. The offeset in the beginnin gets you into the file sytem to get the first sector of the file

i-node

mode
type
- regular
- directory
size
~~pointer to some block 0, for a file~~ direct block
- if overflow use indirect block
- after that we have an even deeper one (3 levels of inderection)
  - block that points to block that points data block

ext2 filesystem

has super block, needs place to store indoes and data blocks
- hence there exists an i-table, which is full of inodes
- the index of the inode will tell you which one it is in the table
  - i.e. the inodes are acutally stored in the itable, there aren't any pointers there or anthing
often one will run out of inodes, even though there is a lot of storage space left
file system memory
- super block
  - the ~~following data~~ super block and following data is called a group and repeats multiple times
- to know where there are entries in the i table we have a i-node bitmap
  - the bitmap is at the beginning of the filesystem somehwere right arond where the super block is
- data bitmap
  - is after the i-node bitmap
IDE
- integrated device electronicse
mounting a filesystem is similar to opening a file
- mount opens the device, looks for superblock (index 0), make sure it is "bobfs"
- questions, does inode represent file, directory, how many bytes are there in the file, how many things point to the inode
every system has certain end of file activty

size of i-table

consider block size of 1kb on disk we have

super lbock
data bitmap
inode bitmap
itable
data

number of inode = $2^10$*8 = 2^13 = 8kb sinode bytes per itable = 2^13 * 16 = 2^17 = 128k blocks per itable = 2^17/2^10 = 2^7 = 128 blocks number of data blocks ≤ 8k max data ≤ 8k (data blocks) * 1kb (block size) = 8MB can store 8k - 1 files since we have to reserve one for a the root

what is the largst file you can have?
- depends on levels of indirection, we only have one
- in a our first indirect block we can have 256 more blocks, plus the single direct block
  - we can have 257 blocks representing a signle file
- largest possible file = 257 blocks = 257 kb
- total space = 257k data + 1k indirection block + 16 bytes for inode + 258 bits for data bitmap + 1 bit for inode bitmap
  - actual data
    - 257k data
  - over head
    - 1k indirection block + 16 bytes for inode + 258 bits for data bitmap + 1 bit for inode bitmap
i nodes don't have names, don't know what their name is when inside the block
linux only allows one link going into files
- directories can only have one name so .. can simply list parent by name
- some entries in directories contain files while others point to other directories
files
- can have as many links to them as they want

devices are access by mount <what> (/dev/sdl) <where> (/volume/backup)

Two types of devices
- block devices
- character device
- networks are kind of another, kinda like a character device
- graphics have their own special behavior too

i node

there are different types

special node
- needs to know how to lie about file system operation
- store major and minor number inside special i-node
  - major number represents the device you represent
  - minor number

Multiple groups

order of design
- super block
- group descriptor
- block bitmap
- inode bitmap
- inode
- block
i-node
- mode (2 bytes)
  - first 4 bytes, file directory, link, BLK, CHAR, socket
  - last 12, permission
- owner (2 bytes)
  - user will have UID
- owner GID (2 bytes)
- size
- block 0- 11
- one level
- two level
- three level
- gen
- a time ( 4 bytes ), data
  - last time accessed
- m time ( 4 bytes ), data
  - last time modified
- c time, i node
  - ?

Process

read, write, 9 bits, 3 apply to user, 3 apply to group, 3 apply to others

user
- rwx
group
- rwx
others
- rwx
3 more bits
- set uid, can run with privileges of uid given.
- sticky bit was a hack at first
  - tells the kernel that something should stick in memory (usually runs a lot) access control list - list of actors/processes allowed to modify resources/ passive entities
fd is index into file descriptor list int fd = open(name, flags, node) fd represents the "capability box" which has to be signed

general form of capabilities

who
what
whom
when
where

f = m^p mod n. signature is f.

entries

name and a correspodning number
ability to read something is the ability to read these entries

Distance from hardware

Python vs C. C is not safe because classes can stop being classes etc.

Rings
- Hw
- Ring 0
  - supervisor, kernel
    - trap, interrupt (software interrupts), faults, aborts
    - TSS - task state segment
      - stack will have 3 more pointers to 3 other stacks
    - TSSR points to the TSS (TSSR is protected)
    - IDTR points to IDT
      - who ever writes the system says where to point to when you have a page fault
        
        it is exactly like signals at the hardware level
- ring 1 (CPL 1)
- Ring 2 (CPL 2)
  - Python runtime
- ring 3 (CPL 3) Whenever there is an iterrupt, the harware decides whetehr its witching stacks. and if so it saves all the stack information (whichi si two parts, area where stack lives). NOw its safe to switch stacks since we know its safe to switch. Nothing is pushed to the user stack, it is not trusted.
- EIP (3)
- CS (3)
- EFLAGS (3)
- ESP (3)
- SS (3) Trap and interrupt are very similar in structure.

IDK

there are virtual machines and containers.

Containers grab a process and tell the machine that they are actually a different operating system.
can create a jail, so that we make a seperate file system specific to a single process within a file system
- command called chroot in unix
- once in jail, cannot get out
- problem is that PIDs are system wide
  - keep mapping inside kernel

process dfines an adress space, that is the virtual memory for that process process is also something that runs, so it better have a thread that executes. and that thread should know who it belongs to what ever reosuces that belong to the process, semaphore, etc. this is where file descriptors would go etc. the process class better tie all of this together

user level process is similar to kernel level thread - also ask yourself where it gets the stack from?

TODO

Read about phantom references check page 872 783 - mkdir, rmdir, link, unlink ( read this ) iret

michaelasper/notes.md