class SpinLock{
Atomic<bool> isTaken = false;
public:
lock(){
while(isTaken.exchange(true));
// exchange returns value at location
}
unlock(){
isTaken.set(false);
}
}
If isTaken
is true to start with you will just an infinite loop. If initially false, and is changed to true and that is returned, we know we are allowed to go and do work.
class Semaphore {
uint32_t count;
Queue<Thread> blocked
public:
Semaphore (uint32_t count) : count(count), blocked(new Queue<Thread>());
void down(){
if (count == 0){
blocked.add(Thread::current);
Thread::yield(); // don't add self to readyQ
}else{
count--; // being atomic would not help much
}
}
void up(){}
}
You cannot surround down()
with a lock i.e.
lock.lock(){
}lock.unlock
System would spin, and there would be no way to unlock the lock. Semaphore would be done with. Must build mechanism to release the Semaphore
//down:
lock();
if(count == 0){
blocked.add(me);
unlock();
}
Thread A is running the former. Around the same time Thread B (runs the latter) gets spinlock and tries to help Thread A onto the readyQ
//up:
lock();
if(someoneIsWaiting){
readyQ->add(someone);
}
- What if someone changes
count
while a thread is trying to block.- Run yield before unlock, and only unlock if yield is successful
- Add a new DND state that tells other threads not to bother you. Allowing you to finish your job
Create idleThread for cores to run when they have nothing to do.
if(first){
// .
// .
// .
kernelMain();
}else{
while(true)Thread::yield();
}
currentThread
using PerCPU
Figuring out who you are using SMP::me()
- Consider multiple threads running on server, producing data
- Called Producers
- Consumers, or workers
- pick up what the produceres output and do work on given data
- Called the Producer/Consumer problem
- pick up what the produceres output and do work on given data
- Any data structure that does not forget, like a Queue, sits between them
- Keep it atomic
- Put backpressure on Producers (flow control)
- When consumers cannot handle work they tell Producers to
Stop It
- Have to be able to tell Producers to wait if the finite-queue is full
- When consumers cannot handle work they tell Producers to
Can handle the Q and telling the sender a status
template<typename T, int N> //N is # of elements in buffer
class BB{
T data[N];
void put(T e){ ... }
T get(){ ... }
}
- Both
put()
andget()
are able to block- In the scenario that the buffer is full on
put()
, or if the buffer is empty onget()
- Must have two flags instead of semaphore to say whether semaphore is empty/full
- In the scenario that the buffer is full on
class B <N>{
Semaphore nEmpty = N;
Semaphore nFull = 0;
Semaphore lock; // just used for protection of data structure, could also be a spin
put(T e){
nEmpty.down(); // if nEmpty is 0, then you block
lock.down();
// need an atomic Q
// .
// . compete over it
// .
lock.up();
nFull.up();
}
T get(){
nFull.down();
//compete over the Q
nEmpty.up();
}
}
By competing over the semaphore, we know that the right number of threads are allowed in.
lock.down();
// need an atomic Q
// .
// . compete over it
// .
lock.up();
This is simply a thread safe Q
//put:
lock();
nEmpty.down();
//.
//.
//.
nFull.up();
unlock();
//get:
lock();
nFull.down();
//.
//.
//.
nEmpty.up();
unlock();
By swapping pairs of commands, we can be deadlock free
Barrier(int n) : n(n){}
void sync(){
//up to n - 1 threads can sync
//nth thread makes every other thread wake up
n = n - 1;
if(n.add_fetch(-1) == 0){
wait.up();
}else{
wait.down();
wait.up();
}// add check to see if less than 0 for invariant
}
Monitor_bounded_buffer{
condition queueIsNotFull;
condition queueIsNotEmpty;
//gurantees mutual exclusion
void put(T v){
[lock]
while(queueIsFull()){
[unlock]
wait queueIsNotFull; // wait is a keyword
[lock]
// condition is pulse not a state. have to wait for the signal
// once out of wait we know q is not full
}
signal queueIsNotEmpty;
}
T get(){
if(queueIsEmpty()){
wait queueIsNotEmpty;
}
<<is not empty>>
// do stuff to get from queue
<<isNotFull>>
// q cannot be full since something is removed
signal queueIsNotFull;
// hand off lock
// what if I want to continue doing work here
}
}
bounded_buffer{
synchronized void put(){}
synchronized void get(){}
}
- Have to say synchronized
- Would be a good alternative that upon wake up you hand off the lock
- When signaled, the lock is given away to other monitor so no one can sneak in and change the state
- When singaling, we expect the other monitor to check the state of the Queue
- If want to work after I signal
- just make it illegal to work after the signal
- actually very difficult to atomically unlock and then wait
- if not atomic there will be race conditions
- Nested Monitor Problem
For every data structure we have to ask ourselves:
- is it global
- per per thread
- per CPU Try to make everything per thread.
- Examine every data structure in P4 and check what scope it can have
- active thread ptr, should be per core
- use PerCpu to get
- Figure out locking mechanism
- Find shortest time needed to hold the lock, otherwise race conditions will arise
if(readyQ.isEmpty()){
// stuff
}
Need lock to preserve invariants. Want to presrve the fact that the Q is empty while dealing with the Q (that is supposed to be empty)
//down:
spin.lock(); // lock should be of the same scope as the data structure
if(count == 0){
waitingForMe.add(Thread::current());
// cannot unlock here because thread could be picked up before contextSwitch happens
// set state that youre in the state of blocking
yieldButDontReady(); // without unlock its a deadlock
}
//up:
spin.lock();
Ask next thread on readyQ to unlock for you
- Use the callback class
Dining philosophers problem Reason for deadlocks
- Mutual exclusion
- avoid this to never have deadlocks
- Things don't exclude each other from getting work done
- Hold and wait
- Someone holds a resource and prevents others from using it
- never grab more than one thing at a time. Set one lock at a time build hierarchical locks
- Non-preemption
- notion of transactions so that there is some history of some transaction
- 2 phase locking
- get all locks up front.
- You tell me all the locks you want, in any order you like. Go through phase where we get all your resources. While doing this, if system detecs a deadlock, it can take away resources to give to someone else. Then later these resources can be given back.
- Cycles
- cyclical waiting
- change natural order and remove one
If we construct of resource graph, and we have a cycle, we immediately have a deadlock.
Have to start implementing tech for recovert
monitor BB{
condition c1;
condition c2;
/* cannot tough condition from outside */
void m1(){
}
void m2(){
if(notHappy) return;
}
}
class Barrier{
Atomic<uint32_t> ctr;
Semaphore wait;
public:
Barrier(uint32_t ctr): ctr(ctr), wait(0){
}
void sync(){
if(ctr.add_fetch(-1) == 0 ){
// terminal state
wait.up();
}else{
wait.down();
wait.up();
}
}
}
monitor Barrier{
condition allHere;
uint32_t ctr;
void sync(){
if(ctr == 0) return;
ctr--;
if(ctr == 0){
signal allHere;
}else{
while(ctr != 0){
wait allHere;
}
signal allHere;
}
}
}
sem2.down()l
bool isSecond = firstIsHere(true);
if(isSecond){
y = data;
secondIsHere.up();
xIsReady.down();
return x;
}else{
x = data;
xIsReady.up()
secondIsHere.down();
T temp = y
sem2.up();
return y;
}
- Program
- file with executable things in it
- file
- bytes that belong together that can be written to and read from
- executable
- has meta-data that describes the contents of the file
- magic number
- headers that show what needs to be loaded from memory
- most linux exec.'s are ELF files
- data
- instructions
- has meta-data that describes the contents of the file
- after being read in, executable is converted to a process
- process is a program loaded in memory
- give it a thread
- set address space
- n/a
- fork
- creates copy of process
- copy will have a different pid
- when copy returns it returns again (since it is a copy of running process)
- ultimately they run independently
- has ppid (parent pid)
- fork
int id = fork();
if (id < 0){
/* parent,
no child, failed */
printf("%d\n", errno);
}else if(id == 0){
/* child */
}else{
/* parent */
/* id is the child pid */
}
start running new program from entry point exec???("gcc",...)
wait(...) waitpid(. ) - parent waits (join)
fork exec wait ( family ) kill alarm ( read about these ) open (files ) creat
-o -e: fork, open files, change stdout/err,
We have managed to get processes to run in parallel etc. Now what we want to do is
- Give the illusion that they are using the full address space
- Also want to isolate them from other processes
- That there is more virtual than physical memory
- give 64MB but make it seem that there is 4GB
- less virtual memory than physical
- demand paging
- Only bring things into memory when they are requested.
- Don't have memory so will have to kick someone out
- protection
- mechanism that allows reading but not writing
- optimize program loading
- thanks to demand paging, don't have to load the whole thing
- copy-on-write (COW)
Handles the memory mapping for different processes Takes an address (virtual address) and outputs an address (physical address)
- base register is added to virtual address and creates physical address
- this is a simple memory management method
- depending on the process running, you put different values in the reigster
- doesn't achieve isolation
- typical solution is to add a limit register which specifies the highest limit on how much memory one can use
- have a compare circuit that checks the virtual address with the limit. if VA is greater than limit we throw a segmentation violation
- cannot do demand paging though
- When the system resets, the "trustMeImTheKernel" register is set to 1
- This needs to be set to 0 right before a user program called
- Hardware has to implement switch of this register on call to kernel
- In x86 there is a CPL, which is a number from 0-4
Cannot have some number of registers that have multiple bases
- What if we just split up the address space into equal sized blocks of 1 MB
- make pages 1000 B
- VA = 1700
- virtual page # (VPN) = VA / page size = 1700 / 1000 = 1
- By restricting page size to a power of two, all you have to do is shift by
$\log_2(\text{pagesize})$ - Now how do we get the physical page number?
- By restricting page size to a power of two, all you have to do is shift by
- (PPN, ok) = MMU(VPN) for some function in the MMU
- "ok" tells you whether there is an actual corresponding page number
(ppn, ok) = pageTable[vpn] // with multi level (ppn,ok) = PD[l1][l2] if(ok){ pa = ppn * pagesize + offset; }else{ pageFault }
- PA = PPN * pageSize + offset
- in this case... offset = VA % pageSize
- All these calculations are done by the hardware
- in x86 the head of the multi level tables is called the page directory (PD)
- children are called page table (PT)
- the whole thing is also called the page table (PT)
- for context switching, a register is dedicated to pointing to the page directory
- this register is called CR3
- PDE is a page directory entry (x86)
- contains PPN of following PT
- stored in high order 20 bits
- starting at the least significant bit:
- present bit. 0 = invalid, 1 = valid
- RW, read/write, 0 = read-only, 1 = write
- US, user supervisor, 0 = supervisor (CPL == 0 [most privileged process]), 1 = user
- D, dirty, if a program is writing to this PDE hardware will set the dirty bit.
- A, accessed, 0 = has not been accessed, 1 = has been accessed
- Redirects us to a PTE, this has the same flags as the PDE
Cannot have a table with an entry per memory address. So we increase the page size; however, now we increase the page size.
- Physical pages
- AKA
- page frames
- frames
- physical frames The table index is representative of the VPN, while the returned value within the table is the PPN Don't use magic values for things being empty etc.
- AKA
- TLB
- translation look-aside buffer
- cache of finite size. while traversing page table (walking), answers are put in the TLB
- TLB holds the VPN & PPN (along with extra bits)
- every core has its own TLB
- only way virtual memory actually works
- TLB is not aware of modification to the Virtual Memory
- x86 has INVLPG instruction which clears TLB page address
- TLB shootdown, have to go invalidate other TLB if there are multi processors
Need to give it some address space - give it a tree
- Have to have a loader
- The thing that reads ELF file
- This action sits behind exec
* is a just an array of sectors
* used to be 512 byte sectors
* in modenr times the sectors are 4k
- Put files in a tree
- Files in this hierarchy have permissions (and other things)
- files are first created in a sector, where the last 4 bytes are a pointer to the next sector containing more data
- the super block has information about where the information about the file system is
- This is usually stored at the beginning of the file system
- meta data
- file name
- starting block
- size
- permissions
- type
- directory
- file Similar to virtual memory. The offeset in the beginnin gets you into the file sytem to get the first sector of the file
- mode
- type
- regular
- directory
- size
pointer to some block 0, for a filedirect block- if overflow use indirect block
- after that we have an even deeper one (3 levels of inderection)
- block that points to block that points data block
-
has super block, needs place to store indoes and data blocks
- hence there exists an i-table, which is full of inodes
- the index of the inode will tell you which one it is in the table
- i.e. the inodes are acutally stored in the itable, there aren't any pointers there or anthing
-
often one will run out of inodes, even though there is a lot of storage space left
-
file system memory
- super block
- the
following datasuper block and following data is called a group and repeats multiple times
- the
- to know where there are entries in the i table we have a i-node bitmap
- the bitmap is at the beginning of the filesystem somehwere right arond where the super block is
- data bitmap
- is after the i-node bitmap
- super block
-
IDE
- integrated device electronicse
-
mounting a filesystem is similar to opening a file
- mount opens the device, looks for superblock (index 0), make sure it is "bobfs"
- questions, does inode represent file, directory, how many bytes are there in the file, how many things point to the inode
-
every system has certain end of file activty
consider block size of 1kb on disk we have
- super lbock
- data bitmap
- inode bitmap
- itable
- data
number of inode =
- what is the largst file you can have?
- depends on levels of indirection, we only have one
- in a our first indirect block we can have 256 more blocks, plus the single direct block
- we can have 257 blocks representing a signle file
- largest possible file = 257 blocks = 257 kb
- total space = 257k data + 1k indirection block + 16 bytes for inode + 258 bits for data bitmap + 1 bit for inode bitmap
- actual data
- 257k data
- over head
- 1k indirection block + 16 bytes for inode + 258 bits for data bitmap + 1 bit for inode bitmap
- actual data
- i nodes don't have names, don't know what their name is when inside the block
- linux only allows one link going into files
- directories can only have one name so .. can simply list parent by name
- some entries in directories contain files while others point to other directories
- files
- can have as many links to them as they want
devices are access by mount <what> (/dev/sdl) <where> (/volume/backup)
- Two types of devices
- block devices
- character device
- networks are kind of another, kinda like a character device
- graphics have their own special behavior too
there are different types
- special node
- needs to know how to lie about file system operation
- store major and minor number inside special i-node
- major number represents the device you represent
- minor number
- order of design
- super block
- group descriptor
- block bitmap
- inode bitmap
- inode
- block
- i-node
- mode (2 bytes)
- first 4 bytes, file directory, link, BLK, CHAR, socket
- last 12, permission
- owner (2 bytes)
- user will have UID
- owner GID (2 bytes)
- size
- block 0- 11
- one level
- two level
- three level
- gen
- a time ( 4 bytes ), data
- last time accessed
- m time ( 4 bytes ), data
- last time modified
- c time, i node
- ?
- mode (2 bytes)
read, write, 9 bits, 3 apply to user, 3 apply to group, 3 apply to others
- user
- rwx
- group
- rwx
- others
- rwx
- 3 more bits
- set uid, can run with privileges of uid given.
- sticky bit was a hack at first
- tells the kernel that something should stick in memory (usually runs a lot) access control list - list of actors/processes allowed to modify resources/ passive entities
- fd is index into file descriptor list
int fd = open(name, flags, node)
fd represents the "capability box" which has to be signed
general form of capabilities
- who
- what
- whom
- when
- where
f = m^p mod n. signature is f.
entries
- name and a correspodning number
- ability to read something is the ability to read these entries
Python vs C. C is not safe because classes can stop being classes etc.
- Rings
- Hw
- Ring 0
- supervisor, kernel
- trap, interrupt (software interrupts), faults, aborts
- TSS - task state segment
- stack will have 3 more pointers to 3 other stacks
- TSSR points to the TSS (TSSR is protected)
- IDTR points to IDT
- who ever writes the system says where to point to when you have a page fault
- it is exactly like signals at the hardware level
- who ever writes the system says where to point to when you have a page fault
- supervisor, kernel
- ring 1 (CPL 1)
- Ring 2 (CPL 2)
- Python runtime
- ring 3 (CPL 3) Whenever there is an iterrupt, the harware decides whetehr its witching stacks. and if so it saves all the stack information (whichi si two parts, area where stack lives). NOw its safe to switch stacks since we know its safe to switch. Nothing is pushed to the user stack, it is not trusted.
- EIP (3)
- CS (3)
- EFLAGS (3)
- ESP (3)
- SS (3) Trap and interrupt are very similar in structure.
there are virtual machines and containers.
- Containers grab a process and tell the machine that they are actually a different operating system.
- can create a jail, so that we make a seperate file system specific to a single process within a file system
- command called
chroot
in unix - once in jail, cannot get out
- problem is that PIDs are system wide
- keep mapping inside kernel
- command called
process dfines an adress space, that is the virtual memory for that process process is also something that runs, so it better have a thread that executes. and that thread should know who it belongs to what ever reosuces that belong to the process, semaphore, etc. this is where file descriptors would go etc. the process class better tie all of this together
user level process is similar to kernel level thread - also ask yourself where it gets the stack from?
Read about phantom references check page 872 783 - mkdir, rmdir, link, unlink ( read this ) iret
thanks to marc