Skip to content

Instantly share code, notes, and snippets.

@mishudark
Created November 8, 2017 14:02
Show Gist options
  • Save mishudark/7cb6d8b2ff9af749244120d880477479 to your computer and use it in GitHub Desktop.
Save mishudark/7cb6d8b2ff9af749244120d880477479 to your computer and use it in GitHub Desktop.
redred
Especially in things like Ryzen where every core can talk to any other core arbitrarily
09:58 travisg
ryzen is not special at all
travisg
it's the definition of a symmetric multiprocessing system: each cpu is equal
redred
Really? AFAIK with intel core to core communication goes through a ringbus
developerhdf
travisg: So the real upside though is security and being able to update an os separately from its drivers? ie reduce fragmentation and increase the lifespan of devices?
travisg
yes, but the itnerconnect between the cpus is hidden to software
travisg
except that there may be slightly higher penalties to move data between certain cores. ryzen is no different, since it puts things in clusters of 4 cores
redred
True but would the latency still not be a big issue?
travisg
that's called cpu topology
travisg
it absolutely is a big issue
travisg
i've been spending lots of time thinking about it
redred
travisg, Wouldn't chosing what cores each process lands on to optimize inter-core communication be the task of the govener
travisg
yep
travisg
that is precisely what i've been spending a lot of time on
travisg
or at least brain cycles on
travisg
it's a tough problem to solve, especially in a highly asynchronous IPC system
redred
So in thoery it would be a matter of writing a govener that can fairly and efficiently pack processes onto core clusters
travisg
yep
travisg
that's a major component of the scheduler's task: finding the best place to run threads
redred
Just how in this case?
redred
It's not just A<->B, it's now A<->B<->C<->
travisg
well, it's a little hard to just describe in af ew sentences on irc
travisg
if it were that easy...
redred
yea
developerhdf
travisg: does the sel4 microkernel have the same shortcomings as minix?
redred
Theoretically couldn't you abstract away the idea of hardware using a microkernel?
travisg
developerhdf: not a clue.
redred
So like what we do with the cloud at a lower layer
travisg
redred: i'm not sure what that means
redred
So instead of saying this server has 4 cores 8 threads and the next one is the same you could theoretically have them work as a seamless cluster
redred
So you would have a resource pool with 8 cores 16 threads
travisg
sure
travisg
in fact most code does not care what it's running on
travisg
it simply uses the api to communicate with services in other processes
travisg
the hardware it's running on is largely irrelevant
redred
But couldn't we truly abstract the idea further kinda like what plan9 wants
travisg
yep. and there's lots and lots of that in the fuchsia design
robtsuk has left IRC (Ping timeout: 264 seconds)
redred
So in the event a system starts to show signs of failure we can seamlessly move a running process
travisg
it's very abstract with services and who provides it, etc
travisg
what fuchsia does *not* do is attempt to be a clustering OS
redred
yea
travisg
it does not have built in support for transparently moving processes between kernels
robtsuk has joined (sid179084@gateway/web/irccloud.com/x-unxacvedhvgwtphu)
redred
Couldn't that be handled by a driver?
travisg
if it were that simple...
travisg
proxying services over a network, sure. but transparently migrating processes is a completely different can of worms
redred
lets pretend all the hard parts where abstracted away
developerhdf
travisg: I hope fuchsia pans out in the future. I'd like to run a Google os on my pc. Hopefully at that point my "Android" phone would also receive updates for longer than just two years :D
redred
Theoretically you could write a driver to handle that task without too much work on the kernel it's self?
redred
developerhdf, don't remind me.
travisg
pretenting the hard parts go away when it's my job to make the hard parts work is a bit of a stretch
travisg
redred: negative
redred
I'm trapped on 7.1.1
travisg
moving processes between machines would require substantial kernel assist
redred
:|
travisg
perhaps it's actually doable, but with the way we do shared memory and whatnot, it'd be fairly complex
redred
travisg, Any Google TPU driver support planned?
travisg
i have no idea
travisg
but back to the migrating processes thing. that's already thinking about things in a very process centric way
developerhdf
redred: If HMD keeps their promise, hopefully my nokia Android Nougat device will get Orea and P
redred
That would be interesting to see, Fuchsia be used in places where AI acceleration is required
travisg
if you look at how the fuchsia ui modular framework works, i think that's already a level too deep
travisg
ie, what is a process? why can't you just start a new one somewhere else and move the data to it
travisg
ie, doesn't need to be a kernel level operation, necessarily. the modular framework uses processes a tool, but not as the sort of identity of the user experience
raggi
Process migration definitely is not a good solution to that general problem area
raggi
There are platforms that do this already, but it comes with significant application side effects
travisg
right. my reptilian kernel brain sees the problem as really get the data to the user, wherever the user is
travisg
if they walk over to another computer or display or whatnot, the data should be there too
raggi
On gce we migrate whole vms between machines, which is much easier than migrating processes
travisg
and that has very little to do with kernel level support, that's very much a higher level concept
raggi
And I've debugged a ton of issues on both the kernel and app side of that, with most on the app side
raggi
Yep
travisg
yah migrating VMs is complex enough, though you can see how it's manageable. with our design it'd be fairly doable to move a process's handle table over, and figure out how to proxy ipc and whatnot (at whatever significant cost) but the way we pass around vmos, i dunno how you'd even go about that
raggi
Shared vmos being hell
redred
raggi, I've tried to understand Google's cloud
redred
I just can't.
raggi
Networks just can't keep up with page dirty rates
raggi
When gce does migrations, if the page dirty rate is too high, the VM gets throttled until the dirty rate drops below available migration bandwidth
travisg
even clustering systems like VMS i think had some fair amoutn of limitations
raggi
There's also a cap, where it'll just halt the vm
travisg
it was more like a very tightly integrated shared user context/file system/etc built into the OS
redred
So
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment