Linux Kernel Module Cheat

Getting started

Each child section describes a possible different setup for this repo. │(gdb) c [ 1.619187] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 │Continuing. ….

So you are now officially a Linux kernel hacker, way to go! Such failures are however unlikely, and you should be fine if you don’t see anything weird happening. Seriously though, if you want to be a real hardware hacker, it just can’t be done with open source tools as of 2018. The inline assembly is disabled with an #ifdef, so first modify the source to enable that. This is not a valid part of the generated Bash command however. See [gem5-vs-qemu] for a more thorough comparison. This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it.
If it has not been started previously, start it. This can also be done explicitly with:

./run-docker start

+ Quit the shell as usual with Ctrl-D

+ This can be called multiple times from different host terminals to open multiple shells.
This might save a bit of CPU and RAM once you stop working on this project, but it should not be a lot. * ./run-docker DESTROY: delete the container and image. We install tmux by default in the container. The major blocking point is how to avoid distributing the kernel images twice: once for gem5 which uses vmlinux, and once for QEMU which uses arch/* images, see also: cirosantilli/linux-kernel-module-cheat#79 [vmlinux-vs-bzimage-vs-zimage-vs-image].

This setup might be good enough for those developing simulators, as that requires less image modification. A full kernel build would also work however. …. ./build-modules --host

Compilation will likely fail for some modules because of kernel or toolchain differences that we can't control on the host. However, we soon started to notice that this had an increasing overlap with other userland test repositories: we were duplicating build and test infrastructure and even some examples. S[]:

/run --arch aarch64 --baremetal userland/arch/aarch64/add. Before v4.17, the symbol name was just sys_write, the change happened at d5a00528b58cdb2c71206e18bd021e34c4eab878. As of Linux v 4.19, the function is called sys_write in arm, and __arm64_sys_write in aarch64. === tmux

tmux just makes things even more fun by allowing us to see both the terminal for:

emulator stdout
[gdb]

at once without dragging windows around! We have working methods, but they are not ideal. Do a fresh boot and get the module:

./run --eval-after './pr_debug.sh;insmod fops.ko;./linux/poweroff.out'

The boot must be fresh, because the load address changes every time we insert, even after removing previous modules. This could be potentially very convenient. Both call do_module_init however, which is what lx-symbols hooks to. ===== GDB module_init add trap instruction

This is another possibility: we could modify the module source by adding a trap instruction of some kind. If you get a failure before that, it will be hard to see the print messages. This seems to be the case for gem5 as explained at: [gem5-host-to-guest-networking] * cannot see the start of the init process easily * gdbserver alters the working of the kernel, and makes your run less representative

Known limitations of direct userland debugging:

the kernel might switch context to another process or to the kernel itself e.g. on a system call, and then TODO confirm the PIC would go to weird places and source code would be missing.
Solutions to this are being researched at: [lx-ps].
TODO step into shared libraries. ….

since GDB does not know that libc is loaded. >>> call fdget_pos(fd) No symbol "fdget_pos" in current context. >>> b fdget_pos Breakpoint 3 at 0xffffffff811615e3: fdget_pos. ( So now we can set breakpoints and continue as usual. And now in GDB we do the usual:

break __x64_sys_write
continue
continue
continue
continue

And now you can count from KGDB! If you do: break __x64_sys_write immediately after ./run-gdb --kgdb, it fails with KGDB: BP remove failed: <address>. Advantage over KGDB: you can do everything in one serial. This can actually be important if you only have one serial for both shell and . Disadvantage: not as much functionality as GDB, especially when you use Python scripts. After boot finishes, run the usual:

./count.sh &
./kgdb.sh

And you are back in KDB. ==== KDB ARM

TODO neither arm and aarch64 are working as of 1cd1e58b023791606498ca509256cc48e95e4f5b + 1. == gdbserver

Step debug userland processes to understand how they are talking to the kernel. If you want to revive and maintain it, send a pull request. The init process is then responsible for setting up the entire userland (or destroying everything when you want to have fun). systemd provides a "popular" init implementation for desktop distros as of 2017. But this fails when we are init itself! Except that --eval-after is smarter and uses base64 encoding. This can be observed with:

cat /proc/$$/cmdline

where $$ is the PID of the shell itself: https://stackoverflow.com/questions/21063765/get-pid-in-shell-bash

Bibliography: https://unix.stackexchange.com/questions/176027/ash-profile-configuration-file

initrd

The kernel can boot from an CPIO file, which is a directory serialization format much like tar: https://superuser.com/questions/343915/tar-vs-cpio-what-is-the-difference

The bootloader, which for us is provided by QEMU itself, is then configured to put that CPIO into memory, and tell the kernel that it is there. Try removing that -initrd option to watch the kernel panic without rootfs at the end of boot. https://unix.stackexchange.com/questions/89923/how-does-linux-load-the-initrd-image

initrd in desktop distros

Most modern desktop distributions have an initrd in their root disk to do early setup. Don’t forget that to stop using initramfs, you must rebuild the kernel without --initramfs to get rid of the attached CPIO image:

./build-linux
./run

Alternatively, consider using [linux-kernel-build-variants] if you need to switch between initramfs and non initramfs often:

./build-buildroot --initramfs
./build-linux --initramfs --linux-build-id initramfs
./run --initramfs --linux-build-id

Setting up initramfs is very easy: our scripts just set CONFIG_INITRAMFS_SOURCE to point to the CPIO path. In real hardware, those components are also often provided separately. So observing the device tree from the guest allows to easily see what the emulator has generated. …. cpu@0 { cpu@1 {

The action seems to be happening at: `hw/arm/virt.c`. This means that this will likely only work for x86 guests since almost all development machines are x86 nowadays. We can test KVM on arm by running this repository inside an Ubuntu arm QEMU VM. See: xref:test-this-repo[xrefstyle=full] for more useful testing tips. Both gem5 and QEMU however allow setting the reported `uname` version from the command line, which we do to always match our toolchain. <<user-mode-static-executables>> also work. We pass `-L` by default, so everything just works. === syscall emulation mode program stdin

The following work on both QEMU and gem5 as of LKMC 99d6bc6bc19d4c7f62b172643be95d9c43c26145 + 1. Many of those are trivial to implement however. Instead, it just outputs a message to stdout just like for <<m5-fail>>:

Simulated exit code not 0! __

finit is newer and was added only in v3.8. ==== module-init-tools

Name of a predecessor set of tools. As the name suggests, OverlayFS allows you to merge multiple directories into one. If the underlying filesystem is changed, the behavior of the overlay is undefined, though it will not result in a crash or deadlock. You would have to leave chroot, remount, then come back. But should be doable with some automation. This is different from a display screen, where each character is a bunch of pixels, and it would be much harder to convert that into actual terminal text. Text mode has the following limitations over graphics mode:

you can’t see graphics such as those produced by [x11]
very early kernel messages such as early console in extract_kernel only show on the GUI, since at such early stages, not even the serial has been setup. ===== QEMU graphic mode arm VGA

TODO: how to use VGA on ARM? TODO could not get it working on x86_64, only ARM. The key option to enable support in Linux is DRM_MALI_DISPLAY=y which we enable at linux_config/display.

Build the kernel exactly as for [graphic-mode-gem5-aarch64] and then run with:

./run --arch aarch64 --dp650 --emulator gem5 --linux-build-id gem5-v4.15

gem5 graphic mode internals

We cannot use mainline Linux because the [gem5-arm-linux-kernel-patches] are required at least to provide the CONFIG_DRM_VIRT_ENCODER option. [ 0.066252] [drm] No driver support for vblank timestamp query. It would be cool to minimize it out to better understand the options. Build and run:

./build-buildroot --config-fragment buildroot_config/x11
./run --graphic

Inside QEMU:

startx

And then from the GUI you can start exciting graphical programs such as:

xcalc
xeyes

Outcome: Figure 1, “X11 Buildroot graphical user interface screenshot”

Figure 1. X11 Buildroot graphical user interface screenshot

We don’t build X11 by default because it takes a considerable amount of time (about 20%), and is not expected to be used by most users: you need to pass the -x flag to enable it. Needs bisection, on whatever commit last touched x11 stuff. We do have BR2_PACKAGE_XDRIVER_XF86_INPUT_MOUSE=y. * xf86-video-fbdev should work as well, but we need to make sure fbdev is enabled, and maybe add some line to the Xorg.conf

Networking

Enable networking

We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: [resource-tradeoff-guidelines]

To enable networking on Buildroot, simply run:

ifup -a

That command goes over all (-a) the interfaces in /etc/network/interfaces and brings them up. * QEMU implements 9P natively, which makes it very stable and convenient, and must mean it is a simpler protocol than NFS as one would expect. ==== 9P getting started

As usual, we have already set everything up for you. [9p] is better with emulation, but let’s just get this working for fun. The userland, however, should simply not break, as Linus enforces strict backwards compatibility of userland interfaces. This backwards compatibility is just awesome, it makes getting and running the latest master painless. Before comitting, don’t forget to update:

the linux_kernel_version constant in common.py
the tagline of this repository on:
- this README
- the GitHub project description

Downgrade the Linux kernel

The kernel is not forward compatible, however, so downgrading the Linux kernel requires downgrading the userland too to the latest Buildroot branch that supports it. You should then look up if there is a branch that supports that kernel. We don’t expect those changes to be very difficult. ==== norandmaps

Disable userland address space randomization. IOW passing "quiet" will be the equivalent of passing "loglevel=<CONSOLE_LOGLEVEL_QUIET>"

which explains the useless reason why that number is special. First get <<kernel-modules-buildroot-package>> working. It
 * disables preemption so be careful if you intend to use it for long periods
 * of time. SyS_finit_module+0xa8/0xb0
 SyS_finit_module+0xa8/0xb0
 do_syscall_64+0x6f/0x310
 ? ====== Exit QEMU on panic

Enabled by default with:

* `panic=-1` command line option which reboots the kernel immediately on panic, see: xref:reboot-on-panic[xrefstyle=full]
* QEMU `-no-reboot`, which makes QEMU exit when the guest tries to reboot

Also asked at https://unix.stackexchange.com/questions/443017/can-i-make-qemu-exit-with-failure-on-kernel-panic which also mentions the x86_64 `-device pvpanic`, but I don't see much advantage to it. One possibility that gets close would be to use <<gdb>> to break at the `panic` function, and then send a <<qemu-monitor-from-gdb>> `quit` command if that happens, but I don't see a way to exit with non-zero status to indicate error. It parses kernel symbols and detecting when the PC reaches the address of the `panic` function. At gem5 ff52563a214c71fcd1e21e9f00ad839612032e3b (July 2018) behaviour was different, and just exited 0: https://www.mail-archive.com/[email protected]/msg15870.html TODO find fixing commit. `0` to disable, `-1` to reboot immediately. ....

I wasn't able do disable `CONFIG_KALLSYMS` to test this this out however, it is being selected by some other option? TODO add a sample panic error message for each error type:

* https://askubuntu.com/questions/41930/kernel-panic-not-syncing-vfs-unable-to-mount-root-fs-on-unknown-block0-0/1048477#1048477

This is the diagnosis procedure:

* does the filesystem appear on the list of filesystems? If not, then likely you are missing either:
** the driver for that hardware type, e.g. hard drive / SSD type. Just mount, set <<file-operations>>, and we are done. It is similar to a <<seq-file>> file operation, except that write is also implemented. Make a more complex example that shows what they can do. A single driver can drive multiple compatible devices. The `seq_file` API makes the process much easier for those trivial cases:

/seq_file.sh

echo $? Sources:

Typically, we are waiting for some hardware to make some piece of data available available to the kernel. man ioctl documents:

Usually, on success zero is returned. ….

Outcome: the test passes:
0
Sources:

kernel_modules/netlink.c

lkmc/netlink.h

userland/kernel_modules/netlink.c

rootfs_overlay/lkmc/netlink.sh

Launch multiple user requests in parallel to stress our socket:
insmod netlink.ko sleep=1
for i in `seq 16`; do ./netlink.out & done
TODO: what is the advantage over read, write and poll? Bibliography:

https://stackoverflow.com/questions/15994603/how-to-sleep-in-the-linux-kernel/44153288#44153288

https://github.com/torvalds/linux/blob/v4.17/Documentation/timers/timers-howto.txt

==== Workqueues

A more convenient front-end for [kthread]:
insmod workqueue_cheat.ko
Outcome: count from 0 to 9 infinitely many times

Stop counting:
rmmod workqueue_cheat
Source: kernel_modules/workqueue_cheat.c

The workqueue thread is killed after the worker function returns. Therefore they produce more accurate timing than thread scheduling, which is more complex, but you can’t do too much work inside of them. We could not find however to write to memory from the QEMU monitor, boring. * soft-dirty: TODO * file/shared: TODO. ==== ftrace

Trace a single function:
cd /sys/kernel/debug/tracing/

# Stop tracing. echo 0 > tracing_on

# Clear previous trace. cat available_tracers
echo function > current_tracer

# List all functions that can be traced
# cat available_filter_functions
# Choose one. cat enabled_functions

echo 1 > tracing_on

# Latest events. If I try to break there with:
+
/run-gdb *0x1000000
+
but I have no corresponding source line. This does not break at all:
+
/run-gdb extract_kernel
+
It only appears once on every log I've seen so far, checked with `grep 0x1000000 trace.txt`
+
Then when we count the instructions that run before the kernel entry point, there is only about 100k instructions, which is insignificant compared to the kernel boot itself. One easy way to do that now is to just run:
+
/run-gdb --userland "$(./getvar userland_build_dir)/linux/poweroff.out" main
+
And get that from the traces, e.g. if the address is `4003a0`, then we search:
+
grep -n 4003a0 trace.txt
+
I have observed a single match for that instruction, so it must be the init, and there were only 20k instructions after it, so the impact is negligible. detected buffer overflow in strlen
------------[ cut here ]------------
followed by a trace. The kernel logs contain:
SELinux:  Initializing. Maybe some brave soul will send a pull request one day. This would be awesome to improve debuggability and safety of kernel modules. I currently don't understand the behaviour very well. This example should handle interrupts from userland and print a message to stdout:
/uio_read.sh
TODO: what is the expected behaviour? Default: 0 or 1 to enable / disable interrupts. I think most are implemented under:
drivers/tty
TODO find all. +
Set in uclibc C code with:
+
reboot(RB_DISABLE_CAD)
+
or from procfs with:
+
echo 0 > /proc/sys/kernel/ctrl-alt-del
+
Done by BusyBox' `reboot`. #::respawn:/sbin/getty -L ttyS3 0 vt100
' >> rootfs_overlay/etc/inittab
./build-buildroot
./run --graphic -- \
  -serial telnet::1235,server,nowait \
  -serial vc:800x600 \
  -serial telnet::1236,server,nowait \
;
and on a second shell:
telnet localhost 1235
We don’t add more TTYs by default because it would spawn more processes, even if we use askfirst instead of respawn. Next, we also have the following shells running on the serial ports, hit enter to activate them:

/dev/ttyS0: first shell that was used to run QEMU, corresponds to QEMU’s -serial mon:stdio. although we cannot change between terminals from there. reset on the terminal then kills the poor penguins. === Linux kernel build system

==== vmlinux vs bzImage vs zImage vs Image

Between all archs on QEMU and gem5 we touch all of those kernel built output files. The current bad behaviour is that it prints just:
Boot-wrapper v0.2
and nothing else. We will also need CONFIG_XEN=y on the Linux kernel, but first Xen should print some Xen messages before the kernel is ever reached. So it seems that the configuration failure lies in the boot wrapper itself rather than Xen. Buildroot does not seem to support EDK 2. It is even the default [android] simulator that developers get with Android Studio 3 to develop apps without real hardware. ==== gem5 disk persistency

TODO how to make gem5 disk writes persistent? To test it out, login into the VM with and run:
./run --eval-after 'umount /mnt/9p/*;./count.sh'
On another shell, take a snapshot:
./qemu-monitor savevm my_snap_id
The counting continues. This shows that CPU and memory states were reverted. The edu hardware model has that feature:

https://stackoverflow.com/questions/32592734/are-there-any-dma-driver-example-pcie-and-fpga/44716747#44716747

https://stackoverflow.com/questions/17913679/how-to-instantiate-and-use-a-dma-driver-linux-module

===== Manipulate PCI registers directly

In this section we will try to interact with PCI devices directly from userland without kernel modules. Each BAR corresponds to an address range that can be used to communicate with the PCI. This is the saner method apparently, and what the edu device uses. QEMU does not have a very nice mechanism to observe GPIO activity: https://raspberrypi.stackexchange.com/questions/56373/is-it-possible-to-get-the-state-of-the-leds-and-gpios-in-a-qemu-emulation-like-t/69267#69267

The best you can do is to hack our build script to add:
HOST_QEMU_OPTS='--extra-cflags=-DDEBUG_PL061=1'
where PL061 is the dominating ARM Holdings hardware that handles GPIO. Rationale: we found out that the kernels that build for qemu -M versatilepb don’t work on gem5 because versatilepb is an old pre-v7 platform, and gem5 requires armv7. depending on what else is available on the GUI: serial, parallel and frame buffer.
It makes 3x execution faster than the default trace backend which logs human readable data to stdout. -d tracing is cool because it does not require a messy recompile, and it can also show symbols. This would include the memory values read into the registers. In general QEMU is not designed to support this kind of monitoring of guest operations. Maybe we can reuse / extend the kernel’s GDB Python scripts?? This awesome feature allows you to examine a single run as many times as you would like until you understand everything:
# Record a run. It existed earlier but it rot completely. Keep in mind however that the disassembly is very broken in several places as of 2019q2, so you can't always trust it. gem5's processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up... The workaround is to just use gem5's `ExecSymbol` to get function granularity, and then GDB individually if line detail is needed? Tested in b4879ae5b0b6644e6836b0881e4da05c64a6550d. ===== gem5 ExecAll trace format

This debug flag traces all instructions. Only shown if the `ExecMacro` flag is given. Breakdown:

* `25007500`: time count in some unit. Note how the microops execute at further timestamps. +
`config`.ini has `--param 'system.multi_thread = True' --param 'system.cpu[0].numThreads = 2'`, but in <<arm-baremetal-multicore>> the first one alone does not produce `T1`, and with the second one simulation blows up with:
+
fatal: fatal condition interrupts.size() ! * strxi_uop x29, [ureg0]: microop disassembly. * MemWrite : D=0x0000000000000000 A=0xffffff8008913f90: a memory write microop: D stands for data, and represents the value that was written to memory or to a register A stands for address, and represents the address to which the value was written. It only shows when data is being written to memory, but not to registers. The best way to verify all of this is to write some baremetal code

===== gem5 Registers trace format

This flag shows a more detailed register usage than [gem5-execall-trace-format].

For example, if we run in LKMC 0323e81bff1d55b978a4b36b9701570b59b981eb:
./run --arch aarch64 --baremetal userland/arch/aarch64/add. 32000: system.cpu.[tid:0]: Reading int reg 1 (1) as 0x3. * what do the numbers in parenthesis mean at `31 (34)`? This allows us to:
+
--
*** do much more realistic performance benchmarking with it, which makes absolutely no sense in QEMU, which is purely functional
*** make certain functional observations that are not possible in QEMU, e.g.:
**** use Linux kernel APIs that flush cache memory like DMA, which are crucial for driver development. ** not sure: gem5 has BSD license while QEMU has GPL
+
This suits chip makers that want to distribute forks with secret IP to their customers. But TODO: how to get the output to check that it is correct without such IO cycles? Would be good to do a detailed assembly run analysis. This suggests that the simulation of cycles in which the CPU is waiting for memory to come back is faster. And we now see the boot messages, and then get a shell. Documentation: http://gem5.org/Checkpoints

To see it in action try:
/run --arch aarch64 --emulator gem5
In the guest, wait for the boot to end and run:
m5 checkpoint
where <<gem5-m5-executable>> is a guest utility present inside the gem5 tree which we cross-compiled and installed into the guest. To restore the checkpoint, kill the VM and run:
/run --arch arm --emulator gem5 --gem5-restore 1
The `--gem5-restore` option restores the checkpoint that was created most recently. S --trace-insts-stdout
./run --arch aarch64 --emulator gem5 --static --userland userland/freestanding/gem5_checkpoint. Starting simulation...
   1500: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
   2000: system.cpu: A0 T0 : @asm_main_after_prologue+16    :   m5exit                   : No_OpClass :   flags=(IsInteger|IsNonSpeculative)
Exiting @ tick 2000 because m5_exit instruction encountered
Then, on the first restore run, the checkpoint is restored, and only instructions after the checkpoint are executed:
info: Entering event queue @ 1000. printf 'echo "first benchmark";m5 exit' > "$(./getvar gem5_readfile_file)"
./run --emulator gem5 --gem5-restore 1

# Restore and run the second benchmark. ./run --emulator gem5 --eval './gem5.sh' --gem5-readfile 'echo "setup run"'

# Restore and run the first benchmark. +
Usage:
+
# Boot, checkpoint and exit. https://www.mail-archive.com/[email protected]/msg15233.html

==== gem5 restore checkpoint with a different CPU

gem5 can switch to a different CPU model when restoring a checkpoint. A common combo is to boot Linux with a fast CPU, make a checkpoint and then replay the benchmark of interest with a slower CPU. This can be observed interactively in full system with:
./run --arch aarch64 --emulator gem5
Then in the guest terminal after boot ends:
sh -c 'm5 checkpoint;sh'
m5 exit
And then restore the checkpoint with a different slower CPU:
./run --arch arm --emulator gem5 --gem5-restore 1 -- --caches --cpu-type=DerivO3CPU
And now you will notice that everything happens much slower in the guest terminal! 83000: O3CPU: system.cpu:
which is the `movz` after the checkpoint. The final `m5exit` does not appear due to DerivO3CPU logging insanity. 1500: O3CPU: system.switch_cpus: Scheduling next tick! 2000: O3CPU: system.switch_cpus:

FullO3CPU: Ticking main, FullO3CPU. It uses the <<m5ops-instructions>> as its backend. `m5` cannot should / should not be used however:

* in bare metal setups
* when you want to call the instructions from inside interest points of your benchmark. ==== gem5 m5 executable

`m5` is a guest command line utility that is installed and run on the guest, that serves as a CLI front-end for the <<m5ops>>

Its source is present in the gem5 tree: https://github.com/gem5/gem5/blob/6925bf55005c118dc2580ba83e0fa10b31839ef9/util/m5/m5.c

It is possible to guess what most tools do from the corresponding <<m5ops>>, but let's at least document the less obvious ones here. Exit code is 1
and exits with status 0. We then parse that string ourselves in run and exit with the correct status…

TODO: it used to be like that, but it actually got changed to just print the message. ===== m5 initparam

Ermm, just another [m5-readfile] that only takes integers and only from CLI options? To make things simpler to understand, you can play around with our own minimized educational m5 subset:

userland/c/m5ops.c

userland/cpp/m5ops.cpp

The instructions used by ./c/m5ops.out are present in lkmc/m5ops.h in a very simple to understand and reuse inline assembly form. To use that file, first rebuild m5ops.out with the m5ops instructions enabled and install it on the root filesystem:
./build-userland \
  --arch aarch64 \
  --force-rebuild \
  userland/c/m5ops.c \
;
./build-buildroot --arch aarch64
We don’t enable -DLKMC_M5OPS_ENABLE=1 by default on userland executables because we try to use a single image for both gem5, QEMU and native, and those instructions would break the latter two. Gets linked to m5op_arm_A64.S which defines a function for each m5op. Because you have to do more setup work by telling the kernel never to touch the magic page.
Those values will loop over the magic constants defined in m5ops.h with the deferred preprocessor idiom. We ignore the \subfunc since it is always 0 on the ops that interest us. ===== m5op annotations

include/gem5/asm/generic/m5ops.h also describes some annotation instructions. The patches are optional: the vanilla kernel does boot. But they add some interesting gem5-specific optimizations, instrumentations and device support. TODO understand why, especially if it is a config difference, or if it actually comes from a patch. desc.empty())
and after that the file size went down to 21KB. Tested in gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d. Without it, nothing shows on terminal, and the simulation terminates with `simulate() limit reached  @  18446744073709551615`. The magic `vmlinux.vexpress_gem5_v1.20170616` works however without a DTB. [----------] Global test environment set-up. BasicReadWriteNoOverflow (0 ms)
[ RUN      ] CircleBufTest. MultiWriteOverflow
[       OK ] CircleBufTest. ....

so you can just copy paste the command. Note that the command and it's corresponding results don't need to show consecutively on stdout because tests are run in parallel. So this usually means all CPUs are in a sleep state, and no events are scheduled in the future, which usually indicates a bug in either gem5 or guest code, leading gem5 to blow up. The same scale as the ExecAll trace is used. As a result, the build and runtime will be way slower than normal, but that still might be the fastest way to solve undefined behaviour problems. `--without-tcmalloc` is needed / a good idea when using `--with-asan`: https://stackoverflow.com/questions/42712555/address-sanitizer-fsanitize-address-works-with-tcmalloc since both do more or less similar jobs, see also <<memory-leaks>>.

==== gem5 Ruby build

gem5 has two types of memory system:

* the classic memory system, which is used by default
* the Ruby memory system

The Ruby memory system includes the SLICC domain specific language to describe memory systems: http://gem5.org/Ruby SLICC transpiles to C++ auto-generated files under `build/<isa>/mem/ruby/protocol/`. Since it is not the default, Ruby is generally less stable that the classic memory model. * otherwise, use the classic memory system. S \
  --static \
  --trace ExecAll,FmtFlag,Ruby,XBar \
  -- \
  --ruby \
;
cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
Then:

when the --ruby flag is given, we see a gazillion Ruby related messages prefixed e.g. by RubyPort:.
We also observe from ExecEnable lines that instruction timing is not simple anymore, so the memory system must have latencies

without --ruby, we instead see XBar (Coherent Crossbar) related messages such as CoherentXBar:, which I believe is the more precise name for the memory model that the classic memory system uses: [gem5-crossbar-interconnect].

Certain features may not work in Ruby. === gem5 CPU types

gem5 has a few in tree CPU models for different purposes. There is no simple answer for "what is the best CPU", in theory you have to understand each model and decide which one is closer your target system. They are therefore completely unrealistic. Implementations:

AtomicSimpleCPU: the default one.
Useful to boot Linux fast and then checkpoint and switch to a more detailed CPU.

TimingSimpleCPU: memory accesses are realistic, but the CPU has no pipeline. ===== gem5 MinorCPU

Generic in-order core that does not model any specific CPU. The weird name "Minor" stands for "M (TODO what is M) IN ONder".
Created by Ashkan Tousi in 2017 while working at ARM. __

+ * ex5_LITTLE: derived from MinorCPU.
The CLI option is named slightly differently as: --cpu-type O3_ARM_v7a_3. Each platform represents a different system with different devices, memory and interrupt setup. Could be used as an alternative to this repository. The best setup I’ve reached is with Eclipse. It is not perfect, and there is a learning curve, but is worth it. ==== gem5 entry point

The main is at: src/sim/main.cc. ==== gem5 event queue

gem5 is an event based simulator, and as such the event queue is of of the crucial elements in the system. The gem5 event queue stores one callback event for each future point in time. ====== AtomicSimpleCPU initial events

Let’s have a closer look at the initial magically scheduled events of the simulation. = Idle) reschedule(tickEvent, curTick() + latency, true);
so it is interesting to learn where that `latency` comes from. This then shows on the <<gem5-config-ini,`config.ini`>> as:
type=SrcClockDomain clock=500
====== AtomicSimpleCPU memory access

It will be interesting to see how `AtomicSimpleCPU` makes memory access on GDB and to compare that with <<gem5-event-queue-timingsimplecpu-syscall-emulation-freestanding-example-analysis,`TimingSimpleCPU`>>.

We assume that the memory access still goes through the <<gem5-crossbar-interconnect,`CoherentXBar`>>, but instead of generating an event to model delayed response, it must be doing the access directly. ====== gem5 se.py page translation

Happens on `EmulationPageTable`, and seems to happen atomically without making any extra memory requests. For now, we have just collected a bunch of data there, but needs interpreting. The CPU specifics in this section are already insightful however. [[config-dot-svg-timingsimplecpu]]
.`config.dot.svg` for a TimingSimpleCPU without caches. Notably, the above tree contains the execution of the first two instructions. Observe how the events leading up to the second instruction are basically a copy of those of the first one, this is the basic `TimingSimpleCPU` event loop in action. Root.getInstance()
        for obj in root.descendants(): obj.startup()
where simulate happens after m5.instantiate, and both are called directly from the toplevel scripts, e.g. for se.py in configs/common/Simulation.py:
def run(options, root, testsys, cpu_class):
    ...
            exit_event = m5.simulate()
By looking up some variable definitions in the source, we now we see some memory parameters clearly:

ranks: std::vector<DRAMCtrl::Rank*> with 2 elements. It is 7.8 us for a 64ms refresh requirement tREFI = Param. Our simulation ends way before that point however, so we will never know what it did thank God. ====== TimingSimpleCPU analysis #5

Executes TimingSimpleCPU::fetch(). The log shows that event ID 43 is now executing: we had previously seen event 43 get scheduled and had analyzed it to be the initial fetch. From:
p/x *pkt
we see:
addr = 0x78
which from [timingsimplecpu-analysis-5] we know is the physical address of the ELF entry point. ====== TimingSimpleCPU analysis #8

Executes DRAMCtrl::processNextReqEvent. This it must just be some power statistics stuff, as it does not schedule anything else. ====== TimingSimpleCPU analysis #21

Schedules BaseXBar::Layer<SrcType, DstType>::releaseLayer through:
EventManager::schedule
BaseXBar::Layer<MasterPort, SlavePort>::occupyLayer
BaseXBar::Layer<MasterPort, SlavePort>::succeededTiming
CoherentXBar::recvTimingResp
CoherentXBar::CoherentXBarMasterPort::recvTimingResp
TimingResponseProtocol::sendResp
SlavePort::sendTimingResp
RespPacketQueue::sendTiming
PacketQueue::sendDeferredPacket
PacketQueue::processSendEvent
====== TimingSimpleCPU analysis #22

Executes BaseXBar::Layer<SrcType, DstType>::releaseLayer. To observe it we could create one well controlled workload with instructions that flush memory, and run it on two CPUs. The memory system system part must be similar to that of TimingSimpleCPU that we previously studied [gem5-event-queue-timingsimplecpu-syscall-emulation-freestanding-example-analysis]: the main thing we want to see is how the CPU pipeline speeds up execution by preventing some memory stalls. Many ThreadContext methods simply forward to ThreadState implementations. But it has been widely overused to insanity. Looks like a documentation mechanism to indicate that a certain symbol is ISA specific. Tested in gem5 2a242c5f59a54bc6b8953f82486f7e6fe0aa9b3d. ===== Why are all C++ symlinked into the gem5 build dir? * it is written in Make and Bash rather than Python like LKMC

This repo basically wraps around that, and tries to make everything even more awesome for kernel developers by adding the capability of seamlessly running the stuff you’ve built on emulators usually via ./run. Must be passed every time you run ./build. Note that dots cannot be used as in 1.5G, so just use Megs as in 1500M instead. For example, QEMU developers will only want to see the final QEMU command that you are running. Because the toolchain is so complex and tightly knitted with the rest of the system, this is more of an art than a science. However, it is not something to be feared, and you will get there without help in most cases. In this section we cover the most common cases. ==== Update GCC: GCC supported by Buildroot

This is of course the simplest case. Let’s see how much Linux allows us to malloc. Then from [malloc-implementation] we see that malloc is implemented with mmap. 2 is precisely documented but I’m lazy to do all calculations. It seems to be easier to use for compute parallelism and more language agnostic than POSIX threads. A quick grep shows many references to pthreads. For inputs large enough, the non-synchronized examples are extremely likely to produce "wrong" results, for example on [p51] Ubuntu 19.10 native with 2 threads and 10000 loops:
./fail.out 2 10000
we could get an output such as:
expect 20000
global 12676
The actual value is much smaller, because the threads have often overwritten one another with older values. So we just use the closest free drafts instead. Contrast with [pthreads] which are for threads. ===== getpid

The minimal interesting example is to use fork and observe different PIDs. Only run this on your host if you have saved all data you care about! So without further ado, let’s rock with either:
./run --eval-after './posix/fork_bomb.out danger'
./run --eval-after './fork_bomb.sh danger'
Sources:

userland/posix/fork_bomb.c

rootfs_overlay/lkmc/fork_bomb.sh

Outcome for the C version on LKMC 762cd8d601b7db06aa289c0fca7b40696299a868 + 1: after a few seconds of an unresponsive shell, we get a visit form the [linux-out-of-memory-killer], and the system is restored! ==== pthreads

POSIX' multithreading API. Linux adds has several POSIX extension flags to it. Which is infinitely better than a silent break in any case. Unmerged patch at: http://lists.busybox.net/pipermail/buildroot/2018-February/213282.html

There is a JamVM package though https://en.wikipedia.org/wiki/JamVM which is something Android started before moving to Dalvik,

Maybe some day other [android] Java runtimes will also become compilable. ./parse_output output < tmp.raw > tmp.o

# Compare the output to the Expected one. cmp tmp.o test_data/8.e

# Same but now with a large randomly generated input. The cache sizes were chosen to match the host [p51] to improve the comparison. TODO confirm with some kind of measurement. The benchmark also makes no syscalls except for measuring time and reporting results. times[0 * ntimes + k] = mysecond(); #pragma omp parallel for for (j=0; j<stream_array_size; j++) c[j] = a[j]; times[0 * ntimes + k] = mysecond() - times[0 * ntimes + k];

/* Scale. */ Some of the benchmarks were are segfaulting, they are documented in that repo. The rebuild is required because we unpack input files on the host. One option would be to do that inside the guest with QEMU. Buildroot was not designed to deal with large images, and currently cross rebuilds are a bit slow, due to some image generation and validation steps. …. before going for the cross compile build. This should be totally viable, and we should do it. S[] * LKMC_ASSERT_EQ tests link:userland/arch/x86_64/lkmc_assert_eq_fail. S[] link:userland/arch/aarch64/lkmc_assert_memcmp_fail. S[]

Bibliography: [armarm7] A2.3 "ARM core registers". This indicates that the argument takes the value zero, but does not indicate that the ZR is implemented as a physical register. For this reason, there are sometimes multiple ways to do floating point operations in each ISA. In order to GDB step debug those executables, you will want to use --no-continue, e.g.:
./run --arch aarch64 --userland userland/arch/aarch64/freestanding/linux/hello. S --gdb-wait
./run-gdb --arch aarch64 --no-continue --userland userland/arch/aarch64/freestanding/linux/hello. S:4
so I didn’t really have a good question. Here is a tiny example that calls just exit from the C standard library:

main. S
.global _start
_start:
    mov $0, %rdi
    call exit
Compile and run with:
gcc -ggdb3 -nostartfiles -static -o exit.out exit. Is it any easy to determine which functions I can use or not, in case there are any that I can't use? Bibliography: https://stackoverflow.com/questions/6514537/how-do-i-specify-immediate-floating-point-numbers-with-inline-assembly/52906126#52906126
* arm
** link:userland/arch/arm/inline_asm/inc.c[]
** link:userland/arch/arm/inline_asm/inc_memory.c[]
** link:userland/arch/arm/inline_asm/inc_memory_global.c[]
** link:userland/arch/arm/inline_asm/add.c[]
* aarch64
** link:userland/arch/aarch64/inline_asm/earlyclobber.c[]
** link:userland/arch/aarch64/inline_asm/inc.c[]
** link:userland/arch/aarch64/inline_asm/inc_32.c[]: how to use 32-bit `w` registers in aarch64. * arm
** link:lkmc/arm.h[] `ENTRY` and `EXIT`
** link:userland/arch/arm/linux/c_from_asm. S[]

==== GNU GAS assembler immediates

Summary:

* x86 always dollar `$` everywhere. S[]
* link:userland/arch/arm/gas_data_sizes. The concept of unified assembly is mentioned in ARM's official assembler documentation: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473c/BABJIHGJ.html and is often called Unified Assembly Language (UAL). === NOP instructions

* x86: link:userland/arch/x86_64/nop. S[NOP]
* ARM: xref:arm-nop-instruction[xrefstyle=full]

No OPeration. This is reflected in their names:

* RAX: Accumulator. The general place where you add, subtract and otherwise manipulate results in-place. Cannot be tested simply from userland, so we won't talk about them here. S[]: MOVSX
* link:userland/arch/x86_64/bswap. S[]: ADD
** link:userland/arch/x86_64/inc. S[]: INC
** link:userland/arch/x86_64/adc. S[]: SBB
* link:userland/arch/x86_64/mul. S[]: NEG
** link:userland/arch/x86_64/imul. S[]: CMP

=== x86 logical instructions

<<intel-manual-1>> 5.1.4 "Logical Instructions"

* link:userland/arch/x86_64/and. S[]: AND
* link:userland/arch/x86_64/not. S[SHL and SHR]
+
SHift left or Right and insert 0. S[SAL and SAR]
+
Application: signed multiply and divide by powers of 2. +
Mnemonics: Shift Arithmetic Left and Right
+
Keeps the same sign on right shift. S[]: ROL and ROR
+
Rotates the bit that is going out around to the other side. S[]: BT
+
Bit test: test if the Nth bit a bit of a register is set and store the result in the CF FLAG. +
CF = reg[N]
* link:userland/arch/x86_64/btr. S[]: BTR
+
Do a BT and then set the bit to 0. S[]: JMP
** link:userland/arch/x86_64/jmp_indirect. S[]: JMP indirect

==== x86 Jcc instructions

link:userland/arch/x86_64/jcc. STOSD is called STOSL in GNU GAS as usual: https://stackoverflow.com/questions/6211629/gcc-inline-assembly-error-no-such-instruction-stosd
* Further examples
** link:userland/arch/x86_64/cmps. S[]: CMPS: CoMPare Strings: compare two values in memory with addresses given by RSI and RDI. S[]: LODS: LOaD String: load from memory to register. REP and REPZ also additionally stop if the comparison operation they repeat fails. LEAVE is still emitted by some compilers. ==== x86 CPUID instruction

Example: link:userland/arch/x86_64/cpuid. Instructions with the P suffix also Pop the stack. S[] FSCALE: `ST0 = ST0 * 2 ^ RoundTowardZero(ST1)`
** link:userland/arch/x86_64/fsqrt. Instructions such as FLDL convert standard <<ieee-754>> 64-bit values from memory into this custom 80-bit format. * https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions[SSE]: Streaming SIMD Extensions. YMM0–YMM15 256-bit registers in 64-bit mode. S[]: MOVUPS: like MOVAPS but also works for unaligned memory
* link:userland/arch/x86_64/movss. S[]: MOVSS: move 32-bits between two XMM registeres or XMM registers and memory

===== x86 SSE packed arithmetic instructions

<<intel-manual-1>> 5.5.1.2 "SSE Packed Arithmetic Instructions"

* link:userland/arch/x86_64/addpd. ===== x86 PADDQ instruction

link:userland/arch/x86_64/paddq. They were introduced into AVX512F however. === x86 system instructions

<<intel-manual-1>> 5.20 "SYSTEM INSTRUCTIONS"

==== x86 RDTSC instruction

Sources:

* link:userland/arch/x86_64/rdtsc. S
./run --eval './arch/x86_64/rdtsc.out;m5 exit;' --emulator gem5
./gem5-stat
RDTSC outputs a cycle count which we compare with gem5’s gem5-stat:

3828578153: RDTSC

3830832635: gem5-stat

which gives pretty close results, and serve as a nice sanity check that the cycle counter is coherent. S[] * userland/arch/x86_64/intrinsics/rdtscp.c

We can observe its operation with the good and old taskset, for example:
taskset -c 0 ./userland/arch/x86_64/rdtscp.out | tail -n 1
taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
produces:
0x00000000
0x00000001
There is also the RDPID instruction that reads just the processor ID, but it appears to be very new for QEMU 4.0.0 or [p51], as it fails with SIGILL on both. It competes with [x86-userland-assembly] because its implementations are designed for low power consumption, which is a major requirement of the cell phone market. ARM Holdings was bought by the Japanese giant SoftBank in 2016. The encodings are:

A32: every instruction is 4 bytes long. * T32: most common instructions are 2 bytes long. Many others less common ones are 4 bytes long. This design can be contrasted with x86, which has widely variable instruction length. Does the linker then resolve thumbness with address relocation? However, everyone only uses little endian, so the big endian ecosystem is not as supported. S ./build-userland --arch aarch64 --ccflags=-mbig-endian userland/arch/aarch64/freestanding/linux/hello. Example: link:userland/arch/arm/ldr_pseudo. S[]: load half word

These also have signed and unsigned versions to either zero or one extend the result:

link:userland/arch/aarch64/ldrsw. S[]: load byte and sign extend

==== ARM STR instruction

Store from memory into registers. See SP alignment checking on page D1-2164. ` indicates that we want to update the register. The registers are encoded as single bits inside the instruction: each bit represents one register. As a consequence, the push order is fixed no matter how you write the assembly instruction: there is just not enough space to encode ordering. S[]: reverse byte order * link:userland/arch/arm/tst. S[]: count leading zeroes

===== ARM BIC instruction

Bitwise Bit Clear: clear some bits. …. dest = left & ~right
Example: link:userland/arch/arm/bic. Has several simpler to understand aliases. ====== ARM BFI instruction

Examples:

* link:userland/arch/arm/bfi. This allows for constants such as 0xFF (0xFF rotated right by 0), 0xFF00 (0xFF rotated right by 24) or 0xF000000F (0xFF rotated right by 4). When they are not small they tend to be bit masks. Similar to <<arm-movw-and-movt-instructions>> in v7. Example: link:userland/arch/aarch64/movk. S[]

The shift types are:

* LSR and LFL: Logical Shift Right / Left. Therefore raise illegal instruction signal. TODO I think it was optional in ARMv7, find quote. It is not part of the ISA, but just an extension. TODO: that spec does not seem to have the instructions documented, and instruction like VMOV just live with the main instructions. S[]

Convert between integers and floating point. ____

Notice how the opcode takes two types. ====== ARMv8 AArch32 VCVTA instruction

Example: link:userland/arch/arm/vcvt. Also there was no ties to away mode in ARMv7. ==== ARMv8 Advanced SIMD and floating-point support

The <<armarm8>> specifies floating point and SIMD support in the main architecture at A1.5 "Advanced SIMD and floating-point support". ____

Therefore it is in theory optional, but highly available. ____

==== ARMv8 AArch64 floating point registers

TODO example. <<armarm8>> B1.2.1 "Registers in AArch64 state" describes the registers:

32 SIMD&FP registers, V0 to V31. The instructions then allow:

incrementing loop index by the vector length without explicitly hardcoding it
when the last loop is reached, extra bytes that are not multiples of the vector length get automatically masked out by the predicate register, and have no effect

Added to QEMU in 3.0.0 and gem5 in 2019 Q3. S --emulator gem5 — --param 'system.cpu[:].isa[:].sve_vl_se = 1' ./run --arch aarch64 --userland userland/arch/aarch64/sve_addvl. That is, SVE requires the implementation of ARMv8.2. The official comprehensive ARMv8 reference. S ./run --arch arm --baremetal baremetal/arch/arm/no_bootloader/semihost_exit. S[] * link:baremetal/arch/aarch64/no_bootloader/semihost_exit. The downside of semihosting is that it is ARM specific. === ARM baremetal

In this section we will focus on learning ARM architecture concepts that can only learnt on baremetal setups. S[]. That patch however enables SIMD in baremetal, which I feel is more important. TODO: how to select to use SP0 in an exception handler? ==== ARM SVC instruction

This is the most basic example of exception handling we have. With [qemu-d-tracing]:

./run \
  --arch aarch64 \
  --baremetal baremetal/arch/aarch64/svc.c \
  -- -d in_asm,int \
;

the output at 8f73910dd1fc1fa6dc6904ae406b7598cdcd96d7 contains:

----------------
IN: main
0x40002098:  d41579a1  svc      #0xabcd

Taking exception 2 [SVC]
...from EL1 to EL1
...with ESR 0x15/0x5600abcd
...with ELR 0x4000209c
...to EL1 PC 0x40000a00 PSTATE 0x3c5
----------------
IN:
0x40000a00:  14000225  b        #0x40001294

----------------
IN:
0x40001294:  a9bf7bfd  stp      x29, x30, [sp, #-0x10]! 0x40001298:  a9bf73fb  stp      x27, x28, [sp, #-0x10]! 0x400012c8:  a9bf13e3  stp      x3, x4, [sp, #-0x10]! 0x400012d0:  d5384015  mrs      x21, spsr_el1
0x400012d4:  a9bf03f5  stp      x21, x0, [sp, #-0x10]! +
TODO: why doesn't QEMU show our nice symbol names? +
This reset value is defined by <<armarm8>> C5.2.2 "DAIF, Interrupt Mask Bits". S --cpus 2 --emulator gem5
# TODO not working, hangs. Then try:

/run --arch aarch64 --baremetal baremetal/arch/aarch64/multicore.c --cpus 1

and watch it hang forever. S[]
* link:userland/arch/aarch64/freestanding/linux/wfe_wfe. Note This is equivalent to issuing an SEVL instruction on the PE for which the monitor state has changed. In MTTCG mode we
 * just skip this instruction. In the shown in the `wfe_ldxr_stxr.cpp` example, which can only terminate in gem5 user mode simulation because due to this event. S

/*
 * Prepare SCTLR
 */
mov_q	x0, SCTLR_EL1_SET

To reduce the number of instructions from our trace, first we boot, and then we restore a checkpoint after boot with <<gem5-restore-new-script>> with a restore command that runs link:userland/arch/aarch64/freestanding/linux/wfe_wfe. Fails with <<gem5-simulate-limit-reached>> at the first WFI done in main, which means that the interrupt is never raised. The key registers to keep in mind are:

* `CNTVCT_EL0`: "Counter-timer Virtual Count register". * `CNTV_CTL_EL0`: "Counter-timer Virtual Timer Control register". * Our reset value matches the fixed frequency we implement the timer at. I needed the following minor patches: https://github.com/NienfengYao/armv8-bare-metal/pull/1

Handles an SVC and setups and handles the timer about once per second. It's integration into this repo will likely never be super good. matches that in which `-drive` were given to QEMU. https://stackoverflow.com/questions/9768103/make-persistent-changes-to-init-rc

Tested on: `8.1.0_r60`. <<p51>> Ubuntu 19.10 LKMC b11e3cd9fb5df0e3fe61de28e8264bbc95ea9005 gem5 e779c19dbb51ad2f7699bd58a5c7827708e12b55 aarch64: 143s. Why huge increases from 70s on above table? But likely wouldn't be much more until after boot since we are almost already done by then! Therefore this vanilla kernel is much much faster! But then I checked out there, run it, and kernel panics before any messages come out. 5x slowdown observed with output to a hard disk. Slightly faster, but the bulk was still just in log format operations! ===== User mode vs full system benchmark

Let's see if user mode runs considerably faster than full system or not, ignoring the kernel boot. Breakdown: 47% GCC, 15% Linux kernel, 9% uclibc, 5% host-binutils. Conclusions:

* we have bloated our kernel build 3x with all those delicious features :-)
* GCC time increased 1.5x by our bloat, but its percentage of the total was greatly reduced, due to new packages being introduced. I bet that this happens during heavy compilation. Same but gem5 d7d9bc240615625141cd6feddbadd392457e49eb (2018-06-17) hacked with `-Wnoerror`: 11m 37s. Slower as expected, since more optimizations are done at link time. Full specs and benchmark scores will be maintained at the latest version of: https://github.com/cirosantilli/notes/blob/0c038b0e430d0017f12d028c6a0e7c0b99ec957f/my-hardware.adoc#thinkpad-p51

=== Benchmark Internets

==== 38Mbps internet

2c12b21b304178a81c9912817b782ead0286d282:

* shallow clone of all submodules: 4 minutes. Maybe the message is there because as concluded in <<gem5-o3threadcontext>>, registeres for `DerivO3CPU` are stored in `DerivO3CPU` itself (`FullO3CPU`), and therefore there is no way to to currently represent multiple register sets per CPU. In most ISAs, this tends to be the minority of instructions, and is only used when something is going to modify memory that is known to be shared across threads. ==== Can caches snoop data from other caches? Either they can snoop only control, or both control and data can be snooped. In what follows I make some stuff up with design choice comparisons, needs confirmation. In this protocol, every cache only needs a single bit of state: validity. If there is another valid cache in another CPU, it services the request. Otherwise, goes the request goes to memory. Then, there are two possible design choices, either:

* that read is marked as exclusive, and all caches that had it, snoop it become invalid. +
Upside: no need to send the new data to the bus. +
Downside: much more data on bus, so likely this is not going to be the best choice. With this we would have:

* V
** PrRd
*** V
***
** PrWr
*** V
*** BusUpgr
** BusRd
*** V
*** BusData
** BusRdX
*** I
*** BusData
** BusUpgr
*** I
***
* I
** PrRd
*** V
*** BusRd
** PrWr
*** V
*** BusRdX
** BusRd
*** I
***
** BusRdX
*** I
***
** BusUpgr
*** I
***

Here Flush and BusData replies are omitted since those never lead to a change of state, nor to the sending of further messages. The system looks like this:

---- |DRAM| ---- ^ | v -------- | BUS | -------- ^ ^ | | v v ------ ------ |CACHE1| |CACHE2| ------ ------ ^ ^ | | | | ---- ---- |CPU1| |CPU2| ---- ----

MSI stands for which states each cache can be in for a given cache line. This means generating bus traffic, which has a cost and must be kept to a minimum. The reply will contain the full data line. ** "Bus write": the cache wants to modify some data, and it does not have the line. +
*** Move to: Shared
*** Send message: "Write back"
** "Bus write": someone else will write to our address. Why not wait until later ant try to gain something from this deferral? == About this repo

=== Supported hosts

The host requirements depend a lot on which examples you want to run. === Default command line arguments

It gets annoying to retype `--arch aarch64` for every single command, or to remember `--config` setups. Section generation happens at `Section.generate_id` in Asciidoctor code. To only nuke only one Buildroot package, we can use the https://buildroot.org/downloads/manual/manual.html#pkg-build-steps[`-dirclean`] Buildroot target:

/build-buildroot --no-all — <package-name>-dirclean

e.g.:

/build-buildroot --no-all — sample_package-dirclean

Verify with:

ls "$(./getvar buildroot_build_build_dir)"

=== Custom build directory

For now there is no way to change the build directory from `out/` (resp. `out.docker` for <<docker>.) to something else. * how big is my build, and how many build configurations do I need to keep around at a time? To check if `ccache` is working, run this command while a build is running on another shell:

watch -n1 'make -C "$(./getvar buildroot_build_dir)" ccache-stats'

or if you have it installed on host and the environment variables exported simply with:

watch -n1 'ccache -s'

and then watch the miss or hit counts go up. +
If a port is not free, it just crashes. ** gem5 automatically increments ports until it finds a free one. Instead, we provide the following safer process. As a result, we are currently using the following rule:

* if something is only going to be used from C and not assembly, define it in a header which is easier to use
+
The slower compilation should be OK as long as split functionality amongst different headers and only include the required ones. ==== buildroot_packages directory

Source: link:buildroot_packages/[].

Every directory inside it is a Buildroot package. [[patches-manual-directory]]
===== patches/manual directory

Patches in this directory are never applied automatically: it is up to users to manually apply them before usage following the instructions in this documentation. This data is stored is stored in link:path_properties.py[] at `path_properties_tuples`. But one big Python dict was easier to implement so we started like this. And it allows factoring chunks out easily. The rationale is the same as for `./build all` and is explained in `./build --help`. We then bisected it as explained at: https://stackoverflow.com/questions/4713088/how-to-use-git-bisect/22592593#22592593 with the link:bisect-qemu-linux-boot[] script:

root_dir="$(pwd)" cd "$(./getvar qemu_source_dir)" git bisect start

Check that our test script fails on v3.0.0-rc3 as expected, and mark it as bad. "${root_dir}/bisect-qemu-linux-boot"

Should output 1. git bisect good

This leaves us at the offending commit. git bisect run "${root_dir}/bisect-qemu-linux-boot"

Clean up after the bisection. For the minimal build to generate the files to be uploaded, see: [release-zip]

The clean build is necessary as it generates clean images since it is not possible to remove Buildroot packages

Run all tests in [non-automated-tests] just QEMU x86_64 and QEMU aarch64. Having at least one example per section is ideal, and it should be the very first thing in the section if possible. * accuracy: how accurate does the simulation represent real hardware?
It is easy to add new packages once you have the toolchain, and if you don’t there are infinitely many packages to cover and we can’t cover them all.
One possibility we could play with is to build loadable modules instead of built-in modules to reduce runtime, but make it easier to get started with the modules. Not much cross compilation information however. === Soft topics

Fairy tale

Once upon a time, there was a boy called Linus. Linus made a super fun toy, and since he was not very humble, decided to call it Linux. Linux was an awesome toy, but it had one big problem: it was very difficult to learn how to play with it! As a result, only some weird kids who were very bored ended up playing with Linux, and everyone thought those kids were very cool, in their own weird way. THE END

Should you waste your life with systems programming?

In that sense, therefore, the kernel is not as open as one might want to believe. * it is impossible to become rich with this knowledge.
The key problem is that the entry cost of hardware design is just too insanely high for startups in general.
It is much easier to accept limitations of physics, and even natural selection in biology, which is are produced by a sentient being (?). Are you fine with those points, and ready to continue wasting your life? Maybe we should just steal it since GPL licensed. Manually builds musl and BusyBox, no Buildroot. They have covered everything we do here basically, but with a more manual approach, while this repo automates everything.

asears/readme_summary.adoc

Select an option

No results found