Skip to content

Instantly share code, notes, and snippets.

@dogtopus
Last active April 20, 2025 04:19
Show Gist options
  • Save dogtopus/b376bb7ee9fa115f8bbe9389d113fff5 to your computer and use it in GitHub Desktop.
Save dogtopus/b376bb7ee9fa115f8bbe9389d113fff5 to your computer and use it in GitHub Desktop.
Besta RTOS reversing notes

Besta RTOS reversing notes

Some random notes about Besta RTOS. Will probably ended up on a wiki somewhere after Project Muteki become mostly usable.

Windows CE?

NOPE. Not even close.

Then why there's Windows CE stuff all over?

For unknown reasons, instead of going through the usual process of configuring a customizable compiler like GCC and making a dedicated toolchain for building the applets for their custom OS, seems that Besta decided to bend MSVC CE toolchain to do so despite the OS being not even remotely related to Windows CE. This is also likely the reason why coredll.dll was included in all the known system images, because coredll.dll provides several OS-independent compiler helpers—namely software-based floating point arithmetic, 64-bit arithmetic and integer division routines—that the MSVC compiler needs. As an unfortunate side effect, this also caused a lot of things to be broken including C++ exceptions and threading/TLS due to Windows CE-specific helpers obviously not working on a completely different OS.

For developing Besta RTOS applets with Windows CE toolchain, custom CRT0 must be used and coredll functions must NOT be used for anything unless the functions are OS-independent, like the previously mentioned helper routines. All syscalls must either go through sdklib/krnllib or be invoked via bare SVCs (e.g. with help from mutekishims).

There are also some Win32API-looking routines provided by sdklib/krnllib, although there are no technical reasons for them to be Windows-ish, other than perhaps providing some familiarity to Windows devs. The fact that they can also sometimes be not completely compatible with the real Win32APIs also ironically makes it a point of frustration for lesser experienced Besta RTOS developers.

If it's not Windows CE, what is it?

Looking at the scheduler code, seems like the Besta RTOS is based on a heavily modified uC/OS-II kernel with a drastically different set of OS API exposed to user code. The rest of the system seems to be developed in-house, sometimes utilizing various existing open-source (e.g. SQLite3, FFmpeg, YAFFS2) and closed-source (e.g. Voxware) components.

You mentioned FFmpeg. Does that mean I can GPL troll Besta?

Maybe. I haven't tried it yet and I'm not a lawyer but maybe.

Hidden modes (diagnostics and DFU)

Diagnostic mode can be used to identify the board type, which is usually a 5 character identifier starts with BA, CA, EA, JA, KA, etc. that is different from the device model number. It can also be used to verify the integrity of the system ROM. On some newer TLCS-900 based systems where a diagnostic menu is known to exist (tested on CA736) it can also be used to dump the system ROM currently installed onto the on-board NAND flash.

DFU may be able to be used to save a bricked device caused by corruption of OS/system image (TODO: needs to verify how much the OS can be corrupted before the DFU was also rendered useless. More NAND dumps would be needed for this).

Entering diagnostics mode

Open the TAD app (it's usually called Service Home, 服务中心 (服務中心), etc. depending on your language settings) and type "diagnostic" using the keyboard. This works on all Arm-based devices with some notable exceptions and some later devices based on the TLCS-900 architecture.

For HP Prime calculators, holding C, F and O button and pressing the Reset button enters the diagnostic mode.

For Sharp dictionaries (at least for JA739), the diagnostics mode is not present and diagnose.exe is not included in the system image.

For Pocket Challenge, it seems to have an external diagnostic menu included in the update SD cards. The system image does also include diagnose.exe but currently it's not certain whether or not it functions properly.

DFU mode

Hold the P button and press RESET to enter DFU mode.

On some systems, the key combo may be slightly different. Specifically on BA742, one needs to hold the P button and press the power button instead.

Syscall scheme

Once called (in Arm mode), push r0 then lr to the stack in this exact order (needs 2 instrucrtions) and initiate an SVC call with the desired syscall #.

Syscalls will not work directly in THUMB mode due to instruction size limit. Interwork is needed in order to do syscall from THUMB code.

Memory management

Besta RTOS uses a single address space memory layout with the kernel, applets and shared libraries all sharing the same address space. There's no MMU support even on SoCs with MMU support so there's zero memory protection, meaning user space code can have direct access to hardware registers, etc. Beware that this also makes NULL an valid address and this will cause NULL dereferencing to be harder to debug.

The kernel mode and user mode share the same heap, although there are several allocators allocated on the main heap for several data structures such as file descriptors and UI elements, possibly to avoid heap fragmentation.

Applet executables

Applet executables are mainly in PE format with ELF being an alternative option. The Windows CE subsystem type is not required although elf2bestape would add it for consistency with some newer Besta RTOS applets. Applet can either be relocatable (true for most of the PE files) or loadable to an absolute base address, although applet of latter type is in practice only runnable if it was the "init" program (i.e. first program to run after the OS is initialized).

It's unclear whether ELF applets support relocation or not since the only one ELF applet known to exist is Prime G1's armfir.elf and it loads to an absolute address.

Shared libraries

Like applets, shared libraries can either be in PE format or ELF. The former used by applets either statically via import table or dynamically with _LoadLibrary*() latter is used mostly by the kernel as some sort of "kernel module". The _start function seems to get ignored when loading them.

Like under Windows, putting a shared library in the same directory as the applet overshadows the system version. This could be used to e.g. trace syscalls.

Threads

The thread model seems to be very similar to uC/OS-II (down to the algorithm level almost source-line-to-source-line), although the public API is totally different.

Total number of 64 threads can be created at the same time. With 38 threads accessible directly via OSCreateThread().

Broken THUMB? (Maybe not)

The code to handle THUMB mode in the CPU context initializer, which is in stock uC/OS-II's Arm Generic port, seems to be missing. Does that mean the THUMB mode is broken? (Maybe not but use THUMB function as an entry point might not work with workarounds i.e. interwork function or patching the saved CPSR. Given that we might need a stack aligner for EABI->OABI conversion anyway this might not be so bad.)

Thread priority

Priority is implied in the natural order of the threads in the global thread table (uC/OS-II just calls this priority table). Some slots in the table seem to be reserved (8 for the top and 18 for the bottom) and are not accessible by just allocating the thread with OSCreateThread(). User can move threads to these reserved slots by calling the OSSetThreadPriority() function.

Scheduling

The scheduler always executes the task that has the highest priority, so using OSSleep() is necessary to prevent one thread getting hold of the CPU for too long.

(TODO figure out if the thread can be yielded when waiting for IO)

1 jiffy is 1ms.

Delay and OSSleep

The OSSleep(jiffies) syscall calls the OSTimeDly() function in uC/OS-II scheduler, which then put the thread to sleep for specified amount of jiffies. Since 1 jiffy is 1ms in Besta RTOS, this practically delays the thread for less than or equal to the specified amount of milliseconds.

There's also a Delay() syscall that delays beyond INT16_MAX jiffies all within a single SVC call.

Events

Events are stripped down version of uC/OS-II mboxes. They don't have the ability to pass arbitrary message pointers like mboxes do.

Events have one extra flag compare to mboxes. Once set by OSCreateEvent(), it will prevent the event flag from cleared once a OSWaitForEvent() call completes without hitting a timeout or error.

Critical sections (aka locks, mutexes, etc.)

Critical sections provide mutually exclusive access to shared resources between threads. They seem to be recursive (as the context struct seems to hold a copy of the reent struct whenever it enters from the same thread/has the same reent struct pointer).

When a thread acquires a free critical section, it only changes the state of that critical section and nothing else on the kernel side is touched. (Unless, of course, when another thread tries to acquire the same critical section. Then that thread will be set to wait for that critical section.)

They also seem to have some kind of index value and a byte array for unknown purpose. More investigations needed. These are standard uC/OS-II thread wait states.

Get current thread

Since the context holds a copy of the current thread pointer, it is possible to use critical sections to know which thread is currently running. To do this, create a critical section locally first. This ensures that no other thread is acquiring it. After this, simply acquire the descriptor with OSEnterCriticalSection() and read out the pointer.

One safe implementation (4 syscalls) is shown as follows:

#include <muteki/threading.h>

thread_t *get_current_thread() {
  thread_t *thr = NULL;
  critical_section_t mutex;
  OSInitCriticalSection(&mutex);
  OSEnterCriticalSection(&mutex);
  thr = mutex.thr;
  OSLeaveCriticalSection(&mutex);
  OSDeleteCriticalSection(&mutex);
  return thr;
}

There is also a faster but hackier way. It abuses an implementation detail of the critical section that there's no other resource allocated/state changed when a free critical section is acquired for the first time. By only barely initializing the critical section and call OSEnterCriticalSection() without any clean up, this brings down the number of syscalls required to only 1. This works on both CD-580+ and WuDi V7.

#include <muteki/threading.h>

thread_t *get_current_thread() {
  critical_section_t mutex;
  // Magic is not checked so not needed here
  mutex.thr = NULL;
  mutex.refcount = 0;
  OSEnterCriticalSection(&mutex);
  return mutex.thr;
}

Error code

Error code is stored in the thread descriptors.

Code set via OSSetLastError will have the flag 0x20000000 set when read back by _GetLastError().

See muteki/errno.h for error codes documented by parsing FormatMessage() string table.

HCA

See hca.xxdm

Direct hardware access

Would be useful for e.g. emulators.

Framebuffer

GetActiveVRamAddress() returns a framebuffer descriptor. It includes the framebuffer as well as its format. This could be a potential way of accessing the framebuffer with e.g. a sw renderer that has no tie to the kernel.

Audio

TODO.

(OpenPCMCodec and ClosePCMCodec look suspicious)

Load an executable at boot

This can be used to e.g. simplify syscall black box testing or implement untethered other OS booting.

Create a file under C:\SYSTEM\DESKTOP.INI with DOS line ending and put

[DESKTOP SETTING]
ENTRY = <dos-8.3-path-to-exe-you-want-to-run>

into the file.

WARNING: This will replace the home screen with the file you specified and might cause the system to not boot properly. If this happens, a full system reset (clearing settings and wiping C: drive) will fix it although it will erase all data in system memory and settings. Alternatively, if chainloading a secondary program is possible, you can also run a program that can help you recover from this situation (e.g. using \\.\EXPLORE.ROM to delete the ini file and reboot this will not be possible on most systems without a loader that strips the v4 args).

PATH_MAX

PATH_MAX is 256 UTF-16 CUs (512 bytes) with NUL terminator. For the 8.3 paths used by CreateFile(), PATH_MAX is 80 bytes with NUL terminator. For 8.3 paths in CWD, PATH_MAX seems to be 64 bytes with NUL terminator.

I am not happy with this ROM.

Neither am I.

Known glitches/vulnerabilities

"Continuous Moan of Death"

When using the Chinese -> English full sentence translation feature, fill the text box with the Chinese character "哼" and press Enter. The system will crash shortly after.

The name of this glitch comes from the result of translation, that contains a huge amount of the phrase "moan continually".

This is at least fixed on BA110L and not present on the HJ translation engine (legacy engine used on TLCS-900 and some very early Arm systems).

No length check in INI parser

_GetPrivateProfileString() does not check the size parameter. Therefore if a string property was longer than expected, this will cause a stack/heap overflow.

No known system fixes this vulnerability.

.coding ascii
.endian little
// HCA format
// Related patent (Chinese): https://patents.google.com/patent/CN1281063C/zh
// Magic
"HCA"
// Pixel format. 0f: 4bpp, ff: 8bpp, c0: 12bpp
c0
// Height and width
(u2:8) (u2:8)
// Number of frames? Number of frame buffers?
(u1:1)
// Valid palette size
(u1:0)
// Number of frames? Number of frame buffers?
(u1:1)
// Index of transparent color (when in indexed mode). This color will be treated as transparent. In indexed mode
(u1:255)
// Palette
// The pixels in indexed image are always processed as a vector of 2 pixels.
// For 4bpp mode, the palette is indexed per-vector. This gives the fixed 1024 bytes palette size.
// For 8bpp mode, 2 palettes (one verbatim, one rotated 4 bits left) are used to take care of unaligned bits.
// (u4:0x00000000) ...
// Size of image data (excluding offset table)
(u4:268)
// Frame offsets (excluding the table)
// (u4:0x0) ...
(u4:0x0)
// Framebuffer 0
// FU: Uncompressed, FC: Compressed (?)
"FU" 00 00 00000000
// Framebuffer type (?) 00000000: Normal, 00264c00: Indexed
00000000
// Framebuffer data
// ...

Nurian X90 and its "little sister" WuDi V7

Basic info

Some basic info about these devices.

Model number

Nurian X90 has the model number KA745 (board ID xA745), while WuDi V7 has the model number BA742 (board ID xA742).

Spec difference

xA742 xA745
SoC NXP i.MX233 Telechips TCC8902
Arch Arm926EJ-S Arm1176JZF-S
GPU None (but a 2D accelerator is present) Mali200
Video decoder None H.264 (?)
RAM DDR 32MiB DDR2 64MiB x2 (128MiB total)
eMMC 2GB 8GB

Other specs not listed here should be more or less the same to both devices.

Battery

A little bit of safety warning ahead: The battery is attached to the mainboard with very strong adhesive. When removing the battery, discharging it before-hand, flooding the bottom of the battery with isopropyl alcohol and carefully cutting away the adhesive with a flat object (like an expired credit card or the ifixit battery removal card) is highly recommended. DO NOT reuse the battery that was removed, as this could be a fire hazard.

Both devices use a 1000mAh internal Li-ion battery with the parameter of 1ICPt30/48/62-1. Battery of this exact size can be hard to find due to the size being an awkward middle ground between small wearable devices and full-fleged phones/tablets. Fortunately, to my testing, there's enough clearance for a battery up to 4mm thick and 63mm long, and this widens the possible choices for a replacement battery.

Due to this, upgrading the battery to a larger capacity one is also possible. Murata US404562H5 is a good candidate for this purpose. It's slightly narrower but 33% thicker, so the volume is still 25% larger than the original, and due to (seemingly) chemistry differences , this gives us about 60% of capacity boost (1000mAh → 1580mAh).

To remove the battery, desolder the terminals, insulate and secure the wires with tape and do the removal procedure stated in the safety warning. To install a new battery, simply glue it back to the location as shown on the PCB silkscreen, and solder the wires to the mainboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment