Crashing RT-kernel for PiZero2W, confirmation / help requested

Greetings,

I have been able to crash RT-kernel for PiZero2W i.e. raspberry_pi_zero2w_trixie_rt_64bit_251204.img.
From https://github.com/hzeller/rpi-rgb-led-matrix/tree/master/RT-kernel

I would like independent confirmation of this crash, and any ideas about how to fix this.

I was able to crash two, independent, PiZero2W setups.
I used two different boards, different power supplies, separate SD cards, unique hubs, etc…
I re-downloaded the .img file, and both copies were binary identical.

I also put raspberry_pi4_trixie_rt_64bit_251204.img on a Pi4, and it did NOT crash.

I am trying to refresh a rpi-rgb-led-matrix project from 2020, written in python.
The older project used either Pi2’s or Pi3s, I am now trying to use a PiZero2W instead.
This project needs to acquire display data via USB/serial.
The data is obtained via a microprocessor, typically over /dev/ttyACM0.

After detecting the crash, I spent a lot of time trying combinations of:
Running rpi-rgb-led-matrix
Exercising python3-serial
Etc…
to re-create this crash.

In the end, all I need to crash, are:
minicom (a serial program)
A suitable serial-providing device.
Transfer serial data.
Crashes typically occur within about 1 minute.

Below is simple python code for a Raspberry Pico, which sends output data when an input character is received.
Follow web directions to install this code on a Pico.

You will need a suitable USB hub for connecting both a Pico, and a Keyboard.
Internet access, via Wi-Fi or Ethernet/USB will be needed to install minicom.

After booting raspberry_pi_zero2w_trixie_rt_64bit_251204.img on a PiZero2W:
Connect a keyboard.
Run: sudo apt install minicom
Connect the programmed Pico to a USB port
Run: minicom -D /dev/ttyACM0
Press and HOLD (for keyboard repeat) various keys.

My observations:
While holding down a key: I sometimes see a couple of 1 second pauses.
Within about a minute, I then see a ~10 second freeze, followed by an OS reboot.

I’d appreciate confirmation, and some help with this, as this OS build is un-usable for me.

Thanks,
-Mike


# pico / micropython program to try and induce a (serial) crash in
# raspberry_pi_zero2w_trixie_rt_64bit_251204.img 

import sys
import select
import machine

# Setup a polling object to monitor stdin
poll_obj = select.poll()
poll_obj.register(sys.stdin, select.POLLIN)

# Setup internal Led
led = machine.Pin(25, machine.Pin.OUT) 

count = 1

while True:

    # Check if there is receive data waiting.
    # Parameter is how long (us) to wait, 0 for non-blocking.
    poll_results = poll_obj.poll(0)
    if poll_results:

        # Read one character from the USB buffer
        char = sys.stdin.read(1)
        # Print a longer response
        led.on()
        print(f"{count:5d} Received: {char}, {hex(ord(char))}")
        led.off()
        count += 1


I ran the journalctl command both before and after the crash, the results were almost identical. I could not identify anything in particular. Data available if needed.

I was trying to find a USB/TTL converter, and see if I some sort of serial console might help.

Great news, that you might have solved this. If you post the proposed changes here, when you are ready, I can help test them.

Thanks,

-Mike

Sorry, same crash. Maybe took 2 mins instead of 1 min.

(6.18.24-v8-rt+ #3 Sun Apr 26 18:34:41 BST)

-Mike

As a test can you please try adding the below to the end of your
/boot/firmware/cmdline.txt

dwc_otg.fiq_fsm_enable=0

then reboot and retry.

I’ve also noticed when using the USB the RT kernel really uses a lot of CPU usage like 10% when viewing through TOP.

I’ve read elsewhere there seems to be issues with RT and the dwc_otg USB driver.
I’m going to remove the RT from the kernel as this was not what was fixing the flickering anyway.

The things that resolved the flickering wrere
Full dynticks system (tickless) and moving the interrupts off Core3.

I added the dwc* line.

Good News: Usb/serial didn’t crash, I probably gave it about 10 minutes of exercise .

Bad News: Keyboard is unusable. Took about 5 minutes to login, until I understood:
1. some (random?) keys are being dropped, and 2. some (random?) keys go into ~ infinite repeat.

To stop an infinite key repeat, you have to press a different key. Kind-of ruins your command line, esp when Backspace goes into infinite repeat during editing.

As an example it took over 10 minutes to type:

sudo /opt/rpi-rgb-led-matrix/examples-api-use/demo --led-rows=64 --led-cols=64 -D 12

At which time it told me that the sound driver was still loaded. End of Test.

Thanks,

-Mike

ok I’ve redone the kernel, kept the features that reduced flicker. The RT part wasn’t the actual fix that reduced flicker.

I now call it optimized rather than realtime.

Here are the new links if you can test.

2 Likes

Good news, keyboard ok, no serial crash. I ran it for over a half hour.

My couple of suggestions to make this easier for the users are:
Disable the audio driver.
Provide a hash (SHA-256) on the github page for users to confirm their external download.

Later, once its compilation is sorted out, maybe also install the python package.

Thanks for your solving this,

-Mike

1 Like

Another suggestion is:
sudo chown -R user:user /opt/rpi-rgb-led-matrix
sudo chmod ug+w /opt/rpi-rgb-led-matrix

Thanks,
-Mike