Kernel message: PCIe Bus Error when receiving messages

CAN FD Interface for M.2 (PCIe)
wangroger0801
Posts: 1
Joined: Thu 21. Mar 2019, 17:09

Kernel message: PCIe Bus Error when receiving messages

Post by wangroger0801 » Thu 21. Mar 2019, 18:01

Hi, we have PCAN-M.2 Four Channe IPEH-004085 series-no: C1/118 in my device, two CANs port are connected. When they are receiving the messages, the system reports the following error message.

Code: Select all

pcieport 0000:00:01.1: AER: Corrected error received: 0000:00:00.0
pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
pcieport 0000:00:01.1:    [ 6] BadTLP                
We had Arch Linux custom real-time kernel. The CAN_PEAK_PCIEFD module was included. Messages can be read from the card. But the flooding error messages bother me so much.

It worked fine without any error message in the normal kernel on the same hardware system. Is there any specific kernel model we should include other than the CAN_PEAK_PCIEFD?

We have enabled: CAN_PEAK_PCI + CAN_PEAK_PCIEC + CAN_PEAK_PCIEFD + CAN_PEAK_USB

We tried to add grub command such as setting pci=nommconf or pci=noaer or pcie_aspm=off or pci=nomsi None of them helped.

We did some experiments that by presenting one of the CAN by "ip link set", the error message disappeared. But after we reset the other can port, the message flooded again.

Here is some dmesg releated to peak:

Code: Select all

[    0.294121] CAN device driver interface
[    0.294134] usbcore: registered new interface driver peak_usb
[    0.294182] peak_pciefd 0000:01:00.0: 4x CAN-FD PCAN-PCIe FPGA v3.2.1:
[    0.294268] peak_pciefd 0000:01:00.0: can0 at reg_base=0x00000000efadab57 irq=46
[    0.294332] peak_pciefd 0000:01:00.0: can1 at reg_base=0x0000000011fceca6 irq=46
[    0.294388] peak_pciefd 0000:01:00.0: can2 at reg_base=0x0000000061e0cde5 irq=46
[    0.294444] peak_pciefd 0000:01:00.0: can3 at reg_base=0x00000000d7541f72 irq=46
[    0.294458] sja1000 CAN netdevice driver
[    0.294496] e100: Intel(R) PRO/100 Network Driver, 3.5.24-k2-NAPI
[    0.294497] e100: Copyright(c) 1999-2006 Intel Corporation
[    0.294510] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[    0.294511] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    0.294525] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    0.294525] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    0.294541] sky2: driver version 1.30
[    0.300212] libphy: r8169: probed
[    0.300418] r8169 0000:0a:00.0 eth0: RTL8168g/8111g, e0:d5:5e:24:90:8d, XID 4c0, IRQ 47
[    0.300420] r8169 0000:0a:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Hardware Info:
gigabyte mother board https://www.gigabyte.com/Motherboard/GA ... -rev-10#kf
AMD Ryzen 7 1700 Eight-Core Processor

Code: Select all

H/W path            Device     Class          Description
=========================================================
                               system         AB350N-Gaming WIFI (Default string)
/0                             bus            AB350N-Gaming WIFI-CF
/0/0                           memory         64KiB BIOS
/0/25                          memory         16GiB System Memory
/0/25/0                        memory         8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2400 MHz (0.4 ns)
/0/25/1                        memory         [empty]
/0/25/2                        memory         8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2400 MHz (0.4 ns)
/0/25/3                        memory         [empty]
/0/27                          memory         768KiB L1 cache
/0/28                          memory         4MiB L2 cache
/0/29                          memory         16MiB L3 cache
/0/2a                          processor      AMD Ryzen 7 1700 Eight-Core Processor
/0/100                         bridge         Family 17h (Models 00h-0fh) Root Complex
/0/100/0.2                     generic        Family 17h (Models 00h-0fh) I/O Memory Management Unit
/0/100/1.1                     bridge         Family 17h (Models 00h-0fh) PCIe GPP Bridge
/0/100/1.1/0        can1       network        PEAK-System Technik GmbH
/0/100/1.3                     bridge         Family 17h (Models 00h-0fh) PCIe GPP Bridge
/0/100/1.3/0                   bus            300 Series Chipset USB 3.1 xHCI Controller
/0/100/1.3/0/0      usb1       bus            xHCI Host Controller
/0/100/1.3/0/0/a               communication  Bluetooth wireless interface
/0/100/1.3/0/1      usb2       bus            xHCI Host Controller
/0/100/1.3/0.1                 storage        300 Series Chipset SATA Controller
/0/100/1.3/0.2                 bridge         Advanced Micro Devices, Inc. [AMD]
/0/100/1.3/0.2/0               bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/1               bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/4               bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/5               bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/5/0  wlp8s0     network        Wireless 3165
/0/100/1.3/0.2/6               bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/7               bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/7/0  enp10s0    network        RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
/0/100/3.1                     bridge         Family 17h (Models 00h-0fh) PCIe GPP Bridge
/0/100/3.1/0                   display        GP106 [GeForce GTX 1060 6GB]
/0/100/3.1/0.1                 multimedia     GP106 High Definition Audio Controller
/0/100/7.1                     bridge         Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
/0/100/7.1/0                   generic        Advanced Micro Devices, Inc. [AMD]
/0/100/7.1/0.2                 generic        Family 17h (Models 00h-0fh) Platform Security Processor
/0/100/7.1/0.3                 bus            Family 17h (Models 00h-0fh) USB 3.0 Host Controller
/0/100/7.1/0.3/0    usb3       bus            xHCI Host Controller
/0/100/7.1/0.3/1    usb4       bus            xHCI Host Controller
/0/100/8.1                     bridge         Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
/0/100/8.1/0                   generic        Advanced Micro Devices, Inc. [AMD]
/0/100/8.1/0.2                 storage        FCH SATA Controller [AHCI mode]
/0/100/8.1/0.3                 multimedia     Family 17h (Models 00h-0fh) HD Audio Controller
/0/100/14                      bus            FCH SMBus Controller
/0/100/14.3                    bridge         FCH LPC Bridge
/0/101                         bridge         Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
/0/102                         bridge         Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
/0/103                         bridge         Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
/0/104                         bridge         Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
/0/105                         bridge         Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
/0/106                         bridge         Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
/0/107                         bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
/0/108                         bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
/0/109                         bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
/0/10a                         bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
/0/10b                         bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
/0/10c                         bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
/0/10d                         bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
/0/10e                         bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
/0/1                scsi4      storage        
/0/1/0.0.0          /dev/sda   disk           500GB Samsung SSD 850
/0/1/0.0.0/1        /dev/sda1  volume         1023KiB BIOS Boot partition
/0/1/0.0.0/2        /dev/sda2  volume         465GiB EXT4 volume
/1                  docker0    network        Ethernet interface

Thanks!

Best regards,
Roger

User avatar
S.Grosjean
Software Development
Software Development
Posts: 357
Joined: Wed 4. Jul 2012, 17:02

Re: Kernel message: PCIe Bus Error when receiving messages

Post by S.Grosjean » Fri 22. Mar 2019, 11:16

Hi,

pci=noaer should normally do the stuff, since this argument prevents the AER module to register in the Kernel... Are you sure of your Kernel command line?

Code: Select all

$ dmesg | grep "Command line:"
Can you give us your Kernel version please?

Code: Select all

$ uname -a
Is the RT version built from the linux-rt patch?

We can propose you an out-of-tree version of peak_pciefd that enables usage of MSI rather than legacy interrupts. Could you please contact us by sending an email to support@peak-system.com?

Regards,
— Stéphane

daniel.grim
Posts: 5
Joined: Thu 18. Jan 2024, 17:05

Re: Kernel message: PCIe Bus Error when receiving messages

Post by daniel.grim » Fri 19. Jan 2024, 12:26

Hello,
I have been running into problems with these errors quite frequently now on multiple setups. Happens randomly. Few work fine (example: same HW and image, 1 works 1 does't).
We have around 20 devices with problems.

Code: Select all

 AER: Multiple Corrected error received: 0000:00:1a.0
Jan 19 10:29:05 linux kernel: [  190.486833] pcieport 0000:00:1a.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Jan 19 10:29:05 linux kernel: [  190.486835] pcieport 0000:00:1a.0:   device [8086:7ac8] error status/mask=00000041/00002000
Jan 19 10:29:05 linux kernel: [  190.486838] pcieport 0000:00:1a.0:    [ 0] RxErr                  (First)
Jan 19 10:29:05 linux kernel: [  190.486840] pcieport 0000:00:1a.0:    [ 6] BadTLP
I tried to minimize the setup to:
1x PCAN-M.2 quad
6.0.0-0.deb11.2-amd64
asus b660m-a wifi d4
Intel 13th gen without gpu
M.2 Storage + 2,5" OS drive

and filling it with a PCAN-USB Pro
2 messages of 10 and 50 cycle times (lowering 500-2000 the cycle time seems to lower the likelyhood of fails but I can't prove this enough to make it a fact)


- Happens on multiple HW setups, AMD, INTEL, different motherboard vendors and chipsets, also on server boards, same behavior. Mostly random, some quad M.2 PCAN cards tend to have longer periods before failing some fail quicker.
- Other setups are connected to different CAN devices and go as far as 16 cans per setup
- Didn't happen as much in the past (older linux kernels), we started seeing these problems somewhere around 5+ kern, all using debian.
- Warm rebooting after the fail sometimes results in a undetected can cards. Even lspci doesn't have them listed. Cold power on usually fixes it.
- Some PCAN-M.2 quads were older and reused in newer devices.
- Using only PCAN-M.2 quad.
- Setups are used for logging

Thank you,

Daniel.

M.Maidhof
Support
Support
Posts: 1751
Joined: Wed 22. Sep 2010, 14:00

Re: Kernel message: PCIe Bus Error when receiving messages

Post by M.Maidhof » Fri 19. Jan 2024, 14:50

Hello Daniel,

please check in dmesg which firmware version is on the card. If you use a version <3.5.7, please update to 3.5.7. You can use PEAK-Flash for Windows or send us the output of uname -a by email to be able to send you a suitable flashtool for Linux.

regards

Michael

daniel.grim
Posts: 5
Joined: Thu 18. Jan 2024, 17:05

Re: Kernel message: PCIe Bus Error when receiving messages

Post by daniel.grim » Fri 19. Jan 2024, 15:04

Hello Michael,

thank you for such a quick answer.

My version is indeed older than 3.5.7. I will proceed to contact your support with a request for that Linux flashing tool. I will add this ticket to it.

Code: Select all

[    4.737434] peak_pciefd 0000:0a:00.0: 4x CAN-FD PCAN-PCIe FPGA v3.2.1
 

M.Maidhof
Support
Support
Posts: 1751
Joined: Wed 22. Sep 2010, 14:00

Re: Kernel message: PCIe Bus Error when receiving messages

Post by M.Maidhof » Fri 19. Jan 2024, 15:32

Hello Daniel,

flashtool was sent by email.

regards

Michael

daniel.grim
Posts: 5
Joined: Thu 18. Jan 2024, 17:05

Re: Kernel message: PCIe Bus Error when receiving messages

Post by daniel.grim » Fri 19. Jan 2024, 16:27

Continuing with another problem.

Code: Select all

sudo ./pcanflash-FW_3.5.7-x86_64 --info

Warning: this version flashes PCAN-PCI_Express_FD_v3.5.7.bin ONLY!
pcanflash: mapping virtual address space failed (errno=22)
pcanflash: unable to find any PEAK-System pci device in this host

Code: Select all

./pcanflash-FW_3.5.7-x86_64 --debug

Warning: this version flashes PCAN-PCI_Express_FD_v3.5.7.bin ONLY!
pcanflash: parsing #20 "/sys/bus/pci/devices/0000:02:00.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:02:00.0/vendor")
pcanflash: 0x001c found!
pcanflash: found "0000:02:00.0"
pcanflash: init_peak_pci_device("0000:02:00.0")
pcanflash: init_peak_device("0000:02:00.0")
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:02:00.0/device")
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:02:00.0/enable")
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:02:00.0/subsystem_device")
pcanflash: init_peak_pci_device("0000:02:00.0"): BAR0 = 0xffffffffffffffff
pcanflash: mapping virtual address space failed (errno=22)
pcanflash: parsing #19 "/sys/bus/pci/devices/0000:01:00.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:01:00.0/vendor")
pcanflash: 0x15b7 not PEAK
pcanflash: parsing #18 "/sys/bus/pci/devices/0000:00:1f.6"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:1f.6/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #17 "/sys/bus/pci/devices/0000:00:1f.5"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:1f.5/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #16 "/sys/bus/pci/devices/0000:00:1f.4"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:1f.4/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #15 "/sys/bus/pci/devices/0000:00:1f.3"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:1f.3/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #14 "/sys/bus/pci/devices/0000:00:1f.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:1f.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #13 "/sys/bus/pci/devices/0000:00:1c.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:1c.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #12 "/sys/bus/pci/devices/0000:00:1a.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:1a.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #11 "/sys/bus/pci/devices/0000:00:17.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:17.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #10 "/sys/bus/pci/devices/0000:00:16.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:16.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #9 "/sys/bus/pci/devices/0000:00:15.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:15.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #8 "/sys/bus/pci/devices/0000:00:14.3"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:14.3/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #7 "/sys/bus/pci/devices/0000:00:14.2"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:14.2/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #6 "/sys/bus/pci/devices/0000:00:14.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:14.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #5 "/sys/bus/pci/devices/0000:00:0e.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:0e.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #4 "/sys/bus/pci/devices/0000:00:0a.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:0a.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #3 "/sys/bus/pci/devices/0000:00:06.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:06.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: parsing #2 "/sys/bus/pci/devices/0000:00:00.0"
pcanflash: read_device_attr("/sys/bus/pci/devices/0000:00:00.0/vendor")
pcanflash: 0x8086 not PEAK
pcanflash: unable to find any PEAK-System pci device in this host
Thanks.

M.Maidhof
Support
Support
Posts: 1751
Joined: Wed 22. Sep 2010, 14:00

Re: Kernel message: PCIe Bus Error when receiving messages

Post by M.Maidhof » Mon 22. Jan 2024, 13:30

Hi,

please rmmod all pcan and peak_pciefd drivers and try again.

regards

Michael

daniel.grim
Posts: 5
Joined: Thu 18. Jan 2024, 17:05

Re: Kernel message: PCIe Bus Error when receiving messages

Post by daniel.grim » Wed 31. Jan 2024, 13:27

Hi,

I tried again with rmmod but after a power cycle the version was still the old one.
However using modprobe -r solved the issue and flashing process worked. (found that it "should also clear the dependencies" don't know if there are any, but it works)

We are testing the new 3.5.7 version on our QUAD M.2 CAN cards for a week now and everything seems to be working as expected.

The last question is. Can new cards be ordered already with 3.5.7 version installed?

Thank you again for your help and I will post more information if anything changes.

Daniel

M.Maidhof
Support
Support
Posts: 1751
Joined: Wed 22. Sep 2010, 14:00

Re: Kernel message: PCIe Bus Error when receiving messages

Post by M.Maidhof » Wed 31. Jan 2024, 14:43

Hi,

thanks for the feedback. At the moment there are still batches with FW3.2 in our stock. If you need version 3.5.7 by delivery, please send a separate postion in your PO to handle that option.

regards

Michael

Post Reply