PCAN-M.2 CAN errors in combination with a IFM RM9000

mbuijs · Post by **mbuijs** » Tue 20. Oct 2020, 18:13

Hello,

I'm using a PCAN-M.2 interface on a Intel NUC7i7DNBE to communicate with a machine over a 500 kbit CAN bus. The CAN bus on this machine has 2 custom boards and 3 IFM RM9000 CANOpen encoders connected, so combined with the NUC there are 6 nodes on the bus. The bus is properly terminated at both ends. On the NUC I have Ubuntu 18.04 installed (kernel 5.4.0-51), I'm using mainline netdev.

As long as the NUC is not connected, the CAN bus communication between the 5 connected nodes works fine, the software running on one of the custom boards can read encoder values and so on and the 2 custom boards can communicate with each other.
As long as the 3 encoders are not connected, the NUC can receive all communication on the bus from the 2 custom boards without any problems. I have not tried transmitting anything from the NUC yet, as I first wanted the bus to work fine with all nodes connected.

When I connect both the encoders and the NUC with the PCAN-M.2 interface, the PCAN-M.2 starts returning error frames as follows:

Code: Select all

$ candump -tA -e can0,0~0,#FFFFFFFF
 (2020-10-20 13:14:44.043915)  can0  20000004   [8]  00 04 00 00 00 00 00 63   ERRORFRAME
	controller-problem{rx-error-warning}
	error-counter-tx-rx{{0}{99}}
 (2020-10-20 13:14:44.043916)  can0  20000004   [8]  00 10 00 00 00 00 00 87   ERRORFRAME
	controller-problem{rx-error-passive}
	error-counter-tx-rx{{0}{135}}
 (2020-10-20 13:14:44.050835)  can0  20000004   [8]  00 04 00 00 00 00 00 7F   ERRORFRAME
	controller-problem{rx-error-warning}
	error-counter-tx-rx{{0}{127}}

I have attached a full log that I created using candump where I plugin the encoders and unplug them at the end.

When I configure can0 as listen-only, there seems to be no problem and all messages are received.

My problem solving experience while using CAN is limited, I don't really understand what this error means, so I didn't really know where to start looking.
In an attempt to learn something, I tried using the proprietary driver. Unfortunately I shows the same problem, I've attached a trace that I created using pcanview. While using pcanview I also tried different clock speeds, but no improvement unfortunately.

For reference the output of ip link:

Code: Select all

$ ip -d -s link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/can  promiscuity 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 85) restart-ms 100
	  bitrate 500000 sample-point 0.875
	  tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
	  peak_canfd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp-inc 1
	  peak_canfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp-inc 1
	  clock 80000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          1744       1720       0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    RX: bytes  packets  errors  dropped overrun mcast
    85548      14763    0       3537    0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0

$ ip -d -s link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/can  promiscuity 0
    can state ERROR-WARNING (berr-counter tx 0 rx 127) restart-ms 100
	  bitrate 500000 sample-point 0.875
	  tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
	  peak_canfd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp-inc 1
	  peak_canfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp-inc 1
	  clock 80000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          1819       1794       0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    RX: bytes  packets  errors  dropped overrun mcast
    87688      15083    0       3537    0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0

$ ip -d -s link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/can  promiscuity 0
    can state ERROR-PASSIVE (berr-counter tx 0 rx 135) restart-ms 100
	  bitrate 500000 sample-point 0.875
	  tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
	  peak_canfd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp-inc 1
	  peak_canfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp-inc 1
	  clock 80000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          1834       1809       0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    RX: bytes  packets  errors  dropped overrun mcast
    88580      15238    0       3537    0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0

Post by **M.Heidemann** » Wed 21. Oct 2020, 08:37

Hello,

Please check if your CANopen nodes use internal termination and if your bus is over-terminated, perhaps.

Furthermore, does this error also occur if two Socket-Can nodes (one of them being the PCAN-M.2) are communicating with each other?

Please report back to me regarding this.

Best Regards

Marvin

mbuijs · Post by **mbuijs** » Thu 22. Oct 2020, 15:39

Hello,

The bus is properly terminated. As I mentioned in my message there is no problem when the PCAN-M.2 is connected with the 2 custom boards.

This morning I experimented a bit more with the setup, also having an additional RM9000 encoder available for testing. I found the following:

When encoder 1 is connected, the error occurs.
When encoder 2 is connected, the error occurs.
When encoder 3 is connected, the error does not occur.
When encoder 4 is connected, the error does not occur.
When encoders 3 and 4 are connected, the error does not occur.

Then I returned my setup to having encoder 1, 2 and 3 connected as it was before and started trying different bus timing configurations. Nothing seemed to help, until I tried setting the baudrate to 501 kbit:

Code: Select all

$ sudo ip link set can0 type can baudrate 501000
$ ip -d -s link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/can  promiscuity 0 
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100 
	  bitrate 500000 sample-point 0.800 
	  tq 400 prop-seg 1 phase-seg1 2 phase-seg2 1 sjw 1
	  peak_canfd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp-inc 1
	  peak_canfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp-inc 1
	  clock 80000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    RX: bytes  packets  errors  dropped overrun mcast   
    17836884   3090436  0       11635   0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    301016     40580    0       0       0       0

Using this configuration there are no errors even with the encoder 1 and 2 connected to the CAN bus.

Afterward I also tried other baudrates (502000, 500500, 501500), but none worked without side effects. It seems that either the clock rate of the encoders or the clock rate of the PCAN-M.2 is not accurate. Do you have any suggestion to investigate this on the PCAN-M.2 side?

Post by **PEAK-Support** » Fri 23. Oct 2020, 08:47

As Marvin wrote - check cabeling, measure termination etc.
This have nothing to do with our card it looks like a local setting of your CANopen Devices (duplicate Node IDs, wrong cabeling, over Terminated etc.). If the Devies are real CANopen Devices, the Standard 500K Bitrate should work.

mbuijs · Post by **mbuijs** » Fri 23. Oct 2020, 13:06

Today I tried using a PCAN-USB adapter (IPEH-002022) and it didn't show any error. What is the difference between PCAN-USB and PCAN-M.2?

Edit:
So I noticed that the bit timing that is used by PCAN-USB when configured to 500 kbit are the following:

Code: Select all

          bitrate 500000 sample-point 0.875 
          tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
          pcan_usb: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
          clock 8000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

While the PCAN-M.2 defaults to the following:

Code: Select all

	  bitrate 500000 sample-point 0.875
	  tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
	  peak_canfd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp-inc 1
	  peak_canfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp-inc 1
	  clock 80000000

When I configure the PCAN-M.2 to the same settings, there are no CAN errors either. I used the following commands:

Code: Select all

$ sudo ip link set can0 type can tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
$ ip -d link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/can  promiscuity 0 
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100 
	  bitrate 500000 sample-point 0.875 
	  tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
	  peak_canfd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp-inc 1
	  peak_canfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp-inc 1
	  clock 80000000numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Post by **M.Heidemann** » Fri 23. Oct 2020, 14:32

Hello,

The PCAN-M.2 is a FD-capable device, whereas the PCAN-USB only supports CAN2.0.

Both devices use different CAN-Controllers which have different default configurations for 500kbit/s sp 0.875 under SocketCAN.

As you noticed the same configuration can be applied to the PCAN-M.2 as well in case there is an incompatibility with the default bitrate at that sample-point.

Best Regards

Marvin

mbuijs · Post by **mbuijs** » Mon 26. Oct 2020, 10:16

So I think I found the root cause.

When configuring the PCAN-M.2 to 501 kbit, it would choose tq 400 instead of tq 12. There would be no problem in that case.
When changing SJW to 2 instead of the default 1, there seems to be no problem.
When measuring the bit timing using an oscilloscope, I found that the 2 encoders that trigger the CAN errors, apply a bit timing that sums up to 1988 ns. The other boards and the PCAN-USB sum up to 2000 ns, just like the PCAN-M.2.
I believe the combination of the inaccurate encoders with the strict tq of 12 ns and SJW 1 was causing the problem.

I decided to use the PCAN-USB settings (tq 125 ns, SJW 1) on the PCAN-M.2 for now, as I don't have any other CAN FD devices anyway.

PEAK-System Forum

PCAN-M.2 CAN errors in combination with a IFM RM9000

PCAN-M.2 CAN errors in combination with a IFM RM9000

Re: PCAN-M.2 CAN errors in combination with a IFM RM9000

Re: PCAN-M.2 CAN errors in combination with a IFM RM9000

Re: PCAN-M.2 CAN errors in combination with a IFM RM9000

Re: PCAN-M.2 CAN errors in combination with a IFM RM9000

Re: PCAN-M.2 CAN errors in combination with a IFM RM9000

Re: PCAN-M.2 CAN errors in combination with a IFM RM9000