Can someone help me understand why the cryosparc_io.so is getting invalid opcodes and killing the jobs at times? I have seen 21 instances of this this month:
01/all.gz:Oct 1 03:08:44 ga4 kernel: [3596257.176981] traps: python[237937] trap invalid opcode ip:2b47bae0feae sp:2b48d5a4c2c0 error:0 in cryosparc_io.so[2b47bae08000+2d000]
03/all.gz:Oct 3 09:19:49 ga7 kernel: [3791297.061444] traps: python[73443] trap invalid opcode ip:2b49fe804eae sp:2f43972622c0 error:0 in cryosparc_io.so[2b49fe7fd000+2d000]
03/all.gz:Oct 3 09:32:14 ga3 kernel: [3791913.314626] traps: python[211824] trap invalid opcode ip:2b1d1a86deae sp:2b1ea79a82c0 error:0 in cryosparc_io.so[2b1d1a866000+2d000]
03/all.gz:Oct 3 09:48:39 ga7 kernel: [3793026.671240] traps: python[75286] trap invalid opcode ip:2b6815cb55ca sp:2b69175394f0 error:0 in cryosparc_io.so[2b6815cae000+2d000]
03/all.gz:Oct 3 10:48:49 ga7 kernel: [3796636.763563] traps: python[77537] trap invalid opcode ip:2af653ba8eae sp:2af7551832c0 error:0 in cryosparc_io.so[2af653ba1000+2d000]
03/all.gz:Oct 3 12:02:11 ga19 kernel: [699512.990736] traps: python[86418] trap invalid opcode ip:2b8299c87eae sp:2b839b3372c0 error:0 in cryosparc_io.so[2b8299c80000+2d000]
03/all.gz:Oct 3 12:54:39 ga20 kernel: [3804078.859481] traps: python[102163] trap invalid opcode ip:2af4189c0eae sp:2af51be742c0 error:0 in cryosparc_io.so[2af4189b9000+2d000]
03/all.gz:Oct 3 13:47:45 ga17 kernel: [3807020.209252] traps: python[221885] trap invalid opcode ip:2b75a3c31eae sp:2b76a52832c0 error:0 in cryosparc_io.so[2b75a3c2a000+2d000]
03/all.gz:Oct 3 13:49:49 ga7 kernel: [3807497.504271] traps: python[87696] trap invalid opcode ip:2b950a51beae sp:2b960fd082c0 error:0 in cryosparc_io.so[2b950a514000+2d000]
03/all.gz:Oct 3 14:25:40 ga7 kernel: [3809648.975726] traps: python[89880] trap invalid opcode ip:2b9496f18eae sp:2b95aa6832c0 error:0 in cryosparc_io.so[2b9496f11000+2d000]
03/all.gz:Oct 3 14:27:13 ga17 kernel: [3809387.909975] traps: python[224094] trap invalid opcode ip:2b1fcbe99eae sp:2b20cfa8f2c0 error:0 in cryosparc_io.so[2b1fcbe92000+2d000]
03/all.gz:Oct 3 19:29:02 ga20 kernel: [3827741.776435] traps: python[121895] trap invalid opcode ip:2b31364f6eae sp:2b31e23102c0 error:0 in cryosparc_io.so[2b31364ef000+2d000]
03/all.gz:Oct 3 21:07:02 ga17 kernel: [3833376.526132] traps: python[246862] trap invalid opcode ip:2abf7eeb6eae sp:2ac029f7b2c0 error:0 in cryosparc_io.so[2abf7eeaf000+2d000]
04/all.gz:Oct 4 04:47:03 ga13 kernel: [3861134.840269] traps: python[26755] trap invalid opcode ip:2b37bfcf8eae sp:2b3853aa22c0 error:0 in cryosparc_io.so[2b37bfcf1000+2d000]
04/all.gz:Oct 4 04:55:12 ga8 kernel: [3861252.443258] traps: python[112030] trap invalid opcode ip:2b7822a35eae sp:2b7927fc22c0 error:0 in cryosparc_io.so[2b7822a2e000+2d000]
04/all.gz:Oct 4 05:33:20 ga17 kernel: [3863753.473876] traps: python[11436] trap invalid opcode ip:2b565bb0beae sp:2b57f56632c0 error:0 in cryosparc_io.so[2b565bb04000+2d000]
04/all.gz:Oct 4 05:54:45 ga8 kernel: [3864824.464155] traps: python[115475] trap invalid opcode ip:2b084b7a9eae sp:2b094fca12c0 error:0 in cryosparc_io.so[2b084b7a2000+2d000]
04/all.gz:Oct 4 17:06:53 ga11 kernel: [3905613.019307] traps: python[56718] trap invalid opcode ip:2ac88c096eae sp:2ac99f2a32c0 error:0 in cryosparc_io.so[2ac88c08f000+2d000]
05/all.gz:Oct 5 07:53:27 ga17 kernel: [3958556.529031] traps: python[95890] trap invalid opcode ip:2ba820dbdeae sp:2ba91d2142c0 error:0 in cryosparc_io.so[2ba820db6000+2d000]
05/all.gz:Oct 5 11:18:16 ga7 kernel: [3971213.810024] traps: python[235868] trap invalid opcode ip:2af346f7eeae sp:2af4856082c0 error:0 in cryosparc_io.so[2af346f77000+2d000]
06/all.gz:Oct 6 01:49:52 ga8 kernel: [4022916.264838] traps: python[1371] trap invalid opcode ip:2ab2fff435ca sp:2ab52caee8b0 error:0 in cryosparc_io.so[2ab2fff3c000+2d000]
I have also seen some logs in another file:
traps: python[19125] trap invalid opcode ip:2ac6e4a92b64 sp:7fff57bee120 error:0 in bin_motion.so[2ac6e4a63000+44000]
I am currently running cryosparc 4.5.3, and this seems to be a pre-compiled library that comes in the worker tar file, so it seems it was compiled with specific CPU opcodes that are not compatible with my AMD EPYC 74F3 24-Core processors.
Any help tracking this down or mitigating the issue would be helpful.