This issue may be linked to other, unresolved, issues that have been an issue for almost a year now. Our Linux machine (CentOS 7, kernel 3.10) is running a 10980XE processor on a ASUS x299 SAGE 10GbE board with 128 GB RAM, and 2 3090 RTX cards. We had previously had issues getting both 3090s to run at the same time, with the result being mysterious, untraceable, crashes after a few minutes of job runtime.
A more recent issue following the v3.3.2 update has given a new error, “Token is invalid, another CS instance is running with the same license ID”. This is the only workstation we have, and the license has never been used on another machine. We saw no issues with this prior to the v3.3.2 update, and then didn’t see any issues after the update until we tested 2 GPU jobs again, as we had returned to troubleshooting the original issue.
Currently, we are able to start 1-2 jobs (canceling the first one before starting the second) on GPU-0 before we start to see the license error. If we try to start it on GPU-1, the error is instant. If we try to use both GPU-0 and GPU-1 (either on different jobs or the same job), the error is instant. Restarting cryosparc (cryosparcm stop) doesn’t help. A full restart of the system puts us back at square one. So far I’ve ensured all cryosparc-related processes are dead before restarting and can’t seem to find any orphaned processes that could be causing this.
At this time we are also aware that CentOS 7 and kernel 3.10 are known to be implicated in a number of other Cryosparc issues, and we are working to get Ubuntu 22.04 LTS running in order to check if these issues are tied to the OS. However, any help or theories on this would be greatly appreciated.