GPU Specific "Token is Invalid" Error, Single WS Setup

jr10 · April 27, 2022, 7:23pm

Hello,

This issue may be linked to other, unresolved, issues that have been an issue for almost a year now. Our Linux machine (CentOS 7, kernel 3.10) is running a 10980XE processor on a ASUS x299 SAGE 10GbE board with 128 GB RAM, and 2 3090 RTX cards. We had previously had issues getting both 3090s to run at the same time, with the result being mysterious, untraceable, crashes after a few minutes of job runtime.

A more recent issue following the v3.3.2 update has given a new error, “Token is invalid, another CS instance is running with the same license ID”. This is the only workstation we have, and the license has never been used on another machine. We saw no issues with this prior to the v3.3.2 update, and then didn’t see any issues after the update until we tested 2 GPU jobs again, as we had returned to troubleshooting the original issue.

Currently, we are able to start 1-2 jobs (canceling the first one before starting the second) on GPU-0 before we start to see the license error. If we try to start it on GPU-1, the error is instant. If we try to use both GPU-0 and GPU-1 (either on different jobs or the same job), the error is instant. Restarting cryosparc (cryosparcm stop) doesn’t help. A full restart of the system puts us back at square one. So far I’ve ensured all cryosparc-related processes are dead before restarting and can’t seem to find any orphaned processes that could be causing this.

At this time we are also aware that CentOS 7 and kernel 3.10 are known to be implicated in a number of other Cryosparc issues, and we are working to get Ubuntu 22.04 LTS running in order to check if these issues are tied to the OS. However, any help or theories on this would be greatly appreciated.

Best,
Justas

wtempel · May 3, 2022, 2:00pm

@jr10 Has the “Token is Invalid” error again occurred after the switch to ubuntu 22.04? If not, have any workloads that would have triggered the error on centOS-7 been successfully handled on ubuntu 22.04?

jr10 · May 3, 2022, 5:08pm

Hi @wtempel,

Thankfully, the swap to Ubuntu 22.04 LTS seemed to fix everything. The system can now run 2 GPU jobs without crashing, and we have seen no “Token is invalid” errors. Setup on 22.04 LTS was nearly a non-issue, with only one significant glitch (gcc/g++ was installed at version 11.2, when it must be <10 to work with cuda 11.2). We are dealing with a residual issue that causes the cache to overheat, but that is simply hardware and placement of the M.2 drive.

Best,
Justas

wtempel · May 3, 2022, 5:23pm

Thank you for the update @jr10, and for sharing early experiences regarding ubuntu 22.04.