Just as another data point, I get the same error (Tensorflow not detecting GPUs) on my systems when running cryosparcw test, but GPU-dependent jobs seem to launch and run just fine…
Looking at the joblog from your attachment, this is the error:
2022-10-04 17:33:48.550500: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /mnt/nvme/cryoSPARC/cryosparc_worker/cryosparc_compute/blobio:/mnt/nvme/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib:/mnt/nvme/cryoSPARC/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda-11.6/lib64:/mnt/nvme/cryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib:/mnt/nvme/cryoSPARC/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib
I checked the Lib64 file, this doesn’t have libcusolver.so.10; however have these two libcusolver.so.11 libcusolver.so. Shall I create a hard link as advised as:
cd /usr/local/cuda-11.6/lib64
sudo ln libcusolver.so.11 libcusolver.so.10
We have 4 x RTX 2080 Ti for computing and 1 x GeForce GT 1030 for running the display in one of our workstations.
At first, the GPU test failed with 0 Tensorflow GPU detected. After creating a link for libcusolver.so.10 in lib64 as mentioned above, the four 2080s are at least being detected with Tensorflow.
However, it seems like if all GPUs are not detected with Tensorflow, the tensorflow GPU test seems to fail.
Traceback (most recent call last):
File "cryosparc_worker/cryosparc_compute/run.py", line 93, in cryosparc_compute.run.main
File "/home/user/software/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/instance_testing/run.py", line 161, in run_gpu_job
assert devs == total_gpus, f"Tensorflow detected {devs} of {total_gpus} GPUs."
AssertionError: Tensorflow detected 4 of 5 GPUs.
FWIW, cryoSPARC lists only the 4 computing GPUs in the processing lane. It’d be great to be able to run the test just with the Tensorflow GPUs or skip the Tensorflow test for GPU(s) without Tensorflow capabilities.
Thank you very much for reporting this issue.
In the upcoming release of CryoSPARC, v4.0.1, testing Tensorflow will be optional through a command line flag: --test-tensorflow. In addition, Tensorflow cabililites will only be “checked” on GPUs that have been registered with CryoSPARC. This will ensure the test doesn’t fail if Tensorflow fails to start on a display GPU for example.