Homogeneous refinement (new) fails with compile error in 2.13: error: identifier "__shfl_down_sync" is undefined

lawsond · February 13, 2020, 3:09pm

Hi All,

I have just upgraded from 2.11 to 2.13 and wanted to try out the new homogeneous refinement job. However, it fails with two different datasets - error messages below. It suggests some problem with compilation. The legacy version of homogeneous refinement is working, as is 2D classification and ab initio reconstruction. I haven’t tested anything else yet. I have tried restarting cryosparc but that makes no difference.
Thanks for any help
Dave Lawson

stephan · February 13, 2020, 3:11pm

Hi @lawsond,

What GPUs are you using? What CUDA version and NVIDIA Driver version are you running?

lawsond · February 13, 2020, 3:18pm

cat /usr/local/cuda/version.txt gives: CUDA Version 8.0.61

nvidia-smi gives: Driver Version: 390.48

Thanks

Dave

stephan · February 13, 2020, 3:20pm

Hey @lawsond,

Are you able to install CUDA 10.2 (latest)? This error is because the CUDA Toolkit you’re using doesn’t have the function we’re using in our kernel in the latest job.

lawsond · February 13, 2020, 3:22pm

and GPUs are GeForce GTX 1080s

Dave

lawsond · February 13, 2020, 3:26pm

Hi @stephan

I will get CUDA 10.2 installed and try again. Do I just re-rerun cryosparcm update again?

Thanks

Dave

stephan · February 13, 2020, 3:32pm

Hi @lawsond,

Once you’ve installed CUDA 10.2 and its required NVIDIA Driver, all you have to do is recompile one of the dependencies that cryoSPARC uses by running one function. This does not require updating or restarting cryoSPARC. Follow the instructions in this post:

lawsond · February 13, 2020, 3:34pm

Hi @stephan

Great - thanks very much for the advice.

Dave

lawsond · February 19, 2020, 11:18am

Hi @stephan

Following your advice we now have CUDA 10.2 and updated the NVIDIA Driver

I then ran:
cryosparc2_worker/bin/cryosparcw newcuda /usr/local/cuda-10.2

…which completed with the following lines:
Installing collected packages: pycuda
Running setup.py install for pycuda … done
Successfully installed pycuda-2019.1
You are using pip version 9.0.1, however version 20.0.2 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.
Finished!

I have not upgraded pip - should I?

However, I was unable to connect to the cryosparc server through my browser

so I did a “cryosparcm restart”

and got the following:
CryoSPARC is running.
Stopping cryosparc.
command_core: stopped
database: stopped
Shut down
Starting cryoSPARC System master process…
CryoSPARC is not already running.
database: started
command_core: started
/home/cryosparc_user/software/cryosparc/cryosparc2_master/bin/cryosparcm: line 424: curl: command not found
/home/cryosparc_user/software/cryosparc/cryosparc2_master/bin/cryosparcm: line 424: curl: command not found
etc…

lines 424-426 look like this:
while ! curl http://$CRYOSPARC_MASTER_HOSTNAME:$CRYOSPARC_COMMAND_CORE_PORT -m1 -o/dev/null -s; do
sleep 0.5
done

…which unfortunately means nothing to me.

Any ideas?

Thanks again

Dave

stephan · February 19, 2020, 7:27pm

Hi @lawsond,

Can you report your OS? Can you also try installing curl via yum or apt (whichever package manager your OS uses)?

lawsond · February 20, 2020, 8:41am

Hi @stephan

I am using Ubuntu 16.04.4 LTS
I have just installed curl using apt
I can now “cryosparcm restart” without any error messages

However, I now get the following error when I run jobs - so far I have seen this for 2D classification and legacy and new homogeneous refi.

…

…and several messages have popped up about database migration

Thanks again

Dave

lawsond · February 26, 2020, 8:53am

Hi All,

I am still stuck at this stage, so if anyone has any insight, that would be much appreciated.

Thanks

Dave

apunjani · February 26, 2020, 7:41pm

Hi @lawsond,

the libcurand.so.8 error means that cryoSPARC is still looking for CUDA 8.0 libraries for some reason. The simplest way to fix this without digging too much into the guts of the system is to reinstall the cryosparc2_worker package only (i.e. not the cryosparc2_master - this should be kept exactly as is). You can do this by renaming your current cryosparc2_worker to cryosparc2_worker_old or similar, and then following the worker installation process again, this time specifying /usr/local/cuda-10.2 as your cuda path right from the start.

The worker installation process is here:

If you install the worker to the same location as it was previously, you will not have to change any configuration and jobs should start to run.

lawsond · February 27, 2020, 8:26am

Hi @apunjani
Thanks for the advice. When I intially installed cryosparc I think I just did the quick installation for a single workstation, rather than separate installations for master and worker. Just to avoid me messing up, please could you clarify my best course of action - should I repeat the quick installation?
Thanks very much
Dave

stephan · March 4, 2020, 7:11pm

Hey @lawsond,

The standalone installation script installs the master and worker together- it’s essentially the same as doing the master and worker install separately. You can still do what @apunjani said in this case!

lawsond · March 5, 2020, 10:37am

Hi @stephan
Thanks for confirming. That nearly worked - I ended up with the worker and master running different versions.
I fixed this using cryosparcm update
Thanks again to both @stephan and @apunjani for the help!
Dave

stephan · March 5, 2020, 3:34pm

Oops! Sorry, I forgot about that. Glad you got it working!