FSC error during refinement on cluster

david.haselbach · December 3, 2018, 1:43pm

Hi,

when I run my refinments on our cluster I do frequently get them stopped with the following error:

cryosparc2_compute/sigproc.py:765: RuntimeWarning: invalid value encountered in divide
fsc_true = (fsc_t - fsc_n) / (1.0 - fsc_n)
0.143 at 78.593 radwn. 0.5 at 56.354 radwn. Took 93.385s.

But when i run the same job on a workstation it easily runs through. what could be the issue?

Best,

David

stephan · December 3, 2018, 3:14pm

Hey @david.haselbach,

Is there a dependency mismatch on the cluster node versus the workstation (i.e. CUDA version)?

david.haselbach · December 3, 2018, 4:33pm

I am not sure can I find this out somehow?

stephan · December 3, 2018, 4:48pm

Hey @david.haselbach,

Log onto the workstation and (assuming it is UNIX based) run the commands:

eval $(cryosparcm env) #loads cryoSPARC environment variables into your path
nvcc --version

You should get something like:

Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61```

EDIT: alternatively, `cat /usr/local/cuda/version.txt`

david.haselbach · December 4, 2018, 6:57am

Yes there is a mismatch. Its 8.0 on the workstation and 9.1 on the cluster.

stephan · December 4, 2018, 3:12pm

Hey @david.haselbach,

A mismatch in CUDA versions should not be a problem unless the cryosparc2_worker directory was installed on a volume shared by the workstation and cluster nodes. To quote the cluster installation documentation,

Installation of the worker is done only once, and the same worker installation is used by any cluster nodes that run jobs. Thus, all cluster nodes must have the same CUDA version, CUDA path and SSD path (if any).

The cluster worker installation needs to be run on a node that either is a cluster worker, or has the same configuration as cluster workers, to ensure that CUDA compilation will be successful at install time.

I believe the easiest way to resolve this problem is to log onto each of the cluster nodes, download and install CUDA 8.0, and ensure it is sym-linked to /usr/local/cuda (or wherever the value of CRYOSPARC_CUDA_PATH is in the file cryosparc2_worker/config.sh)

Hope this helps.

david.haselbach · December 4, 2018, 5:43pm

But if this is not the problem what could it be?