Local Decomposition Issue in Non-Uniform Refinement

akoehl · October 29, 2018, 6:01pm

I was trying to run non-uniform refinement with dynamic masking (I didn’t input my own mask). The initial iteration worked fine, but I ran into an issue at the end of iteration 002 in the local decomposition:

‘’’
Cross-validation…

Using Filter Radius 50.803 (5.442A) | Previous: 40.714 (6.791A)

Local decomposition…

Processing 1 of 4…

Processing 2 of 4…

Processing 3 of 4…

Processing 4 of 4…
Traceback (most recent call last):
File “cryosparc2_worker/cryosparc2_compute/run.py”, line 78, in cryosparc2_compute.run.main
File “cryosparc2_worker/cryosparc2_compute/jobs/nonuniform_refine/run.py”, line 703, in cryosparc2_compute.jobs.nonuniform_refine.run.run_non_uni_refine
File “cryosparc2_worker/cryosparc2_compute/jobs/local_filter/run.py”, line 292, in cryosparc2_compute.jobs.local_filter.run.standalone_locfilter
File “cryosparc2_worker/cryosparc2_compute/jobs/local_filter/run.py”, line 335, in cryosparc2_compute.jobs.local_filter.run.standalone_locfilter
File “/home/groups/kobilka/programs/CryoSparc/2.3/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py”, line 253, in fft
return _fft(x_gpu, y_gpu, plan, cufft.CUFFT_FORWARD)
File “/home/groups/kobilka/programs/CryoSparc/2.3/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py”, line 198, in _fft
int(y_gpu.gpudata))
File “/home/groups/kobilka/programs/CryoSparc/2.3/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py”, line 285, in cufftExecR2C
cufftCheckStatus(status)
File “/home/groups/kobilka/programs/CryoSparc/2.3/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cufft.py”, line 110, in cufftCheckStatus
raise cufftExceptions[status]
cufftExecFailed
‘’’

I am running the latest version (2.4) on TitanXp GPUs. CryoSparc is installed on a cluster (CentOS 7).

Output of cryosparcm status:

CryoSPARC System master node installed at
/home/groups/kobilka/programs/CryoSparc/2.3/cryosparc2_master
Current cryoSPARC version: v2.4.0

cryosparcm process status:

command_core RUNNING pid 18026, uptime 1:15:41
command_proxy RUNNING pid 18130, uptime 1:15:25
command_vis RUNNING pid 18122, uptime 1:15:27
database RUNNING pid 17943, uptime 1:15:47
watchdog_dev STOPPED Not started
webapp RUNNING pid 18136, uptime 1:15:22
webapp_dev STOPPED Not started

global config variables:

export CRYOSPARC_LICENSE_ID="*" (Edited out)
export CRYOSPARC_MASTER_HOSTNAME="" (Edited out)
export CRYOSPARC_DB_PATH="/home/groups/kobilka/programs/CryoSparc/cryosparc_database"
export CRYOSPARC_BASE_PORT=39008
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false

The output of the worker config file:

export CRYOSPARC_LICENSE_ID="******" (Edited out)
export CRYOSPARC_USE_GPU=true
export CRYOSPARC_CUDA_PATH="/share/software/user/open/cuda/8.0.61" [Cuda version 8.0.61]
export CRYOSPARC_DEVELOP=false

Best,
Antoine

akoehl · November 6, 2018, 10:59pm

The error is very reproducible. Always on local decomposition 4.
Increasing the amount of RAM doesn’t seem to help. Neither does decreasing the micro batch size to 1000. Switching to CPU brings up a different error in the local decomposition routine.
I’m using a box size of 256, which is the full size of the extracted particle. When I try to use a smaller box- say 240, it still runs into the same cufftExecFailed error.

akoehl · November 30, 2018, 11:45pm

Update:

Upgrading to v 2.4.4 and using CUDA 9.1.85 fixed my issue!

Local Decomposition Issue in Non-Uniform Refinement

Output of cryosparcm status:

CryoSPARC System master node installed at /home/groups/kobilka/programs/CryoSparc/2.3/cryosparc2_master Current cryoSPARC version: v2.4.0

CryoSPARC System master node installed at
/home/groups/kobilka/programs/CryoSparc/2.3/cryosparc2_master
Current cryoSPARC version: v2.4.0