2D classification fails (Warnings: cryosparc_compute.gpu.gpuarray.GPUArray)

Hi everyone,

I’m trying to run a 2D classification job on ~1.5M of 400-px particles and it fails at random points (no heartbeat). I use the most recent v4.5.1 but had similar issues on 4.4. Increasing the number of classes to 200 and the Batchsize per class to 200 causes the job to fail earlier. Nevertheless, it always fails at online-EM iterations, never getting to full iterations.
I checked the job logfile and it is full of the following warnings:

cryosparc_worker/cryosparc_compute/jobs/runcommon.py:2294: UserWarning: Kernel function check_error_for_nans called with very small array <cryosparc_compute.gpu.gpuarray.GPUArray

Could this be related?

I’m running a single workstation with an RTX 4090 (24 GB), 512 GB of RAM and 64-thread CPU

Here is the output ov the evaluation script:

CRYOSPARC_PATH=/home/milosz/software/cryosparc/cryosparc_worker/bin
WINDOWPATH=2
PYTHONPATH=/home/milosz/software/cryosparc/cryosparc_worker
NUMBA_CUDA_INCLUDE_PATH=/home/milosz/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include
LIBTBX_OPATH=
LD_LIBRARY_PATH=
PATH=/home/milosz/software/cryosparc/cryosparc_worker/bin:/home/milosz/software/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/home/milosz/software/cryosparc/cryosparc_worker/deps/anaconda/condabin:/home/milosz/software/miniconda3/condabin:/home/milosz/software/cryosparc/cryosparc_master/bin:/home/milosz/software/amber/amber22/bin:/home/milosz/software/ccp4-8.0/etc:/home/milosz/software/ccp4-8.0/bin:/home/milosz/software/ccpem-1.6.0/bin:/home/milosz/software/phenix/phenix-1.21rc1-5058/build/bin:/home/milosz/software/MGLTools-1.5.7/bin:/home/milosz/software/vina:/home/milosz/software/xds:/home/milosz/.local/bin:/home/milosz/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/thinlinc/bin:/opt/thinlinc/sbin
	libpcsamplingutil.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libpcsamplingutil.so
	libnvrtc.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.11.2
	libnvrtc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so
	libnvrtc-builtins.so.11.8 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.8
	libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so
	libnvperf_target.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvperf_target.so
	libnvperf_host.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvperf_host.so
	libnvjpeg.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg.so.11
	libnvjpeg.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg.so
	libnvblas.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so.11
	libnvblas.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so
	libnvToolsExt.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvToolsExt.so.1
	libnvToolsExt.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvToolsExt.so
	libnpps.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnpps.so.11
	libnpps.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnpps.so
	libnppitc.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppitc.so.11
	libnppitc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppitc.so
	libnppisu.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppisu.so.11
	libnppisu.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppisu.so
	libnppist.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppist.so.11
	libnppist.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppist.so
	libnppim.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppim.so.11
	libnppim.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppim.so
	libnppig.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppig.so.11
	libnppig.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppig.so
	libnppif.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppif.so.11
	libnppif.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppif.so
	libnppidei.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so.11
	libnppidei.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so
	libnppicc.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so.11
	libnppicc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so
	libnppial.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppial.so.11
	libnppial.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppial.so
	libnppc.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppc.so.11
	libnppc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppc.so
	libicudata.so.70 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.70
	libicudata.so.70 (ELF) => /lib/i386-linux-gnu/libicudata.so.70
	libcusparse.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so.11
	libcusparse.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so
	libcusolverMg.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolverMg.so.11
	libcusolverMg.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolverMg.so
	libcusolver.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.11
	libcusolver.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so
	libcurand.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10
	libcurand.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so
	libcupti.so.11.8 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so.11.8
	libcupti.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so
	libcuinj64.so.11.8 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcuinj64.so.11.8
	libcuinj64.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcuinj64.so
	libcufile_rdma.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile_rdma.so.1
	libcufile_rdma.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile_rdma.so
	libcufile.so.0 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile.so.0
	libcufile.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile.so
	libcufftw.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so.10
	libcufftw.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so
	libcufft.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
	libcufft.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so
	libcudart.so.11.0 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
	libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so
	libcudadebugger.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudadebugger.so.1
	libcuda.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so.1
	libcuda.so.1 (libc6) => /lib/i386-linux-gnu/libcuda.so.1
	libcuda.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so
	libcuda.so (libc6) => /lib/i386-linux-gnu/libcuda.so
	libcublasLt.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11
	libcublasLt.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so
	libcublas.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11
	libcublas.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so
	libcheckpoint.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcheckpoint.so
	libaccinj64.so.11.8 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libaccinj64.so.11.8
	libaccinj64.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libaccinj64.so
	libOpenCL.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so.1
	libOpenCL.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so
Linux proline 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May  7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
               total        used        free      shared  buff/cache   available
Mem:             503           7          13           0         483         492
Swap:              0           0           0
Sat May 25 10:23:49 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0  On |                  Off |
|  0%   49C    P8              28W / 450W |    776MiB / 24564MiB |     18%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      4007      G   /usr/lib/xorg/Xorg                          430MiB |
|    0   N/A  N/A      4165      G   /usr/bin/gnome-shell                         65MiB |
|    0   N/A  N/A      4270      G   /opt/teamviewer/tv_bin/TeamViewer            19MiB |
|    0   N/A  N/A     34321      G   ...irefox/4259/usr/lib/firefox/firefox      236MiB |
+---------------------------------------------------------------------------------------+

Many thanks in advance for any help.
Milosz

Dear all,

For future reference, I managed to fix the issue by removing all paths for Amber in bashrc. Apparently it was messing with some of the libraries CS was supposed to take from its own directories. The ```
cryosparc_worker/cryosparc_compute/jobs/runcommon.py:2294: UserWarning: Kernel function check_error_for_nans called with very small array <cryosparc_compute.gpu.gpuarray.GPUArray


Good luck,
M.
2 Likes