Local motion correction job crash

We are getting the following error in local motion correction, 2946 movies in a job of 3111. Movies up till this one work fine, no other issues with other jobs and this set of particles and movies. Movies were previously patch motion corrected by Cryosparc live and particles have gone through global and local CTF refinement and 3D refinement previously.

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_local.py", line 394, in cryosparc_compute.jobs.motioncorrection.run_local.run_local_motion_correction_multi
  File "/home/bio21em2/Software/Cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/dataset.py", line 219, in append_many datasets = tuple(d for d in datasets if len(d) > 0)  # skip empty datasets
  File "/home/bio21em2/Software/Cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/dataset.py", line 219, in <genexpr> datasets = tuple(d for d in datasets if len(d) > 0)  # skip empty datasets
TypeError: object of type 'NoneType' has no len()

CryoSPARC instance information

Master-worker, v4.2.1+230427
Linux Freyja 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
total used free shared buff/cache available
Mem: 187 18 8 0 160 166
Swap: 1 1 0

CryoSPARC worker environment

(base) bio21em2@Freyja:~/Software/Cryosparc$ env | grep PATH
CRYOSPARC_PATH=/home/bio21em2/Software/Cryosparc/cryosparc_worker/bin
MANPATH=/usr/local/IMOD/man:/usr/local/Particle/man:/usr/local/IMOD/man:/usr/local/Particle/man:/usr/local/Particle/man:/usr/local/IMOD/man:/usr/local/man:/usr/local/share/man:/usr/share/man
PYTHONPATH=/home/bio21em2/Software/Cryosparc/cryosparc_worker
CRYOSPARC_CUDA_PATH=/usr/local/cuda-11.2
LIBTBX_OPATH=
LD_LIBRARY_PATH=/home/bio21em2/Software/Cryosparc/cryosparc_worker/deps/external/cudnn/lib:/public/em/RELION/relion/lib
PATH=/home/bio21em2/Software/Cryosparc/cryosparc_worker/bin:/home/bio21em2/Software/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/home/bio21em2/Software/Cryosparc/cryosparc_worker/deps/anaconda/condabin:/home/bio21em1/Scripts:/home/bio21em2/Software/Cryosparc/cryosparc_master/bin:/home/bio21em1/.aspera/connect/bin:/home/bio21em1/Software/mediaflux-data-mover/bin:/usr/local/IMOD/bin:/home/bio21em1/anaconda3/bin:/home/bio21em1/anaconda3/condabin:/home/bio21em1/Software/relion/build/bin:/usr/local/ccpem-1.5.0/bin:/usr/local/phenix-1.19.2-4158/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
(base) bio21em2@Freyja:~/Software/Cryosparc$ which nvcc
/home/bio21em2/Software/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin/nvcc
(base) bio21em2@Freyja:~/Software/Cryosparc$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
(base) bio21em2@Freyja:~/Software/Cryosparc$  python -c "import pycuda.driver; print(pycuda.driver.get_version())"
(11, 7, 0)
(base) bio21em2@Freyja:~/Software/Cryosparc$ /sbin/ldconfig -p | grep -i cuda
	libnvrtc.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.11.2
	libnvrtc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so
	libnvrtc-builtins.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.2
	libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so
	libnvperf_target.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvperf_target.so
	libnvperf_host.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvperf_host.so
	libnvjpeg.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg.so.11
	libnvjpeg.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg.so
	libnvblas.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so.11
	libnvblas.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so
	libnvToolsExt.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvToolsExt.so.1
	libnvToolsExt.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvToolsExt.so
	libnpps.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnpps.so.11
	libnpps.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnpps.so
	libnppitc.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppitc.so.11
	libnppitc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppitc.so
	libnppisu.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppisu.so.11
	libnppisu.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppisu.so
	libnppist.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppist.so.11
	libnppist.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppist.so
	libnppim.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppim.so.11
	libnppim.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppim.so
	libnppig.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppig.so.11
	libnppig.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppig.so
	libnppif.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppif.so.11
	libnppif.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppif.so
	libnppidei.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so.11
	libnppidei.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so
	libnppicc.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so.11
	libnppicc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so
	libnppial.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppial.so.11
	libnppial.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppial.so
	libnppc.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppc.so.11
	libnppc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppc.so
	libicudata.so.66 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.66
	libcusparse.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so.11
	libcusparse.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so
	libcusolverMg.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolverMg.so.11
	libcusolverMg.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolverMg.so
	libcusolver.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.11
	libcusolver.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so
	libcurand.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10
	libcurand.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so
	libcupti.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so.11.2
	libcupti.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so
	libcuinj64.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcuinj64.so.11.2
	libcuinj64.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcuinj64.so
	libcufftw.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so.10
	libcufftw.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so
	libcufft.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
	libcufft.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so
	libcudart.so.11.0 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
	libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so
	libcuda.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so.1
	libcuda.so.1 (libc6) => /lib/i386-linux-gnu/libcuda.so.1
	libcuda.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so
	libcuda.so (libc6) => /lib/i386-linux-gnu/libcuda.so
	libcublasLt.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11
	libcublasLt.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so
	libcublas.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11
	libcublas.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so
	libaccinj64.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libaccinj64.so.11.2
	libaccinj64.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libaccinj64.so
	libOpenCL.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so.1
	libOpenCL.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so
(base) bio21em2@Freyja:~/Software/Cryosparc$ uname -a
Linux Freyja 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
(base) bio21em2@Freyja:~/Software/Cryosparc$ free -g 
              total        used        free      shared  buff/cache   available
Mem:            187          20           6           0         161         165
Swap:             1           1           0
(base) bio21em2@Freyja:~/Software/Cryosparc$ nvidia-smi
Thu Jun  1 10:01:13 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    On   | 00000000:18:00.0 Off |                  N/A |
| 55%   61C    P2   222W / 350W |  10221MiB / 24268MiB |     81%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3090    On   | 00000000:3B:00.0 Off |                  N/A |
| 30%   38C    P8    20W / 350W |      8MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 3090    On   | 00000000:86:00.0 Off |                  N/A |
| 30%   33C    P8    22W / 350W |      8MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 3090    On   | 00000000:AF:00.0 Off |                  N/A |
| 30%   31C    P8    15W / 350W |      8MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2381      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A   2744298      C   python                          10213MiB |
|    1   N/A  N/A      2381      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      2381      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      2381      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

I’d suggest checking that micrograph for broken frames. I’ve seen the same thing once before; with one dataset I downloaded from EMPIAR where the network hiccuped (at least, redownloading it a few days later it was fine) CryoSPARC Live just soldiered on, but RELION, normal CryoSPARC motion correction and cisTEM would all complain and crash.

adding to Hamish’s comments. it seems to only crash with different box size. so 416 and 480 works, 512 and 600 do not. i haven’t tried more box sizes yet

Larger box sizes failing sounds like a memory issue.

How many particles are on that micrograph compared to others? I’m not sure how CryoSPARC local motion correction handles it, but Bayesian polishing in RELION uses memory based on dimensions, frames, box size and number of particles… all else being equal, a micrograph with double the particles uses approximately double the RAM; which caused a polishing run to crash for me recently when it hit three micrographs simultaneously that had high particle counts and they all wanted ~50% of system RAM rather than the 25-30% earlier micrographs wanted.

there is only 300,000 particles, so not massive. plus this is just the motioncorrection.

@HamishGBrown Is there any additional, relevant information in the job log (Metadata|Log) of the job that gave the object of type 'NoneType' has no len() error?

Hi @wtempel, unfortunately no. I checked the joblog but it gave exactly the same error message as the web app so didn’t post.

One of the CryoSPARC team will correct me if I’m wrong, but local motion correction (not patch correction) is on a per particle basis, which is why I was curious about the number of particles on that specific micrograph. If it fails with larger box sizes but not smaller ones, it definitely sounds like it’s running out of memory. If that micrograph has more particles on it than others, that would support that hypothesis.

I don’t know the specific micrograph it fails on since it only prints this information once it completes that micrograph. What I’ll try is using a Curate Exposures job to remove micrographs in the top quartile (or maybe top 5 %) of particles and if the job runs successfully with the same box size then we might be onto the reason why it has been crashing

Thanks for reporting this. I suspect that the issue here is actually a bug… If particles are too close to the edge of a micrograph (such that their box size extends past the edge), they get discarded. If this process causes the number of particles to reach zero, what should happen is the micrograph just gets skipped, but I think what’s happening here shows us that what actually happens is this error.

As a workaround, you might be able to use an extract from micrographs job (at your intended box size), and then connect the output particles to local motion. This should work because extract from micrographs will also discard particles too close to the edge. Make sure you use the same box size that you intend to use for local motion, since that may determine whether a particle intersects the edge or not.

Thanks @hsnyder, this makes sense in that the machine had GPUs and RAM that I thought were well and truly up to the task yet it failed on random micrographs and only at larger box sizes (ie. an increased possibility of a collision with the micrograph boundary). I’m trying your workaround now.

Hi @hsnyder, just following up on this one after doing a bit more digging. It seems that movies with 0 particles seem to be causing this issue, the particle input to this job is after a few rounds of 2D and 3D classification so it’s not surprising that some bad movies contribute 0 particles to the final rounds. Is the 0 particles case something that is handled well? I ask because I’ve been rerunning smaller jobs with just the movies that seem to cause failure and I don’t seem to be able to reproduce the error in the smaller job. For the time being we will use the “curate exposures” job to remove movies with no particles.

Hi Hamish,

Very interesting, thank you for mentioning this. Just to be absolutely clear: movies that have zero particles in them, regardless of proximity to the edge, cause the problem? That definitely sounds like a bug. I’m glad you’ve found a workaround for now, but I will look into this and provide an update soon.

–Harris

Hi @hsnyder, to be honest I can’t reliably reproduce the error - the movies that cause the issue in one job do not cause the same error when isolated out using a “curate movies job” but I noticed that a run of 0 particle movies seemed to trigger the error.

Hi Hamish,

Sorry for the delay! That’s very odd… you may have already tried this, but a ‘curate exposures’ job would allow you to filter out movies with zero picks in them, which might serve as a workaround, or at least tell us whether that really is the problem or not. We will also investigate on our end.

– Harris