Error in NU-Refinement using a large box size

kacper · August 20, 2019, 7:16am

Hello cryoSPARC Developers!

I have been using your non-uniform refinement for a couple of weeks now, and I have to say that it’s genius. Great work, guys! I’m a big fan.

It has been working for me very well until I re-extracted particles unbinned, which bumped my box size to 704 px. See the error traceback below. I saw this error discussed on the board before, but my case seems slightly different. It appears to be caused by the large box size, and for people reporting it, the crash happened right away in the first iteration, while for me it crashes later on (iteration 2…), during “Local decomposition”.

I made sure that the pixel and box sizes are identical for the starting volume and particle images. Any ideas how to go around this problem would be greatly appreciated!

Many thanks,

Kacper

Job history

Launching job on lane default target vr2 ...
License is valid.
Running job on master node hostname vr2
Project P2 Job J226 Started
Master running v2.9.0, worker running v2.9.0
Running on lane default
Resources allocated:
Worker:  vr2
CPU   :  [0, 1, 2, 3]
GPU   :  [0]
RAM   :  [0, 1, 2]
SSD   :  True

--------------------------------------------------------------
Importing job module for job type nonuniform_refine...
Job ready to run
***************************************************************
Using random seed of 1159693947
Loading a ParticleStack with 112037 items...
SSD cache : cache successfuly synced in_use
SSD cache : cache successfuly synced, found 264774.94MB of files on SSD.
SSD cache : cache successfuly requested to check 1 files.
SSD cache : cache requires 0.00MB more on the SSD for files to be downloaded.
SSD cache : cache has enough available space.
SSD cache : cache starting transfers to SSD.
SSD cache : complete, all requested files are available on SSD.
Done.
Windowing particles
Done.
====== Refinement ======
Refining Structure with volume size 704.
Starting at initial resolution 30.000A (radwn 12.423). 
Estimating scale of initial reference. 
Rescaling initial reference by a factor of 1.428 
Estimating scale of initial reference. 
Rescaling initial reference by a factor of 1.001 
Estimating scale of initial reference. 
Rescaling initial reference by a factor of 1.006

-- Iteration 0
Auto batchsize 6727 (each split)
Using Max Alignment Radius 12.423 (30.000A)
-- DEV 0 THR 1 NUM 3292 TOTAL 22.768670 ELAPSED 70.177330 --
Processed 13454.000 images in 87.192s.
Computing Global FSCs... 
Done in 782.429s
Using Filter Radius 46.104 (8.084A) | Previous: 12.423 (30.000A)
Plotting..
    Done in 139.693s.
  Outputting files..
    Done in 111.746s.
Done iteration 0 in 1541.422s. Total time so far 1541.422s

-- Iteration 1
  Auto batchsize 25340 (each split)
  Using Max Alignment Radius 46.104 (8.084A)
  Using dynamic mask.
-- DEV 0 THR 1 NUM 12745 TOTAL 86.434904 ELAPSED 196.30526 --
  Processed 50680.000 images in 214.668s.
  Computing Global FSCs... 
    Done in 735.841s
  Using Filter Radius 84.896 (4.390A) | Previous: 46.104 (8.084A)
  Estimated Bfactor: -77.5
 Plotting..
  Done in 158.568s.
  Outputting files..
    Done in 104.307s.
Done iteration 1 in 1940.119s. Total time so far 3482.297s

-- Iteration 2
  Auto batchsize 46777 (each split)
  Using Max Alignment Radius 84.896 (4.390A)
  Using dynamic mask.
 Start local processing...
Expand dynamic mask A and B by 6 voxels
-- DEV 0 THR 1 NUM 11710 TOTAL 66.167929 ELAPSED 695.97869 --
 Newly generated random seed: 1081662500
  Processed 93554.000 images in 713.996s.
  Computing Global FSCs... 
    Done in 796.311s
Cross-validation...
  Using Filter Radius 97.978 (3.804A) | Previous: 84.896 (4.390A)
Local decomposition...
  Processing 1 of 4...

Traceback

Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 78, in cryosparc2_compute.run.main (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/run.c:3954)
  File "cryosparc2_worker/cryosparc2_compute/jobs/nonuniform_refine/run.py", line 624, in cryosparc2_compute.jobs.nonuniform_refine.run.run_non_uni_refine (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/jobs/nonuniform_refine/run.c:18872)
  File "cryosparc2_worker/cryosparc2_compute/jobs/local_resolution/run.py", line 999, in cryosparc2_compute.jobs.local_resolution.run.standalone_locres (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/jobs/local_resolution/run.c:21997)
ValueError: could not broadcast input array from shape (88,88,88) into shape (40,40,40)

kacper · August 31, 2019, 2:46pm

Hi Guys,

I would appreciate any thoughts in dealing with this. Still unresolved!

Ali @apunjani, maybe you can help?

Cheers,

Kacper

apunjani · October 1, 2019, 3:46pm

Hi @kacper,
Sorry for the long delay in response here - unfortunately we have not been able to reproduce this issue on our side. The memory errors that others have (as you mentioned) are different and a bit more obvious to diagnose. Have you already tried setting the “refinement box size” parameter in nonuniform refinement to something slightly smaller (say 640) instead of 704? this will cause the job to Fourier-crop (downsample) the images on the fly and perform the refinement in a slightly smaller box.