Slowdown with heterogeneous refinement in 4.x


In the latest version, heterogenous refinement sometimes (frequently) seems to run anomalously slowly - to the point where the whole system seems to choke up, when running a single heterogeneous refinement job without excessive box size or unusually large number of particles.

Has anyone else noticed this? It is slow to the point that 10 sub-iterations is taking 1hr (!). We have seen this on two systems with different OS and GPU config, so it is not a system-specific issue.

Other job types run fine, it seems to be tied to hetero refinement specifically.

The slowdown seems to be tied to the number of classes - A job with 8 classes is taking 5-10 min per iteration, but only 10s (!) per iteration if I reduce to 5 classes.


It may not be relevant at all, but when it happens again next time, try clearing page cache on the worker node to see if it might help, e.g.,

echo 3 > /proc/sys/vm/drop_caches



@olibclarke There’s a chance this could be related to an issue in CUDA 11.8 that we worked around in v4.2.0 (just released), it may be worth updating to try

It’s unclear why this slowdown would be happening now and not in previous versions but the dependence on size of the job (num classes) suggests that it is related to system RAM. As @olaf suggested, you can try doing

echo 1 > /proc/sys/vm/drop_caches

(NB the 3 drops all caches, 1 drops only the filesystem cache which should be all that’s needed)
On some of our systems we have this echo line in cron every minute. What happens is that as system RAM fills up, the OS continues using any free RAM for filesystem cache, and then takes long to eject the cache when a job requests more RAM to be allocated.
You can watch whether system RAM is full of FS cache using eg. htop
Please let us know what you find

Hi @apunjani

Thanks! I’ve tried this, but it doesn’t seem to help (and output of htop doesn’t seem to change before/after).

Here is the output of htop:

We will try to reproduce the performance problems you experience with heterogeneous refinement. What were box size, particle count and applied symmetry for affected jobs?
For the two cryosparc_worker installations for which you have observed the problem (or only one installation if that is shared between the two GPU hosts), please can you post

  • the cryosparc version
  • outputs of
    cryosparcw call which nvcc
    cryosparcw call nvcc --version
    cryosparcw call python -c "import pycuda.driver; print(pycuda.driver.get_version())"

Please could you also email us the job reports for affected jobs.

We would also be interested in the file produced by
cryosparcm snaplogs, as we spotted unexpectedly heavy memory use by the command_core process

Did you observe this previously/regularly?

Hi @wtempel, for this particular affected job:

box size: 200 (but raw particles are 600px)
Particle count: 270k
Applied symmetry: C1
Batch size: 5000

But we have seen it in a variety of contexts since upgrading to 4.1.

The cryosparc version is 4.1.3-privatebeta.1 (but we saw the same behavior with earlier 4.1x releases).

cryosparcw call which nvcc:

cryosparcw call nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

cryosparcw call python -c "import pycuda.driver; print(pycuda.driver.get_version())"
(11, 7, 0)

Will send job reports & snaplogs via DM. I have seen this heavy memory usage from command_core during previous times that cryosparc is running slowly, yes.


The inconsistency between CUDA version is not expected. Question (not suggestion): Did you run
cryosparcw install-3dflex for this cryosparc_worker installation?
What are the outputs of

cryosparcw call conda list
cryosparcw call python -c "import torch; print(torch.cuda.is_available())"


And the output of this one is “True”

re install-3dflex I believe so but not sure… @kookjookeem?


Yes, 3dflex dependencies were installed via cryosparcw install-3dflex.


