Hi,
In the latest version, heterogenous refinement sometimes (frequently) seems to run anomalously slowly - to the point where the whole system seems to choke up, when running a single heterogeneous refinement job without excessive box size or unusually large number of particles.
Has anyone else noticed this? It is slow to the point that 10 sub-iterations is taking 1hr (!). We have seen this on two systems with different OS and GPU config, so it is not a system-specific issue.
Other job types run fine, it seems to be tied to hetero refinement specifically.
The slowdown seems to be tied to the number of classes - A job with 8 classes is taking 5-10 min per iteration, but only 10s (!) per iteration if I reduce to 5 classes.
Cheers
Oli
olaf
February 26, 2023, 10:18pm
#2
It may not be relevant at all, but when it happens again next time, try clearing page cache on the worker node to see if it might help, e.g.,
echo 3 > /proc/sys/vm/drop_caches
Cheers
2 Likes
@olibclarke There’s a chance this could be related to an issue in CUDA 11.8 that we worked around in v4.2.0 (just released), it may be worth updating to try
It’s unclear why this slowdown would be happening now and not in previous versions but the dependence on size of the job (num classes) suggests that it is related to system RAM. As @olaf suggested, you can try doing
echo 1 > /proc/sys/vm/drop_caches
(NB the 3 drops all caches, 1 drops only the filesystem cache which should be all that’s needed)
On some of our systems we have this echo line in cron every minute. What happens is that as system RAM fills up, the OS continues using any free RAM for filesystem cache, and then takes long to eject the cache when a job requests more RAM to be allocated.
You can watch whether system RAM is full of FS cache using eg. htop
Please let us know what you find
1 Like
Hi @apunjani
Thanks! I’ve tried this, but it doesn’t seem to help (and output of htop doesn’t seem to change before/after).
Here is the output of htop:
We will try to reproduce the performance problems you experience with heterogeneous refinement. What were box size, particle count and applied symmetry for affected jobs?
For the two cryosparc_worker
installations for which you have observed the problem (or only one installation if that is shared between the two GPU hosts), please can you post
Please could you also email us the job reports for affected jobs.
We would also be interested in the file produced by
cryosparcm snaplogs
, as we spotted unexpectedly heavy memory use by the command_core process
Did you observe this previously/regularly?
Hi @wtempel , for this particular affected job:
box size: 200 (but raw particles are 600px)
Particle count: 270k
Applied symmetry: C1
Batch size: 5000
But we have seen it in a variety of contexts since upgrading to 4.1.
The cryosparc version is 4.1.3-privatebeta.1 (but we saw the same behavior with earlier 4.1x releases).
cryosparcw call which nvcc
:
/usr/local/cuda-11.2/bin/nvcc
cryosparcw call nvcc --version
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
cryosparcw call python -c "import pycuda.driver; print(pycuda.driver.get_version())"
(11, 7, 0)
Will send job reports & snaplogs via DM. I have seen this heavy memory usage from command_core
during previous times that cryosparc is running slowly, yes.
Cheers
Oli
olibclarke:
cryosparcw call nvcc --version
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
cryosparcw call python -c "import pycuda.driver; print(pycuda.driver.get_version())"
(11, 7, 0)
The inconsistency between CUDA version is not expected. Question (not suggestion): Did you run
cryosparcw install-3dflex
for this cryosparc_worker
installation?
What are the outputs of
cryosparcw call conda list
cryosparcw call python -c "import torch; print(torch.cuda.is_available())"
?
cryosparcw call conda list
:
# packages in environment at /home/user/software/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 0.15.0 pyhd8ed1ab_0 conda-forge
aiohttp 3.8.3 py38h0a891b7_1 conda-forge
aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge
aom 3.5.0 h27087fc_0 conda-forge
appdirs 1.4.4 pyh9f0ad1d_0 conda-forge
astor 0.8.1 pyh9f0ad1d_0 conda-forge
astunparse 1.6.3 pyhd8ed1ab_0 conda-forge
async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge
attrs 22.1.0 pyh71513ae_1 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 pyhd8ed1ab_3 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
bcrypt 3.2.2 py38h0a891b7_1 conda-forge
blinker 1.5 pyhd8ed1ab_0 conda-forge
blosc 1.21.3 hafa529b_0 conda-forge
brotli 1.0.9 h166bdaf_8 conda-forge
brotli-bin 1.0.9 h166bdaf_8 conda-forge
brotlipy 0.7.0 py38h0a891b7_1005 conda-forge
brunsli 0.1 h9c3ff4c_0 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f98852_0 conda-forge
c-blosc2 2.6.1 hf91038e_0 conda-forge
ca-certificates 2022.12.7 ha878542_0 conda-forge
cachetools 5.2.0 pyhd8ed1ab_0 conda-forge
certifi 2022.12.7 py38h06a4308_0
cffi 1.15.1 py38h4a40e3a_2 conda-forge
cfitsio 4.1.0 hd9d235c_0 conda-forge
charls 2.3.4 h9c3ff4c_0 conda-forge
charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge
click 7.1.2 pyh9f0ad1d_0 conda-forge
cloudpickle 2.2.0 pyhd8ed1ab_0 conda-forge
cryptography 38.0.4 py38h2b5fc30_0 conda-forge
cuda-cccl 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-command-line-tools 11.7.1 0 nvidia/label/cuda-11.7.1
cuda-compiler 11.7.1 0 nvidia/label/cuda-11.7.1
cuda-cudart 11.7.99 0 nvidia/label/cuda-11.7.1
cuda-cudart-dev 11.7.99 0 nvidia/label/cuda-11.7.1
cuda-cuobjdump 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-cupti 11.7.101 0 nvidia/label/cuda-11.7.1
cuda-cuxxfilt 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-documentation 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-driver-dev 11.7.99 0 nvidia/label/cuda-11.7.1
cuda-gdb 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-libraries 11.7.1 0 nvidia/label/cuda-11.7.1
cuda-libraries-dev 11.7.1 0 nvidia/label/cuda-11.7.1
cuda-memcheck 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-nsight 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-nsight-compute 11.7.1 0 nvidia/label/cuda-11.7.1
cuda-nvcc 11.7.99 0 nvidia/label/cuda-11.7.1
cuda-nvdisasm 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-nvml-dev 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-nvprof 11.7.101 0 nvidia/label/cuda-11.7.1
cuda-nvprune 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-nvrtc 11.7.99 0 nvidia/label/cuda-11.7.1
cuda-nvrtc-dev 11.7.99 0 nvidia/label/cuda-11.7.1
cuda-nvtx 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-nvvp 11.7.101 0 nvidia/label/cuda-11.7.1
cuda-sanitizer-api 11.7.91 0 nvidia/label/cuda-11.7.1
cuda-toolkit 11.7.1 0 nvidia/label/cuda-11.7.1
cuda-tools 11.7.1 0 nvidia/label/cuda-11.7.1
cuda-visual-tools 11.7.1 0 nvidia/label/cuda-11.7.1
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
cytoolz 0.12.0 py38h0a891b7_1 conda-forge
dask-core 2022.12.0 pyhd8ed1ab_0 conda-forge
dav1d 1.0.0 h166bdaf_1 conda-forge
decorator 4.4.2 py_0 conda-forge
fftw 3.3.10 nompi_hf0379b8_106 conda-forge
flask 1.1.4 pyhd8ed1ab_0 conda-forge
flask-jsonrpc 0.3.1 pypi_0 pypi
flask-pymongo 2.3.0 pypi_0 pypi
flatbuffers 1.12 pypi_0 pypi
fonttools 4.38.0 py38h0a891b7_1 conda-forge
freetype 2.12.1 hca18f0e_1 conda-forge
frozenlist 1.3.3 py38h0a891b7_0 conda-forge
fsspec 2022.11.0 pyhd8ed1ab_0 conda-forge
future 0.18.2 pyhd8ed1ab_6 conda-forge
gast 0.3.3 py_0 conda-forge
gds-tools 1.3.1.18 0 nvidia/label/cuda-11.7.1
giflib 5.2.1 h36c2ea0_2 conda-forge
google-auth 2.15.0 pyh1a96a4e_0 conda-forge
google-auth-oauthlib 0.4.6 pyhd8ed1ab_0 conda-forge
google-pasta 0.2.0 pyh8c360ce_0 conda-forge
grpcio 1.32.0 py38heead2fc_0 conda-forge
h5py 2.10.0 nompi_py38h9915d05_106 conda-forge
hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
imagecodecs 2022.8.8 py38hf09e3b1_5 conda-forge
imageio 2.22.4 pyhfa7a67d_1 conda-forge
importlib-metadata 5.1.0 pyha770c72_0 conda-forge
ipython 7.33.0 py38h578d9bd_0 conda-forge
itsdangerous 1.1.0 py_0 conda-forge
jedi 0.18.2 pyhd8ed1ab_0 conda-forge
jinja2 2.11.3 pyhd8ed1ab_2 conda-forge
joblib 1.2.0 pyhd8ed1ab_0 conda-forge
jpeg 9e h166bdaf_2 conda-forge
jxrlib 1.1 h7f98852_2 conda-forge
keras-preprocessing 1.1.2 pyhd8ed1ab_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.4 py38h43d8883_1 conda-forge
krb5 1.20.1 hf9c8cef_0 conda-forge
lcms2 2.14 h6ed2654_0 conda-forge
ld_impl_linux-64 2.39 hcc3a1bd_1 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libaec 1.0.6 h9c3ff4c_0 conda-forge
libavif 0.10.1 h5cdd6b5_2 conda-forge
libblas 3.9.0 16_linux64_openblas conda-forge
libbrotlicommon 1.0.9 h166bdaf_8 conda-forge
libbrotlidec 1.0.9 h166bdaf_8 conda-forge
libbrotlienc 1.0.9 h166bdaf_8 conda-forge
libcblas 3.9.0 16_linux64_openblas conda-forge
libcublas 11.10.3.66 0 nvidia/label/cuda-11.7.1
libcublas-dev 11.10.3.66 0 nvidia/label/cuda-11.7.1
libcufft 10.7.2.91 0 nvidia/label/cuda-11.7.1
libcufft-dev 10.7.2.91 0 nvidia/label/cuda-11.7.1
libcufile 1.3.1.18 0 nvidia/label/cuda-11.7.1
libcufile-dev 1.3.1.18 0 nvidia/label/cuda-11.7.1
libcurand 10.2.10.91 0 nvidia/label/cuda-11.7.1
libcurand-dev 10.2.10.91 0 nvidia/label/cuda-11.7.1
libcurl 7.86.0 h6312ad2_2 conda-forge
libcusolver 11.4.0.1 0 nvidia/label/cuda-11.7.1
libcusolver-dev 11.4.0.1 0 nvidia/label/cuda-11.7.1
libcusparse 11.7.4.91 0 nvidia/label/cuda-11.7.1
libcusparse-dev 11.7.4.91 0 nvidia/label/cuda-11.7.1
libdeflate 1.14 h166bdaf_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.2.0 h65d4601_19 conda-forge
libgfortran-ng 12.2.0 h69a702a_19 conda-forge
libgfortran5 12.2.0 h337968e_19 conda-forge
libgomp 12.2.0 h65d4601_19 conda-forge
liblapack 3.9.0 16_linux64_openblas conda-forge
libllvm10 10.0.1 he513fc3_3 conda-forge
libnghttp2 1.47.0 hdcd2b5c_1 conda-forge
libnpp 11.7.4.75 0 nvidia/label/cuda-11.7.1
libnpp-dev 11.7.4.75 0 nvidia/label/cuda-11.7.1
libnsl 2.0.0 h7f98852_0 conda-forge
libnvjpeg 11.8.0.2 0 nvidia/label/cuda-11.7.1
libnvjpeg-dev 11.8.0.2 0 nvidia/label/cuda-11.7.1
libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge
libpng 1.6.39 h753d276_0 conda-forge
libprotobuf 3.21.11 h3eb15da_0 conda-forge
libsqlite 3.40.0 h753d276_0 conda-forge
libssh2 1.10.0 haa6b8db_3 conda-forge
libstdcxx-ng 12.2.0 h46fd767_19 conda-forge
libtiff 4.4.0 h55922b4_4 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libwebp-base 1.2.4 h166bdaf_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libzlib 1.2.13 h166bdaf_4 conda-forge
libzopfli 1.0.3 h9c3ff4c_0 conda-forge
llvmlite 0.34.0 py38h4f45e52_2 conda-forge
locket 1.0.0 pyhd8ed1ab_0 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
mako 1.2.4 pyhd8ed1ab_0 conda-forge
markdown 3.4.1 pyhd8ed1ab_0 conda-forge
markupsafe 2.0.1 py38h497a2fe_1 conda-forge
matplotlib-base 3.5.3 py38h38b5ce0_2 conda-forge
matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge
multidict 6.0.2 py38h0a891b7_2 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
ncurses 6.3 h27087fc_1 conda-forge
networkx 2.8.8 pyhd8ed1ab_0 conda-forge
nsight-compute 2022.2.1.3 0 nvidia/label/cuda-11.7.1
numba 0.51.2 py38hc5bc63f_0 conda-forge
numpy 1.19.5 py38h8246c76_3 conda-forge
oauthlib 3.2.2 pyhd8ed1ab_0 conda-forge
openjpeg 2.5.0 h7d73246_1 conda-forge
openssl 1.1.1s h0b41bf4_1 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
packaging 22.0 pyhd8ed1ab_0 conda-forge
pandas 1.4.4 py38h47df419_0 conda-forge
parso 0.8.3 pyhd8ed1ab_0 conda-forge
partd 1.3.0 pyhd8ed1ab_0 conda-forge
pbzip2 1.1.13 0 conda-forge
pexpect 4.8.0 pyh1a96a4e_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 9.2.0 py38h9eb91d8_3 conda-forge
pip 22.3.1 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.36 pyha770c72_0 conda-forge
protobuf 4.21.11 py38h8dc9893_0 conda-forge
psutil 5.9.4 py38h0a891b7_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pyasn1 0.4.8 py_0 conda-forge
pyasn1-modules 0.2.7 py_0 conda-forge
pybind11 2.10.1 py38h43d8883_0 conda-forge
pybind11-global 2.10.1 py38h43d8883_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pycrypto 2.6.1 py38h497a2fe_1006 conda-forge
pycuda 2020.1 pypi_0 pypi
pyfftw 0.12.0 py38h9e8fb0f_3 conda-forge
pygments 2.13.0 pyhd8ed1ab_0 conda-forge
pyjwt 2.6.0 pyhd8ed1ab_0 conda-forge
pylibtiff 0.4.2 py38hd5759d1_7 conda-forge
pymongo 3.13.0 py38hfa26641_0 conda-forge
pyopenssl 22.1.0 pyhd8ed1ab_0 conda-forge
pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.8.15 h257c98d_0_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-slugify 5.0.2 pyhd8ed1ab_0 conda-forge
python-snappy 0.6.1 py38h1ddbb56_0 conda-forge
python_abi 3.8 3_cp38 conda-forge
pytools 2020.4.4 pyhd3deb0d_0 conda-forge
pytz 2022.6 pyhd8ed1ab_0 conda-forge
pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge
pywavelets 1.3.0 py38h71d37f0_1 conda-forge
pyyaml 6.0 py38h0a891b7_5 conda-forge
readline 8.1.2 h0f457ee_0 conda-forge
requests 2.28.1 pyhd8ed1ab_1 conda-forge
requests-oauthlib 1.3.1 pyhd8ed1ab_0 conda-forge
requests-toolbelt 0.10.1 pyhd8ed1ab_0 conda-forge
rsa 4.9 pyhd8ed1ab_0 conda-forge
scikit-image 0.17.2 py38h51da96c_4 conda-forge
scikit-learn 0.23.2 py38h5d63f67_3 conda-forge
scipy 1.9.1 py38hea3f02b_0 conda-forge
semver 2.13.0 pyh9f0ad1d_0 conda-forge
setuptools 65.5.1 pyhd8ed1ab_0 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sleef 3.5.1 h9b69904_2 conda-forge
snappy 1.1.9 hbd366e4_2 conda-forge
tabulate 0.9.0 pyhd8ed1ab_1 conda-forge
tensorboard 2.8.0 pyhd8ed1ab_1 conda-forge
tensorboard-data-server 0.6.1 py38h2b5fc30_4 conda-forge
tensorboard-plugin-wit 1.8.1 pyhd8ed1ab_0 conda-forge
tensorflow 2.4.4 pypi_0 pypi
tensorflow-estimator 2.4.0 pyh9656e83_0 conda-forge
termcolor 1.1.0 pyhd8ed1ab_3 conda-forge
text-unidecode 1.3 py_0 conda-forge
threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
tifffile 2022.10.10 pyhd8ed1ab_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
toolz 0.12.0 pyhd8ed1ab_0 conda-forge
torch 1.13.1 pypi_0 pypi
traitlets 5.7.1 pyhd8ed1ab_0 conda-forge
typing-extensions 3.7.4.3 0 conda-forge
typing_extensions 3.7.4.3 py_0 conda-forge
unicodedata2 15.0.0 py38h0a891b7_0 conda-forge
unidecode 1.3.6 pyhd8ed1ab_0 conda-forge
urllib3 1.26.13 pyhd8ed1ab_0 conda-forge
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
werkzeug 1.0.1 pyh9f0ad1d_0 conda-forge
wheel 0.38.4 pyhd8ed1ab_0 conda-forge
wrapt 1.12.1 py38h497a2fe_3 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.8.1 py38h0a891b7_0 conda-forge
zfp 1.0.0 h27087fc_3 conda-forge
zipp 3.11.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 h166bdaf_4 conda-forge
zlib-ng 2.0.6 h166bdaf_0 conda-forge
zstd 1.5.2 h6239696_4 conda-forge
And the output of this one is “True”
re install-3dflex
I believe so but not sure… @kookjookeem ?
Hi,
Yes, 3dflex dependencies were installed via cryosparcw install-3dflex
.
Best,
Kookjoo
1 Like