Cryosparc crashing - sock file

Hi,
I’ve recently had more frequent problems with Cryosparc crashing time and time again, and the problem seems to be related to the .sock file

It’s a similar problem that others have had, and before the solutions from those discussions (posted below) have fixed the problem. Usually deleting the .sock file works and then Cryosparc runs fine. The issue now is that it only last for a 1-2 hours and then the same thing happens again.

Any ides or tips how to fix this?

It’s a single workstation. Recently upgraded to 4.1.2.

Thanks
Sebastian

Welcome to the forum @svalen.
Please can you paste your terminal output as text.
Before deleting the sock file, one should always confirm that no processes related to the given CryoSPARC instance are running, keeping in mind that a computer can run multiple CryoSPARC instances (if certain requirements are met). For this purpose: under the Linux account that runs CryoSPARC processes, run
cryosparcm stop
ps xww | grep -e cryosparc -e mongo
and kill (not kill -9) processes related to the given CryoSPARC instance, but not processes related to other CryoSPARC instances that may also be running on the computer.
For confirmation that no more relevant processes are running, again:
ps xww | grep -e cryosparc -e mongo
Only then should sock files belonging to the given CryoSPARC instance be removed.
Does this help?

Hello, and thank you for helping.
I’ve posted the text below after following your instructions. I can’t see any processes I can kill (right?) or am I doing something wrong?

(base) mflab@nextron-Super-Server:~$ cryosparcm stop
CryoSPARC is running.
Stopping cryoSPARC 
unix:///tmp/cryosparc-supervisor-0aa561e418771a6396296a50bedb6c18.sock refused connection
(base) mflab@nextron-Super-Server:~$ ps xww | grep -e cryosparc -e mongo
  13032 pts/0    S+     0:00 grep --color=auto -e cryosparc -e mongo
(base) mflab@nextron-Super-Server:~$ kill 13032
bash: kill: (13032) - No such process
(base) mflab@nextron-Super-Server:~$ 


Hi, again
I’ve noticed that it seems to happen when I’m running two NU-refinements at the same time. This has not been a problem before, and I usually run two jobs in parallel. At the moment, running jobs one at a time works but it’s of course slower and not ideal.

Please can you describe the specific symptoms of those crashes
and post these details about your CryoSPARC worker environment:

Hi,

Yes, it’s like it disconnects and starts “buffering”, but cannot connect again. If I update the page it says “Unable to connect”


I’ve pasted the output below. (Sorry, I’m a bit inexperienced so not sure I did this properly)

(base) mflab@nextron-Super-Server:~$ eval $(/media/datastore/cryosparc/cryosparc_worker/bin/cryosparcw env)
env | grep PATH
which nvcc
nvcc --version
python -c "import pycuda.driver; print(pycuda.driver.get_version())"
uname -a
free -g
nvidia-smi
CRYOSPARC_PATH=/media/datastore/cryosparc/cryosparc_worker/bin
WINDOWPATH=2
PYTHONPATH=/media/datastore/cryosparc/cryosparc_worker
CRYOSPARC_CUDA_PATH=/usr/local/cuda
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/media/datastore/cryosparc/cryosparc_worker/deps/external/cudnn/lib
PATH=/usr/local/cuda/bin:/media/datastore/cryosparc/cryosparc_worker/bin:/media/datastore/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/media/datastore/cryosparc/cryosparc_worker/deps/anaconda/condabin:/media/datastore/cryosparc/cryosparc_master/bin:/home/mflab/miniconda3/bin:/home/mflab/miniconda3/condabin:/home/mflab/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
(11, 7, 0)
Linux nextron-Super-Server 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
               total        used        free      shared  buff/cache   available
Mem:             125          11          91           0          21         111
Swap:              1           0           1
Fri Feb 10 09:26:42 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A5000    On   | 00000000:01:00.0  On |                  Off |
| 30%   34C    P8    22W / 230W |    468MiB / 24564MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A5000    On   | 00000000:02:00.0 Off |                  Off |
| 30%   31C    P8    13W / 230W |      6MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2128      G   /usr/lib/xorg/Xorg                210MiB |
|    0   N/A  N/A      2261      G   /usr/bin/gnome-shell               78MiB |
|    0   N/A  N/A     10360      G   ...7/usr/lib/firefox/firefox      151MiB |
|    0   N/A  N/A     13376      G   ...mviewer/tv_bin/TeamViewer       23MiB |
|    1   N/A  N/A      2128      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+
(base) mflab@nextron-Super-Server:~$ 

I was able to run a Patch motion correction job with both GPUs without a problem, but if I run two different NU-refinement jobs simultaneously, it disconnects.

Thank you

For an NU-refinement job that completed, but would have failed if it had been run concurrently with another job, what is the output of the following command (run inside the icli, with the actual project and job identifiers):

project, job = 'P147', 'J96'
max([e.get('cpumem_mb', 0) for e in db.events.find({'project_uid':project, 'job_uid':job})])

Sure, here’s the output:

(base) mflab@nextron-Super-Server:~$ cryosparcm icli
Python 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:49:35) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.33.0 -- An enhanced Interactive Python. Type '?' for help.

 connecting to nextron-Super-Server:39002 ...
 cli, rtp, db, gfs and tools ready to use

In [1]: project, job = 'P8', 'J101'
   ...: max([e.get('cpumem_mb', 0) for e in db.events.find({'project_uid':projec
   ...: t, 'job_uid':job})])
Out[1]: 42558.15625

Non-uniform refinement jobs use a lot of system RAM. Two concurrent, memory-intensive jobs could cause available system RAM to be exhausted and overall system performance to deteriorate.