Volume tools hangs

Just updated to v4.1.1 (complete nightmare). In the end had to completely wipe and reinstall cryosparc, then manually create the user account because, despite all the information being in the install.sh file, no users were generated. Then had to deal with the command_core not launching, which was eventually fixed. I don’t know if these issue are related to the volume tool issue described below, but it wouldn’t surprise me.

Trying to run the Vol Tools utility, upon queuing the job simply hangs. The event logs shows only the following:

License is valid.
Launching job on lane default target spgpu …
Running job on master node hostname spgpu

And then it sits doing nothing. The metadata shows more going on before it also hangs. These are the last several lines:

“type”: “volume_tools”,
“version”: “v4.1.1”,
“ui_layouts”: {
“P24”: {
“show”: true,
“floater”: false,
“top”: 232,
“left”: 1788,
“width”: 152,
“height”: 192,
“groups”: []
},
“P24W3”: {
“show”: true,
“floater”: false,
“top”: 232,
“left”: 1344,
“width”: 152,
“height”: 192,
“groups”: []
}
},
“no_check_inputs_ready”: false,
“queued_to_gpu”: false,
“queued_to_hostname”: null,
“num_tokens”: 0
}

Earlier in the metadata I found this error message:
“queue_message”: “[Errno 2] No such file or directory: ‘/home/spuser/software/cryosparc/cryosparc2_worker/bin/cryosparcw’”,

But the job runs a bit further, so I assume that it finds someway around the error that it is looking for the cryosparcw executable in the wrong (legacy) directory. Compared to running a job such as “Import Volume”, there is no “Job J### Started” line in the event log. In the metadata for “Import Volume” there is no call to cryosparcw

Install is a single workstation
Version 4.1.1 (confirmed by cryosparcm status)
All appropriate processes are running

Template picker has the same problem. This seems to be something fundamental, which I have been unable to correct. I completely wiped and reinstalled cryoSPARC, including a reboot, and the issue remains.

Welcome to the forum @Endo-Streeter.

I agree. To help us find out what is wrong, please can you initially post:

  • output of cryosparcm cli "get_scheduler_targets()"
  • the full path: /path/to/cryosparc_worker/bin/cryosparcw (of the new CryoSPARC installation)
  • output of ls -l /path/to/cryosparc_worker/
  • output of cat /path/to/cryosparc_worker/version

cyrosparc cli “get_scheduler_targets()” results in

[{‘cache_path’: ‘/ssd/cryosparc_scratch’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 11554717696, ‘name’: ‘GeForce RTX 2080 Ti’}, {‘id’: 1, ‘mem’: 11554324480, ‘name’: ‘GeForce RTX 2080 Ti’}, {‘id’: 2, ‘mem’: 11554717696, ‘name’: ‘GeForce RTX 2080 Ti’}, {‘id’: 3, ‘mem’: 11554717696, ‘name’: ‘GeForce RTX 2080 Ti’}], ‘hostname’: ‘spgpu’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘spgpu’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], ‘GPU’: [0, 1, 2, 3], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, ‘ssh_str’: ‘spuser@spgpu’, ‘title’: ‘Worker node spgpu’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/spuser/software/cryosparc/cryosparc2_worker/bin/cryosparcw’}]

full path
/home/spuser/software/cryosparc/cryosparc_worker/bin/cryosparcw

[spuser@spgpu cryosparc]$ ls -l /home/spuser/software/cryosparc/cryosparc_worker/
total 36
drwxrwxr-x. 2 spuser spuser 61 Dec 20 10:58 bin
-rwxrwxr-x. 1 spuser spuser 5437 Dec 20 10:58 check_install_deps.sh
drwxrwxr-x. 11 spuser spuser 4096 Dec 20 10:58 cryosparc_compute
drwxrwxr-x. 4 spuser spuser 195 Dec 20 10:58 cryosparc_tools
drwxrwxr-x. 4 spuser spuser 36 Dec 20 10:58 deps_bundle
drwxrwxr-x. 4 spuser spuser 36 Dec 20 10:58 deps_bundle_hashes
-rw-rw-r–. 1 spuser spuser 7122 Dec 20 10:58 environment.yml
-rwxrwxr-x. 1 spuser spuser 9659 Dec 20 10:58 install.sh
-rw-rw-r–. 1 spuser spuser 7 Dec 20 10:58 version

[spuser@spgpu cryosparc]$ cat /home/spuser/software/cryosparc/cryosparc_worker/version
v4.1.1

I downgraded back to v4.0.2, got the following message:

===================================================
Now updating worker nodes.

All workers:
spgpu spuser@spgpu

Updating worker spgpu
Direct update
\cp -f ./cryosparc_worker.tar.gz /home/spuser/software/cryosparc/cryosparc2_worker
bash /home/spuser/software/cryosparc/cryosparc2_worker/bin/cryosparcw update
bash: /home/spuser/software/cryosparc/cryosparc2_worker/bin/cryosparcw: No such file or directory
Failed to update spgpu! Skipping…


Done updating all worker nodes.
If any nodes failed to update, you can manually update them.
Cluster worker installations must be manually updated.

To update manually, copy the cryosparc_worker.tar.gz file into the
cryosparc worker installation directory, and then run
$ bin/cryosparcw update
from inside the worker installation directory.

Same message repeated when I ran the upgrade to 4.1.1

Attempting to manually update failed with a file not found message. cryoSPARC keeps looking for the old cryosparc2 directories and files, which have been depreciated

[Response to Volume tools hangs - #4 by Endo-Streeter, not taking into account the intermittent two messages]
Thanks. I think there are two (related) problems.

  1. The database still refers to the cryosparc2_worker directory, as you suspected.
  2. /home/spuser/software/cryosparc/cryosparc_worker/install.sh should have been called in the background during a --standalone installation, but may have failed or was not run at all.

You may want to try:

cd /home/spuser/software/cryosparc/cryosparc_worker
./install.sh --license "your-license-id" --cudapath /path/to/cuda 2>&1 | tee worker_install.log
./bin/cryosparcw connect --worker spgpu --master spgpu --ssdpath /ssd/cryosparc_scratch --port 39000 --update 

after ensuring:

  1. CryoSPARC master processes are running for the cryosparcw connect command
  2. /path/to/cuda is the parent of a bin subdirectory and
    /path/to/cuda/bin/nvcc --version shows at least 10.0, at most 11.8
  3. the --port parameter in the cryosparcw connect command specifies the same port number as the URL of the CryoSPARC web interface
1 Like

I will take a look at this next week, I am busy in experiments until then.

I successfully performed a clean install of v4.1.2 today, so far it is working fine. No issues with not creating the user account, no failure to find the database, no failure to find cryosparc_worker.

1 Like