Cryosparc wont let me run any job after the update to v4.5.3

rakroy · August 28, 2024, 8:12pm

Hello world,

I updated our cryosparc to v4.5.3 with 240807 patch applied. crryosparc would not run any jobs after that (new/old/cloned etc). I tried several things (cryosparc stopped, then cryosparc started) (separately restarted).

it always returns the error similar to one example below (ab initio): any suggestions?

Traceback (most recent call last):
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_tools/cryosparc/dataset.py”, line 568, in load
dset = cls(indata)
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_tools/cryosparc/dataset.py”, line 769, in init
self.add_fields([entry[0] for entry in populate])
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_tools/cryosparc/dataset.py”, line 1044, in add_fields
self._data.addcol_array(name, TYPE_TO_DSET_MAP[dt.base.type], dt.shape)
File “cryosparc/core.pyx”, line 112, in cryosparc.core.Data.addcol_array
TypeError: addcol_array() takes exactly 5 positional arguments (3 given)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 44, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 678, in load_input_group
dsets = [load_input_connection_slots(input_group_name, keep_slot_names, idx, allow_passthrough=allow_passthrough, memoize=memoize) for idx in range(num_connections)]
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 678, in
dsets = [load_input_connection_slots(input_group_name, keep_slot_names, idx, allow_passthrough=allow_passthrough, memoize=memoize) for idx in range(num_connections)]
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 642, in load_input_connection_slots
dsets = [load_input_connection_single_slot(input_group_name, slot_name, connection_idx, allow_passthrough=allow_passthrough, memoize=memoize) for slot_name in slot_names]
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 642, in
dsets = [load_input_connection_single_slot(input_group_name, slot_name, connection_idx, allow_passthrough=allow_passthrough, memoize=memoize) for slot_name in slot_names]
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 634, in load_input_connection_single_slot
d = load_output_result_dset(_project_uid, output_result, slotconnection[‘version’], slot_name, memoize=memoize)
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 589, in load_output_result_dset
d = dataset.Dataset.load(abspath)
File “/spshared/apps/cryosparc24/cryosparc_worker/cryosparc_tools/cryosparc/dataset.py”, line 606, in load
raise DatasetLoadError(f"Could not load dataset from file {file}") from err
cryosparc_tools.cryosparc.errors.DatasetLoadError: Could not load dataset from file /data/RR/RR-104-sdW/CS-rr/J34/J34_020_particles.cs

I also checked if there is any orphan instances making it problematic (output given below). I cant seem to kill the jobs from the grep supervisord…

[spuser@spgpu run]$ ps -ax | grep “supervisord”
1775911 ? Ss 0:06 python /spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /spshared/apps/cryosparc24/cryosparc_master/supervisord.conf
1818774 pts/1 S+ 0:00 grep --color=auto supervisord

wtempel · August 28, 2024, 8:26pm

@rakroy Please can you post the outputs of these commands

ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongod
stat /spshared/apps/cryosparc24/cryosparc_master/patch
cat /spshared/apps/cryosparc24/cryosparc_master/patch
ls -l /tmp/cryosparc*sock

rakroy · August 28, 2024, 8:44pm

Hey @wtempel ,

thank you for helping me out. here are the outputs:

[spuser@spgpu ~]$ ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongod
1775911 1021686   Aug 27 python /spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /spshared/apps/cryosparc24/cryosparc_master/supervisord.conf
1776022 1775911   Aug 27 mongod --auth --dbpath /spshared/apps/cryosparc24/cryosparc_database --port 39001 --oplogSize 64 --replSet meteor --wiredTigerCacheSizeGB 4 --bind_ip_all
1776138 1775911   Aug 27 python /spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39002 cryosparc_command.command_core:start() -c /spshared/apps/cryosparc24/cryosparc_master/gunicorn.conf.py
1776139 1776138   Aug 27 python /spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39002 cryosparc_command.command_core:start() -c /spshared/apps/cryosparc24/cryosparc_master/gunicorn.conf.py
1776168 1775911   Aug 27 python /spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39003 -c /spshared/apps/cryosparc24/cryosparc_master/gunicorn.conf.py
1776201 1776168   Aug 27 python /spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39003 -c /spshared/apps/cryosparc24/cryosparc_master/gunicorn.conf.py
1776214 1775911   Aug 27 python /spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39005 -c /spshared/apps/cryosparc24/cryosparc_master/gunicorn.conf.py
1776215 1776214   Aug 27 python /spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39005 -c /spshared/apps/cryosparc24/cryosparc_master/gunicorn.conf.py
1776245 1775911   Aug 27 /spshared/apps/cryosparc24/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
1819677  610768 15:42:18 grep --color=auto -e cryosparc_ -e mongod

[spuser@spgpu ~]$ stat /spshared/apps/cryosparc24/cryosparc_master/patch
  File: /spshared/apps/cryosparc24/cryosparc_master/patch
  Size: 7         	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 844724934   Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  spuser)   Gid: ( 1000/  spuser)
Context: unconfined_u:object_r:default_t:s0
Access: 2024-08-28 14:59:34.156900091 -0500
Modify: 2024-08-06 13:34:43.000000000 -0500
Change: 2024-08-26 14:44:08.088818487 -0500
 Birth: 2024-08-26 14:44:08.088818487 -0500

[spuser@spgpu ~]$ cat /spshared/apps/cryosparc24/cryosparc_master/patch
240807

[spuser@spgpu ~]$ ls -l /tmp/cryosparc*sock
srwx------. 1 spuser spuser 0 Aug 27 14:49 /tmp/cryosparc-supervisor-5d2f448f7681bf0b5e8e3ab5b60ac9ce.sock

i really appreciate the help

bmdennis · September 3, 2024, 5:55pm

I recently came across this error after upgrading a lab cryoSPARC instance from 4.4.1 to 4.5.3+240807. I had let the master update process take care of running the worker updates on the worker nodes. It looks like something went wrong with the worker updates though, as although each worker was reporting v4.5.3+240807 after the master update finished, any jobs run on the workers would receive this same error:
TypeError: addcol_array() takes exactly 5 positional arguments (3 given)

In order to resolve the error, I had to download the 4.5.3 cryosparc_worker.tar.gz release, then do a force update + patch on each of the worker nodes via cryosparcw update --override then cryosparcw patch. After doing so, jobs once again ran without issue on the worker nodes.

rakroy · September 9, 2024, 9:52pm

Hi @bmdennis ,

Thank you for the suggestion. I tried the following: curl downloaded the worker_tar.gz file from cryosparc. But it wouldn’t let me extract and update. i feel like it might be a double instance issue. Is there any way to stop all active instances under this lisence?

here is the output. Maybe once I can stop all instances and start it back up again with the patch it will work?
please let me know if you have some solutions for multi-instances and how to shut all of them down. I checked my license is correct.

[spuser@spgpu cryosparc24]$ tar -xvzf cryosparc_worker.tar.gz cryosparc_worker

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
[spuser@spgpu cryosparc24]$ cryosparcm status

CryoSPARC System master node installed at
/spshared/apps/cryosparc24/cryosparc_master
Current cryoSPARC version: v4.5.3+240807

CryoSPARC is not running.

global config variables:
export CRYOSPARC_LICENSE_ID=“”
export CRYOSPARC_MASTER_HOSTNAME=“spgpu”
export CRYOSPARC_DB_PATH=“/spshared/apps/cryosparc24/cryosparc_database”
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000
export CRYOSPARC_PROJECT_DIR_PREFIX=‘CS-’
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_CLICK_WRAP=true

[spuser@spgpu cryosparc24]$ curl https://get.cryosparc.com/checklicenseexists/$LICENSE_ID
{“message”:“Missing Authentication Token”}[spuser@spgpu cryosparc24]$ ls

rakroy · September 10, 2024, 1:48pm

@wtempel do you have any suggestions? any help is highly appreciated. Thank you.

wtempel · September 10, 2024, 2:24pm

Please carefully track where and when a CryoSPARC license ID issued to you is being used for a CryoSPARC instance. Each active CryoSPARC instance (equivalent to a cryosparc_master/ installation) requires a unique license ID. In case of a license ID conflict, please shutdown affected CryoSPARC instances individually.

What are the outputs of these commands:

file /spshared/apps/cryosparc24/cryosparc_worker.tar.gz
ls -l /spshared/apps/cryosparc24/cryosparc_worker.tar.gz

?
The second argument to the tar command, cryosparc_worker, may cause the tar command to fail. If there is not already a directory cryosparc_worker/, you may want to try instead

tar xvf cryosparc_worker.tar.gz

rakroy · September 10, 2024, 3:47pm

Hi @wtempel ,

Here are the outputs you asked.

[spuser@spgpu cryosparc24]$ file /spshared/apps/cryosparc24/cryosparc_worker.tar.gz
/spshared/apps/cryosparc24/cryosparc_worker.tar.gz: ASCII text, with no line terminators
[spuser@spgpu cryosparc24]$ ls -l /spshared/apps/cryosparc24/cryosparc_worker.tar.gz
-rw-rw-r--. 1 spuser spuser 42 Sep  9 16:39 /spshared/apps/cryosparc24/cryosparc_worker.tar.gz

meanwhile, we have been very careful about single instances and where we use the licenseID. Our computer restarted and a new member started a separate cryosparc session on the same stand-alone workstation without our knowledge. I was hoping to find a way to stop all cryosparc processes for this standalone machine and then only start over a single instance such that we don’t get that missing token error. please advise.

Thank you.

wtempel · September 10, 2024, 5:28pm

rakroy:

[spuser@spgpu cryosparc24]$ file /spshared/apps/cryosparc24/cryosparc_worker.tar.gz
/spshared/apps/cryosparc24/cryosparc_worker.tar.gz: ASCII text, with no line terminators
[spuser@spgpu cryosparc24]$ ls -l /spshared/apps/cryosparc24/cryosparc_worker.tar.gz
-rw-rw-r--. 1 spuser spuser 42 Sep  9 16:39 /spshared/apps/cryosparc24/cryosparc_worker.tar.gz

It is possible that $LICENSE_ID was specified in the download command, but LICENSE_ID not properly defined in the environment.
You may want to try the download again, this time ensuring that LICENSE_ID is defined:

LICENSE_ID="your-license-id"
curl -L https://get.cryosparc.com/download/worker-v4.5.3/$LICENSE_ID -o cryosparc_worker.tar.gz

I specified the version explicitly to match the version of your cryosparc_master/ installation. Replacing v4.5.3 with latest would download a newer release.

A comprehensive shutdown procedure is described in the guide.

rakroy · September 10, 2024, 9:24pm

Hey @wtempel,

Thank you for the advice. But here are the issues. I checked, the license-id and the environment are correctly defined. I downloaded the correct version, also made sure I went through the checklist of comprehensive shutdown (just didnt do - systemctl stop), made sure there is no mongo or supervisord zombie processes going on.

I still have the same issues

no jobs would run. They would just sit idly in the launched status.
tar extraction fails with the exact same error as before (i tried all different arguments for the tar comand)

gzip: stdin: not in gzip format

tar: Child returned status 1
tar: Error is not recoverable: exiting now

please let me know what you think might be going on. Thank you so much for the help.

wtempel · September 10, 2024, 9:56pm

Very likely, the download did not work correctly. The file cryosparc_worker.tar.gz for version v4.5.3 should be several gigabytes in size. What are the outputs of these commands:

file /spshared/apps/cryosparc24/cryosparc_worker.tar.gz
ls -l /spshared/apps/cryosparc24/cryosparc_worker.tar.gz
# Do not execute the following command if the file is larger than a megabyte
cat /spshared/apps/cryosparc24/cryosparc_worker.tar.gz

?

rakroy · September 11, 2024, 1:33pm

@wtempel

you are absolutely right. It was just a tiny file. (i did try to download it at a different location, it still doesn’t download properly).
here are the outputs

$ ls -l cryosparc_worker.tar.gz

rw- rw- r–. 1 spuser spuser 42 Sep 10 16:07 cryosparc_worker.tar.gz

$ cat cryosparc_worker.tar.gz
{“message”:“MIssing Authentication Token”}[spuser@spgpu cryosparc24]

Please let me know how to proceed. Thank you

wtempel · September 11, 2024, 2:56pm

This message suggest that LICENSE_ID was not defined in the shell were you ran

curl -L https://get.cryosparc.com/download/worker-v4.5.3/$LICENSE_ID -o cryosparc_worker.tar.gz

You may want to retry running the curl command immediately after defining the LICENSE_ID variable:

LICENSE_ID="your-unique-license-id" # replace with actual license id issued to you
curl -L https://get.cryosparc.com/download/worker-v4.5.3/$LICENSE_ID -o cryosparc_worker.tar.gz

and confirming that the cryosparc_worker.tar.gz file has a size of several gigabytes before trying to unpack it.

rakroy · September 11, 2024, 3:21pm

@wtempel

Thank you for that help. Now its unpacked. please let me know how to proceed from here. If there is any other thread where its mentioned what to do next, please feel free to redirect to that. Thank you again for all the help.

rakroy · September 12, 2024, 4:02pm

@wtempel I tried to reconnect everything and run cryosparc, it still gets stuck. and I am unable to run the override or patch. If you have any other suggestion that would be great. THank you for understanding.

rakroy · September 13, 2024, 4:49pm

Anyone has any ideas?

wtempel · September 16, 2024, 7:32pm

What is the output of the command

tail -n 40 /spshared/apps/cryosparc24/cryosparc_master/run/update.log

?

rakroy · September 16, 2024, 8:07pm

Hey @wtempel,

Thank you again for your help with everything. Here is the output you asked.

spuser@spgpu cryosparc24]$ tail -n 40 /spshared/apps/cryosparc24
tail: error reading '/spshared/apps/cryosparc24': Is a directory
[spuser@spgpu cryosparc24]$ tail -n 40 /spshared/apps/cryosparc24/cryosparc_master/run/update.log
Linking tqdm-4.66.2-pyhd8ed1ab_0
Linking urllib3-2.2.1-pyhd8ed1ab_0
Linking requests-2.31.0-pyhd8ed1ab_0
Linking zstandard-0.22.0-py310h1275a96_0
Linking conda-package-streaming-0.9.0-pyhd8ed1ab_0
Linking conda-package-handling-2.2.0-pyh38be061_0
Linking conda-24.1.2-py310hff52083_0
Linking conda-libmamba-solver-24.1.0-pyhd8ed1ab_0
Linking mamba-1.5.7-py310h51d5547_0
Transaction finished
installation finished.
  ------------------------------------------------------------------------
    Done.
    anaconda python installation successful.
  ------------------------------------------------------------------------
  Extracting all conda packages...
  ------------------------------------------------------------------------
..................................................................................................
  ------------------------------------------------------------------------
    Done.
    conda packages installation successful.
  ------------------------------------------------------------------------
  Main dependency installation completed. Continuing...
  ------------------------------------------------------------------------
  Completed.
  Currently checking hash for mongodb
  Dependencies for mongodb have not changed.
  Completed dependency check.
 
===================================================
Successfully updated master to version v4.5.3.
===================================================
 
Starting CryoSPARC System master process...
CryoSPARC is not already running.
configuring database...
    configuration complete
database: started
database OK
command_core: started

wtempel · September 16, 2024, 8:40pm

There is no indication of cryosparc_worker/ being updated. If you post the output of the command

cryosparcm cli "get_scheduler_targets()"

we can let you know whether this is or is not expected.
Certain configurations require manual update and patching (see Cluster tab of section) of cryosparc_worker/.

rakroy · September 16, 2024, 9:15pm

Hey @wtempel

Here is the ooutput you asked for. Meanwhile, I start looking at the section on manual update.

spuser@spgpu cryosparc24]$ cryosparcm cli “get_scheduler_targets()”
/spshared/apps/cryosparc24/cryosparc_master/cryosparc_tools/cryosparc/command.py:135: UserWarning: *** CommandClient: (http://spgpu:39002/api) URL Error [Errno 111] Connection refused, attempt 1 of 3. Retrying in 30 seconds
system = self._get_callable(“system.describe”)()
/spshared/apps/cryosparc24/cryosparc_master/cryosparc_tools/cryosparc/command.py:135: UserWarning: *** CommandClient: (http://spgpu:39002/api) URL Error [Errno 111] Connection refused, attempt 2 of 3. Retrying in 30 seconds
system = self._get_callable(“system.describe”)()
Traceback (most recent call last):
File “/spshared/apps/cryosparc24/cryosparc_master/cryosparc_tools/cryosparc/command.py”, line 105, in func
with make_json_request(self, “/api”, data=data, _stacklevel=4) as request:
File “/spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/contextlib.py”, line 135, in enter
return next(self.gen)
File “/spshared/apps/cryosparc24/cryosparc_master/cryosparc_tools/cryosparc/command.py”, line 226, in make_request
raise CommandError(error_reason, url=url, code=code, data=resdata)
cryosparc_tools.cryosparc.errors.CommandError: *** (http://spgpu:39002/api, code 500) URL Error [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/spshared/apps/cryosparc24/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/spshared/apps/cryosparc24/cryosparc_master/cryosparc_compute/client.py”, line 57, in
cli = CommandClient(host=host, port=int(port))
File “/spshared/apps/cryosparc24/cryosparc_master/cryosparc_compute/client.py”, line 38, in init
super().init(service, host, port, url, timeout, headers, cls=NumpyEncoder)
File “/spshared/apps/cryosparc24/cryosparc_master/cryosparc_tools/cryosparc/command.py”, line 97, in init
self._reload() # attempt connection immediately to gather methods
File “/spshared/apps/cryosparc24/cryosparc_master/cryosparc_tools/cryosparc/command.py”, line 135, in _reload
system = self._get_callable(“system.describe”)()
File “/spshared/apps/cryosparc24/cryosparc_master/cryosparc_tools/cryosparc/command.py”, line 108, in func
raise CommandError(
cryosparc_tools.cryosparc.errors.CommandError: *** (http://spgpu:39002, code 500) Encounted error from JSONRPC function “system.describe” with params ()

Cryosparc wont let me run any job after the update to v4.5.3

gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now [spuser@spgpu cryosparc24]$ cryosparcm status

CryoSPARC System master node installed at /spshared/apps/cryosparc24/cryosparc_master Current cryoSPARC version: v4.5.3+240807

gzip: stdin: not in gzip format

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
[spuser@spgpu cryosparc24]$ cryosparcm status

CryoSPARC System master node installed at
/spshared/apps/cryosparc24/cryosparc_master
Current cryoSPARC version: v4.5.3+240807