Update to version 5.0

Hi,

The update to version 5.0 doesn’t seem to work right.

I was updating from a working version 4.7.1 on a Rocky 9.7 workstation.

It started complaining about the “port range 39000 to 39010 overlap with ephemeral port range” then when it got to the database upgrade it threw the error:

“ConnectionError: Error 111 connecting to hostname.xxx.xxx:39004. Connection refused.” and terminated the update process (hostname redacted)

Cryosparcm status says that master node is at version v5.0.0 and

cryosparc process status:

error <class ‘xmlrpc.client.ProtocolError’>, <ProtocolError for 127.0.0.1/RPC2: 404 NOT FOUND>: file: /xxx/cryosparc2/cryosparc_master/.pixi/envs/lib/python3.12/site-packages/supervisor/xmlrpc.py line: 539

Istvan

@istv01 Please can you post the outputs of these commands

ps -eo user:12,pid,ppid,start,cmd | grep -e cryosparc_ -e mongo
ls -l /tmp/cryosparc*.sock /tmp/mongo*.sock

ps -eo user:12,pid,ppid,start,cmd | grep -e cryosparc_ -e mongo

root 1898847 1 Jan 15 python /x/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /x/cryosparc2/cryosparc_master/supervisord.conf
root 1898957 1898847 Jan 15 mongod --auth --dbpath /data/cryosparc_database --port 39001 --oplogSize 64 --replSet meteor --wiredTigerCacheSizeGB 4 --bind_ip_all
root 1899062 1898847 Jan 15 python /x/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39002 cryosparc_command.command_core:start() -c /x/cryosparc2/cryosparc_master/gunicorn.conf.py
root 1899063 1899062 Jan 15 python /x/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39002 cryosparc_command.command_core:start() -c /x/cryosparc2/cryosparc_master/gunicorn.conf.py
root 1899104 1898847 Jan 15 python3.10 /x/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/flask --app cryosparc_command.command_vis run -h 0.0.0.0 -p 39003 --with-threads
root 1899138 1898847 Jan 15 python /x/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39005 -c /x/cryosparc2/cryosparc_master/gunicorn.conf.py
root 1899139 1899138 Jan 15 python /x/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39005 -c /x/cryosparc2/cryosparc_master/gunicorn.conf.py
root 1899169 1898847 Jan 15 /x/cryosparc2/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
root 2015717 2015193 15:23:10 grep --color=auto -e cryosparc_ -e mongo

ls -l /tmp/cryosparc*.sock /tmp/mongo*.sock

ls: cannot access ‘/tmp/cryosparc*.sock’: No such file or directory
srwx------. 1 root root 0 Jan 15 09:44 /tmp/mongodb-39001.sock

Thanks @istv01. Please can you try

  1. a thorough shutdown of CryoSPARC (instructions)
  2. then cryosparcm start

and let us know

  1. whether the startup succeeds?
  2. the output of the command
    cat /x/cryosparc2/cryosparc_master/version
    

Additional recovery steps will likely be required thereafter, and depend on the outcome of the previous recovery steps.

Thanks for the suggestions. Shutting down all processes helped a bit, but was still complaining about a process hogging port 39000, so I manually killed that too. After that it started up without errors and shows the pid for the processes in the status.

However, in the interface I could not re-run an extensive validation, it queues the job and then nothing happens. It creates a tile for the job and says: launched, running job on remote node

But nothing is happening. nvidia-smi shows no CUDA processes and there is no progress.

You may have to manually update the cryosparc_worker installation(s). Does this help?

Indeed. I thought this issue was sorted out, updates were derailing quite a bit in versions >4.2

It would be nice to show the worker node’s status too (or version #) for the “cryosparcm status” command (for single workstation installations).

Benchmark works now. Thanks for the suggestion.

By the way, is there a way to force multi-GPU use for multi-GPU type jobs? For ex. motion patch multi-GPU always runs on a single GPU (a 450 movie dataset from EMPIAR). Splitting the data in 2 and running it in parallel finishes in 2/3rd of the time of the single GPU job. Patch CTF also always runs on a single GPU (for this small dataset), despite specifying 2. Running it in split mode finishes in half the time. I could not find anything on the forum related to this, other than the disk speed is limiting therefore cryosparc decides to use a single GPU. It also doesn’t use the SSD cache, despite selecting it. Doesn’t make much sense.

Thanks for the confirmation.

Am I assuming correctly that this question concerns jobs that run outside the Extensive Validation workflow?
Please can you post information for the specific job

project_uid="P99" # specify actual project ID
job_uid="J199" # specify actual job ID for motion correction of 450 movies
cryosparcm cli "api.jobs.find_one('$project_uid', '$job_uid').params"
cryosparcm cli "api.jobs.find_one('$project_uid', '$job_uid').instance_information"

This was for the dataset EMPIAR10096 ~450 movies.

For the patch CTF multiGPU job:

cryosparcm cli “api.jobs.find_one(‘$project_uid’, ‘$job_uid’).params”

{“do_plots”: true, “num_plots”: 10, “classic_mode”: false, “amp_contrast”: 0.1, “res_min_align”: 25, “res_max_align”: 4.0, “df_search_min”: 1000, “df_search_max”: 40000, “phase_shift_min”: 0, “phase_shift_max”: 3.141592653589793, “do_phase_shift_refine_only”: false, “compute_num_gpus”: 1}

cryosparcm cli “api.jobs.find_one(‘$project_uid’, ‘$job_uid’).instance_information”

{“platform_node”: “x.x.x.x”, “platform_release”: “5.14.0-570.23.1.el9_6.x86_64”, “platform_version”: “#1 SMP PREEMPT_DYNAMIC Thu Jun 26 19:29:53 UTC 2025”, “platform_architecture”: “x86_64”, “cpu_model”: “Intel(R) Xeon(R) w5-2465X”, “physical_cores”: 16, “max_cpu_freq”: 0.0, “total_memory”: “250.84GB”, “available_memory”: “244.59GB”, “used_memory”: “4.01GB”, “ofd_soft_limit”: 1024, “ofd_hard_limit”: 524288, “driver_version”: “12.8”, “CUDA_version”: “11.8”, “gpu_info”: [{“id”: 0, “name”: “NVIDIA RTX 5000 Ada Generation”, “mem”: 33796980736, “bus_id”: “”, “compute_mode”: “Default”, “persistence_mode”: “Disabled”, “power_limit”: 0.0, “sw_power_limit”: “Not Active”, “hw_power_limit”: “Not Active”, “max_pcie_link_gen”: 0, “current_pcie_link_gen”: 0, “temperature”: 0, “gpu_utilization”: 0, “memory_utilization”: 0, “driver_version”: “”}, {“id”: 1, “name”: “NVIDIA RTX 5000 Ada Generation”, “mem”: 33805828096, “bus_id”: “”, “compute_mode”: “Default”, “persistence_mode”: “Disabled”, “power_limit”: 0.0, “sw_power_limit”: “Not Active”, “hw_power_limit”: “Not Active”, “max_pcie_link_gen”: 0, “current_pcie_link_gen”: 0, “temperature”: 0, “gpu_utilization”: 0, “memory_utilization”: 0, “driver_version”: “”}], “version”: “”}

I just tried to re-run this same job on v5.0 on the same machine and it’s working on 2GPUs which is Great !

The extensive validation EMPIAR10025 data patch motion and patch ctf now also run on 2GPUs on v5.0, but still run single GPU on v4.7.1 (just tried it). Also cryosparc v4.7.1-cuda12 doesn’t take the commands:

cryosparcm cli "api.jobs.find_one('$project_uid', '$job_uid').params"
cryosparcm cli "api.jobs.find_one('$project_uid', '$job_uid').instance_information"

give the error:

AttributeError: ‘CommandClient’ object has no attribute ‘api’

It appears that the patch CTF job wasn’t specified to run with Number of GPUs to parallelize: 2, and instead ran with the default of 1 GPU.

These commands are new in v5 and not expected to work with v4.7.1.

In v4.7.1, you may query job parameters with the command

cryosparcm cli "get_job('P99', 'J199', 'job_type', 'version', 'instance_information', 'params_spec', 'parents')"

Please replace P99 with the applicable project ID and run the command twice where you replace J199

  1. first, with the job ID of the v4.7.1 Extensive Validation job
  2. second, with the job ID of the v4.7.1 Patch CTF job

I updated most machines to v5.0.1 now, but this is from a machine that still runs v4.7.1 from the Extensive validation job.

cryosparcm cli “get_job(‘P1’, ‘J3’, ‘job_type’, ‘version’, ‘instance_information’, ‘params_spec’, ‘parents’)”

{‘_id’: ‘69304cf0f0519f7d918d20b1’, ‘instance_information’: {‘CUDA_version’: ‘12.8’, ‘available_memory’: ‘491.48GB’, ‘cpu_model’: ‘AMD Ryzen Threadripper PRO 7985WX 64-Cores’, ‘driver_version’: ‘13.0’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 101963333632, ‘name’: ‘NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition’, ‘pcie’: ‘0000:01:00’}, {‘id’: 1, ‘mem’: 101971722240, ‘name’: ‘NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition’, ‘pcie’: ‘0000:c2:00’}], ‘ofd_hard_limit’: 524288, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 64, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘x.x.x.x’, ‘platform_release’: ‘5.14.0-611.9.1.el9_7.x86_64’, ‘platform_version’: ‘#1 SMP PREEMPT_DYNAMIC Tue Nov 25 17:53:21 UTC 2025’, ‘total_memory’: ‘502.30GB’, ‘used_memory’: ‘7.52GB’}, ‘job_type’: ‘patch_motion_correction_multi’, ‘params_spec’: {}, ‘parents’: [‘J2’], ‘project_uid’: ‘P1’, ‘uid’: ‘J3’, ‘version’: ‘v4.7.1-cuda12’}

cryosparcm cli “get_job(‘P1’, ‘J4’, ‘job_type’, ‘version’, ‘instance_information’, ‘params_spec’, ‘parents’)”

{‘_id’: ‘69304d65f0519f7d918d2e9e’, ‘instance_information’: {‘CUDA_version’: ‘12.8’, ‘available_memory’: ‘491.20GB’, ‘cpu_model’: ‘AMD Ryzen Threadripper PRO 7985WX 64-Cores’, ‘driver_version’: ‘13.0’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 101963333632, ‘name’: ‘NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition’, ‘pcie’: ‘0000:01:00’}, {‘id’: 1, ‘mem’: 101971722240, ‘name’: ‘NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition’, ‘pcie’: ‘0000:c2:00’}], ‘ofd_hard_limit’: 524288, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 64, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘x.x.x.x’, ‘platform_release’: ‘5.14.0-611.9.1.el9_7.x86_64’, ‘platform_version’: ‘#1 SMP PREEMPT_DYNAMIC Tue Nov 25 17:53:21 UTC 2025’, ‘total_memory’: ‘502.30GB’, ‘used_memory’: ‘7.80GB’}, ‘job_type’: ‘patch_ctf_estimation_multi’, ‘params_spec’: {}, ‘parents’: [‘J3’], ‘project_uid’: ‘P1’, ‘uid’: ‘J4’, ‘version’: ‘v4.7.1-cuda12’}

Thanks @istv01 . Please can you also post the outputs of the commands

cryosparcm cli "get_job('P1', 'J1', 'job_type', 'version', 'instance_information', 'params_spec', 'parents')"
cryosparcm cli "get_job('P1', 'J2', 'job_type', 'version', 'instance_information', 'params_spec', 'parents')"

cryosparcm cli “get_job(‘P1’, ‘J1’, ‘job_type’, ‘version’, ‘instance_information’, ‘params_spec’, ‘parents’)”

{‘_id’: ‘69304bddf0519f7d918d103b’, ‘instance_information’: {‘available_memory’: ‘491.95GB’, ‘cpu_model’: ‘AMD Ryzen Threadripper PRO 7985WX 64-Cores’, ‘ofd_hard_limit’: 524288, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 64, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘x.x.x.x’, ‘platform_release’: ‘5.14.0-611.9.1.el9_7.x86_64’, ‘platform_version’: ‘#1 SMP PREEMPT_DYNAMIC Tue Nov 25 17:53:21 UTC 2025’, ‘total_memory’: ‘502.30GB’, ‘used_memory’: ‘7.05GB’}, ‘job_type’: ‘extensive_workflow_bench’, ‘params_spec’: {‘testing’: {‘value’: ‘Benchmark’}}, ‘parents’: , ‘project_uid’: ‘P1’, ‘uid’: ‘J1’, ‘version’: ‘v4.7.1-cuda12’}

cryosparcm cli “get_job(‘P1’, ‘J2’, ‘job_type’, ‘version’, ‘instance_information’, ‘params_spec’, ‘parents’)”

{‘_id’: ‘69304ce1f0519f7d918d1df4’, ‘instance_information’: {‘available_memory’: ‘491.49GB’, ‘cpu_model’: ‘AMD Ryzen Threadripper PRO 7985WX 64-Cores’, ‘ofd_hard_limit’: 524288, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 64, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘x.x.x.x’, ‘platform_release’: ‘5.14.0-611.9.1.el9_7.x86_64’, ‘platform_version’: ‘#1 SMP PREEMPT_DYNAMIC Tue Nov 25 17:53:21 UTC 2025’, ‘total_memory’: ‘502.30GB’, ‘used_memory’: ‘7.51GB’}, ‘job_type’: ‘import_movies’, ‘params_spec’: {‘accel_kv’: {‘value’: 300}, ‘blob_paths’: {‘value’: ‘/x/CS-benchmark/empiar_10025_subset/*.tif’}, ‘cs_mm’: {‘value’: 2.7}, ‘gainref_path’: {‘value’: ‘/x/CS-benchmark/empiar_10025_subset/norm-amibox05-0.mrc’}, ‘psize_A’: {‘value’: 0.6575}, ‘total_dose_e_per_A2’: {‘value’: 53}}, ‘parents’: , ‘project_uid’: ‘P1’, ‘uid’: ‘J2’, ‘version’: ‘v4.7.1-cuda12’}

Thanks for posting the additional job information. It seems that a value was not specified for the Number of GPUs to use parameter of the Extensive Validation job. Under those circumstances, it is expected that jobs would

Yes, in this case it was not specified and defaulted to 1 GPU. In the case of another EMPIAR dataset (not part of the Extensive validation) I did specify 2 and still defaulted to 1. This was a reproducible problem on v4.7.1 but seems to work OK on v5 (if you specify 2 GPUs, runs on 2 GPUs).