Topaz train errors while running

S_12_Daou · November 14, 2025, 5:14pm

Hello everyone,

I have been getting errors recently while running Topaz train job.

It seems inconsistent: sometimes it works, and other times it just crashes.

Please see the error below:

This is my GPU station configuration:

My CryoSparc version is: v4.5.3.

I would greatly appreciate your help and insights on how to fix the issue. Do I need to upgrade the CryoSparc version, or is the CUDA version not compatible?

Many thanks,

Salima

wtempel · November 14, 2025, 8:33pm

Welcome to the forum Salima @S_12_Daou . Please can you

post the outputs of these commands on the worker node where Topaz training failed:
```
uname -a
free -h
```
let us know whether the node also runs significant workloads outside CryoSPARC

post the outputs of these commands on the CryoSPARC master host. Please replace /path/to with the actual path of the directory that contains the cryosparc_master/ directory.

cd /path/to/cryosparc_master/ # replace with actual path
csprojectid='P19'
csjobid='J220'
./bin/cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'status',  'params_spec', 'errors_run', 'input_slot_groups', 'started_at')" 
./bin/cryosparcm eventlog $csprojectid $csjobid | tail -n 40
./bin/cryosparcm joblog $csprojectid $csjobid | tail -n 20
cd - # back to previous directory

[edited 2025-11-24 with corrected instructions]

S_12_Daou · November 17, 2025, 3:59pm

Hello @wtempel

Thanks a lot for your reply,

1- Please see below the outputs of the commands:

csprojectid=‘P19’

csjobid=‘J220’

cryosparcm cli “get_job(‘$csprojectid’, ‘$csjobid’, ‘job_type’, ‘version’, ‘instance_information’, ‘status’, ‘params_spec’, ‘errors_run’, ‘input_slot_groups’, ‘started_at’)”

cryosparcm eventlog $csprojectid $csjobid | tail -n 40

cryosparcm joblog $csprojectid $csjobid | tail -n 20

uname -a

free -h

bash: cryosparcm: command not found…

Linux quad3.XXX 5.14.0-570.28.1.el9_6.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul 24 10:32:22 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

           total        used        free      shared  buff/cache   available

Mem: 125Gi 10Gi 12Gi 21Mi 104Gi 115Gi

Swap: 31Gi 2.4Gi 29Gi

2- For the second question, I am not sure I understand it. Would you please give me further insights?

Many thanks for your help, I greatly appreciate it.

Salima

wtempel · November 24, 2025, 4:46pm

Apologies for incorrect instructions. Please can you post the outputs of these commands on the CryoSPARC master host. Please replace /path/to with the actual path of the directory that contains the cryosparc_master/ directory.

cd /path/to/cryosparc_master/ # replace with actual path
csprojectid='P19'
csjobid='J220'
./bin/cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'status',  'params_spec', 'errors_run', 'input_slot_groups', 'started_at')" 
./bin/cryosparcm eventlog $csprojectid $csjobid | tail -n 40
./bin/cryosparcm joblog $csprojectid $csjobid | tail -n 20
cd - # back to previous directory

S_12_Daou · November 24, 2025, 9:36pm

Hello @wtempel,

Thanks a lot for reaching out again, I appreciate it.

Please find below the output of the commands:

\[sdaou@quad3 cryoDB\]$ cd /cryoDB/cryosparc_master

\[sdaou@quad3 cryosparc_master\]$ 

\[sdaou@quad3 cryosparc_master\]$ 

\[sdaou@quad3 cryosparc_master\]$ 

\[sdaou@quad3 cryosparc_master\]$ 

\[sdaou@quad3 cryosparc_master\]$ csprojectid='P19'

csjobid='J220'

./bin/cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'status',  'params_spec', 'errors_run', 'input_slot_groups', 'started_at')" 

./bin/cryosparcm eventlog $csprojectid $csjobid | tail -n 40

./bin/cryosparcm joblog $csprojectid $csjobid | tail -n 20

{'\_id': '690633ee1b25f930335eeec0', 'errors_run': \[{'message': 'Subprocess exited with status 1 (/programs/x86_64-linux/topaz/0.2.5-2/bin/topaz train --train-images /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/image_list_train.txt --train-targets /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/topaz_parti…)', 'warning': False}\], 'input_slot_groups': \[{'connections': \[{'group_name': 'exposures_accepted', 'job_uid': 'J219', 'slots': \[{'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'micrograph_blob', 'result_type': 'exposure.micrograph_blob', 'slot_name': 'micrograph_blob', 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'mscope_params', 'result_type': 'exposure.mscope_params', 'slot_name': 'mscope_params', 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'movie_blob', 'result_type': 'exposure.movie_blob', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'background_blob', 'result_type': 'exposure.stat_blob', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'micrograph_thumbnail_blob_1x', 'result_type': 'exposure.thumbnail_blob', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'micrograph_thumbnail_blob_2x', 'result_type': 'exposure.thumbnail_blob', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'ctf', 'result_type': 'exposure.ctf', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'ctf_stats', 'result_type': 'exposure.ctf_stats', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'rigid_motion', 'result_type': 'exposure.motion', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'spline_motion', 'result_type': 'exposure.motion', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'micrograph_blob_non_dw', 'result_type': 'exposure.micrograph_blob', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'micrograph_blob_non_dw_AB', 'result_type': 'exposure.micrograph_blob', 'slot_name': None, 'version': 'F'}, {'group_name': 'exposures_accepted', 'job_uid': 'J219', 'result_name': 'gain_ref_blob', 'result_type': 'exposure.gain_ref_blob', 'slot_name': None, 'version': 'F'}\]}\], 'count_max': inf, 'count_min': 1, 'description': 'Micrographs for training Topaz', 'name': 'micrographs', 'repeat_allowed': False, 'slots': \[{'description': '', 'name': 'micrograph_blob', 'optional': False, 'title': 'Raw micrograph data', 'type': 'exposure.micrograph_blob'}, {'description': '', 'name': 'micrograph_blob_denoised', 'optional': True, 'title': 'Denoised micrograph data', 'type': 'exposure.micrograph_blob'}, {'description': '', 'name': 'mscope_params', 'optional': True, 'title': 'Microscope parameters for identifying negatively stained data', 'type': 'exposure.mscope_params'}\], 'title': 'Micrographs', 'type': 'exposure'}, {'connections': \[{'group_name': 'particles_accepted', 'job_uid': 'J219', 'slots': \[{'group_name': 'particles_accepted', 'job_uid': 'J219', 'result_name': 'location', 'result_type': 'particle.location', 'slot_name': 'location', 'version': 'F'}, {'group_name': 'particles_accepted', 'job_uid': 'J219', 'result_name': 'pick_stats', 'result_type': 'particle.pick_stats', 'slot_name': None, 'version': 'F'}, {'group_name': 'particles_accepted', 'job_uid': 'J219', 'result_name': 'alignments3D', 'result_type': 'particle.alignments3D', 'slot_name': None, 'version': 'F'}, {'group_name': 'particles_accepted', 'job_uid': 'J219', 'result_name': 'ctf', 'result_type': 'particle.ctf', 'slot_name': None, 'version': 'F'}, {'group_name': 'particles_accepted', 'job_uid': 'J219', 'result_name': 'blob', 'result_type': 'particle.blob', 'slot_name': None, 'version': 'F'}, {'group_name': 'particles_accepted', 'job_uid': 'J219', 'result_name': 'alignments2D', 'result_type': 'particle.alignments2D', 'slot_name': None, 'version': 'F'}\]}\], 'count_max': inf, 'count_min': 1, 'description': 'Particle locations for training Topaz', 'name': 'particles', 'repeat_allowed': False, 'slots': \[{'description': '', 'name': 'location', 'optional': False, 'title': 'Particle locations', 'type': 'particle.location'}\], 'title': 'Particles', 'type': 'particle'}\], 'instance_information': {'CUDA_version': '11.8', 'available_memory': '117.76GB', 'cpu_model': 'AMD Ryzen Threadripper PRO 5965WX 24-Cores', 'driver_version': '12.9', 'gpu_info': \[{'id': 0, 'mem': 50897289216, 'name': 'NVIDIA RTX A6000', 'pcie': '0000:01:00'}, {'id': 1, 'mem': 50897289216, 'name': 'NVIDIA RTX A6000', 'pcie': '0000:21:00'}, {'id': 2, 'mem': 50897289216, 'name': 'NVIDIA RTX A6000', 'pcie': '0000:41:00'}, {'id': 3, 'mem': 50897289216, 'name': 'NVIDIA RTX A6000', 'pcie': '0000:42:00'}\], 'ofd_hard_limit': 524288, 'ofd_soft_limit': 1024, 'physical_cores': 24, 'platform_architecture': 'x86_64', 'platform_node': 'quad3.mshri.on.ca', 'platform_release': '5.14.0-570.28.1.el9_6.x86_64', 'platform_version': '#1 SMP PREEMPT_DYNAMIC Thu Jul 24 10:32:22 UTC 2025', 'total_memory': '125.13GB', 'used_memory': '6.17GB'}, 'job_type': 'topaz_train', 'params_spec': {'compute_num_workers': {'value': 1}, 'exec_path': {'value': '/programs/x86_64-linux/topaz/0.2.5-2/bin/topaz'}, 'num_distribute': {'value': 1}, 'num_particles': {'value': 2000}, 'par_diam': {'value': 130}, 'pretrained': {'value': True}}, 'project_uid': 'P19', 'started_at': 'Sun, 02 Nov 2025 21:14:30 GMT', 'status': 'failed', 'uid': 'J220', 'version': 'v4.5.3'}

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/topaz/main.py", line 148, in main

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] args.func(args)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 695, in main

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] , save_prefix=save_prefix, use_cuda=use_cuda, output=output)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 577, in fit_epochs

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] , use_cuda=use_cuda, output=output)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 557, in fit_epoch

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] metrics = step_method.step(X, Y)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/topaz/methods.py", line 103, in step

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] score = self.model(X).view(-1)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in \__call_\_

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] result = self.forward(\*input, \*\*kwargs)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/topaz/model/classifier.py", line 28, in forward

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] z = self.features(x)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in \__call_\_

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] result = self.forward(\*input, \*\*kwargs)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/topaz/model/features/resnet.py", line 54, in forward

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] z = self.features(x)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in \__call_\_

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] result = self.forward(\*input, \*\*kwargs)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] input = module(input)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in \__call_\_

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] result = self.forward(\*input, \*\*kwargs)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/topaz/model/features/resnet.py", line 270, in forward

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] y = self.conv(x)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in \__call_\_

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] result = self.forward(\*input, \*\*kwargs)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] return self.conv2d_forward(input, self.weight)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] File "/programs/x86_64-linux/topaz/0.2.5-2/topaz_extlib/miniconda3-4.8.2-b5qb/envs/topaz/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] self.padding, self.dilation, self.groups)

\[Sun, 02 Nov 2025 21:58:26 GMT\] \[CPU RAM used: 326 MB\] RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

\[Sun, 02 Nov 2025 21:58:27 GMT\] \[CPU RAM used: 326 MB\] Traceback (most recent call last):

  File "cryosparc_master/cryosparc_compute/run.py", line 115, in cryosparc_master.cryosparc_compute.run.main

  File "/cryoDB/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py", line 384, in run_topaz_wrapper_train

    utils.run_process(train_command)

  File "/cryoDB/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py", line 99, in run_process

    assert process.returncode == 0, f"Subprocess exited with status {process.returncode} ({str_command})"

AssertionError: Subprocess exited with status 1 (/programs/x86_64-linux/topaz/0.2.5-2/bin/topaz train --train-images /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/image_list_train.txt --train-targets /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/topaz_parti…)

========= sending heartbeat at 2025-11-02 16:57:36.636152

========= sending heartbeat at 2025-11-02 16:57:46.650376

========= sending heartbeat at 2025-11-02 16:57:56.664358

========= sending heartbeat at 2025-11-02 16:58:06.678569

========= sending heartbeat at 2025-11-02 16:58:16.692996

========= sending heartbeat at 2025-11-02 16:58:26.709436

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

Running job on hostname %s quad3.mshri.on.ca

Allocated Resources :  {'fixed': {'SSD': False}, 'hostname': 'quad3.mshri.on.ca', 'lane': 'default', 'lane_type': 'node', 'license': False, 'licenses_acquired': 0, 'slots': {'CPU': \[0\], 'GPU': \[0\], 'RAM': \[0\]}, 'target': {'cache_path': '/ssd_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': \[{'id': 0, 'mem': 50897289216, 'name': 'NVIDIA RTX A6000'}, {'id': 1, 'mem': 50897289216, 'name': 'NVIDIA RTX A6000'}, {'id': 2, 'mem': 50897289216, 'name': 'NVIDIA RTX A6000'}, {'id': 3, 'mem': 50897289216, 'name': 'NVIDIA RTX A6000'}\], 'hostname': 'quad3.mshri.on.ca', 'lane': 'default', 'monitor_port': None, 'name': 'quad3.mshri.on.ca', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': \[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47\], 'GPU': \[0, 1, 2, 3\], 'RAM': \[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15\]}, 'ssh_str': 'cryspc@quad3.mshri.on.ca', 'title': 'Worker node quad3.mshri.on.ca', 'type': 'node', 'worker_bin_path': '/cryoDB/cryosparc_worker/bin/cryosparcw'}}

\*\*\*\* handle exception rc

Traceback (most recent call last):

  File "cryosparc_master/cryosparc_compute/run.py", line 115, in cryosparc_master.cryosparc_compute.run.main

  File "/cryoDB/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py", line 384, in run_topaz_wrapper_train

    utils.run_process(train_command)

  File "/cryoDB/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py", line 99, in run_process

    assert process.returncode == 0, f"Subprocess exited with status {process.returncode} ({str_command})"

AssertionError: Subprocess exited with status 1 (/programs/x86_64-linux/topaz/0.2.5-2/bin/topaz train --train-images /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/image_list_train.txt --train-targets /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/topaz_parti…)

set status to failed

========= main process now complete at 2025-11-02 16:58:36.724043.

========= monitor process now complete at 2025-11-02 16:58:36.728125.

Thanks a lot,

Salima

wtempel · November 24, 2025, 10:57pm

Thanks @S_12_Daou Please can you also post the output of the command

/cryoDB/cryosparc_master/bin/cryosparcm eventlog P19 J220 | grep -m1 "topaz train"

S_12_Daou · November 25, 2025, 4:31pm

Hello @wtempel,

Thank you,

Please find below the output for the command:

[sdaou@quad3 bin]$ /cryoDB/cryosparc_master/bin/cryosparcm eventlog P19 J220 | grep -m1 “topaz train”

[Sun, 02 Nov 2025 21:50:57 GMT] [CPU RAM used: 326 MB] Starting dataset splitting by running command /programs/x86_64-linux/topaz/0.2.5-2/bin/topaz train_test_split --number 90 --seed 278009619 --image-dir /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/preprocessed /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/topaz_particles_processed.txt

[sdaou@quad3 bin]$

wtempel · November 25, 2025, 4:53pm

Thanks @S_12_Daou . It may be time to try the topaz train command outside CryoSPARC, as suggested in Topaz: CUDNN error: CUDNN STATUS MAPPING ERROR - #2 by alexjamesnoble. What is the output of the following command (which includes an extra space):

/cryoDB/cryosparc_master/bin/cryosparcm eventlog P19 J220 | grep -m1 "topaz train "

(For the run outside CryoSPARC, you may want to modify some parameters to avoid writing into the CryoSPARC project directory.)

S_12_Daou · November 25, 2025, 10:42pm

Hello @wtempel,

Thank you,

Please find below the output for the command:

[sdaou@quad3 bin]$ eventlog P19 J220 | grep -m1 “topaz train”

bash: eventlog: command not found…

[sdaou@quad3 bin]$ /cryoDB/cryosparc_master/bin/cryosparcm eventlog P19 J220 | grep -m1 “topaz train”

[Sun, 02 Nov 2025 21:50:57 GMT] [CPU RAM used: 326 MB] Starting dataset splitting by running command /programs/x86_64-linux/topaz/0.2.5-2/bin/topaz train_test_split --number 90 --seed 278009619 --image-dir /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/preprocessed /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/topaz_particles_processed.txt

[sdaou@quad3 bin]$ /cryoDB/cryosparc_master/bin/cryosparcm eventlog P19 J220 | grep -m1 "topaz train "

[Sun, 02 Nov 2025 21:51:01 GMT] [CPU RAM used: 326 MB] Starting training by running command /programs/x86_64-linux/topaz/0.2.5-2/bin/topaz train --train-images /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/image_list_train.txt --train-targets /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/topaz_particles_processed_train.txt --test-images /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/image_list_test.txt --test-targets /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/topaz_particles_processed_test.txt --num-particles 2000 --learning-rate 0.0002 --minibatch-size 128 --num-epochs 10 --method GE-binomial --slack -1 --autoencoder 0 --l2 0.0 --minibatch-balance 0.0625 --epoch-size 5000 --model resnet8 --units 32 --dropout 0.0 --bn on --unit-scaling 2 --ngf 32 --num-workers 1 --cross-validation-seed 278009619 --radius 3 --num-particles 2000 --device 0 --save-prefix=/Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/models/model -o /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/train_test_curve.txt

[sdaou@quad3 bin]$

If I want to run outside CryoSPARC, where do I run it? In what directory in the computer?

Would you please let me know how to modify the parameters?

Many thanks,

Salima

wtempel · November 26, 2025, 8:00pm

Create and change into an empty directory.

You will want to modify the --save-prefix= and -o parameters such that outputs are created inside the new, empty directory, for example:

cd /where/to/create/new/dir/ # replace with actual, suitable path
mkdir topaz_train_1
cd topaz_train_1/
/usr/bin/nohup /programs/x86_64-linux/topaz/0.2.5-2/bin/topaz train \
  --train-images /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/image_list_train.txt \
  --train-targets /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/topaz_particles_processed_train.txt \
  --test-images /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/image_list_test.txt \
  --test-targets /Stx2Data3/Salima/CS-mark3-crbn-ddb1-protac-glacios-october-2025/J220/topaz_particles_processed_test.txt \
  --num-particles 2000 --learning-rate 0.0002 --minibatch-size 128 \
  --num-epochs 10 --method GE-binomial --slack -1 --autoencoder 0 \
  --l2 0.0 --minibatch-balance 0.0625 --epoch-size 5000 --model resnet8 \
  --units 32 --dropout 0.0 --bn on --unit-scaling 2 --ngf 32 --num-workers 1 \
  --cross-validation-seed 278009619 --radius 3 --num-particles 2000 --device 0 \
  --save-prefix=model -o train_test_curve.txt &

When you run this job outside CryoSPARC, the CryoSPARC internal scheduler will not be “aware” that Topaz is using a GPU and may run a job on the same GPU. Depending on how “busy” the workstation and its GPUs are, you may want to also modify the
--device parameter. Instead of --device 0, you could specify, for example, --device 3, where the number confusingly may not match the GPU # from nvidia-smi, but instead the id from the command

/cryoDB/cryosparc_worker/bin/cryosparcw gpulist