We just upgraded to cryoSPARC 4, and there appears to be a problem when running jobs on the GPU. For example, cryoSPARC does not run when attempting to run processes such as heterogeneous refinement, non-uniform refinement, etc. We tested CPU-based job, such as selecting 2Ds, which finishes successfully and uploads to the database.
In the event log on the web-server, the only information that is displayed is below:
[2022-10-16 21:41:58.05]. License is valid.
[2022-10-16 21:41:58.05]. Launching job on lane default target infinity.salk.edu ...
[2022-10-16 21:41:58.08]. Running job on master node hostname infinity.salk.edu
Subsequently, there is nothing else logged.
cryosparc_master/run/command_core.log indicates that the job has finished, tailed logfile pasted below:
2022-10-16 21:44:28,008 COMMAND.JOBS set_job_status INFO | Status changed for P28.J135 from waiting to running
2022-10-16 21:44:41,084 COMMAND.DATA dump_job_database INFO | Request to export P28 J135
2022-10-16 21:44:41,086 COMMAND.DATA dump_job_database INFO | Exporting job to /log-l/netapp/data5/zshan/cryoSPARC/19jan04c/P28/J135
2022-10-16 21:44:41,088 COMMAND.DATA dump_job_database INFO | Exporting all of job's images in the database to /log-l/netapp/data5/zshan/cryoSPARC/19jan04c/P28/J135/gridfs_data...
2022-10-16 21:44:41,152 COMMAND.DATA dump_job_database INFO | Writing 59 database images to /log-l/netapp/data5/zshan/cryoSPARC/19jan04c/P28/J135/gridfs_data/gridfsdata_0
2022-10-16 21:44:41,153 COMMAND.DATA dump_job_database INFO | Done. Exported 59 images in 0.06s
2022-10-16 21:44:41,153 COMMAND.DATA dump_job_database INFO | Exporting all job's streamlog events...
2022-10-16 21:44:41,156 COMMAND.DATA dump_job_database INFO | Done. Exported 1 files in 0.00s
2022-10-16 21:44:41,156 COMMAND.DATA dump_job_database INFO | Exporting job metafile...
2022-10-16 21:44:41,158 COMMAND.DATA dump_job_database INFO | Creating .csg file for particles_selected
2022-10-16 21:44:41,169 COMMAND.DATA dump_job_database INFO | Creating .csg file for templates_selected
2022-10-16 21:44:41,179 COMMAND.DATA dump_job_database INFO | Creating .csg file for particles_excluded
2022-10-16 21:44:41,188 COMMAND.DATA dump_job_database INFO | Creating .csg file for templates_excluded
2022-10-16 21:44:41,210 COMMAND.DATA dump_job_database INFO | Done. Exported in 0.05s
2022-10-16 21:44:41,210 COMMAND.DATA dump_job_database INFO | Updating job manifest...
2022-10-16 21:44:41,214 COMMAND.DATA dump_job_database INFO | Done. Updated in 0.00s
2022-10-16 21:44:41,214 COMMAND.DATA dump_job_database INFO | Exported P28 J135 in 0.13s
2022-10-16 21:44:41,231 COMMAND.JOBS set_job_status INFO | Status changed for P28.J135 from running to completed
cryosparc_master/run/command_vis.log returns an out-of-range index error, tailed logfile pasted below:
2022-10-16 18:35:11,733 VIS.MAIN recreate_mesh INFO | Loading mesh for P28 J133.volume.map_sharp
[2022-10-16 18:35:11,742] ERROR in app: Exception on /P28/J133.volume.map_sharp [GET]
Traceback (most recent call last):
File "/home/cryospuser/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/home/cryospuser/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/cryospuser/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/cryospuser/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/home/cryospuser/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/home/cryospuser/cryosparc2/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/cryospuser/cryosparc2/cryosparc_master/cryosparc_command/command_vis/__init__.py", line 142, in recreate_mesh
result = cli.get_job_result(project_uid, src_result) # only gives one version and metafiles
File "/home/cryospuser/cryosparc2/cryosparc_master/cryosparc_compute/client.py", line 66, in func
+ self._format_server_error(res['error'])
AssertionError: Encountered error for method "get_job_result" with params ('P28', 'J133.volume.map_sharp'):
ServerError: list index out of range
Traceback (most recent call last):
File "/home/cryospuser/cryosparc2/cryosparc_master/cryosparc_command/commandcommon.py", line 194, in wrapper
res = func(*args, **kwargs)
File "/home/cryospuser/cryosparc2/cryosparc_master/cryosparc_command/command_core/__init__.py", line 6316, in get_job_result
output_result['version'] = output_result['versions'][idx]
IndexError: list index out of range
The information about Cuda version is:
dlyumkis@infinity cryosparc_master] nvidia-smi
Sun Oct 16 21:49:14 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 32C P0 26W / 250W | 4MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Any help would be appreciated.
Dmitry