Cufft exception in 3DVA job log

Hi,

I found many lines of “exception in cufft.Plan.del:” in a log file of a running 3DVA job. It also has one line of “exception in force_free_cufft_plan: ‘NoneType’ object has no attribute ‘handle’”. I’d like to know what kind of problem is going on here.

No other warning or error like line in the log file (I don’t attach it here because it has 87890 lines now. Most of the lines(83339 lines) are “exception in cufft.Plan.del:”.).

I can’t check the progress of the job because I’m facing another problem that the webapp stopped updating (refreshing). The job seems to keep running because it using cpu and gpu resources.

Here is the cryosparcm status output

----------------------------------------------------------------------------
CryoSPARC System master node installed at
/data2/cryosparcuser/cryosparc_master
Current cryoSPARC version: v3.2.0+211012
----------------------------------------------------------------------------

CryoSPARC process status:

app                              RUNNING   pid 24563, uptime 18:02:46
app_dev                          STOPPED   Not started
command_core                     RUNNING   pid 24447, uptime 18:02:55
command_rtp                      RUNNING   pid 24507, uptime 18:02:51
command_vis                      RUNNING   pid 24477, uptime 18:02:52
database                         RUNNING   pid 24329, uptime 18:02:57
liveapp                          RUNNING   pid 24585, uptime 18:02:45
liveapp_dev                      STOPPED   Not started
webapp                           RUNNING   pid 24546, uptime 18:02:47
webapp_dev                       STOPPED   Not started

Thanks,
Kotaro

We get the same repeating error in a log file, but during patch motion correction. The job also failed after processing 1/3 of the movies and i can’t figure out why.

OS is Centos 7.5.1804 and error message pasted below.

Alan

[CPU: 955.8 MB] Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 84, in cryosparc_compute.run.main
File “cryosparc_worker/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 402, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 322657 has terminated unexpectedly!

Update: the job completes fine when dropping to 1 GPU instead of 2. Still get lots of “exception in cufft.Plan.del :” in the log.