Streaming 2D Classification Error occurred (Input/output error)

Hi,

I’m running CryoSPARC v4.5.1, and I’m having this error come up when I do 2D Classification on a 4 GPU lane during a Live Session:

Traceback (most recent call last):
  File "/home/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2192, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 134, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 135, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1069, in cryosparc_master.cryosparc_compute.engine.engine.process.work
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 129, in cryosparc_master.cryosparc_compute.engine.engine.EngineThread.load_image_data_gpu
  File "/home/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/particles.py", line 34, in get_original_real_data
    data = self.blob.view()
  File "/home/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/blobio/mrc.py", line 145, in view
    return self.get()
  File "/home/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/blobio/mrc.py", line 140, in get
    _, data, total_time = prefetch.synchronous_native_read(self.fname, idx_start = self.page, idx_limit = self.page+1)
  File "cryosparc_master/cryosparc_compute/blobio/prefetch.py", line 70, in cryosparc_master.cryosparc_compute.blobio.prefetch.synchronous_native_read
OSError: 

IO request details:
Error ocurred (Input/output error) at line 716 in fread 

The file is probably corrupt. If this is a movie, try deleting it and re-importing the movie set. If this is a particle stack, try the 'check for corrupt particles' job (if corrupt particles are found, they will be excluded from the job's output).

filename:    /mnt/Scratch/cryosparc_cache/instance_10.250.102.76:45001/links/P80-J35-1715619504/e28d14c9aa3f081f25548c070bad8961c01c969f.mrc
filetype:    0
header_only: 0
idx_start:   340
idx_limit:   341
eer_upsampfactor: 2
eer_numfractions: 40
num_threads: 6
buffer:      0x7ff91856a6c0
buffer_sz:   230400
nx, ny, nz:  240 240 1
dtype:       2
total_time:  -1.000000
io_time:     0.000000

I’ve tried restarting the Live session and restarting cryosparcm, but I still get the error. When I restart the failed job outside of the Live session (while the Live session is still running), it goes further in the process, but still fails with the same error.
The only information cryosparcm log command_core provides is “Status changed for P80.J38 from running to failed”

Any ideas on how to troubleshoot this further?

Thanks

P.S. There is a typo in the error message

IO request details:
Error ocurred (Input/output error) at line 716 in fread

There is a “c” missing in “occurred”

Try clearing your CryoSPARC cache? The file it’s erroring on is on your scratch disk. You could also try a 2D classification and disable caching to see if the original file is corrupt or not.

I deleted the cache, and restarted the job. It redownloaded the files to the scratch disk, but I got the same error.

I suppose this means that one of the original files is corrupted. I can try to wait until the acquisition is finished to run a “Check For Corrupt Particles” job.

You can export the current particle stack and check that. Also, try not using the cache, if the cache disk is going bad, the original stack might be OK. I had a bad SATA cable in a system a few years ago which caused all sorts of corruption related issues on that disk; not saying it’s that, just that there are a lot of potential causes.

I looked for a corrupted file, but the job didn’t find any… Doesn’t seem to be a hardware problem either, this is a first time we’ve had this error.

The offline 2D Classification worked without errors once all the micrographs were collected, so I don’t know if this is a bug with the Streaming 2D Classification/CryoSPARC Live. I wouldn’t rule out a miss-configuration on one node on my part, but I can’t find anything wrong.

Anyway, I’ll follow-up if I encounter the error again, it might have been a random thing.