Our instance running on our workstation locally with cryoSPARC version: v4.3.0+230816.
I have run the same job previously and it was running smoothly. This time its not progressing and showing same status now as well.
Local storage only on same workstation. No of Particle for this job 77496 after Ab initio reconstruction. I have run last week with 216000 particle of same data set it was working.
However again running 2D class with SSD cache on not progressing as well. I guess there is some issue in taking particles on cache. I can send you the log files. Seems none of the job using ssd cache not progressing.
========= monitor process now starting main process at 2023-10-10 12:51:25.250774
MAINPROCESS PID 26534
========= monitor process now waiting for main process
MAIN PID 26534
class2D.run cryosparc_compute.jobs.jobregister
========= sending heartbeat at 2023-10-10 12:51:40.497384
========= sending heartbeat at 2023-10-10 12:51:50.517348
========= sending heartbeat at 2023-10-10 12:52:00.644494
========= sending heartbeat at 2023-10-10 12:52:10.854909
========= sending heartbeat at 2023-10-10 12:52:20.909381
========= sending heartbeat at 2023-10-10 12:52:30.929378
========= sending heartbeat at 2023-10-10 12:52:41.788302
Then there are several of these
again in last part as follow
========= sending heartbeat at 2023-10-10 16:14:56.075678
========= sending heartbeat at 2023-10-10 16:15:06.095005
========= sending heartbeat at 2023-10-10 16:15:16.116277
========= sending heartbeat at 2023-10-10 16:15:26.133141
========= sending heartbeat at 2023-10-10 16:15:36.150743
========= sending heartbeat at 2023-10-10 16:15:46.167556
========= sending heartbeat at 2023-10-10 16:15:56.184891
========= sending heartbeat at 2023-10-10 16:16:06.203710
========= sending heartbeat at 2023-10-10 16:16:16.221898
========= sending heartbeat at 2023-10-10 16:16:26.239514
========= sending heartbeat at 2023-10-10 16:16:36.258213
========= sending heartbeat at 2023-10-10 16:16:46.276202
========= sending heartbeat at 2023-10-10 16:16:56.293377
========= sending heartbeat at 2023-10-10 16:17:06.313209
========= sending heartbeat at 2023-10-10 16:17:16.331930
========= sending heartbeat at 2023-10-10 16:17:26.349241
========= sending heartbeat at 2023-10-10 16:17:36.366142
I just had the same problem as well (stalled on "SSD cache : cache successfully synced in_use”).
I then recognised that somehow my scratch disc was not responding anymore (however, I could see it with “df” could “cd” into it but not even “ls” worked).
So I just rebooted the system and the scratch disc was accessible again and also cSPA works again properly.
Check your logs for why it wasn’t readable - normally if a drive has issues it will be remounted read-only automatically; if even that failed I’d be concerned of a failing or corrupt disk. That said, SSDs tend to fail hard and without warning, so it might be something else…
@rbs_sci
thanks for the advice. Using
sudo journalctl -k | grep -i name_of_ssd_disk
or
sudo grep -i name_of_ssd_disk /var/log/messages
I could not find any error (but especially in “/var/log/messages” there are only entries visable after the reboot).
Is there any other way to " Check logs for why it wasn’t readablecheck"?
Appreciate your help
/var/log/syslog might have something, in the past a failing drive for me has always spewed into dmesg, so if I’m diagnosing a system I will set up a cron job to dump dmesg to a text file every 24 hours.
Might also be worth checking for SATA errors (controller related) (dmesg again) as I had a bad SATA cable once (came with the motherboard…!) where the disk was detected but writing to it would make the SATA controller drop off the PCI-E bus. Turned out it was the cable.
Out of curiosity, what model is the SSD? I have a vague memory of a buggy SSD firmware years ago OCZ or Corsair IIRC?) where the drives wouldn’t come out of sleep properly. For the record, I don’t think this issue is that, as this was pretty early in the days of SSDs being available, but I mention for completeness.
edit: Also check SMART, with smartmon or similar (some UEFI BIOSes have a SMART check) or Gnome Disks, see if anything is suspicious there…?