Did anyone ever face a similar problem where the hard drive to where cryoSPARC is writing job info and outputs suddenly turns into a ‘read-only file system’ and then (obviously) the job fails?
I have never seen such a situation reported in the forum, but it is already the second time that we are facing this problem with two of our workstations.
When we first saw this happening to our other workstation, we thought that it was a hard drive problem, we sent it back to service and they couldn’t detect any problem with the hard drive. In fact, we have it running for several weeks without cryoSPARC started and nothing really wrong happens. From the moment we turn cryoSPARC on and run something it is a matter of a few hours until it happens. In our most recent workstation, this happened during Motion Correction job (after having processed more than 1500 micrographs; all default values) and is reproducible any time that I restarted or reconfigured a new Motion Correction job. For the other workstation, I saw it happening during different 3D refinement jobs.
Can anyone offer a clue? =)
Thank you,
André
Regarding the Motion Correction job on the recent workstation.
Traceback:
Took me longer, since yesterday the workstation were busy running other software.
There are about 260 lines that are written in dmesg since something unusual starts to happen until cryoSPARC gives the job as ‘Failed’.
Some of the last lines of dmesg -T are posted below. Since you only asked for the last 20, I didn’t dare to share all the 260 lines, but they might be useful. I wait for your feedback. Thank you!
That looks exactly like some sort of hardware issue, ‘sda’ is a hard disk that suddenly disappeared from the linux system, and as such linux forces it into read-only mode to protect your data. Is this an external drive of some sort?
Yes, without any doubt ‘sda’ is the name of the hard drive to where cryoSPARC is suppose to write and that gets disconnected and makes the job fail. It is an internal hard drive (Seagate IronWolf 10Tb 3.5" NAS HDD). On the other system internal HDD of 8Tb, I don’t know right now brand and model.
Do you think it is a matter of defective hard drive? I think it is strange because not only being newly installed HDD, but it only happens whyle using cryoSPARC.
Those were the first things that we suspected and they were checked several times. Cables have been replaced and it is all the same… the only thing that was not really changed was the hard drive. One of the workstations even went back to service and they decided not to change the drive because they didn’t detect any sign of being broken. Nevertheless, we have already ordered a new hard drive to see if that would solve the problem for that workstation. Still I think it is very strange.
What happens when you try to read that file (/mnt/storagesda/cryosparc_projects/P1/J1/imported/FoilHole_3621484...) manually on the system? Are you able to read it properly?
Also, what version of cryoSPARC are you running? What OS?
How many other SATA drives do you have connected to the motherboard? Was this a custom built computer, or a prebuilt from an assembler?
Hi @stephan! Thank you for checking out this topic
Both workstation are custom built, so we don’t have the ‘privilege’ to have a warranty of a replacement unit that would work for sure. They are anyway pretty different workstations. One is already 2 or 3 years old, and the other was just assembled 2 weeks ago.
The newest workstation is running the latest version of cryoSPARC (v 2.15) and the oldest machine is running version 2.13., but if I remember correctly it was upgraded more than once since the trouble started (months ago).
Both of the workstations are running Ubuntu 16.04 LTS.
The newest machine has:
two M.2 PCIe Gen 3.0 x 4, NVMe 1.3: one for OS and where cryoSPARC master and worker nodes are installed; the other drive for scratch
two 10Tb HDD SATA 6Gb/s: One of the HDD is the one that disconnects (cryoSPARC outputs are being written there); the other one continues fine, but there is nothing being written on it so far.
I am tempted to redirect cryosparc_projects directory to the other hardrive and see if the behaviour is similar… so far I did not try that, but it is good to clean out some options.
The other workstation I believe it only has one SSD and one HDD.
The screenshots and output that I have posted here are from the newest machine. There I was also trying to work with a dataset that never ran on the other machine, so somehow doesn’t seem to be project/data related.
By manually read, do you mean to display the tif file with an image viewer?.. If yes, I can report an interesting fact: for any movie (.tif file) I can only see multipage image file with all black pages (using Ubuntu’s integrated ‘image viewer’, ‘document viewer’ or ‘tiffgt’), but cryoSPARC web interface shows correctly some examples of the imported micrographs of J1 without any problem.
I hope that some of this info helps! We are a bit desperate, eheh
Hi, just an update from our side after we have figured out what the problems were, after struggling with this for months and being relutant to change the hard drive.
Newest workstation: for our most recent machine we thought that likely the problem would not be from the new hard drive, so after nudging around I have realised that (although almost unmentioned by the manufacturer) the motherboard half the SATA ports are controlled by the MB main chipset and the others controlled by an ASMedia controller. Apparently this controller does not handle very well constant data transfer in massive ways.
The older workstation got fixed by changing the hard drive.