Detected file change due to modification time

Hello, I am processing data after reading files into a scratch directory. I noticed that with some job types, including 2D classification, and Ab-initio reconstruction the cached particles are always deleted and read back into the scratch even if the particles are already read in. The flag “Detected file change due to modification time.” is posted and the cached particles are deleted. This means that if one job begins after another using the same particle set the set in use will be deleted and the job using it will fail. I’m currently running version 4.2.1 on my local workstation. this

Here is the output of the job run first.
[CPU: 196.8 MB Avail: 110.58 GB]
Importing job module for job type homo_abinit…

[CPU: 294.5 MB Avail: 109.87 GB]
Job ready to run

[CPU: 294.5 MB Avail: 109.87 GB]


[CPU: 297.7 MB Avail: 109.92 GB]
Using random seed for sgd of 777916736

[CPU: 297.7 MB Avail: 109.92 GB]
Loading a ParticleStack with 3014 items…

[CPU: 297.7 MB Avail: 110.35 GB]
SSD cache : cache successfully synced in_use

[CPU: 301.5 MB Avail: 109.65 GB]
SSD cache : cache successfully synced, found 1583331.42MB of files on SSD.

[CPU: 301.5 MB Avail: 109.65 GB]
SSD cache : cache successfully requested to check 222 files.

[CPU: 301.5 MB Avail: 109.65 GB]
Detected file change due to modification time.

[CPU: 301.5 MB Avail: 111.13 GB]
SSD cache : cache requires 111000.22MB more on the SSD for files to be downloaded.

[CPU: 301.5 MB Avail: 111.13 GB]
SSD cache : cache has enough available space.

[CPU: 301.5 MB Avail: 111.13 GB]
Transferring J76/subtracted_particles_B_99.mrcs (500 MB) (222/222)
Complete : 111000 MB (100.00%)
Total : 111000 MB
Current Speed : 88.16 MB/s
Average Speed : 98.86 MB/s
ETA : 0h 0m 0s

[CPU: 268.4 MB Avail: 110.48 GB]
SSD cache : complete, all requested files are available on SSD.

And the job run second which reads in the same particle dataset after deleting the same dataset immediately before.

[CPU: 196.8 MB Avail: 110.58 GB]
Importing job module for job type homo_abinit…

[CPU: 294.5 MB Avail: 109.87 GB]
Job ready to run

[CPU: 294.5 MB Avail: 109.87 GB]


[CPU: 297.7 MB Avail: 109.92 GB]
Using random seed for sgd of 777916736

[CPU: 297.7 MB Avail: 109.92 GB]
Loading a ParticleStack with 3014 items…

[CPU: 297.7 MB Avail: 110.35 GB]
SSD cache : cache successfully synced in_use

[CPU: 301.5 MB Avail: 109.65 GB]
SSD cache : cache successfully synced, found 1583331.42MB of files on SSD.

[CPU: 301.5 MB Avail: 109.65 GB]
SSD cache : cache successfully requested to check 222 files.

[CPU: 301.5 MB Avail: 109.65 GB]
Detected file change due to modification time.

[CPU: 301.5 MB Avail: 111.13 GB]
SSD cache : cache requires 111000.22MB more on the SSD for files to be downloaded.

[CPU: 301.5 MB Avail: 111.13 GB]
SSD cache : cache has enough available space.

[CPU: 301.5 MB Avail: 111.13 GB]
Transferring J76/subtracted_particles_B_99.mrcs (500 MB) (222/222)
Complete : 111000 MB (100.00%)
Total : 111000 MB
Current Speed : 88.16 MB/s
Average Speed : 98.86 MB/s
ETA : 0h 0m 0s

[CPU: 268.4 MB Avail: 110.48 GB]
SSD cache : complete, all requested files are available on SSD.

Is there any remedy for this issue?

Hi @awsteven, to clarify, are the original files are on a local hard disk or a remote network a drive?

Could you send me the workstation’s reported date time, the reported time from the internet and a partial directory listing of the original and cached files? You get this information from a terminal with these commands (making the noted substitutions):

date
date +%s
curl -s http://worldtimeapi.org/api/timezone/Etc/UTC
ls -l /path/to/original/project/J76 | head -n 20
ls -l /path/to/scratch/instance_hostname:port/projects/PX/J76 | head -n 20

Where

  • /path/to/original/project/J76 is the path to the job directory for the noted job which contains .mrcs files to cache (particle stacks may also be in a subfolder such as J76/extract)
  • /path/to/scratch/instance_hostname:port/projects/PX/J76 is the cached path on the SSD. /path/to/scratch is the cache folder configured with the CryoSPARC workstation hostname and port are your computer’s hostname and CryoSPARC’s base port number + 1. PX is project ID

These files are stored on an external NAS system so they are read in each time via wired connection.

$date
Wed Apr 5 18:31:53 PDT 2023

$date +%s
1680744719

$curl -s http://worldtimeapi.org/api/timezone/Etc/UTC
{“abbreviation”:“UTC”,“client_ip”:“164.XX.XX.XXX”,“datetime”:“2023-04-06T17:58:04.113695+00:00”,“day_of_week”:4,“day_of_year”:96,“dst”:false,“dst_from”:null,“dst_offset”:0,“dst_until”:null,“raw_offset”:0,“timezone”:“Etc/UTC”,“unixtime”:1680803884,“utc_datetime”:“2023-04-06T17:58:04.113695+00:00”,“utc_offset”:“+00:00”,“week_number”:14}(base) [awsteven@XXXXXXX ~]

$ls -l /mnt/KRABBY-11/20220810_XXX/CS-20220810-XXX/ J76 | head -n 20
total 113816696

$ls -l /scratch/cryosparc_cache/instance_XXXXXX:61001/projects/P7/J76/ | head -n 20
total 113718348

Changed characters in domain names and IP address for security reasons.

Best

@awsteven it looks like your workstation’s clock is about 16 hours behind. If the NAS clock is accurate, this is the likely culprit. I recommend you sync your workstation’s clock to the network time. How you do this depends on your specific Linux distro. For example, this is the documentation for Ubuntu. Once you do this, the particles may be re-cached once more but should stay on the SSD for future runs.