I have some suggestions regarding some small aspects of the cryoSPARC system:
Periodically output meta files for long-running tasks.
These include, but not limited to:
a) “exposures.bson” for live sessions (not too important, because they are in the database too)
b) “particles.cs”, “exposures.cs”, etc., for jobs dealing with movies (patch, local, ref-mo, etc). This is important. If everything else takes time, please at least implement this.
For example, with the local motion job, it seems to me that the meta info of the extracted particles is held in the memory of the worker only. If the job dies (due to server restart, out of storage, out of memory, etc.), we won’t be able to use the already extracted particles.( The number of extracted particles is often different from the input stack due to rejection, therefore, trying to relink with the input particle.cs is troublesome.)
These files can also be used when “mark job as completed”. I realize that it might be too much work for many job types to implement “Continue from a certain point”. But the ability to rescue what has already been finished would be quite nice.
(Or, maybe I just haven’t found where that info is saved during the running? )
Local log on the worker computer. Jobs sometimes die due to unexpected data. Therefore I suggest that the jobs save some sort of text log locally on the worker computer. Maybe a “cryosparc_run” dir in /tmp? The reason to save on the worker computer is that it avoids losing the info due to network or master problems. A “last millisecond” message can be saved this way just before the job fails. This should help troubleshooting.
(Not quite related and not a request, just a wild thought: can we keep the job running on worker during master restart? My impression is that sometimes jobs die or get marked failed simply because the heartbeats are missed. )
An extra option added to cryoSPARC live setup: “Stop Session if no exposure is discovered for ____ minutes.” The behaviuor would be: if no new exposure is discovered after this set length of time, then the session will stop when all current exposures are processed.
An extra option on workspaces that are also live sessions: “Remove temporary display images.” This is for removing all display images generated during the live session. It is very common that a live session will generate GBs of preview images and save them all in the database. If the live session is never deleted, these few GB of images will remain with the database. Deleting these files won’t affect processing, it only makes the live interface slightly uglier.
Scan database for zombie files. This will search fs.files for files belonging to deleted projects or deleted jobs. Somehow they do arise over time.
Suppress some unnecessary information. For example the tiff format error message in job.log of jobs dealing with eer files.
Do not generate a pdf version of the plots in 3Dvar job. The pdf version of the particle distribution plot in 3D var job is very large. It seems that they may contain up to millions of individual dots. No pdf viewer can view them. They can quite effectively increase the size of the database, though.
Another four suggestions:
make “job history” easier to access, and remove its effects on subsequent workspace display.
Can we have “job history” as an icon on the left panel?
It seems that the job history view is now achieved by the combination of the table view and a filter/sorter. The problem is that after viewing the job history, the filter and the display setting are not deactivated. It takes the user a few clicks to get back to the normal workspace view.
Make jobs easier to locate in project view. Currently in V4 interface, it seems that the only way to directly locate a job is to figure out the syntax of the URL and type in the address bar. There is not an intuitive way to know in which workspaces a specific job can be found. I think if there is a button in the project view that leads to the equivalent of the following view, it might solve the problem:
Several extra columns in the table view. Can we also have columns showing the workspaces, parents, and children of each job in the table view?
More than 2 categories can be selected in “select 2D” job.
For example, a line of radio buttons “group 1, group 2, group 3…group 10” can be inserted on top of the 2D class image panel. Then when each of them is activated, clicking on the 2D class average assigns that 2D class to that group. This way from one 2D class job we can have more than just “accepted” and “excluded” classes.
It would be even nicer if we can do the selection with the keyboard only. This can be achieved by having some keyboard shortcuts such as 1,2,3,4, for assigning the groups, and arrow keys for moving a highlighted focus among the classes.
Currently in order to divide classes into multiple groups, one needs to chain several “select 2D” jobs.
eer upscaling options in local motion correction (or ref-mo) job
With the eer format movies, it is possible that the users may start processing in non-super-res mode, then find the needs of processing in superres.
If in local/ref-based motion correction jobs (movie–>particle jobs), the users can specify an up-scaling factor of 2 or 4 and a fourier-crop-to size, then the users can directly continue from the previous non-superres patch motion job, to get superres particles with a pixel size that is suitable for their target resolution.
This may also help those who want to avoid directly dealing with surperres micrographs, which are at least 4x the non-superres size.
Thank you for these requests. We will aim to process these internally, and distribute them to our relevant teams for triaging.
I have a few questions about the large PDF plots in 3D Variability:
- Are these very large plots (scatter plots, I assume) produced in the 3D Variability Display job, or only in 3D Variability?
- Does this occur in
- How many variability components, and how many particles, are connected to a job that produces a very large file?
These files are found in the 3D_var jobs, not the display jobs. They are the pdf version of the “Iteration xx components”.
With 400k particles the corresponding PDF version will be nearly 24MB each. Multiplied by the 20 rounds, they will take nearly 500MB of database space with each job. I think the only factor that determines how big the files are is the total number of input particles.
When downloaded and opened in Acrobat, with a slower computer, one can see the dots slowly accumulate in the plots, which is interesting to watch, but not too useful, because the pdf program will eventually die.
Define “slower computer”? If a 24MB PDF is that slow, something is wrong. It certainly shouldn’t make Acrobat fall over on any system with >8GB RAM.
I’ve got a 1.78GB PDF output from a RELION polishing job which opens with no problems on any system I’ve tried it on, from the Linux system it was generated on to a scrappy little ultraportable (although the laptop has had a RAM upgrade).
Alternatively, try opening it in something which isn’t Adobe Acrobat.
Thanks @ZhijieLi for these thorough suggestions.
I would like to follow-up on some specific items:
Re: Local log on the worker computer. Do you happen to have a concrete example from an instance where the project directory was local to the worker (standalone with local project directory or where worker was the nfs server) that demonstrate the potential utility of this feature?
Re: Make jobs easier to locate in project view. In the Projects or Home view of the GUI, you can begin a typing a spotlight query like
P200 J and should see a flat scroll down list of the project’s jobs. One way to access a “flat” view of a project’s jobs is to scroll down a project’s right-hand side Details panel
and, in the Statistics
section, push “→View” next to Total Jobs
. In this context, you may find this discussion
of “flat” navigation in CryoSPARC interesting.
Re: Several extra columns in the table view.
Similarly, if in table view you click the left-column check box of a job, details like workspaces, parents, children are shown on the right-hand side panel’s Details
Blockquote Re: Local log on the worker computer. Do you happen to have a concrete example from an instance where the project directory was local to the worker (standalone with local project directory or where worker was the nfs server) that demonstrate the potential utility of this feature?
Sorry, I do not have an example where the project dir is local to the worker. What I meant was that the running job might save some log files on the worker’s local file system, at
worker3:/tmp/cs_logs for example. This is independent of where the project files are.
This could be helpful if the running job can’t send error messages back to the master or save the job.log in the job dir. Now considering this might only be useful when the worker loses network connection, it is not too valuable to have. Please disregard this suggestion.
Thanks for the tips regarding the interface!