Help with importing CS-live exposures

Hi everyone,

I’m trying to import exposures from a CS-live session into a new cryoSPARC instance. After transferring the “all_exposures” output from CS-Live to my project directory, I get the following error when I run the “Import Result Group” job:

License is valid.

Launching job on lane default target localhost ...

Running job on remote worker node hostname localhost
[CPU:  212.8 MB  Avail: 119.62 GB]
Job J660 Started

[CPU:  212.8 MB  Avail: 119.62 GB]
Master running v4.4.1, worker running v4.4.1

[CPU:  213.1 MB  Avail: 119.62 GB]
Working in directory: /data/user/experiments/CS-project/J660

[CPU:  213.1 MB  Avail: 119.62 GB]
Running on lane default

[CPU:  213.1 MB  Avail: 119.62 GB]
Resources allocated: 

[CPU:  213.1 MB  Avail: 119.62 GB]
  Worker:  localhost

[CPU:  213.1 MB  Avail: 119.62 GB]
  CPU   :  [0]

[CPU:  213.1 MB  Avail: 119.62 GB]
  GPU   :  []

[CPU:  213.1 MB  Avail: 119.62 GB]
  RAM   :  [0, 1, 2, 3]

[CPU:  213.1 MB  Avail: 119.62 GB]
  SSD   :  False

[CPU:  213.1 MB  Avail: 119.62 GB]
--------------------------------------------------------------

[CPU:  213.1 MB  Avail: 119.62 GB]
Importing job module for job type import_result_group...

[CPU:  296.2 MB  Avail: 119.55 GB]
Job ready to run

[CPU:  296.2 MB  Avail: 119.55 GB]
***************************************************************

[CPU:  296.2 MB  Avail: 119.55 GB]
Importing result group from /data/user/experiments/CS-project/exports/groups/2024.04.26_live-export-exposures/J12_all_exposures_exported.csg

[CPU:  366.0 MB  Avail: 119.57 GB]
Unable to find /data/user/experiments/CS-project/ (from J12_all_exposures_exported.cs > gain_ref_blob/path)

[CPU:  366.1 MB  Avail: 119.57 GB]
Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 95, in cryosparc_master.cryosparc_compute.run.main
  File "/home/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/jobs/imports/run.py", line 1340, in run_import_result_group
    assert not missing_paths, (
AssertionError: Unable to find some files referred to in dataset J12_all_exposures_exported.cs, field gain_ref_blob/path. Affected files are listed above.

I’m a little confused by what’s going wrong. It looks like when the job is trying to prepare the “gain_ref_blob” output group, it’s failing to find the path to my project directory in the .cs file, which doesn’t seem to make sense. I’ve confirmed that the project directory path from the error message is correct, and I’ve also confirmed that the .cs file correctly specifies “S1/import_movies/GainReference.gain” as the the path under “gain_ref_blob/path” for each exposure. Lastly, I’ve confirmed that the S1/import_movies directory contains the gain reference. I think I must be making some silly mistake, but I can’t figure out where I’m going wrong, and I’d greatly appreciate any advice anyone can offer.

Best regards,
cbeck

@cbeck Please can you describe how you

  1. exported the result group, including details of the source job type
  2. dealt with symbolic link dereferencing, if needed
  3. transferred the the result group between the instances, including details like commands and command options

Thank you so much for your response! Please find the requested information below.

  1. exported the result group, including details of the source job type

In the CS-live workspace, I used the Live Exposure Exports job to export the exposures. After the job finished, I went into the Outputs tab and clicked “Export” in the left panel for the “All Exposures” output. This created a new directory in exports/groups called /J12_all_exposures.

  1. transferred the the result group between the instances, including details like commands and command options

I used “rsync -aP” to transfer all of the raw data to an external drive on my personal workstation (EPU session containing the movies and .xml files, and the CS-live directories ctfestimated and motioncorrected). I also used rsync to transfer the J12_all_exposures directory directly into my cryoSPARC project’s export/groups directory.

  1. dealt with symbolic link dereferencing, if needed

After transferring the J12_all_exposures directory, I found that the symbolic links to the raw data (ctfestimated, motioncorrected, import_movies) had broken, so I removed the broken links and repopulated the directories with new symlinks using the following commands:

  • find /path-to-raw-data/ctfestimated -type f -name '*' -exec ln -s {} /path-to-export-directory/S1/ctfestimated \;
  • find /path-to-raw-data/motioncorrected -type f -name '*' -exec ln -s {} /path-to-export-directory/S1/motioncorrected \;
  • find /path-to-raw-data/EPU-session -type f -name '*.eer' -exec ln -s {} /path-to-export-directory/S1/import_movies \;
  • ln -s /path-to-raw-data/EPU-session/20240425_101132_EER_GainReference.gain ./path-to-export-directory/S1/import_movies
  • I confirmed that each directory (ctfestimated, motioncorrected, import_movies) had the expected number of working symlinks

Please let me know if you need any other information.

Edit: In the last section, I added “S1” to the paths that I specified in the four commands

After conducting several tests, I discovered that exporting only a subset of 10 exposures from the CS-live session using the Exposure Sets Tool allowed me to import them successfully into my project on a separate cryoSPARC instance following the outlined protocol.

However, attempting the same procedure with all 31,300 exposures resulted in encountering the same error once again.

This discrepancy prompted me to investigate why a subset of exposures could be imported successfully while the entire dataset could not. I speculated that the issue with exporting the entire exposure set might stem from the exposures being divided between two exposure groups. Conversely, when selecting a subset of 10 exposures, they were automatically drawn from the first exposure group. However, even when I randomized this subset so that they came from both exposure groups, I still managed to import it successfully, leaving me just as confused as before.

On an unrelated note, I noticed that in the Live Exposure Export job, there are outputs for All Exposures, All Exposures in Exposure Group 1, and All Exposures in Exposure Group 2. However, I’m only able to export the All Exposures* output. When I attempt to export either Exposure Group 1 or Exposure Group 2, I get the warning:

Output result group ‘all_exposures_group_1’ is empty. Skipping…

I hope this information is helpful. I’ll continue to try and troubleshoot what’s going on.

1 Like

Fixed! Upon carefully inspecting the .cs file, I noticed that out of 31,300 exposures, 10 of the exposures did not have a path for the gain reference. It was helpful to first convert the .cs file to a pandas dataframe and use the .unique() operator on the values in the gain_ref_blob/path column to get all of the unique values for the gain reference path. After editing the .cs file via cryosparc-tools and adding the gain reference path to each of these 10 exposures, I could finally import the entire result group into my cryoSPARC project.

The exposures that were missing the gain reference path were the first 10 exposures from the second exposure group in cryoSPARC live. I think what happened is that when I configured the second exposure group in CS-live, I resumed the live session before linking the gain reference, which is why it was missing from only the first few exposures.

@wtempel, is there a simpler method for exporting exposures to a different cryoSPARC instance? If so, would it be possible to add a tutorial page explaining how to do so? While it isn’t too difficult to just remake the symbolic links to the raw data, it took me a while to figure out which files needed to be linked and how they needed to be organized in directories.