Value Error with Extract from Micrographs

Hi nfrasser,

I had a similar error when doing the Template Picker job:


What I did before this Job:

  1. Trash all the bad micrographs in the motioncorrected folder
  2. Subtract the gold beads from the *_aligned.mrc and *_aligned_doesweighted.mrc files
  3. Run Patch CTF job on these gold subtracted micrographs
  4. Do the Template Picker.

The weird thing is I tested the step 1 to 4 in a dataset of ~1.5k micrographs, it goes well. But for the dataset of ~3k micrographs, it has this error when doing the step 4. What can I do next?

This again looks like one or some of the micrographs ended up corrupted, but this time it’s the full dose-weighed micrograph (some *_aligned_doseweighted.mrc file).

Can you again try navigating to the motioncorrected directory and run ls -lh to see if any micrographs have file size 0, or differing significantly from the others? You would then have to re-create these files to get the template picker to work (e.g., re-run the Motion Correction job)

Can I just trash these size 0 files, as I don’t know how to re-create those files?

Unfortunately right now the Template Picker isn’t set up to skip missing files, so the whole job will fail if any exposure file is missing. You may be able to take them out if you first run the exposures through a “Manually Curate Exposures” job and reject the exposures with missing previews. Try that out and let me know how that goes, if it doesn’t work we’ll try to find an alternate solution.

Hi nfrasser
One naive question: how to quickly find the individual exposure without preview? I’ve created a Manually Curate Exposures.

FYI. I’ve tried to manually remove all the zero size .mrc files, then do Template Picker, yes, doesn’t work.

@CleoShen unfortunately there’s no quick way to pick out exposures without preview from the Curation job. You have to manually click through the list in interactive mode and see if the previews load, then manually accept or reject those. Alternatively, pick out the zero-size files from your file system and search for those in the interactive list.

I tired the way you said, but as I have about 16K micrographs in total and there is no quick way to locate the zero-size files in the interactive list instead of open each micrograph one by one. This solution is so time consuming. If I have other choices, like delete all the zero-size .mrc files and correlated .npy files, then do CTF again?

Okay, I’ve put together a script you can run to modify the results of the motion correction job and filter-out zero-sized files, here’s how to run it:

  1. Paste the following code into a text editor and make the noted subsitutions:
    project_uid = '<PROJECT UID HERE>'
    datafile = '<OUTPUT DATA FILEPATH HERE>'
    passthrough_datafile = '<PASSTHROUGH OUTPUT DATAFILE HERE>'
    import os
    from shutil import copyfile
    from cryosparc_compute.dataset import Dataset
    project_dir = cli.get_project_dir_abs(project_uid)
    copyfile(datafile, f'{datafile}.bak')
    d1 = Dataset().from_file(datafile)
    keep_uids = set()
    for uid, path in zip(d1.data['uid'], d1.data['micrograph_blob/path']):
        full_path = os.path.join(project_dir, path)
        if os.path.exists(full_path) and os.stat(full_path).st_size > 0:
            keep_uids.add(uid)
    print(f'Found {len(keep_uids)}/{len(d1)} exposures to keep ({len(d1) - len(keep_uids)} empty files)')
    newd1 = d1.subset_query(lambda item: item['uid'] in keep_uids)
    newd1.to_file(datafile)
    copyfile(passthrough_datafile, f'{passthrough_datafile}.bak')
    d2 = Dataset().from_file(passthrough_datafile)
    newd2 = d2.subset_query(lambda item: item['uid'] in keep_uids)
    newd2.to_file(passthrough_datafile)
    print('Done filtering.')
    
    • Replace <PROJECT UID HERE> with the ID of the project you are in, e.g., P3
    • Replace <OUTPUT DATA FILEPATH HERE> with the path to the micrograph_blob output of your motion correction job. Copy this value from the job’s “Outputs” tab, as indicated:
    • Similarly replace <PASSTHROUGH OUTPUT DATAFILE HERE> with the path from one of the outputs marked as “passthrough”, e.g., the mscope_params output
  2. Enter cryoSPARC’s interactive CLI with this command:
    cryosparcm icli
    
  3. Paste and enter the modified code from your text editor into the prompt
  4. Re-run the Patch CTF job
  5. Re-run the Template Picker job

Let me know if you have any trouble with that.

3 Likes

Hi nfrasser,
Sorry for late reply. I’ve tried your method, but doesn’t work, shown as belwo


Hmm, maybe some of micrographs aren’t zero sized? Can you check the the motioncorrected folder again and see if there are any files with a very small size? e.g., less than one 100 bytes?

If you find any try re-running the same script but replace this line:

    if os.path.exists(full_path) and os.stat(full_path).st_size > 0:

with this:

    if os.path.exists(full_path) and os.stat(full_path).st_size > 100:

Oh, I might make a mistake. If I should trash all the 0 size files before running the script you shared with me? I left the 0 size files for the last error try shown above.

Hi nfrasser,


I’m wondering if this error happens because of the code you sent me?

Thanks,

This wouldn’t be related, the code I sent does not affect the contents of the database. This error can happen if cryoSPARC didn’t restart properly and there are some artefacts left over from the previous processes. Can you send me the output of the following commands?

cryosparcm restart
cryosparcm log database | tail -n 200

I’d also suggest restarting the machine on which cryoSPARC is running.

Hi nfrasser,

Thank you for your reply, here pasted the log below. BTW, I’ve tired to restart my iMac and terminal, but can’t restart the computer cluster, FYI.


It looks like there’s something else running at the network port that cryoSPARC should be running on and this is preventing the database from starting up.

Can you confirm what your base port is? You should see it inside cryosparc_master/config.sh - look for the line that starts with export CRYOSPARC_BASE_PORT= - the proceeding number will be the base port.

Then, only if the base port is 39006, follow the instructions I posted here: Regarding installation of version 3.2

But instead of starting the lsof steps at 39000, start at 39006. e.g.,:

lsof -i :39006
lsof -i :39007
lsof -i :39008
lsof -i :39009
lsof -i :39010
lsof -i :39011
lsof -i :39012
lsof -i :39013

You also don’t have to run any of the commands that begin with cd and you should use cryosparcm instead wherever you see ./bin/cryosparcm.

After you’ve killed the processes bound to any port, run those lsof commands to ensure the processes are not running.

Then start cryoSPARC with cryosparcm start

If the base port is config.sh something other than 39006, let me know here what it is and I’ll follow up with additional instructions.

Hi nfrasser,

My export CRYOSPARC_BASE_PORT is 39006, and I used 16711 as the local port to connect to cryosparc. I’ve tried all ports below, but nothing is runing :sweat_smile: any other solution I can try?

Plus:
cryosparcm status----------------------------------------------------------------------------

CryoSPARC System master node installed at

/home/groups/brunger/software/cryosparc/cryosparc_master

Current cryoSPARC version: v3.2.0


CryoSPARC is not running.


global config variables:

export CRYOSPARC_LICENSE_ID="**************"

export CRYOSPARC_MASTER_HOSTNAME=“sh02-04n16.int”

export CRYOSPARC_DB_PATH="/*/software/cryosparc/cryosparc_database"

export CRYOSPARC_BASE_PORT=39006

export CRYOSPARC_DEVELOP=false

export CRYOSPARC_INSECURE=false

export CRYOSPARC_CLICK_WRAP=true

#export CRYOSPARC_FORCE_HOSTNAME=true

Plus:



Hi nfrasser,

The database error fixed, let’s come back to the Value Error :sweat_smile: