Problem with K3 data gain reference

Hello,

I have a K3 dataset (TIFF files), which was not gain corrected during acquisition. The provided gain reference file is also in TIFF format. I understand that CryoSPARC requires the gain reference in MRC format. I have converted the TIFF file to MRC in Bsoft using the following command

“bimg -datatype b gain.tiff gain.mrc”

This produces an unsigned 8-bit MRC file. When I run “full frame motion correction” I get the following error:

numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc ‘multiply’ output from dtype(‘float64’) to dtype(‘uint8’) with casting rule ‘same_kind’

If I import the same movies omitting the gain reference file, the job runs fine.

Any suggestions how to convert the gain reference so that CryoSPARC can read it? CryoSPARC ver 4.1.1

Hi @Juha,

Unless there’s a specific reason for using full-frame motion correction, I’d recommend trying with patch motion correction. That being said, this still looks like it could be a bug. Could you paste the full error message that you’re seeing? A screenshot of the stream log might be best.

Thanks,
Harris

Hi @Harris,

Here’s a screenshot of the error message.

Patch motion correction gives the same error.

Thanks!
Juha

With a K3, I’d generate a gain reference fresh from the data anyway. Use RELION or cisTEM to do so. I’ve often had to make multiple gain references across a dataset if it was collected over more than 12 or so hours, because the K3 can demonstrate pretty severe gain drift.

Using CDS mode, we see no significant gain drift on our K3. We also usually let GMS gain correct, as it has given better results than applying binned gain files from the GMS-reference directory or RELION/cisTEM derived gain files during some – admittedly not super-well controlled – tests. The disadvantage is, of course, slightly larger movies.

1 Like

Hi @Juha,

Thanks for posting the full error message. I’m not familiar with bsoft or the bimg command, but does -datatype b mean “byte” as in uint8? If so, that could be the problem. You could verify this by taking a look at the mrc file header, for example by running xxd -e -g 4 /path/to/gain-reference.mrc | head and taking a look at the first line it outputs. If the last batch of numbers is 00000000 then the file is in uint8 format.

In any event, this is a bug in the sense that CryoSPARC should certainly provide a more helpful error message when this happens, but if the gain reference is indeed uint8 then that’s definitely not going to work well in any event. Try re-running your bimg command and choosing float32 as your output datatype.

–Harris

Hi @hsnyder,

I verified that the gain ref in MRC format I had created in Bsoft is uint8. The command printed:
00000000: 00002d00 00001ff8 00000001 00000000

If I convert the TIFF to MRC with option “-datatype float” I get float32 file. The command prints now:
00000000: 00002d00 00001ff8 00000001 00000002

Is this what you would expect for float32? If I try to import movies with this float32 gain ref, I get the error:

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 89.9 MiB for an array with shape (8184, 11520) and data type uint8

This error lead me earlier to think that cryoSPARC expects the gain reference as uint8 MRC. Do you have a suggestion how to convert the TIFF format K3 gain reference to a compatible MRC format? Thanks!

In the meanwhile I have also tried making a new gain reference by averaging 1500 movies in cisTEM (using the command “sum_all_tif_files”) as suggested by @rbs_sci. This gain reference is float32. When I try to import the movies with this gain reference, I get the same error as above when trying to import the float32 gain reference converted with Bsoft:

“numpy.core._exceptions._ArrayMemoryError: Unable to allocate 89.9 MiB for an array with shape (8184, 11520) and data type uint8”

I can import these movies fine, if I don’t provide any gain reference. Obviously this is not the solution, but it isolates the problem to the gain ref import. Many thanks for your help

Hi @Juha,

Yes that xxd output looks like what I would expect to see for float32, but the error you’re getting now seems quite puzzling. What are the specifications of the machine you’re running cryoSPARC on? If you know them you can just list them, but otherwise you can run lscpu && free -g && uname -a && nvidia-smi, which will output details about the cpu, memory, linux version, and GPUs.

Another thought: is this a fresh install, or was it an upgrade from an earlier version? It also might be worth checking that your python and numpy versions got installed correctly… from the command line you can run /path/to/cryosparc_worker/bin/cryosparcw ipython which should print out the python version and start an interactive python prompt. Run import numpy; numpy.__version__ which should print the numpy version and then you can exit with control-d.

–Harris

Hi @hsnyder,

lscpu && free -g && uname -a && nvidia-smi

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 19
On-line CPU(s) list: 0-18
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 19
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
Stepping: 7
CPU MHz: 2099.998
BogoMIPS: 4199.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0-18
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke avx512_vnni
total used free shared buff/cache available
Mem: 176 96 56 1 23 74
Swap: 0 0 0
Linux puhti-login12.bullx 4.18.0-372.26.1.el8_6.x86_64 #1 SMP Sat Aug 27 02:44:20 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
-bash: nvidia-smi: command not found

I am running the command on the front node of a HPC cluster. On this cluster, GPUs (4 x v100 on each node) are available on GPU nodes only.

This is a fresh install.

Python 3.7.12
IPython 7.33.0
Numpy 1.19.5

Thanks
Juha

Hi @Juha,

Interesting… that Python version is not what I expected to see for v4.1.1. It’s possible that something went wrong during the installation process.

I’d recommend that you try the following:

  • Shut down cryosparc (cryosparcm stop)
  • Reinstall v4.1.1 and dependencies (cryosparcm update --override)
  • Confirm version numbers again: cryosparcm call python --version and cd cryosparc_worker && ./bin/cryosparcw call python --version

Please let me know how that goes.

–Harris

I may be misunderstanding, but I read this as you are converting the cisTEM output (float32) to uint8 with bsoft? Just use the float32 gain.mrc, I always have done and it works fine for me (either counting, CDS or super res mode)…

For mismatched Python versions, made sure conda is deactivated. That gets cryoSPARC really confused.

Hi @rbs_sci

What I meant was that I get the same error with the float32 gain ref made in cisTEM (sum_all_tifs) and a float32 gain ref made in Bsoft (conversion from TIF to MRC). In both cases, the error is

“numpy.core._exceptions._ArrayMemoryError: Unable to allocate 89.9 MiB for an array with shape (8184, 11520) and data type uint8”

This issue with gain ref at import movies step is the only thing that fails with our installation, everything else seems ok. We are looking into the Python version mismatch.

Ah, I understand now. Thanks for clarifying!

Hi @hsnyder

Sorry, the information I gave above was for different installation than what gave the error. I have now checked the actual installation (the one giving the error). There python versions of both master and worker are 3.8.15. This was a fresh 4.0.1 install which was upgraded to 4.1.1.

Hi @Juha,

You mentioned that this is on a cluster system? Since this data is K3 super-resolution, which is quite large, I’m thinking that the job is actually running out of memory. Not exhausting the compute node’s RAM, but reaching the limit imposed by SLURM.

It’s possible to adjust cryosparc to ask for more memory from SLURM, but it’s not super straightforward.The easiest way to do this is as follows:

  1. Modify the file cryosparc_master/cryosparc_compute/jobs/imports/build.py, replacing line 251 with:

job.set_resources_needed(params['compute_num_cpus'], 0, 32768, False)

  1. Make the same change in cryosparc_worker/cryosparc_compute/jobs/imports/build.py

  2. Run cryosparcm cli "refresh_job_types()"

  3. Run the import job again

Let me know how that goes. If that fixes the issue, we will consider this a bug and change the default memory allocation.

– Harris

Hi @hsnyder,

Yes, our installation is on a HPC system. Thanks for the instructions, but are these applicapable in the case of Import Movies job? (After fixing the gain reference datatype (to float32), the error occurs already at the Import Movies step.) As this step runs on the front node of the HPC cluster (not on a compute node via SLURM), and memory allocation for interactive jobs is limited on the front node, it is possible that it runs out of memory. If there was a way to send Import Movies job to a compute node via SLURM, then this might solve the problem.

Hi @Juha,

I believe CRYOSPARC_DISABLE_IMPORT_ON_MASTER is what you’re looking for.

3 Likes

Hi @stephan,

Thanks, this solved the problem. In summary, import of superresolution movies, together with gain reference, failed on the HPC cluster front node.

Adding “export CRYOSPARC_DISABLE_IMPORT_ON_MASTER=true” in the config.sh file solved the problem.

Thank you all for the help.

2 Likes