Error "Given data must a have a uid field"

Hi,

I’m trying to import some particles .mrcs file using “Import Result Group” job but get the error “Given data must a have a uid field”. Here are details.

I have several particles to import, and each particle is in a .mrcs file. I use python and numpy (np) package to create a .cs file with given fields. For example, the code is like
newcs = np.zeros(n, dtype=([ ('uid', '<u8'), ('blob/path', 'S200'), ...(other fields) ])).

Then, for each particle, I use a for loop to set values to corresponding newcs fields. For the ith particle, I set the “uid” to be i, that is: newcs[i]['uid'] = i.

After that, I make a .csg file and then import the .csg file in “Import Result Group” job. However, I get the error “Given data must a have a uid field”. I use python and numpy to check the .cs file, the ‘uid’ field is there, for example, cs[0]['uid'] returns 0. So I’m confused. Anyone knows why or how to solve this problem?

I also try another way. I read a .cs file exported from another job and copy the uid values to my created new .cs file. In this way, the “Import Results Group” job works well. I don’t know why this way works while the first way fails.Does the “uid” value have some requests or what?

Thanks

Hi @CirenSangzhu, older versions of CryoSPARC require that the each uid value is > 0, in your loop try setting newcs[i]['uid'] = i + 1

Alternatively, you can update to the latest CryoSPARC and create datasets with cryosparc-tools, where UIDs may be generated for you. Here’s an example of how to do that:
https://tools.cryosparc.com/examples/recenter-particles.html

1 Like

Thanks for you quick reply!

I’m using CryoSPARC v4.0.2 now. As you suggested, I tried setting newcs[i]['uid'] = i+1, and the Import Results Group job worked well, importing all particles into the workspace. Your information helps a lot! Thanks very much!

Further on, I have one more question.

I’m wondering do uid values matters to following image processing or 3D reconstruction? Because maybe I have to import many other particles as I need in the future, and probably still set the uid from 1 to n (the number of particles in these stacks). Are uid values actually used in any job, and how? Will any error occured in CryoSPARC downstream processing job if different particles have the same uid value? (I have a feeling that probably uid is not really used in CryoSPARC, but I’m not sure.)

Thanks!

@CirenSangzhu uids are used to uniquely identify exposures, particles, volumes, etc. when combining multiple dataset and calculating passthrough outputs. They’re also used to link different kinds of data together: e.g., particles have a location/micrograph_uid field to identify the exposure they were extracted from.

CryoSPARC ensures that all imported or generated data has unique uids. If two datasets have duplicate uid values, you will have errors and incorrect behaviour when connecting the two datasets to the same job.

To ensure your uids are unique, use the same function that CryoSPARC uses to generate them:

import numpy as np
...
newcs["uid"] =  np.random.default_rng().integers(low=0, high=2**64, size=len(newcs), dtype=np.uint64)
2 Likes

Thanks for your patient explanation! It’s very clear and I think I get it.

Also, thanks for providing the uid generating function. Can’t wait to create new uid! :grinning:

1 Like