Local refinement (new) error

olibclarke · January 29, 2021, 3:38am

That was indeed the issue. Setting masking to dynamic, it now runs correctly. There was nothing particularly exotic about the input mask though, so I am a little puzzled as to what was causing the problem.

Cheers
Oli

olibclarke · January 29, 2021, 3:44am

Perhaps this is it - the mask I was using was a binary mask, generated with relion_mask_create and imported, because in the previous version I knew it was using dynamic masking. Perhaps if a hard edged binary mask is provided as a static mask, that causes the problem?

Cheers
Oli

mmclean · January 29, 2021, 3:49am

Hi @olibclarke,
.
Ah yes – you’re right, static masking always expects a soft mask and I believe this has to be in floating point format, rather than unsigned int or otherwise. If you want to use static masking, what you can do is take the input binary mask that previously would give an error, pass it through a Volume Tools job, and re-do the process of thresholding/padding (e.g. you could just set the threshold to 1 and set the dilate parameter to 10 for a 10 pixel soft edge). This should re-generate a new mask in the same way that dynamic masking does, and then that mask should work with the new local refinement job. Glad that it is working now, please let us know if you run into any other errors!

Best,
Michael

olibclarke · January 29, 2021, 12:41pm

Thanks @mmclean, it is running, but it seems to be extremely slow - my jobs have been sitting at the “local cross validation” step for the last 8hrs. Granted this is a large (512px) volume, but I don’t think it should take this long… any suggestions to diagnose? Output of htop and nvida-smi attached.

apunjani · January 29, 2021, 4:50pm

Hi @olibclarke have you run non-uniform refinement (new) on same/similar data? Did you see any issue there? It’s the same code that is running in local refinement now

olibclarke · January 29, 2021, 4:52pm

Yes, I have run the same data on Nu-refine (new) - it was slow, but not this slow

olibclarke · January 29, 2021, 8:28pm

The FSC seems to go bananas after the first iteration (at least with a 40Å initial lowpass). This is with the new local refinement, with NU-refine and marginalization on.

Is it possible this is caused by using a binary mask to initialize dynamic masking? Perhaps it is not using the correct mask? Has anything like this been seen during testing?

mmclean · January 29, 2021, 8:43pm

Hmm, certainly strange. How many particles did you use, and how big would you estimate the masked area to be? Did the FSCs for the run with initial lowpass of 8 Å look any different? Also, is this with dynamic masking on? With using dynamic masking and starting from a very low resolution (40 Å), the mask can change quite a lot from iteration to iteration which is generally undesirable.

In our testing, we’ve found results with static masking to work better, at least in terms of avoiding overfitting, which is why the default mask parameter was set to “static” —perhaps trying the job with a static mask could alleviate some of these issues? As well, we’ve also seen better results starting from resolutions in the range 8 - 20 or so, so we decided to increase the default to 12 Å

Edit: Yeah, perhaps the dynamic mask could cause issues here, especially if starting from such a low resolution (then, inside the masked area, the density might vary slowly over space, meaning there’s basically nothing to align to except the mask itself – this is part of reason why for the other refinements, the dynamic mask start resolution is set as 12 Å by default).

Edit 2: Actually yes – I think the error is with initializing the dynamic mask with a binary mask. The initial mask is used for the tight mask FSC computation and so the tight FSC curve is very high because the volume is being sharply cut off, meaning basically all of the high res info is just going to be ringing in fourier space from the density being clipped by the binary mask (which is the same for both half-maps, explaining the high correlation). For best results, either static masking should be used, or dynamic masking, starting from a relatively high res of 12 Å (and in either case, soft masks are recommended).

Best,
Michael

olibclarke · January 29, 2021, 8:45pm

Hi Michael,

It looks like this by iteration 1 whether the lowpass is 40 or 8 - I have used the same dataset with the original local refinement procedure and have not seen anything like this using the same mask and same parameters. The dataset contains 360k particles.

I am trying now with a static mask. Yes, I generally use initial lowpass resolutions in that range also.

Cheers
Oli

olibclarke · January 30, 2021, 12:03am

Hi Michael,

Replacing with a static (soft) mask does the trick, thanks! I would note though that this binary mask worked just fine in the previous version of local refinement, starting from either initial lowpass value, whereas in the new version it works with neither - I think something must have changed in the dynamic masking parameters. I still see this behaviour when starting with an initial lowpass of 8 Å, so it is not all derived from filtering too aggressively to begin with.

Cheers
Oli

apunjani · February 1, 2021, 4:11pm

Hi @olibclarke,
Did you continue to find that the local refinement jobs are running very slowly?

olibclarke · February 1, 2021, 4:23pm

No - after I restarted the system they got considerably faster - there must have been some other process consuming resources. All good thanks Ali!

Oli

sk.huang · April 8, 2023, 2:47pm

Hello, I ran into the same problem with the local refinement job (empty alignment map and FSC filtered half map at iteration 0).

The input particles are from a 3D classification job using the same mask. The local refinement runs normally when the target resolution of this upstream 3D classification job was set to 15 A, but breaks when the target resolution was set to 12 A, 10 A, and 5 A.

I used a mask from the Volume Tools job with both dilation and padding set to 15, and have tried both static and dynamic masking in the local refine parameters. I’ve tried setting the initial lowpass resolution of local refine to 8 A and 4 A, and using initial volumes with different resolutions.

Another observation is that the job uses a filter radius of 2.1 A (Nyquist) almost immediately in iteration 0, which led to a non-sense straight line for the FSC curve.

Any ideas why the target resolution of the upstream job where the input particles come from can affect the behavior of the local refine job?

Thanks!
Kate

olibclarke · April 8, 2023, 4:43pm

Hi Kate - Hmmm that is weird - the one other thing that I can think of that might cause this are the scale factors - ran into this recently where some scale factors refined to nan values and I saw the same thing - and scale factors are refined by default during 3D classification. Maybe try resetting input scale factors at the start of local refinement (there is an option for this) and see if that helps?

Cheers
Oli

sk.huang · April 9, 2023, 8:33pm

Hi Oli,

Resetting input scale factors solved the problem! Many thanks!

Kate

olibclarke · April 9, 2023, 9:33pm

Ah very good, glad to hear it!

vperetroukhin · April 12, 2023, 3:25pm

Glad this solved the problem! @olibclarke could you elaborate on how you noticed the nan scales?

olibclarke · April 12, 2023, 3:29pm

They were listed in the log… unfortunately I have deleted the relevant job (cleared and rerun) but I will take note if I see it again!

I wonder if scales refine to nan values if they should be reset to 1, maybe this would improve stability?

vperetroukhin · April 12, 2023, 3:36pm

Ok great, thanks! Just to confirm, was the mean in the scale histogram showing as nan?

We’ll definitely (at least) output a warning and set scales to 1 as you suggest.

olibclarke · April 12, 2023, 3:43pm

I think so, but I can’t recall exactly - I just remember noting nan values listed in the log - will keep an eye out and let you know if I see it again! in our case the issues was scales being refined during local refinement, so maybe not surprising that they went a bit bananas, I wouldn’t expect the same thing to happen during 3D classification, but seems like it can…