Weirdness in exposure group utilities regex (bug?)

olibclarke · June 17, 2020, 3:07pm

Hi,

I am trying to split a leginon-collected data set into temporal segments (in this case 10 square chunks).

Leginon have filenames with a substring of the format:

0004Gr_00015Sq_v01_00002Hln_00002Enn

Where 00015Sq indicates the square number.

I figured that Gr_000. should separate into 10 square chunks, but this gives an error.

Gr_00. or Gr_00[0123456789] however, which I wouldn’t expect to match anything useful, both work, and gives the attached output in the log which I find puzzling. Am I missing something here about how regex operates in cryosparc? The tokens don’t seem to match any substring of the filename.

Cheers
Oli

stephan · June 17, 2020, 3:48pm

Hey @olibclarke,

For this job, we use re.search(<regular_expression>) on the basename of the full file path, then select the group number specified in the parameters.

re.search(reg_exp, os.path.basename(filepath)).group(group_index_number)

https://docs.python.org/2.7/library/re.html

Another way you can get 00015Sq to be the token is to use the “string split” method and specify “_” as the separator. Then you can specify ‘1’ for the group index number.

olibclarke · June 17, 2020, 3:51pm

Yes I understand that @stephan - but given the regex token that I entered, I think the tokens should be Gr_0001, Gr_0002, etc - not Gr_001. And the Gr_001 token shouldn’t match anything, but it does - it behaves as though it is actually Gr_0001. I’m probably being dim…

(I’m not trying to get 00015Sq as the token - that is why I am doing it this way, because I want one token every 10 squares. It is working, but I have to enter what seems to be the incorrect regex for it to work)