Hi,
I am trying to split a leginon-collected data set into temporal segments (in this case 10 square chunks).
Leginon have filenames with a substring of the format:
0004Gr_00015Sq_v01_00002Hln_00002Enn
Where 00015Sq
indicates the square number.
I figured that Gr_000.
should separate into 10 square chunks, but this gives an error.
Gr_00.
or Gr_00[0123456789]
however, which I wouldn’t expect to match anything useful, both work, and gives the attached output in the log which I find puzzling. Am I missing something here about how regex operates in cryosparc? The tokens don’t seem to match any substring of the filename.
Cheers
Oli
Hey @olibclarke,
For this job, we use re.search(<regular_expression>)
on the basename of the full file path, then select the group number specified in the parameters.
re.search(reg_exp, os.path.basename(filepath)).group(group_index_number)
https://docs.python.org/2.7/library/re.html
Another way you can get 00015Sq
to be the token is to use the “string split” method and specify “_
” as the separator. Then you can specify ‘1’ for the group index number.
Yes I understand that @stephan - but given the regex token that I entered, I think the tokens should be Gr_0001
, Gr_0002
, etc - not Gr_001
. And the Gr_001
token shouldn’t match anything, but it does - it behaves as though it is actually Gr_0001
. I’m probably being dim…
(I’m not trying to get 00015Sq
as the token - that is why I am doing it this way, because I want one token every 10 squares. It is working, but I have to enter what seems to be the incorrect regex for it to work)