Reducing RAM usage of restacking

KiSchnelle · January 2, 2024, 4:44pm

I really like the new restacking job but with the current implementation it takes a lot of RAM when creating bigger stacks with multiple threads. We generally like to have few big stacks, cause our filesystem handles it better.

The current implementation will create in each thread an array for the respective stack, which leads to huge RAM (well not huge seen in cluster perspective but enough to let it fail with cryosparc calculations ~50 GB for 14 threads 100k stacks) usage when using a lot of threads with for example 100k stacks. Of course i could just increase the ram multiplicator but i would rather not:)

I tried around a little bit and would maybe suggest moving the threading to reading the data for one stack, rather then creating multiple stack files at once.
For example i changed the code quick and dirty to this:

total_batches = len(batches)

def load_particle(idx, particle, data):
        data[idx] = particle.rawdata.view()

for batch_idx, batch in enumerate(batches):
        tic = time.time()
        N_D = len(batch)

        filename = f"batch_{batch_idx}_restacked.mrc"
        outpath_rel = os.path.join(outdir_rel, filename)

        data = n.ndarray((N_D, N, N), n.float32)
        pool = ThreadPool(params['num_threads'])

        for idx, particle in enumerate(batch):
            pool.apply_async(load_particle, args=(idx, particle, data))

        for idx, particle in enumerate(batch):
            assert particles_out['uid'][particle.idx] == particle.get('uid'), "Output particle UID doesn't match input!"
            particles_out['blob/path'][particle.idx] = outpath_rel
            particles_out['blob/idx'][particle.idx] = idx
            particles_out['blob/shape'][particle.idx] = n.array([N,N])

        pool.close()
        pool.join()

        mrc.write_mrc(os.path.join(outdir_abs, filename), data, psize=psize, output_f16=params.get("output_f16",False))

        rc.log(f"Done batch {batch_idx + 1} of {total_batches} in {time.time()-tic:.2f}s")

Its cost a tiny bit of speed in my tests, so maybe a combination of both would be more preferable but it cost a lot less RAM which i would definitly prefer over the job running 15 instead of 10 minutes.

Maybe it could also be done with just streaming the writing process or writing out after a certain amount so the ram usage stays constant. Couldnt find the uncompiled code for that parts ;/

cheers
Kilian

wtempel · January 2, 2024, 5:06pm

Did you consider a job-specific increase via cluster custom variables?

KiSchnelle · January 2, 2024, 5:35pm

Yes but different users will use different sizes and and thread counts depending on how big the stack is and how patient they are. And I would prefer if not every user is just setting up some amount that they then reserve in the cluster but at the end not really need or take not enough and end up at my door with “my job terminated abnormally do you know why”

I mean this is definitely not an urgent problem we have enough resources but I think it would maybe better not to have the RAM scale like this in the job:) or increase the calculation by number of threads and stack size or smth

Cheers
Kilian

nwong · January 3, 2024, 8:21pm

Hi @KiSchnelle,

Thanks for bringing this to our attention. We’ve noted it down as a future improvement.