Is there a way to restart a failed experiment ? I was running 2D classification on around 500,000 particles. The job was running for around 2hr 30min and then failed after 19th iteration complaining “MemoryError: cuMemHostAlloc failed: out of memory”.
Is there a way to restart the job from where it failed ?
Hi @sdhindwal, you cannot restart a failed experiment from where it failed. Can you please provide the number of 2d classes and box size you were working with?
Thanks for the reply. I asked for 200 classes, the dataset image size was 150 and there were around 800,000 particles.
It finished 19 iterations and then failed complaining about the memory. I guess I should have divided the particles into smaller subsets to come over memory problems.