Topaz 0.2.5, v3.2.0 - NotImplementedError: can't find current frequency file

heejongkim · July 6, 2021, 12:35am

Hi,

After upgrading to 3.2.0, I’m experiencing to execute the topaz extract job with the following reproducible error at the early stage of Preprocessing stage.

========= monitor process now starting main process
MAINPROCESS PID 42032
MAIN PID 42032
topaz.run_topaz cryosparc_compute.jobs.jobregister
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "cryosparc_worker/cryosparc_compute/run.py", line 168, in cryosparc_compute.run.run
  File "/home/cryosparcuser/cryosparc2_worker/cryosparc_compute/jobs/runcommon.py", line 1886, in get_instance_information
    cpufreq = psutil.cpu_freq()
  File "/home/cryosparcuser/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/psutil/__init__.py", line 1857, in cpu_freq
    ret = _psplatform.cpu_freq()
  File "/home/cryosparcuser/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/psutil/_pslinux.py", line 702, in cpu_freq
    "can't find current frequency file")
NotImplementedError: can't find current frequency file
slurmstepd: error: Step 45161.0 exceeded memory limit (8322468 > 8192000), being killed
slurmstepd: error: *** STEP 45161.0 ON gpu026 CANCELLED AT 2021-07-05T20:28:27 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: gpu026: task 0: Killed

If anyone has any suggestions to look into, it would be very much appreciated.

Thanks.

best,
hee jong kim

stephan · July 6, 2021, 2:06pm

Hi @heejongkim,

Can you report your OS/Kernel version: uname -a

heejongkim · July 6, 2021, 5:18pm

Hi,

It’s Centos 7and here’s the output of “uname -a” output.
Linux kingcobra 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

thanks.

stephan · August 26, 2021, 3:09pm

Hi @heejongkim,

Looking at this error message again, it looks like your cluster scheduler killed this job because it used too much RAM.
If you’re still having this error, edit the last line in the file cryosparc_master/cryosparc_compute/jobs/topaz/build_topaz.py:
job.set_resources_needed(cpu_count, gpu_count, 8000, False) to
job.set_resources_needed(cpu_count, gpu_count, 16000, False)
then run cryosparcm cli "refresh_job_types()", create a new Topaz Extract job, and run it.