Kill/delete job that is not anymore in PBS/Slurm scheduler job list

jcducom · May 30, 2020, 11:33pm

I would like to delete/kill some jobs that had issues with scheduler. The jobs are not anymore in the scheduler list and therefore cannot killed/deleted them

I get

Command [‘qdel’ , '4406.ims ']'returned non-zero exit status 153 (as the job does not exist from the scheduler side)

Can I delete those jobs directly from mongodb? Can I delete all resources associated to a user as well?

Thanks you for any info

Best,

JC

stephan · June 1, 2020, 2:57pm

HI @jcducom,

This seems to be a bug- the kill command shouldn’t fail if the cluster command fails. We’ve recorded this issue and we’ll update you when it’s fixed. For the time being, you can manually “kill” the job by running the command:
cryosparcm cli "set_job_status('<project_uid>', '<job_uid>', 'killed')"

jcducom · June 13, 2020, 7:05pm

Thank you so much for your quick reply and sorry my late reply! It did fix the issue.
Thanks again
JC

kyestachowski · January 14, 2022, 9:29pm

@stephan I am having a very similar issue using on an LSF cluster. I tried to run the command you gave above and received the following error:

[stae8w@bmi-r740-05 bin]$ /usr/local/cryosparc/cryosparc_master/bin/cryosparcm cli “set_job_status(‘’, ‘’, ‘killed’)”
Traceback (most recent call last):
File “/usr/local/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/runpy.py”, line 193, in _run_module_as_main
“main”, mod_spec)
File “/usr/local/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/usr/local/cryosparc/cryosparc_master/cryosparc_compute/client.py”, line 89, in
print(eval(“cli.”+command))
File “”, line 1, in
File “/usr/local/cryosparc/cryosparc_master/cryosparc_compute/client.py”, line 62, in func
assert False, res[‘error’]
AssertionError: {‘code’: 500, ‘data’: None, ‘message’: “OtherError: argument of type ‘NoneType’ is not iterable”, ‘name’: ‘OtherError’}

Here is a screenshot of what happens when I try to kill the job.

jcducom · January 14, 2022, 10:29pm

It looks like you forgot to enter Project ID and Job ID in the command
cryosparcm cli "set_job_status('<project_uid>', '<job_uid>', 'killed')"

kyestachowski · January 18, 2022, 5:16pm

Yes, I saw that after posting and forgot to come back here. Total noob mistake!

Rajiv-Singh · August 30, 2023, 3:24pm

Hey @stephan Our Instance has 3 jobs always showing one Launched with 4 token and 2 other as queued and when I am using the command suggested giving following error

cryosparcm cli “set_job_status(‘P13’, ‘J3’, ‘killed’)”
Error for “set_job_status” with params (‘P13’, ‘J3’, ‘killed’):
ServerError: validation error: lock file for P13 at ‘/data/arbit/20221108_yyyy-ccc-xxx/CS-yyyy-xxx-sp26/’/cs.lock absent or otherwise inaccessible.
Traceback (most recent call last):
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py”, line 195, in wrapper
res = func(*args, **kwargs)
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py”, line 246, in wrapper
assert os.path.isfile(
AssertionError: validation error: lock file for P13 at ‘/data/arbit/20221108_yyyy-ccc-xxx/CS-yyyy-xxx-sp26/’/cs.lock absent or otherwise inaccessible.

This File was deleted by my lab member, Please advise to rectify

Also when I run following command ps -ax | grep cryosparc this has the output as follow:
ps -ax | grep cryosparc
8488 pts/26 S+ 0:00 grep --color=auto cryosparc
28237 ? Ss 0:00 python /opt/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /opt/cryosparc2/cryosparc2_master/supervisord.conf
28719 ? Sl 0:12 mongod --auth --dbpath /data/cryosparc2_database --port 38001 --oplogSize 64 --replSet meteor --nojournal --wiredTigerCacheSizeGB 4 --bind_ip_all
29098 ? Sl 8:38 python -c import cryosparc_command.command_core as serv; serv.start(port=38002)
29532 ? Sl 0:09 python -c import cryosparc_command.command_vis as serv; serv.start(port=38003)
29578 ? Sl 0:09 python -c import cryosparc_command.command_rtp as serv; serv.start(port=38005)
31150 ? Sl 0:08 /opt/cryosparc2/cryosparc2_master/cryosparc_app/api/nodejs/bin/node ./bundle/main.js

wtempel · August 30, 2023, 3:47pm

You may try, logged on to the CryoSPARC master host, using the Linux account that owns and runs the CryoSPARC software,
cryosparcm cli "take_over_project('P13')"
(details).
You may also want to ensure that all users with access to process directories understand the purpose of cs.lock files (guide).

Rajiv-Singh · August 30, 2023, 3:56pm

Hi
I tried as you suggested, however, the entire file was deleted and giving the following output

cryosparcm cli “take_over_project(‘P13’)”
Error for “take_over_project” with params (‘P13’,):
ServerError: [Errno 2] No such file or directory: ‘’/data/arbit/20221108_yyyy-ccc-xxx/CS-yyyy-xxx-sp26/‘.lock’
Traceback (most recent call last):
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py”, line 195, in wrapper
res = func(*args, **kwargs)
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/command_core/init.py”, line 8479, in take_over_project
write_lockfile(lockfile_path_abs)
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/command_core/init.py”, line 8517, in write_lockfile
with open(lockfile_path_abs, ‘w’) as lockfile:
FileNotFoundError: [Errno 2] No such file or directory: ‘/data/BradyT/20221108_MmGS-oxo-sP26/CS-mmgs-oxo-sp26/cs.lock’

Rajiv-Singh · August 30, 2023, 3:58pm

Also at present none of the GPU available for next job

When I run command ps -ax | grep cryosparc this is the output

ps -ax | grep cryosparc
16753 pts/26 S+ 0:00 grep --color=auto cryosparc
22176 ? Ss 0:00 python /opt/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /opt/cryosparc2/cryosparc2_master/supervisord.conf
22521 ? Sl 0:09 mongod --auth --dbpath /data/cryosparc2_database --port 38001 --oplogSize 64 --replSet meteor --nojournal --wiredTigerCacheSizeGB 4 --bind_ip_all
22934 ? Sl 4:49 python -c import cryosparc_command.command_core as serv; serv.start(port=38002)
23222 ? Sl 0:09 python -c import cryosparc_command.command_vis as serv; serv.start(port=38003)
23229 ? Sl 0:07 python -c import cryosparc_command.command_rtp as serv; serv.start(port=38005)
24591 ? Sl 0:14 /opt/cryosparc2/cryosparc2_master/cryosparc_app/api/nodejs/bin/node ./bundle/main.js

wtempel · August 30, 2023, 4:11pm

What is the output of the command

ls -l /data/BradyT/20221108_MmGS-oxo-sP26/CS-mmgs-oxo-sp26/

?

Rajiv-Singh · August 30, 2023, 4:12pm

ls: cannot access ‘/data/arbit/20221108_yyyy-ccc-xxx/CS-yyyy-xxx-sp26/’: No such file or directory

wtempel · August 30, 2023, 4:34pm

This indicates that the entire project directory has been moved or deleted. Please can you confirm that the directory has been deleted on purpose?

Rajiv-Singh · August 30, 2023, 4:38pm

The Project directory was deleted to cleanup the space and now we are having trouble. Please advise to troubleshoot.

wtempel · August 30, 2023, 4:40pm

Deletion of CryoSPARC project directories that are attached to a CryoSPARC instance disrupts CryoSPARC function (details).
Please see Killing or removing a project for a similar problem and its resolution.

Rajiv-Singh · August 30, 2023, 5:07pm

Hey I could able to delete those jobs!

Do I need to run this as well before exit the cli?
db.projects.update_one({‘uid’: ‘P2’}, {‘$set’: {‘deleted’: True}})

wtempel · August 30, 2023, 5:55pm

I would recommend running this command for projects whose directories have been deleted. Please ensure you are specifying the correct project uid for the command.

Rajiv-Singh · August 30, 2023, 6:42pm

Yeah, I checked and those jobs are now deleted.Thank you for your prompt support. Appreciate ihe same.