Kill/delete job that is not anymore in PBS/Slurm scheduler job list

I would like to delete/kill some jobs that had issues with scheduler. The jobs are not anymore in the scheduler list and therefore cannot killed/deleted them

I get

Command [‘qdel’ , '4406.ims ']'returned non-zero exit status 153 (as the job does not exist from the scheduler side)

Can I delete those jobs directly from mongodb? Can I delete all resources associated to a user as well?

Thanks you for any info

Best,

JC

HI @jcducom,

This seems to be a bug- the kill command shouldn’t fail if the cluster command fails. We’ve recorded this issue and we’ll update you when it’s fixed. For the time being, you can manually “kill” the job by running the command:
cryosparcm cli "set_job_status('<project_uid>', '<job_uid>', 'killed')"

1 Like

Thank you so much for your quick reply and sorry my late reply! It did fix the issue.
Thanks again
JC

@stephan I am having a very similar issue using on an LSF cluster. I tried to run the command you gave above and received the following error:

[stae8w@bmi-r740-05 bin]$ /usr/local/cryosparc/cryosparc_master/bin/cryosparcm cli “set_job_status(’’, ‘’, ‘killed’)”
Traceback (most recent call last):
File “/usr/local/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “/usr/local/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/usr/local/cryosparc/cryosparc_master/cryosparc_compute/client.py”, line 89, in
print(eval(“cli.”+command))
File “”, line 1, in
File “/usr/local/cryosparc/cryosparc_master/cryosparc_compute/client.py”, line 62, in func
assert False, res[‘error’]
AssertionError: {‘code’: 500, ‘data’: None, ‘message’: “OtherError: argument of type ‘NoneType’ is not iterable”, ‘name’: ‘OtherError’}

Here is a screenshot of what happens when I try to kill the job.

It looks like you forgot to enter Project ID and Job ID in the command
cryosparcm cli "set_job_status('<project_uid>', '<job_uid>', 'killed')"

Yes, I saw that after posting and forgot to come back here. Total noob mistake!

Hey @stephan Our Instance has 3 jobs always showing one Launched with 4 token and 2 other as queued and when I am using the command suggested giving following error

cryosparcm cli “set_job_status(‘P13’, ‘J3’, ‘killed’)”
Error for “set_job_status” with params (‘P13’, ‘J3’, ‘killed’):
ServerError: validation error: lock file for P13 at ‘/data/arbit/20221108_yyyy-ccc-xxx/CS-yyyy-xxx-sp26/’/cs.lock absent or otherwise inaccessible.
Traceback (most recent call last):
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py”, line 195, in wrapper
res = func(*args, **kwargs)
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py”, line 246, in wrapper
assert os.path.isfile(
AssertionError: validation error: lock file for P13 at ‘/data/arbit/20221108_yyyy-ccc-xxx/CS-yyyy-xxx-sp26/’/cs.lock absent or otherwise inaccessible.

This File was deleted by my lab member, Please advise to rectify

Also when I run following command ps -ax | grep cryosparc this has the output as follow:
ps -ax | grep cryosparc
8488 pts/26 S+ 0:00 grep --color=auto cryosparc
28237 ? Ss 0:00 python /opt/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /opt/cryosparc2/cryosparc2_master/supervisord.conf
28719 ? Sl 0:12 mongod --auth --dbpath /data/cryosparc2_database --port 38001 --oplogSize 64 --replSet meteor --nojournal --wiredTigerCacheSizeGB 4 --bind_ip_all
29098 ? Sl 8:38 python -c import cryosparc_command.command_core as serv; serv.start(port=38002)
29532 ? Sl 0:09 python -c import cryosparc_command.command_vis as serv; serv.start(port=38003)
29578 ? Sl 0:09 python -c import cryosparc_command.command_rtp as serv; serv.start(port=38005)
31150 ? Sl 0:08 /opt/cryosparc2/cryosparc2_master/cryosparc_app/api/nodejs/bin/node ./bundle/main.js

You may try, logged on to the CryoSPARC master host, using the Linux account that owns and runs the CryoSPARC software,
cryosparcm cli "take_over_project('P13')"
(details).
You may also want to ensure that all users with access to process directories understand the purpose of cs.lock files (guide).

Hi
I tried as you suggested, however, the entire file was deleted and giving the following output

cryosparcm cli “take_over_project(‘P13’)”
Error for “take_over_project” with params (‘P13’,):
ServerError: [Errno 2] No such file or directory: ‘’/data/arbit/20221108_yyyy-ccc-xxx/CS-yyyy-xxx-sp26/‘.lock’
Traceback (most recent call last):
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py”, line 195, in wrapper
res = func(*args, **kwargs)
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/command_core/init.py”, line 8479, in take_over_project
write_lockfile(lockfile_path_abs)
File “/opt/cryosparc2/cryosparc2_master/cryosparc_command/command_core/init.py”, line 8517, in write_lockfile
with open(lockfile_path_abs, ‘w’) as lockfile:
FileNotFoundError: [Errno 2] No such file or directory: ‘/data/BradyT/20221108_MmGS-oxo-sP26/CS-mmgs-oxo-sp26/cs.lock’

Also at present none of the GPU available for next job

When I run command ps -ax | grep cryosparc this is the output

ps -ax | grep cryosparc
16753 pts/26 S+ 0:00 grep --color=auto cryosparc
22176 ? Ss 0:00 python /opt/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /opt/cryosparc2/cryosparc2_master/supervisord.conf
22521 ? Sl 0:09 mongod --auth --dbpath /data/cryosparc2_database --port 38001 --oplogSize 64 --replSet meteor --nojournal --wiredTigerCacheSizeGB 4 --bind_ip_all
22934 ? Sl 4:49 python -c import cryosparc_command.command_core as serv; serv.start(port=38002)
23222 ? Sl 0:09 python -c import cryosparc_command.command_vis as serv; serv.start(port=38003)
23229 ? Sl 0:07 python -c import cryosparc_command.command_rtp as serv; serv.start(port=38005)
24591 ? Sl 0:14 /opt/cryosparc2/cryosparc2_master/cryosparc_app/api/nodejs/bin/node ./bundle/main.js

What is the output of the command

ls -l /data/BradyT/20221108_MmGS-oxo-sP26/CS-mmgs-oxo-sp26/

?

ls: cannot access ‘/data/arbit/20221108_yyyy-ccc-xxx/CS-yyyy-xxx-sp26/’: No such file or directory

This indicates that the entire project directory has been moved or deleted. Please can you confirm that the directory has been deleted on purpose?

The Project directory was deleted to cleanup the space and now we are having trouble. Please advise to troubleshoot.

Deletion of CryoSPARC project directories that are attached to a CryoSPARC instance disrupts CryoSPARC function (details).
Please see Killing or removing a project for a similar problem and its resolution.

Hey I could able to delete those jobs!

Do I need to run this as well before exit the cli?
db.projects.update_one({‘uid’: ‘P2’}, {‘$set’: {‘deleted’: True}})

I would recommend running this command for projects whose directories have been deleted. Please ensure you are specifying the correct project uid for the command.

Yeah, I checked and those jobs are now deleted.Thank you for your prompt support. Appreciate ihe same.