"Python for cryo-EM" (continued discussion, breakout from subparticle defocus)

rbs_sci · August 7, 2024, 11:39pm

Continuing the discussion from Adjust per-particle defocus of subparticles in volume alignment tools?:

Yes, I know python’s a sticking point for sure, and I totally understand having been there myself! I think the idea with cs-tools is to let “power users” work with features the minute they think of them, but unfortunately the Venn diagram of advanced cryoEM practitioners and python programmers is not a circle!

On my list is a guide page along the lines of “Python for cryoEM”. If you have collected specific sticking points from trainees (i.e., beyond “the command line is frightening”, which I don’t in any way mean to downplay) I’d be very interested in hearing about them (although perhaps in their own forum topic )!

Cheers, @rwaldo!

Some manner of GUI integration would be ideal; I baulk at the idea of a browser interface for writing the code itself, which would be both a massive security risk and an epic disaster waiting to happen. But a UI where scripts could be run (with the usual drag-n-drop assignment for Project/Job to be worked on) would make access a lot easier.

Although cs-tools would probably have to be shipped directly with CryoSPARC to prevent version mismatching?

…

Otherwise, in a more general fashion, feedback from trainees used to focus on how complicated the RELION UI was, but since we experimented with a dual-suite training course for academics (both RELION and CryoSPARC, different datasets) I’ve heard several times that the CryoSPARC interface is overwhelming with the array of options.

Disabling “Advanced” mode hides most things, but as a result seriously limits options. It might be nice to have more granular choice (at a facility level) what is considered “Advanced” and what is considered “Basic” as there are many options in Advanced mode which rarely, if ever, need adjusting (or jobs fail if you do adjust them*). A management page where you can tickbox or radio button whether a function in each job should be considered “Basic” or “Advanced”, maybe?

*A good example here is the “GPU/CPU” option in Local Filtering - it says there are two options, but you should leave it at GPU… but running on CPU would be nice when the job runs out of GPU memory (as it appears to have no “low memory mode” like NU Refine/Local Refine)… but if you manually set CPU, it ignores it, runs it on the GPU anyway and crashes.

Traceback (most recent call last):
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 851, in _attempt_allocation
    return allocator()
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1054, in allocator
    return driver.cuMemAlloc(size)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 115, in cryosparc_master.cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/local_filter/run.py", line 243, in cryosparc_master.cryosparc_compute.jobs.local_filter.run.run_locfilter
  File "cryosparc_master/cryosparc_compute/jobs/local_filter/run.py", line 292, in cryosparc_master.cryosparc_compute.jobs.local_filter.run.standalone_locfilter
  File "cryosparc_master/cryosparc_compute/jobs/local_filter/run.py", line 333, in cryosparc_master.cryosparc_compute.jobs.local_filter.run.standalone_locfilter
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py", line 276, in zeros
    arr = empty(shape, dtype, stream=stream)
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py", line 270, in empty
    return device_array(shape, dtype, stream=stream)
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py", line 226, in device_array
    arr = GPUArray(shape=shape, strides=strides, dtype=dtype, stream=stream)
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py", line 21, in __init__
    super().__init__(shape, strides, dtype, stream, gpu_data)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devicearray.py", line 103, in __init__
    gpu_data = devices.get_context().memalloc(self.alloc_size)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1372, in memalloc
    return self.memory_manager.memalloc(bytesize)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1056, in memalloc
    ptr = self._attempt_allocation(allocator)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 863, in _attempt_allocation
    return allocator()
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1054, in allocator
    return driver.cuMemAlloc(size)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

I’ll post the last bit as a separate thread as well, here I just use it as an example.

edit: typo.

wtempel · August 14, 2024, 6:41pm

Thank you very much for your feedback and example use case.

olibclarke · August 15, 2024, 7:07pm

One thing that I would find handy for record keeping purposes would be to attach a copy of the script as executed to any External Results job it creates.

That way, if I am running many different variants of a script, I can keep track of exactly which bit of code created which external job - this is currently hard to do.

rwaldo · August 16, 2024, 5:32pm

Hi @olibclarke! Did you know that, for python scripts (not notebooks, unfortunately), sys.argv[0] stores the path to the executed python script? So you can add

import sys
with open(sys.argv[0]) as f:
    job.log("".join(f))

to the end of your script to put the whole thing in the log

olibclarke · August 16, 2024, 6:03pm

Hi @rwaldo,

Handy tip! But that assumes that the file remains unchanged at that path - whereas often I will be editing & rerunning the same script. For record keeping purposes I think it would be better to archive a copy of the actual script - it doesn’t take up a lot of space, & provides an immutable record of what was done to generate the external result

Oh wait - i just read that again - so this basically prints the whole script to the log, that is very useful, sorry I thought it just printed the path! Any way to print it with syntax coloring for readability? Does the log take markdown formatting?

Oli

rwaldo · August 16, 2024, 6:05pm

Sorry, I think I may not have been clear: the block above prints the entire script to the log! Here’s a screenshot from the cs log showing the first few lines:

olibclarke · August 16, 2024, 6:06pm

Yes got it sorry, realized while I was writing, thanks!!

olibclarke · November 28, 2024, 12:59am

I tried this but couldn’t get it to work - I put it in my script and the script ran fine, but the contents were not appended to the log.

I still think it would be useful for record keeping purposes if external jobs were to automatically document the contents of the script used to generate them, or at least automatically append the file name of the script - currently the default log file is pretty sparse.

rwaldo · December 2, 2024, 6:31pm

Curious! Looking back, not sure why I recommended sys.argv[0] instead of __file__. Could you try this minimal example and see if it works?

#!/usr/bin/env python

from cryosparc.tools import CryoSPARC
import json
from pathlib import Path

def main():

    with open(Path('~/instance-info.json').expanduser(), 'r') as f:
        instance_info = json.load(f)

    cs = CryoSPARC(**instance_info)
    assert cs.test_connection()

    project_uid = "P337"
    workspace_uid = "W15"

    project = cs.find_project(project_uid)

    ext_job = project.create_external_job(workspace_uid, title = "script_test")
    with ext_job.run():
        ext_job.log("Trying to log script now...")
        with open(__file__, "r") as script:
            ext_job.log("".join(script))

if __name__ == "__main__":
    main()

When I run this script (either with ./add_script_text.py or python add_script_text.py) it produces an external job with the following Event Log:

License is valid.

Trying to log script now...

#!/usr/bin/env python

from cryosparc.tools import CryoSPARC
import json
from pathlib import Path

def main():

    with open(Path('~/instance-info.json').expanduser(), 'r') as f:
        instance_info = json.load(f)

    cs = CryoSPARC(**instance_info)
    assert cs.test_connection()

    project_uid = "P337"
    workspace_uid = "W15"

    project = cs.find_project(project_uid)

    ext_job = project.create_external_job(workspace_uid, title = "script_test")
    with ext_job.run():
        ext_job.log("Trying to log script now...")
        with open(__file__, "r") as script:
            ext_job.log("".join(script))

if __name__ == "__main__":
    main()

If that works as expected, maybe try using a similar thing as my previous recommendation but with __file__ instead of sys.argv[0].

As for your request, unfortunately I’m not sure we could implement such a feature. There are many ways of interfacing with CryoSPARC Tools which don’t have an obvious means of logging themselves (most notably Jupyter Notebooks).

olibclarke · December 2, 2024, 6:55pm

That worked, thanks @rwaldo! Understood re difficulty of implementation.

Is there anyway to add syntax coloring to code blocks in the log file?

olibclarke · December 2, 2024, 7:24pm

slight modification to also print full path (so I can find the script later):

import os
#script contents
#goes here
full_path = os.path.abspath(__file__)
out_job.log(f"Job created using: {full_path}")
with out_job.run():
    out_job.log("Trying to log script now...")
    with open(__file__, "r") as script:
        out_job.log("".join(script))
out_job.stop()

rwaldo · December 2, 2024, 7:31pm

Unfortunately, no way to add syntax highlighting right now as the log does not support rich text. A highly motivated user might consider writing a browser extension to add styling to user-selected text, but I am not such a user .

As an aside, when a job is run within a context manager (i.e., when you use with job.run(): ), you should not need to stop() it yourself . Relevant docs here.

rwaldo · December 2, 2024, 7:41pm

One thing I considered when I was worried we wouldn’t be able to log the script was creating a copy of script in the job’s directory (example below). You could then open this script with your editor of choice to get syntax highlighting.

#!/usr/bin/env python

from cryosparc.tools import CryoSPARC
import json
from pathlib import Path
import shutil

def main():

    with open(Path('~/instance-info.json').expanduser(), 'r') as f:
        instance_info = json.load(f)

    cs = CryoSPARC(**instance_info)
    assert cs.test_connection()

    project_uid = "P337"
    workspace_uid = "W15"

    project = cs.find_project(project_uid)

    ext_job = project.create_external_job(workspace_uid, title = "script_test")
    with ext_job.run():
        job_dir = ext_job.dir()
        script_path = Path(__file__)
        copied_script_path = job_dir / script_path.name
        shutil.copy(script_path, copied_script_path)
        ext_job.log(f"Copied script to {copied_script_path}")

if __name__ == "__main__":
    main()

olibclarke · December 4, 2024, 4:39pm

On this general topic - there are some great scripts in your github Rich (GitHub - cryoem-uoft/cryosparc-examples: Example scripts, notebooks, and code snippets that are helpful for CryoSPARC users!), as well as lots of others you have posted on the forum - I wonder if it would be worth linking this from the dashboard and perhaps expanding it, to make an easily accessible & updated/curated repository of cryosparc tools scripts for users to reuse/modify? Perhaps with the option for user-submitted scripts also?

rwaldo · December 9, 2024, 5:05pm

Thanks for your kind words @olibclarke! We’ve noted your feature request for adding a link to the dashboard.

cbeck · December 9, 2024, 6:19pm

On the topic of streamlining cs-tools workflows to make them more accessible to use, would it be possible auto-generate a Jupyter notebook from the GUI that will load the metadata for a job? It would be really nice to have a one-click option (like for the job export) to automatically generate and launch a notebook with the following code:

project = cs.find_project("P251")
job = cs.find_job("P251", "J16")
particles = job.load_output("particles_selected")

It would also be helpful if the notebook came pre-loaded with the license information so that new users don’t have to hunt for it or copy-paste it from a previous notebook.

I currently keep my cs-tools scripts in the same directory as the cryoSPARC project in a sub-folder called cs-tools, so I wonder if it would also be appropriate to save these auto-generated notebooks in a similar folder.

Cheers,
cbeck

sdawood · December 16, 2024, 6:39pm

Thanks for the suggestion @cbeck , we’ve made note of it for a future release!

olibclarke · December 16, 2024, 6:44pm

In terms of making CS-tools more accessible - would it be at all possible (in the future) to make some kind of GUI template for running CS-tools scripts? I’m thinking something in the format of running a job, where one could provide values for key parameters and then run the script in the regular CS GUI. If possible, this would reduce the friction to reusing commonly used scripts significantly.

Alternatively, perhaps a set of vetted scripts could be provided as “custom” jobs, similar to blueprints, where parameters could be provided in the GUI? Perhaps with a repository of scripts, similar to the way plugins work for ChimeraX?

Cheers
Oli