Quick Start¶
This guide will help you get started with srunx quickly.
Basic Job Submission¶
Submit a simple Python script:
Submit with specific resources:
Submit with conda environment:
Job Management¶
Check a job's current status (active queue only):
Check historical status (finished jobs too, via the srunx state DB):
List active jobs (all users by default, like native squeue):
Cancel a job:
Workflow Example¶
Create a workflow YAML file (workflow.yaml):
name: ml_pipeline
jobs:
- name: preprocess
command: ["python", "preprocess.py"]
resources:
nodes: 1
- name: train
command: ["python", "train.py"]
depends_on: [preprocess]
resources:
gpus_per_node: 1
memory_per_node: "32GB"
time_limit: "4:00:00"
environment:
conda: ml_env
- name: evaluate
command: ["python", "evaluate.py"]
depends_on: [train]
Run the workflow:
Validate a workflow:
Environment Setup¶
srunx supports multiple environment types:
Conda Environment¶
Python Virtual Environment¶
Container (Pyxis)¶
Apptainer / Singularity Container¶
srunx sbatch --wrap "python script.py \"
--container "runtime=apptainer,image=/path/to/image.sif,nv=true"
Or specify the runtime separately:
srunx sbatch --wrap "python script.py \"
--container /path/to/image.sif \
--container-runtime apptainer
Conda Inside a Container¶
Containers can be combined with conda or venv:
srunx sbatch --wrap "python script.py \"
--container "runtime=apptainer,image=pytorch.sif,nv=true,bind=/data:/data" \
--conda ml_env
Parameter Sweep¶
So far you have run a single workflow. In this last part of the tutorial you will run the same workflow several times with different parameters -- a parameter sweep -- without copying YAML files.
You will:
- Write a tiny workflow that just echoes the parameters it received.
- Launch a sweep over three values from the command line.
- Watch the three cells run and inspect the aggregated result.
The example uses only echo, so no GPU, conda, or cluster-specific setup
is required.
1. Write the workflow¶
Save the following as sweep_demo.yaml:
name: sweep_demo
args:
seed: 1
jobs:
- name: echo
command: ["bash", "-lc", "echo 'seed={{ seed }}'"]
Notice the {{ seed }} placeholder. The workflow already runs on its own
(it will just use seed=1, the default), but it is ready to be swept.
2. Launch the sweep¶
Ask srunx to run the workflow three times, once per seed, with at most two cells running at the same time:
srunx expands the matrix at load time into three independent cells,
each with its own seed value. The command prints a sweep ID and the IDs
of the three child workflow runs, then streams their progress.
3. Observe the cells¶
While the sweep is running, list your jobs in another terminal:
You should see up to two echo jobs in RUNNING state at a time, with
the third one queued until a slot frees up. When everything is done, the
sweep converges to completed and each cell reports its own result.
Because fail_fast defaults to false, one misbehaving cell would not
cancel the others -- the sweep would simply end with a mix of
completed and failed cells.
4. You made it work¶
You just ran the same workflow three times under a single sweep parent, with automatic concurrency control. From here:
- To re-run only the cells that failed, or to learn the full sweep surface (ad-hoc overrides, dry-run previews, Web UI / MCP sweeps), read the how-to guide: Parameter Sweeps.
- To drive sweeps from an AI agent via Claude Code, see the MCP tools reference.
Next Steps¶
- Read the User Guide for detailed usage instructions
- Check the API Reference for programmatic usage
- Explore Workflows for complex job orchestration