srunx documentation#
srunx is a powerful Python library for managing SLURM jobs and workflows. It provides a simple command-line interface and Python API for submitting, monitoring, and orchestrating computational jobs on HPC clusters.
Features#
Simple Job Submission: Submit jobs with intuitive command-line interface
Resource Management: Fine-grained control over compute resources
Environment Support: Conda, virtual environments, and containers (Apptainer/Singularity, Pyxis)
Workflow Orchestration: YAML-based workflow definition with dependency management
Monitoring and Callbacks: Real-time job monitoring with notification support
Project Sync: rsync-based project directory synchronization to remote SLURM servers
Template System: Flexible SLURM script generation with Jinja2 templates
Web UI: Browser-based dashboard for job management, resource monitoring, and workflow DAG visualization
Quick Example#
Submit a simple job:
srunx submit python train.py --gpus-per-node 2 --conda ml_env
Submit with an Apptainer container:
srunx submit python train.py --container "runtime=apptainer,image=pytorch.sif,nv=true"
Define a workflow:
name: ml_pipeline
jobs:
- name: preprocess
command: ["python", "preprocess.py"]
resources:
nodes: 1
- name: train
command: ["python", "train.py"]
depends_on: [preprocess]
resources:
gpus_per_node: 1
memory_per_node: "32GB"
time_limit: "8:00:00"
environment:
conda: ml_env
Tutorials
- Installation
- Quick Start
- Web UI Setup
- Prerequisites
- Step 1: Install Web Dependencies
- Step 2: Configure SSH Connection
- Step 3: Start the Server
- Step 4: Explore the Dashboard
- Step 5: View Jobs
- Step 6: Upload a Workflow
- Step 7: Build a Workflow Visually
- Step 8: Configure Settings
- Step 9: Browse Files with the Explorer
- Step 10: Set Up Mount Points
- Step 11: Run a Workflow
- Next Steps
How-to Guides
- User Guide
- Workflows
- Job and Resource Monitoring
- Project Synchronization
- Web UI How-to Guide
- Connect to a Different Cluster
- Change the Server Port
- Monitor GPU Resources
- Cancel a Running Job
- View Job Logs
- Upload and Visualize a Workflow
- Build a Workflow with the DAG Builder
- Edit Job Properties
- Use the File Browser for Remote Paths
- Manage Mount Points
- Sync Files Before Running
- Change Dependency Types
- Run a Workflow
- Cancel a Running Workflow
- Edit an Existing Workflow
- Delete a Workflow
- Manage Mounts from the Web UI
- View Job Logs from a Workflow Run
- Run Without SSH (Frontend Only)
- Develop the Frontend
- Run Tests
- Settings
- File Explorer
Reference