Smoke-testing the notification pipeline against real SLURM¶
Manual procedure for verifying ActiveWatchPoller → DeliveryPoller →
SlackWebhookDeliveryAdapter end-to-end on a real SLURM cluster. Run
this before cutting a release that touches the srunx.observability.monitoring.pollers /
srunx.observability.notifications modules.
Estimated wall time: ~15 minutes.
Prerequisites¶
- An SSH profile configured for a reachable SLURM cluster
(
srunx ssh profile listshows it as current). - A Slack Incoming Webhook URL you own — the adapter POSTs to it during the test, so use a test channel.
- Local Python env with the latest changes installed:
uv sync && uv run srunx --version. - This procedure assumes PR #84 (
srunx sbatch --endpoint) and PR #85 (poller payload enrichment includingjob_id/job_name) are both merged. If they are not yet onmain, skip to the "Pre-#84 fallback" note below before running step 4.
Procedure¶
-
Isolate the state DB. The pollers will write to
~/.config/srunx/srunx.observability.storage. To keep the smoke clean: -
Register an endpoint. Either via the Web UI (Settings → Notifications) or the API:
-
Start the Web UI with pollers enabled. (Never use
The banner should say--reloadduring this test — the reload guard will skip poller startup.)● connected. -
Submit a quick job via the CLI with an endpoint subscription. Use
--preset all— thejob.submittedevent is only delivered underpreset='all'(seesrunx/notifications/presets.py;terminalandrunning_and_terminalwill skip the first Slack message and only deliver once the job reaches RUNNING / COMPLETED):Verify in the server logs thatuv run srunx sbatch --wrap "\" bash -c 'echo hi && sleep 10' \ --name smoke --endpoint smoke --preset allStarting 2 background poller(s)appeared at startup.
Pre-#84 fallback. If this branch hasn't merged yet, create the
watch + subscription manually instead of relying on --endpoint:
# 4a. Submit with the legacy --slack flag (uses SLACK_WEBHOOK_URL env)
SLACK_WEBHOOK_URL='<YOUR_URL>' uv run srunx sbatch --wrap "\"
bash -c 'echo hi && sleep 10' --name smoke --slack
# 4b. Grab the returned SLURM job id and create a watch+sub:
curl -s http://127.0.0.1:8000/api/watches \
-H 'content-type: application/json' \
-d '{"kind":"job","target_ref":"job:<JOB_ID>"}'
# Then POST /api/subscriptions with the watch_id + endpoint_id + "all".
- Expected Slack messages (chronological, typically within ~30 s each). A delivered message looks like:
The job_id / job_name keys are only present once PR #85
has landed; before that, the Slack adapter falls back to parsing
the id out of source_ref, rendering Name: ?:
| Event | Slack block | Payload keys (post-#85) |
|---|---|---|
job.submitted |
*Job submitted* |
job_id, job_name |
job.status_changed |
PENDING → *RUNNING* |
job_id, job_name, from_status, to_status, started_at, completed_at |
job.status_changed |
RUNNING → *COMPLETED* |
same |
The Notifications Center (at http://127.0.0.1:8000/notifications)
walks through three states during this test — it should look like:
=== "Empty"

=== "Populated"

=== "Filter: delivered"

-
Verify persistence. Before inspecting, stop the web server (Ctrl+C) and reopen the DB directly:
All deliveries should besqlite3 "$XDG_CONFIG_HOME/srunx/srunx.observability.storage" sqlite> SELECT id, status, attempt_count, delivered_at FROM deliveries; sqlite> SELECT from_status, to_status, source FROM job_state_transitions;delivered, and the transition log should contain exactly one(None, PENDING, source='webhook')row followed by the poller-sourced transitions. -
Reload guard check. Restart the server with
The server log should say--reload:Background pollers disabled (reload mode or SRUNX_DISABLE_POLLER=1).
Troubleshooting¶
- Slack says "no_active_hooks": the webhook URL is malformed or the app was removed from the workspace.
- Job.status_changed event never fires: check that the seed
job_state_transitionsrow is present for the job (step 6 query). Without it, the poller treats its first observation as "unknown prior state" and skips the event on purpose. - Stuck pending deliveries:
SELECT * FROM deliveries WHERE status='pending' AND next_attempt_at <= strftime('%Y-%m-%dT%H:%M:%fZ','now');should return 0 rows after a minute. If not, the DeliveryPoller may be disabled — checkSRUNX_DISABLE_DELIVERY_POLLER.
