Scheduler

A cron-like rule engine embedded in the crew dashboard for periodic tasks and agent lifecycle automation. Rules are declarative, one per line, stored in a plain text file per crew.

When To Use It

  • Keep a specific agent alive automatically (re-summon on crash or release)
  • Send periodic messages (daily narrative prompts, status checks)
  • Release idle agents after a timeout
  • Run shell commands on a schedule (test suites, maintenance scripts)
  • Replace system crontab entries for crew-level operations

Quick Start

Create a rules file at .crew/scheduler.rules in the crew directory:

[keep-forge] every 1m if not alive(forge) => summon forge
[daily-report] at 0 0 * * * => msg beacon "Write daily narrative for today"

The dashboard picks up the file automatically on the next tick (within 10 seconds). No restart required.

Verify with a dry run:

$ metateam crew scheduler test

Rule Format

[<name>] <trigger> [<condition>] => <action>
  • Name (optional): enclosed in [brackets], must be unique. If omitted, an auto-generated name is derived from the trigger and action.
  • Trigger: when the rule fires (interval or cron).
  • Condition (optional): a predicate that must be true for the action to execute.
  • Action: what to do when the rule fires.
  • Parts are separated by => (space-arrow-space).
  • Empty lines and # comments are ignored.

Triggers

Interval

every <duration>

Fires repeatedly at fixed intervals. Duration format: <number><unit> where unit is s (seconds), m (minutes), h (hours), or d (days).

On startup or rule creation, the first fire waits one full interval period. This prevents a storm of actions on dashboard restart.

every 30s
every 5m
every 1h

Cron

Standard 5-field crontab expression, evaluated in the server's local timezone:

at <minute> <hour> <day-of-month> <month> <day-of-week>

Fields support *, */N, N,M, and N-M ranges.

at 0 0 * * *       # midnight daily
at */30 * * * *     # every 30 minutes
at 0 9,17 * * 1-5   # 9am and 5pm on weekdays

Cron Shorthands

at 14:30            # daily at 14:30
at Sun 00:00        # weekly on Sunday at midnight

Conditions

Optional. If omitted, the action fires unconditionally on trigger. One condition per rule.

alive / not alive

if alive(<agent>)
if not alive(<agent>)

True when the specified agent is or is not present in the communicator agent registry.

Multiple agents use AND semantics:

Expression Meaning
alive(forge) Forge is registered
alive(forge,beacon) Forge AND Beacon are both registered
not alive(forge,beacon) Forge OR Beacon is not registered

idle

if idle(<target>) > <duration>

True when agents have had no real terminal activity for longer than the given duration.

Target Meaning
all Every registered agent is idle
any At least one registered agent is idle
<name> Specific agent is idle
<name1>,<name2> All listed agents are idle

When an idle condition matches, two template variables become available in the action's message:

  • {idle} -- comma-separated names of idle agents
  • {idle_duration} -- formatted duration of the longest idle agent

Actions

msg

msg <target> "<message>"

Send a message via the communicator. Target: all, <name>, or <name1>,<name2>.

Messages support template variables: {timestamp} (ISO timestamp), {crew} (crew name), {idle} and {idle_duration} (when an idle condition is present).

Messages are suppressed when dashboard silence mode is active.

summon

summon <name> ["<message>"]

Summon a persona. Uses the same path as metateam crew summon -- reads KB persona entry for client, model, and effort defaults. Message is optional.

Summon dedup prevents overlapping summons for the same persona. If a summon is already pending (within a 60-second window), subsequent summon actions for that persona are skipped.

release

release <target>

Release an agent. Target: <name>, <name1>,<name2>, or all.

Not suppressed by silence mode.

run

run "<shell command>"

Execute a shell command asynchronously. Output is logged, not sent to agents.

  • Working directory: crew root (repository root)
  • Default timeout: 5 minutes (SIGTERM, then SIGKILL after 10 seconds)
  • Single-instance per rule: skipped if a previous run is still active
  • Last 20 lines of stdout/stderr are captured and logged with the exit code

Complete Examples

# === Agent Lifecycle ===

# Keep an agent alive -- re-summon if it disappears
[keep-forge] every 1m if not alive(forge) => summon forge

# Release agents idle for 4 hours
[idle-release] every 5m if idle(any) > 4h => release {idle}

# === Scheduled Tasks ===

# Daily narrative prompt at midnight
[daily-narrative] at 0 0 * * * => msg beacon "Write daily narrative for today"

# Weekly summary on Sunday at 1am
[weekly-summary] at 0 1 * * 0 => summon beacon "Build weekly narrative"

# === Health Checks ===

# Run tests every hour during work hours on weekdays
[api-tests] at 0 9-17 * * 1-5 => run "dotnet test api/Tests -v minimal"

# === Notifications ===

# Nudge crew lead when everyone is idle
[idle-nudge] every 30m if idle(all) > 10m => msg drift "All agents idle for {idle_duration}"

Dry Run

Parse rules and evaluate triggers and conditions against the current crew state without executing any actions:

$ metateam crew scheduler test

Use --json for structured output.

Dashboard Integration

Status Bar

The scheduler shows a compact indicator in the dashboard status bar:

Indicator Meaning
SCH 3 3 rules loaded, all healthy
SCH 3 !1 3 rules, 1 in failure backoff (yellow)
SCH X Scheduler disabled after repeated crashes (red)
SCH -- No rules file or empty

Notifications

Events that need attention surface as dashboard notification toasts:

  • Rule enters backoff after consecutive failures
  • Scheduler disabled (crash threshold exceeded)
  • Rules file parse errors on reload
  • run action timeout or non-zero exit

Normal successful fires are silent.

/scheduler Overlay

The /scheduler slash command opens a full state table:

Name             Trigger      Condition          Last     Status
keep-forge       every 1m     not alive(forge)   30s ago  ok
daily-narrative  at 0 0 * * * --                 8h ago   ok
idle-nudge       every 30m    idle(all) > 10m    2m ago   skip
api-tests        at 0 9-17    --                 1h ago   !3 bk

Status values: ok (last success), skip (condition was false), !N bk (in backoff with N failures), run (currently executing), abandoned (from unclean restart).

Runtime Behavior

  • The scheduler runs inside the dashboard communicator server, ticking every 10 seconds.
  • Actions are dispatched asynchronously and never block the tick loop.
  • All conditions in a single tick evaluate against the same immutable snapshot of agent state.
  • On consecutive failures for the same rule, backoff increases exponentially: 1m, 2m, 4m, up to 1 hour max. Backoff resets when the action succeeds or the relevant agent's state changes.
  • Execution state is persisted to .crew/scheduler.state.json and survives dashboard restarts.
  • If the dashboard was down, missed intervals are not replayed. The scheduler resumes from the next future trigger.

Hot Reload

The scheduler checks .crew/scheduler.rules for changes every tick. On change:

  • The file is re-parsed entirely.
  • Rules matched by name carry forward their execution state.
  • New or changed rules start fresh (next interval fires after one full period).
  • Parse errors are logged with line numbers; valid rules continue to run.

Troubleshooting

Symptom Cause Fix
Rule never fires Condition always false Run metateam crew scheduler test to see current evaluation
Rule fires repeatedly Condition flaps (agent keeps dying) Check agent health; backoff will engage automatically
summon action skipped Another summon for that agent is pending Wait for the 60-second dedup window to expire
Status shows !N bk N consecutive failures triggered backoff Fix the underlying issue; backoff resets on success
Status shows abandoned Dashboard crashed mid-execution State resets on next successful fire
Parse error on reload Syntax error in rules file Check logs for the line number and raw text; valid rules still run

Guarantees vs Best-Effort

Aspect Level
Rule evaluation order Guaranteed: file order, top to bottom
At-most-once per interval/minute Guaranteed: dedup prevents double-fire
Exact fire time Best-effort: evaluated every 10 seconds, so up to 10s late
Missed interval catch-up No: missed windows are not replayed after downtime
Action completion Best-effort: actions can fail; failures trigger backoff
State persistence across restart Guaranteed: .crew/scheduler.state.json survives restarts

See Also