Runbook: Migrate the EntraClaw Provisioner from secret-auth to cert-auth¶
Status: Active (one-time migration per developer machine) Risk: Low — reversible within ~15 minutes. Affects only the Provisioner app's credential type. Duration: ~2 minutes end-to-end if nothing goes wrong.
Why this exists¶
scripts/entra_provisioning.py used to create a long-lived client secret on the EntraClaw Agent ID Provisioner app and write it to .entraclaw-state.json on disk. That violates the CLAUDE.md non-negotiable:
Never write secrets to logs or memory files
It also broke the symmetry with the Blueprint cert, whose private key already lives in macOS Keychain. This runbook walks you through the one-shot migration that:
- Replaces the Provisioner's client_secret with a self-signed X.509 cert (private key in Keychain).
- Deletes all password credentials on the Provisioner app.
- Purges
PROVISIONER_CLIENT_SECRETfrom.entraclaw-state.json.
See docs/SECURITY-DEBT-PROVISIONER-SECRET.md for the full rationale.
What gets changed¶
| Component | Before | After |
|---|---|---|
| Provisioner credential | client_secret (365 days) on app | RSA 2048 self-signed cert (365 days) on app |
| Private key storage | .entraclaw-state.json (plaintext on disk) |
macOS Keychain (service entraclaw-provisioner-cert) |
| Library path | azure.identity.ClientSecretCredential |
azure.identity.CertificateCredential |
| State file fields | PROVISIONER_CLIENT_SECRET |
PROVISIONER_CERT_THUMBPRINT |
| Shell token helper | n/a (inline curl with secret) | scripts/provisioner-token.py |
The Provisioner app identity (display name, object id, app id, permission set) is unchanged. This is a credential-type migration, not a rename or re-permission.
Pre-flight checklist¶
- [ ]
az loginas the user that already owns the Provisioner app. Verify with: - [ ] Python venv is active and has the provisioning deps:
- [ ] Back up the current state file so rollback is cheap:
- [ ] Note your Provisioner app ID (you'll need it for verification):
Migration command¶
One shot. Idempotent — safe to re-run.
What this does, in order:
- Reads
PROVISIONER_CLIENT_SECRETfrom.entraclaw-state.json(if present) and immediately clears it from the state file. - Finds the existing
EntraClaw Agent ID Provisionerapp (or creates it if missing). - Ensures the 25 Graph application permissions are registered and admin-consent is granted.
- Deletes all password credentials on the app via
az ad app credential delete. Prints a count. - Generates a fresh RSA 2048 cert (CN=
entraclaw-provisioner, 365 days). - Uploads the public cert to the app via
az ad app credential reset --cert @... --append. - Stores the cert+key PEM bundle in macOS Keychain (service
entraclaw-provisioner-cert, account = tenant id). - Records the SHA-1 thumbprint in
.entraclaw-state.jsonasPROVISIONER_CERT_THUMBPRINT.
Expected runtime: ~45 seconds. Most of it is the 30-second Graph permission propagation wait.
On success you'll see (trimmed):
WARNING: legacy PROVISIONER_CLIENT_SECRET found in state file. Cert-auth supersedes secret-auth; purging the secret from disk.
Ensuring 25 Graph application permissions on provisioner app...
Provisioner app already has the required Graph permissions
Admin consent granted
Removed N legacy password credential(s) from Provisioner app (cert-auth only from here on).
Generating cert for Provisioner (RSA 2048, 365 days)...
Uploading public cert to app (SHA-1 thumb: ...)...
Cert private key stored in macOS Keychain (service='entraclaw-provisioner-cert', account=<tenant>).
Verification¶
Run each of these from the repo root. All four must pass.
TENANT=$(jq -r .TENANT_ID .entraclaw-state.json)
PROV=$(jq -r .PROVISIONER_CLIENT_ID .entraclaw-state.json)
# 1. Secret purged from state file
grep PROVISIONER_CLIENT_SECRET .entraclaw-state.json && echo "FAIL" || echo "PASS: not found"
# 2. Zero password credentials on the app
az ad app show --id "$PROV" --query "passwordCredentials" -o json
# Expected: []
# 3. Keychain entry present (metadata only — do NOT pass -w unless you really need the PEM bundle)
security find-generic-password -a "$TENANT" -s "entraclaw-provisioner-cert" -g 2>&1 | head -3
# Expected: entry with service "entraclaw-provisioner-cert" and account <tenant>
# 4. Token helper emits a clean JWT for graph.microsoft.com
TOKEN=$(python3 scripts/provisioner-token.py)
echo "$TOKEN" | python3 -c "
import sys, json, base64
tok = sys.stdin.read().strip()
parts = tok.split('.')
assert len(parts) == 3, f'not a JWT ({len(parts)} parts)'
payload = json.loads(base64.urlsafe_b64decode(parts[1] + '=' * (-len(parts[1]) % 4)))
print(f'PASS: aud={payload[\"aud\"]}, appid={payload[\"appid\"]}')
"
If any of the four fails, STOP and investigate. Do not ship a half-migrated state — the most dangerous state is one where the Provisioner has both a cert AND a lingering password credential (backdoor).
Rollback (emergency only — 15-minute window max)¶
Rollback leaves you exposed — it re-creates the client_secret on disk that the fix was meant to eliminate. Use it only if the migration broke something you need running right now, and only long enough to diagnose. Then re-run the migration.
# 1. Restore the backup state file
cp /tmp/entraclaw-state.backup-<timestamp>.json .entraclaw-state.json
# 2. Re-create a client_secret on the Provisioner app so old code paths keep working
PROV=$(jq -r .PROVISIONER_CLIENT_ID .entraclaw-state.json)
SECRET_JSON=$(az ad app credential reset --id "$PROV" --append --years 1 -o json)
SECRET=$(echo "$SECRET_JSON" | jq -r .password)
# 3. Re-write the secret into the state file (what the old code expected)
python3 -c "
import json, pathlib
sf = pathlib.Path('.entraclaw-state.json')
data = json.loads(sf.read_text())
data['PROVISIONER_CLIENT_SECRET'] = '$SECRET'
sf.write_text(json.dumps(data, indent=2) + '\n')
"
# 4. Temporarily revert the code change (replay the commit locally)
git revert --no-commit HEAD # adjust as needed to get back to the secret-auth version
# 5. (Optional) Purge the cert from Keychain
TENANT=$(jq -r .TENANT_ID .entraclaw-state.json)
security delete-generic-password -a "$TENANT" -s "entraclaw-provisioner-cert"
After rollback, the secret is back on disk. Don't leave it there. Diagnose the issue, re-run the migration, verify all four checks, then force-delete the rolled-back secret:
Post-migration actions¶
- [ ]
git statusshows no modified.entraclaw-state.json, no Keychain export, no token output committed. - [ ] Delete the backup state file from
/tmp/once you're confident the migration held (it contains the now-invalidated secret, but it's still worth cleaning up): - [ ] Schedule a cert-rotation reminder for ~11 months from now (certs expire at 365 days). Rotation = re-run
python3 scripts/entra_provisioning.py— it detects expired certs and re-mints.
Known gotchas¶
- "keyring is required for cert-auth" — you forgot to install the provisioning deps. Run
pip install -e ".[dev,provisioning]". - "failed to upload cert to Provisioner app" — your
az loginisn't the Provisioner app owner, or the account lacks Application.ReadWrite.All consent. Have an admin re-run the migration from their machine, or grant you the role. - Migration succeeds but
provisioner-token.py401s — Graph permission propagation can lag. Wait 60 seconds and retry.get_graph_tokenalready sleeps 30s by default; passwait_for_propagation=Trueif re-running programmatically. passwordCredentialsis not[]after migration — a concurrent process (CI job, another dev machine) re-created a secret. Re-run the migration; the_remove_legacy_password_credentialsstep is idempotent.