Design: Teams Bot Gateway for EntraClaw¶

Generated by /office-hours on 2026-04-10 Branch: feature/multi-tenant-lightweight-chat Repo: example/entraclaw-identity-research Status: APPROVED Mode: Builder

Problem Statement¶

EntraClaw's delegated mode sends Teams messages via the human's Graph API token. Messages show as coming FROM the human with an [EntraClaw] prefix. This is confusing — recipients can't tell if the human or the agent sent the message. The Agent User flow solves this (agent gets its own identity) but requires per-user provisioning (Blueprint + Agent Identity + Agent User + M365 license + 15-minute wait).

Goal: Use the Teams Bot Framework so the agent has its own identity in Teams by default, with zero provisioning delay. Messages show as the bot. The bot runs locally alongside the MCP server via Dev Tunnel. Developers launch Claude Code, sign in once, and the bot handles Teams communication.

What Makes This Cool¶

Identity separation is solved by design. Bot Framework messages always show as the bot in Teams. No prefix hacks, no identity confusion. The bot has its own icon, display name, presence.
Minimal-config agent identity. No Agent User provisioning, no M365 license ($25/mo saved per agent), no 15-minute wait. One-time bot registration, then developers just sign in and go.
Event-driven, not polling. Bot Framework pushes activities to the bot when humans send messages. No 5-second polling loop. Instant delivery.
Adaptive Cards for free. Rich status cards with action buttons ("Open PR", "Run Tests", "Approve") are built into the framework. ~20 lines of JSON per card.
Speed to demo. "Launch Claude Code → sign in → say 'build this and message me when done' → it just works." No setup wizard, no identity provisioning, no Visual Studio subscription.

Constraints¶

Must run locally on the developer's Mac (no cloud deployment for demo)
Dev Tunnel (or ngrok) required for public HTTPS endpoint
One-time admin setup: Azure Bot resource + multi-tenant Entra app registration
Python 3.12+ (match existing codebase)
M365 Agents SDK (not deprecated Bot Framework SDK)
Must coexist with existing Graph API mode (config switch)
Must integrate with existing MCP tool surface (send/read/watch)

Premises¶

M365 Agents SDK (successor to Bot Framework) messages always show as the bot in Teams UI — this solves the identity confusion problem by design. (Note: "Bot Framework" is used colloquially throughout this doc to refer to the M365 Agents SDK and its predecessor's architectural patterns.)
A Dev Tunnel is acceptable for the demo. The bot needs a public HTTPS endpoint. Fine for research, not production.
The bot registration is a ONE-TIME admin setup (Azure Bot resource + multi-tenant Entra app). After that, developers just sideload the Teams app and run the MCP server. Same admin barrier as the current delegated flow.
OBO is NOT needed for the core demo — the bot sends messages as itself and receives messages from humans. OBO would only matter for accessing the human's mailbox/calendar.
The M365 Agents SDK (Python) is the right SDK choice over the deprecated Bot Framework SDK.

Approaches Considered¶

Approach A: "Bot Gateway" — Thin Bot + Existing MCP Server (CHOSEN)¶

Add a thin Bot Framework bot (M365 Agents SDK, Python) that runs alongside the MCP server. Bot handles Teams UI (send/receive), MCP server handles Claude Code integration. Communication via shared state or local IPC. - Effort: M (CC: ~30 min) - Risk: Low — additive, doesn't touch existing code - Pros: Clean separation, existing MCP tools unchanged, bot shows as distinct identity - Cons: Two processes to manage, Dev Tunnel dependency

Approach B: "Unified Bot-MCP" — Bot IS the MCP Server¶

Replace the MCP server's Graph API polling with Bot Framework event-driven messaging. Single process. - Effort: L (CC: ~1-2 hours) - Risk: Medium — deeper refactor - Pros: Single process, event-driven, cleaner long-term - Cons: Bigger change, harder to keep Graph API mode

Approach C: "Bot Relay Service" — Cloud Bot, Local Agent¶

Deploy a lightweight bot to Azure App Service. Relays to local MCP server via WebSocket. - Effort: L (CC: ~1 hour + Azure deploy) - Risk: High — adds cloud dependency - Cons: Not "zero-config" anymore

Recommended Approach¶

Approach A: Bot Gateway. Additive, ships fastest, proves the concept. Two processes is fine for a demo. Can evolve to Approach B later if we want event-driven messaging in a single process.

Architecture¶

┌─────────────────────────────────────────────────────────────┐
│ Developer's Mac                                             │
│                                                             │
│  ┌──────────────┐   shared state    ┌──────────────────┐   │
│  │ MCP Server   │◄────────────────►│ Bot Server        │   │
│  │ (stdio)      │   (file/queue)    │ (aiohttp :3978)   │   │
│  │              │                   │                    │   │
│  │ Claude Code  │                   │ M365 Agents SDK    │   │
│  │ ↕ tools      │                   │ ActivityHandler    │   │
│  └──────────────┘                   └────────┬───────────┘   │
│                                              │               │
│                                     ┌────────▼───────────┐   │
│                                     │ Dev Tunnel          │   │
│                                     │ (public HTTPS)      │   │
│                                     └────────┬───────────┘   │
└──────────────────────────────────────────────┼───────────────┘
                                               │
                                     ┌─────────▼──────────┐
                                     │ Azure Bot Service   │
                                     │ (cloud relay)       │
                                     └─────────┬──────────┘
                                               │
                                     ┌─────────▼──────────┐
                                     │ Microsoft Teams     │
                                     │ (user's client)     │
                                     └────────────────────┘

Relationship to Existing Spec¶

This design is an alternative to the Graph API delegated mode described in NEXT-WhatsApp-lightweight-teams-chat.md. Both modes coexist via the ENTRACLAW_MODE config switch: - agent_user — Three-hop flow, Graph API, agent sends as Agent User (existing, production path) - delegated — MSAL token, Graph API, agent sends as human with [EntraClaw] prefix (existing) - bot — M365 Agents SDK, Bot Framework, agent sends as bot identity (this design)

The Agent User mode remains the recommended production path. Bot mode is a faster-to-demo alternative that avoids per-user provisioning.

Communication: MCP Server ↔ Bot Server¶

Phase 1 uses shared JSONL files at ~/.entraclaw/bot/outbound.jsonl and inbound.jsonl. Append-only writes with advisory file locking (fcntl.flock). Phase 2 may upgrade to local HTTP.

This is debuggable (cat the file), zero-dependency, and avoids port conflicts.

New Files¶

src/entraclaw/
  bot/
    __init__.py
    server.py          — aiohttp server + M365 Agents SDK ActivityHandler
    handler.py         — Message routing (inbound from Teams → shared state)
    cards.py           — Adaptive Card templates (Phase 2 — plain text first)
    tunnel.py          — Dev Tunnel management (start/stop/URL)
scripts/
  setup_bot.sh         — One-time: create Azure Bot + Entra app + sideload manifest
  start_bot.sh         — Launch bot server + dev tunnel
manifests/
  teams-app/
    manifest.json      — Teams app manifest
    color.png          — Bot icon (color)
    outline.png        — Bot icon (outline)

Modified Files¶

src/entraclaw/
  config.py            — ENTRACLAW_MODE=bot|delegated|agent_user, bot app ID/cert thumbprint
  mcp_server.py        — When mode=bot, read inbound from shared state instead of polling

Config¶

# Bot mode (certificate auth per ADR-003 — no client secrets)
ENTRACLAW_MODE=bot
ENTRACLAW_BOT_APP_ID=<azure-bot-app-id>
ENTRACLAW_BOT_CERT_THUMBPRINT=<bot-cert-thumbprint>
ENTRACLAW_BOT_TUNNEL_PORT=3978

Note: Per ADR-003, the bot uses certificate auth (private key in OS keystore), not client secrets. The M365 Agents SDK supports CertificateCredential for bot authentication.

Conversation Reference Persistence¶

Stored as JSON at ~/.entraclaw/bot/conversation_refs.json, keyed by chat ID. Loaded on bot startup, updated on every conversationUpdate and message activity. This follows the existing data_dir pattern used for chat_id persistence.

Failure Modes¶

Failure	Detection	Recovery
Dev Tunnel disconnect	Bot server gets no inbound activities	Bot logs warning, auto-reconnects tunnel. MCP server falls back to "bot unavailable" error on send.
Bot server crash	MCP server's outbound file grows without being consumed	MCP server logs warning after 30s of unconsumed messages. User restarts bot.
Shared file corruption	JSON parse error on read	Truncate corrupted file, log warning, continue. Messages in flight are lost (acceptable for demo).
Bot credential failure (401/403)	Bot startup health check fails against Azure Bot Service	Log specific error (expired cert, wrong app ID, missing consent). Exit with actionable message.

Known limitation: JSONL files at ~/.entraclaw/bot/ contain message content in plaintext. For production, use local HTTP IPC or encrypt at rest.

Testing Strategy¶

Per AGENTS.md: TDD — tests first, then implementation.

Unit tests: Mock M365 Agents SDK adapter, test handler.py message routing, test cards.py JSON output, test conversation reference persistence (load/save/corruption).
Integration tests: Bot server + fake tunnel (localhost-to-localhost), verify inbound/outbound JSONL round-trip.
No live Teams tests in CI — manual verification against real Teams for demo.
Key test scenarios:
Bot receives activity → writes correctly formatted entry to inbound.jsonl
MCP server writes to outbound.jsonl → bot reads and calls continue_conversation with correct conversation reference
Conversation ref persistence: save → simulate restart → load → proactive send succeeds
Corrupted JSONL → bot and MCP server recover without crash (truncate and continue)

SDK Maturity¶

The M365 Agents SDK for Python is in preview (as of early 2026). Pin package versions in pyproject.toml. API may change. If preview packages prove unstable, fall back to the deprecated-but-functional botbuilder-* packages (supported through Dec 2025, still installable).

Key SDK Packages¶

pip install microsoft-agents-hosting-core microsoft-agents-activity \
  microsoft-agents-hosting-aiohttp microsoft-agents-authentication-msal

Proactive Messaging Flow¶

Claude Code finishes a task
MCP server calls send_teams_message("Build complete! ✅")
MCP server writes message to shared outbound file
Bot server reads outbound file, sends via continue_conversation(ref, ...)
Message appears in Teams as the bot with an Adaptive Card

Inbound Flow¶

Human sends message in Teams to the bot
Azure Bot Service relays to Dev Tunnel → localhost:3978
Bot server's on_message_activity fires
Bot writes message to shared inbound file
MCP server reads inbound file (polls inbound.jsonl every 2 seconds via asyncio.create_task in the existing background poll loop)
MCP server pushes to Claude Code via notifications/claude/channel

Eng Review Notes (2026-04-10)¶

Outside voice findings (accepted): - JSONL fcntl.flock is POSIX-only. Acceptable for Mac/Linux demo. Windows would need portalocker. - M365 Agents SDK CertificateCredential support is unverified in preview. Risk accepted — will pivot to botbuilder-* or client secret if cert auth fails. - Effort estimate revised from 30 min to 2-4 hours (preview SDK + tunnel integration). - botbuilder-* fallback is not a drop-in — different class hierarchy. If Agents SDK fails, it's a partial rewrite, not a config change.

Outside voice findings (rejected): - "Two processes unnecessary" — Approach A is intentionally additive to protect existing MCP server. Approach B (embedded) remains the Phase 2 upgrade path. - "Solves already-solved problem" — Agent User requires E5 license + 15-min provisioning. Bot mode delivers the instant experience PM leadership requested. Different value prop.

Test coverage: 19 codepaths mapped, 13 new test scenarios added (see eng review test plan artifact).

Open Questions¶

Dev Tunnel reliability — Is devtunnel host -p 3978 --allow-anonymous stable for multi-hour sessions? Or should we use a reserved persistent tunnel URL?
Bot secret storage — The bot needs a client secret (or certificate). Store in OS keystore like the current cert? Or use a managed identity somehow?
Sideloading policy — Does the MSIT tenant allow sideloading? If not, the demo only works on werner.ac.
Conversation reference persistence — Where to store the conversation reference for proactive messaging? Keyring? File? The bot needs this to survive restarts.
Group chat support — Can a bot be added to an existing group chat, or does it need to create its own? For the "notify all of us" scenario.

Success Criteria¶

Developer launches Claude Code + bot server on their Mac
Bot appears in Teams as "EntraClaw Agent" with its own icon
Developer says "build X and message me when done" to Claude Code
Claude Code does the work, sends a message via the bot
Message appears in Teams as the bot, with an Adaptive Card showing the result
Developer replies in Teams, bot relays to Claude Code, Claude Code acts on it
Works in 1:1 chat (group chat support deferred to Phase 2 pending investigation of bot-in-group-chat mechanics)
Total setup time < 5 minutes (after one-time admin bot registration)

Next Steps¶

Update platform-learnings knowledge base — Add M365 Agents SDK Python details, Dev Tunnel setup, bot registration walkthrough
Create Azure Bot resource on contoso.com tenant
Scaffold bot server (src/entraclaw/bot/server.py) with M365 Agents SDK
Add shared state IPC between MCP server and bot
Build Adaptive Card templates for status, PR, build results
Wire up proactive messaging — bot sends on behalf of MCP server
Test end-to-end — Claude Code → MCP → bot → Teams → human → bot → MCP → Claude Code

What I noticed about how you think¶

You immediately spotted the admin consent problem: "Can an end user set up a bot resource?" That's the right question. Most people wouldn't push on premise #3 until they hit it in production.
"SPEED" as the core value prop — not features, not architecture, not security. You care about the developer feeling "whoa, that was fast." That's taste.
"Add it and make a switch" — you protected the existing work (Agent User mode via Graph API) while exploring a new path. That's the instinct of someone who's shipped things before. You don't burn bridges.
"Yes" to the entire 10x vision. You're not afraid of scope — you just want to ship the core first. That's founder energy.