PLAN: SharePoint / Files / Excel MCP Tools (V1)¶
Status: PR1 IMPLEMENTED 2026-04-30 on
feat/files-graph-api. 4 tools (resolve_file_url, list_recent_files, read_file, add_file_comment) + eval scaffold + 50 unit tests + 5 eval traces all passing onpytest -m eval. Tenant consent grant extended to includeFiles.Read.All Sites.Read.All Sites.ReadWrite.All(idempotent PATCH path covers existing tenants on next setup.sh run). PR2 (author/upload/share) and PR3 (Excel reads) deferred to follow-up sessions. Branch:feat/files-graph-apiAuthor: office-hours session 2026-04-30, locked by CEO review 2026-04-30, eng review 2026-04-30, PR1 shipped 2026-04-30 Companion (research):docs/platform-learnings/files-graph-api.mdCompanion (deferred Office authoring):docs/architecture/PLAN-files-llm-authoring-v2.mdApproach: narrow MCP tools, two scenarios, markdown-only V1 authoring, sponsor-only sharing, configurable site denylist
Why¶
The Agent User can talk in Teams. It cannot read a spec, comment on it, or author + share a new one. Closing this gap is the single biggest unlock for "agent participates in M365 work."
Two scenarios drive V1:
-
Read + comment. A user in Teams says "read the spec at this SharePoint URL — what do you think?" The agent fetches the document, ingests it, asks a clarifying question, and surfaces that question both as a Teams chat reply (always) and as a real document comment (Word / Excel only — see D1).
-
Author + share. A user says "draft a spec for X, upload it, and share with me." The agent generates the document as Markdown (
.md), uploads to SharePoint or its own OneDrive, andinvites the sponsor — only the sponsor — with edit rights.
V2 (PLAN-files-llm-authoring-v2.md) adds Office-format (.docx / .xlsx / .pptx) authoring via a template + IR + renderer pipeline — explicitly deferred from V1 per CEO-review D2.
Locked Decisions (CEO review)¶
| # | Decision | Locked value |
|---|---|---|
| D1 | Scenario 1 primitive | Both: Teams chat reply (primary) + file-comment (opportunistic, Word/Excel only) |
| D2 | Authoring format in V1 | Markdown only. Office authoring deferred to V2 plan stub |
| D3 | Tool API shape | Narrow tools (one tool per verb — matches existing tools/teams.py convention) |
| D4 | Scope | Both scenarios in V1 + V2 plan stub written as separate deliverable |
| D5 | Mode | HOLD SCOPE — review locked the plan, no scope expansion |
| D6 | Graph 5xx policy | Read tools retry-with-jitter on 5xx; mutation tools fail-fast |
| D7 | Upload conflictResolution default |
rename (no overwrite) |
| D8 | LLM eval gating | Include LLM eval; gate the PR1 merge |
| D10 | URL → ID resolution | resolve_file_url is a first-class PR1 tool |
| D11 | Site scope governance | Configurable site denylist via ENTRACLAW_FILES_DENIED_SITES (default empty) |
D9 was not asked (premise 4 was committed without question).
Premises¶
- The Agent User is a real Entra user. Existing third hop (
acquire_agent_user_token, scopehttps://graph.microsoft.com/.default) returns a Graph token covering whatever scopes the Agent User was provisioned for. No new auth hop. - Sharing is sponsor-only by Entraclaw policy.
share_fileenforces it viaentraclaw.identity.sponsors.list_sponsors(); Graph itself does not. - Every mutation enforces the sponsor allowlist — not just
share_file.upload_fileandwrite_text_fileaudit the recipient/site context against the allowlist before the Graph call. (Premise 4 from CEO review, committed.) - PowerPoint slide commenting is not possible via Graph.
comment_on_filerejects.pptx/.ppt/.odpand falls back to chat reply with a deep link. - File comments require beta.
/betahost is isolated to one helper. Comment endpoints are the only beta calls. - No anonymous links.
createLinkwithscope="anonymous"is never exposed.inviteonly. - Site denylist gates every read AND write.
_check_site_allowed(site_id)runs before every Graph call that touches a site/library. Denylist defaults empty (no restriction); operators setENTRACLAW_FILES_DENIED_SITESto opt in.
Scope¶
In scope (V1)¶
- File discovery: shared-with-me (
list_recent_files), URL → ID resolution (resolve_file_url) - File read: download + text extraction for prose formats only (
.md,.txt,.docx,.pdf) - File comments: list, add, reply (Word + Excel only) plus chat-reply fallback for everything else
- File authoring: Markdown only (
.md) - File upload: small-PUT + chunked,
conflictResolution=renamedefault (D7) - File sharing:
inviteto sponsor with role enforcement - Workbook range reads (Excel only — write deferred to V1.1)
- Site denylist enforcement on every read/write (D11)
- Reuse of existing
RetryOn429Transportfrom day 1; extended withallow_5xx_retryper D6
NOT in scope (deferred — see "Deferred / TODOs" section below)¶
- Office-format authoring (
.docx,.xlsx,.pptx) → V2 plan stub (PLAN-files-llm-authoring-v2.md) search_files(KQL site search) → P2 TODO; needsSites.Read.Allconsent — cut from V1 to keep PR1 permission scope coherentlist_sites(site enumeration) → P2 TODO; sameSites.Read.Allreason- PowerPoint slide authoring (
python-pptx) → V2 - Anchored / cell-level comments — Graph doesn't support them
- Co-authoring real-time collaboration sessions
- Webhook subscriptions for comment notifications — V1.1 will use
_background_poll_comments()(same shape as_background_poll) - Site/library creation
- Excel range writes / table appends → V1.1 (only
read_workbook_rangeandlist_workbook_tablesship in PR3)
Auth changes¶
Code — none¶
acquire_agent_user_token already returns a Graph-scoped token. The .default scope means "all consented scopes for this user." Adding scopes is a provisioning task, not a code task.
Provisioning — scripts/create_entra_agent_ids.py + setup.sh¶
Add admin consent for one delegated scope on the Agent Identity's app registration in PR1:
| Scope | Why | Sensitivity |
|---|---|---|
Files.ReadWrite.All |
Read sponsor-shared files; add comments; write to shared docs; resolve URLs to IDs | Tenant-scoped, signed-in user only |
PR2 adds:
| Sites.ReadWrite.All | Upload to SharePoint document libraries (vs. just OneDrive) | Same |
Sites.Read.All is not consented in V1 (because search_files and list_sites are deferred). If a P2 PR adds them, it adds the consent at the same time.
A new helper in create_entra_agent_ids.py (mirror of grant_agent_user_storage_consent) names this grant_agent_user_files_consent. Setup adds a --with-files flag; running without it leaves files capability disabled. Default off until a tenant admin opts in — matches --use-cloud-memory.
Sponsor-only enforcement (Premise 3)¶
entraclaw.identity.sponsors.list_sponsors() is the single authority. Reused by every mutation tool: share_file (recipient must be in list), upload_file (target site must NOT be in deny list), write_text_file (same).
Site denylist (D11)¶
# src/entraclaw/tools/files.py
def _check_site_allowed(site_id: str) -> None:
denied = os.getenv("ENTRACLAW_FILES_DENIED_SITES", "").split(",")
denied = [s.strip() for s in denied if s.strip()]
if site_id in denied:
raise SiteNotAllowedError(
f"Site {site_id} is in ENTRACLAW_FILES_DENIED_SITES; "
f"the agent is not permitted to read from or write to this site."
)
Called by read_file, resolve_file_url, list_recent_files, comment_on_file, upload_file, write_text_file, share_file, read_workbook_range, list_workbook_tables. Single audit point — easy to test, easy to verify.
Tool surface (9 tools)¶
Each tool is @mcp.tool()-decorated in mcp_server.py (wrapped with _with_token_retry), implemented in src/entraclaw/tools/files.py, and audit-logged via the shared _audit_graph_call async context manager (one helper, used 9 times — see Cross-cutting concerns).
All tool functions are async with the contract *, token: str, transport: httpx.AsyncBaseTransport | None = None (mirrors tools/email.py and tools/teams.py). The MCP wrapper supplies token from acquire_agent_user_token and leaves transport defaulted; tests inject a respx-driven transport.
PR 1 — Read & Comment (Scenario 1) — 4 tools¶
async def resolve_file_url(url: str, *, token: str, transport=None) -> FileRef: ...
async def list_recent_files(limit: int = 25, *, token: str, transport=None) -> RecentFilesPage: ...
async def read_file(file_ref: FileRef, *, as_format: Literal["raw", "auto"] = "auto",
token: str, transport=None) -> FileContent: ...
async def add_file_comment(file_ref: FileRef, content: str,
*, token: str, transport=None) -> CommentResult: ...
A1 (eng review): module boundary preserved. tools/files.py does NOT import tools/teams.py. The chat-reply leg from D1 is the model's job — it calls add_file_comment for the document side and send_teams_message for the Teams side as two separate tool calls. This keeps comment-side errors isolated from Teams-side errors and lets each surface its own audit event.
A2 (eng review): FileRef carries site_id. The resolver does the site lookup once; downstream tools never re-resolve. _check_site_allowed(site_id) is a pure local function that reads ENTRACLAW_FILES_DENIED_SITES.
@dataclass(frozen=True)
class FileRef:
drive_id: str
item_id: str
name: str
mime_type: str
kind: Literal["onedrive_personal", "onedrive_business", "sharepoint"]
site_id: str | None # populated for sharepoint kind, None for OneDrive
A3 (eng review): resolver uses GET /shares/{base64url(url)}/driveItem. This single endpoint covers SharePoint URLs (https://contoso.sharepoint.com/sites/X/...), OneDrive personal URLs, OneDrive business URLs, and shared-link URLs in one call — the response includes parentReference.siteId and parentReference.driveId for free. Errors: FileNotFoundError (404), SiteNotAllowedError (resolved site in denylist), UrlNotResolvableError (malformed/unrecognized URL).
RecentFilesPage = {files: list[FileSummary], denied_count: int} from /me/drive/sharedWithMe. denied_count is the number of files filtered out by the site denylist — surfaced so the model can tell the user "I see N more files but I can't access those sites."
FileSummary = {drive_id, item_id, name, web_url, mime_type, size_bytes, modified_at, shared_by}.
read_file(file_ref, as_format="auto") policy:
- .md / .txt / .html → fetch raw, decode, return text
- .docx → GET /content?format=pdf, extract via pypdf, return text
- .pdf → fetch raw, extract via pypdf (after size check)
- .xlsx / .xls → reject with UnsupportedReadFormatError(message="Use read_workbook_range for Excel data") (PR3)
- .pptx / .ppt → reject with UnsupportedReadFormatError(message="PowerPoint reading is not supported in V1; user can paste slide content into chat")
- everything else → UnsupportedReadFormatError
P1 (eng review): size cap before download. Before fetching .pdf content, read_file issues a GET /items/{id} and rejects with FileTooLargeError if size > ENTRACLAW_FILES_MAX_PDF_BYTES (default 52_428_800 = 50 MiB). For .docx-via-PDF the same check applies after the format=pdf conversion's Content-Length. Avoids a 200 MB download to extract one paragraph of text.
FileContent = {drive_id, item_id, name, mime_type, text: str, page_count: int | None, truncated: bool}. truncated=True if extracted text exceeds ENTRACLAW_FILES_MAX_TEXT_BYTES (default 200_000 — a 50-page spec).
A1+A4+A5 (eng review): add_file_comment is Files-only and tightly scoped.
- Endpoint:
POST /beta/drives/{drive-id}/items/{item-id}/comments(the only beta endpoint in this PR — see "Beta surface isolation"). The/workbook/commentsand/document/commentsshapes from earlier drafts are wrong; Microsoft's beta surface uses one path for both Word and Excel. - A5 reject conditions (raise
UnsupportedCommentFormatError): - File extension not in
{.docx, .xlsx}(rejects.pptx,.pdf,.md, etc.) file_ref.kind == "onedrive_personal"(Microsoft does not GA personal-OneDrive comments)file_ref.mime_typeindicates a folder (folderfacet on the driveItem)- No chat-reply leg. Returns
CommentResult = {comment_id: str, content: str, web_url: str}. The model orchestrates the Teams reply withsend_teams_messageif it wants to.
PR 2 — Author, Upload, Share (Scenario 2) — 3 tools¶
async def write_text_file(content: str, filename: str,
target: UploadTarget = OneDriveTarget("/"),
*, conflict: Literal["rename", "replace", "fail"] = "rename",
token: str, transport=None) -> FileSummary: ...
async def upload_file(content: bytes, filename: str,
target: UploadTarget = OneDriveTarget("/"),
*, conflict: Literal["rename", "replace", "fail"] = "rename",
token: str, transport=None) -> FileSummary: ...
async def share_file(file_ref: FileRef, recipient_email: str,
role: Literal["read", "write"] = "write",
message: str | None = None,
*, token: str, transport=None) -> SharePermission: ...
C1 (eng review): write_text_file accepts conflict symmetric with upload_file. Default "rename" per D7. Markdown / plaintext only — Office formats are V2 (PLAN-files-llm-authoring-v2.md).
UploadTarget is a tagged union: OneDriveTarget(folder_path: str = "/") (agent's own OneDrive — no SharePoint write needed), or SharePointTarget(site_id: str, library_id: str | None, folder_path: str = "/") (named site library). Both target types are dataclasses; _check_site_allowed(target.site_id) runs whenever target is a SharePointTarget.
upload_file picks small-PUT vs. chunked-upload by len(content):
- < 4 MiB → single PUT to /items/{parent}:/{filename}:/content
- ≥ 4 MiB → createUploadSession + 5 MiB chunks
T2 (eng review): chunked-upload uses protocol-native nextExpectedRanges resume. On a 5xx mid-stream, re-issue GET {uploadUrl} to read nextExpectedRanges, then resume from the first byte the server still wants. Up to 3 retries per chunk; cap on total wall-time. This is the only mutation that retries on 5xx, and only for an in-progress upload session — D6 fail-fast still applies to fresh mutation calls.
share_file takes a FileRef (A2 — no re-resolve). Pre-call check: recipient_email must be in entraclaw.identity.sponsors.list_sponsors(). Otherwise raise NotASponsorError with the sponsor list so the model can correct itself. requireSignIn=true always. sendInvitation=true always. Audit records the Graph permission ID for clean future revocation (V1.1 unshare_file).
PR 3 — Workbook depth (read-only) — 2 tools¶
async def read_workbook_range(file_ref: FileRef, sheet: str, address: str,
*, token: str, transport=None) -> ExcelRange: ...
async def list_workbook_tables(file_ref: FileRef,
*, token: str, transport=None) -> list[ExcelTable]: ...
ExcelRange = {address, values: list[list], formulas: list[list], num_rows, num_cols}. Opens and closes a workbook session per call. Excel writes (excel_write_range, excel_append_table_rows) deferred to V1.1.
Phasing¶
Three PRs, each independently shippable. Each PR reuses RetryOn429Transport from day 1 — no new throttle helper.
PR 1 — Read & Comment (Scenario 1)¶
- 4 tools above (
resolve_file_url,list_recent_files,read_file,add_file_comment) +_check_site_allowed,_audit_graph_callhelper,RetryOn429Transportextended withallow_5xx_retry=Truefor read calls (D6) - Provisioning:
Files.ReadWrite.Allconsent grant,--with-filesflag,ENTRACLAW_FILES_DENIED_SITESenv doc - Eval scaffolding (T1, eng review):
tests/evals/__init__.py,tests/evals/conftest.py,tests/evals/rubric.py,tests/evals/test_files_scenario_1.py(5 traces) — full directory ships in PR1, not a follow-up - LLM eval suite (gates merge per D8): five Scenario-1 traces — agent reads spec, asks clarifying question, posts
add_file_comment+send_teams_message(model orchestrates both legs). Eval asserts: comment lands on doc, chat reply lands in chat, deep link works, denied site rejects with correct error, truncated doc surfaces truncation honestly. - Tests: ~50 new unit tests (T4, eng review), mirroring
tests/tools/test_email.py+tests/tools/test_teams.py(respx mocks, real Graph response shapes, both 5xx-retry paths exercised, T3 regression-safe default for existing email/teams)
PR 2 — Author, Upload, Share (Scenario 2)¶
- 3 tools above + sponsor-allowlist enforcement on every share mutation
- Provisioning:
Sites.ReadWrite.Allconsent grant - Tests: ~30 new tests (T4, eng review), including chunked-upload happy-path + mid-stream 503 →
nextExpectedRangesresume (T2), denylist rejection, non-sponsor share rejection,write_text_fileconflictparameter symmetry (C1) - LLM eval extension: three Scenario-2 traces — agent drafts markdown, uploads, shares with sponsor; one with denied target site; one with non-sponsor recipient
PR 3 — Workbook reads¶
- 2 tools above + workbook session lifecycle helper
- Tests: ~15 new tests
- No new permissions
Cross-cutting concerns¶
Throttling & retry (extends existing RetryOn429Transport)¶
src/entraclaw/tools/rate_limit.py already implements RetryOn429Transport (async httpx transport with Retry-After backoff). Files PR1 extends it with one new kwarg:
class RetryOn429Transport(httpx.AsyncBaseTransport):
def __init__(self, *, max_attempts: int = 3, allow_5xx_retry: bool = False) -> None: ...
Per D6: read_file, resolve_file_url, list_recent_files, read_workbook_range, list_workbook_tables use allow_5xx_retry=True with exponential jitter (200ms / 800ms / 2400ms). comment_on_file, write_text_file, upload_file, share_file use allow_5xx_retry=False — fail fast on 5xx so the model can decide whether to re-issue. No new throttle helper invented.
Audit logging — single helper (C3, eng review)¶
Every file-touching tool emits two audit events: pre-call (outcome="pending") and post-call (outcome="success"/"failure"). Resource format: {drive_id}:{item_id} for file-scoped events, {site_id} for site-scoped events. Premise 3 means the sponsor allowlist check itself emits an audit event when it rejects.
DRY: one helper, used 9 times.
@asynccontextmanager
async def _audit_graph_call(verb: str, resource: str,
*, allow_5xx_retry: bool = False) -> AsyncIterator[None]:
"""Pre-call pending audit, run body, post-call success/failure audit.
`allow_5xx_retry` is a passthrough flag the caller can use when constructing
its `httpx.AsyncClient(transport=RetryOn429Transport(allow_5xx_retry=True))`.
Kept on the helper signature so reads vs. mutations are visibly different at
every call site.
"""
This replaces nine ad-hoc audit_log(...) / try / except / audit_log(failure) blocks with a single concern. Tests assert exactly one pre-event and one post-event per tool call.
Beta surface isolation¶
Comments are the only beta calls. Define GRAPH_BETA_HOST = "https://graph.microsoft.com/beta" once in tools/files.py and gate all comment calls through a single helper. If Microsoft GAs comments to v1.0, that's a one-line change.
Channel notifications for comment replies¶
Out of scope for V1 (V1.1 follow-up): when a sponsor replies to an agent's file comment, the agent should see it the same way it sees a Teams DM (notifications/claude/channel). Implementation: _background_poll_comments() task at 60s interval (files change less often than Teams chats).
Testing strategy¶
Mirror tests/tools/test_email.py and tests/tools/test_teams.py. Target counts after eng-review T4: PR1 ~50 unit tests, PR2 ~30, PR3 ~15.
- respx for Graph HTTP mocking — never hit real Graph in unit tests
- Real Graph response shapes — copy from Graph Explorer for fidelity, never invent
- Each tool gets: happy path, missing-permission (403), file-not-found (404), throttled (429 with
Retry-After), token-expired-during-call (401), denylist-rejected, for read tools: 503 retry-success and 503 retry-exhausted - Chunked upload (T2): multi-chunk happy path; mid-stream 503 →
nextExpectedRangesresume; per-chunk retry cap exhausted; protocol-aborted upload-session 410 surfaces cleanly add_file_commentreject tests (A5):.pptx,.pdf,.md, personal-OneDrive.docx, folder driveItem, denylisted siteshare_filereject test for non-sponsor recipientread_filereject tests for.xlsx,.pptx, denylisted site, PDF overENTRACLAW_FILES_MAX_PDF_BYTES(P1)resolve_file_urltests for SharePoint URL, OneDrive personal URL, OneDrive business URL, malformed URL, denylisted site, 404, 429 backoff- T3 regression test for
RetryOn429Transportdefault behavior —allow_5xx_retry=False(default) MUST NOT retry on 5xx (proves PR1 didn't change existing email/teams retry semantics) - T5 fixture factories in
tests/tools/conftest.py:make_file_ref(),make_file_summary(),make_file_content(),make_excel_range()— keep test bodies focused on the path under test, not Pydantic boilerplate
Target: ≥ 80% line coverage per pytest --cov-fail-under=80.
LLM eval suite (D8 — gates PR1 merge per T1, eng review)¶
Lives in tests/evals/. Scaffolding ships in PR1, not a follow-up:
tests/evals/__init__.pytests/evals/conftest.py— replay harness: respx-driven Graph fixture loader, model-call snapshot/replay, scoring rubric injectortests/evals/rubric.py—EvalRubricdataclass:must_call_tools,must_call_in_order,must_not_call_tools,assertions(callable list over the post-trace state)tests/evals/test_files_scenario_1.py— 5 traces:- Happy path — agent reads a
.docxspec, asks clarifying question, postsadd_file_commentAND asend_teams_messagechat reply (model orchestrates both legs) .pptxreject — agent gracefully reports it can't read PowerPoint, suggests user paste content- Denied site — agent hits
SiteNotAllowedError, surfaces the error and tells user the operator denied that site - Truncated long doc — agent reads a 60-page PDF that gets truncated, mentions truncation in its summary, doesn't pretend it read everything
- 5xx retry success — Graph returns 503 once, then 200; the model never sees the failure (transparent retry)
Marker: eval (registered in pyproject.toml). Run via pytest tests/evals -m eval. The unit test suite excludes the eval marker by default to keep pytest -v fast. PR1 cannot merge until pytest tests/evals -m eval passes.
P2 (eng review): live/snapshot/replay modes for the eval LLM are deferred — V1.1 follow-up. PR1 ships with snapshot mode only (model responses are pre-recorded JSON in tests/evals/snapshots/).
Failure-mode registry (per D6)¶
| Failure | Tool | Retry policy | User experience |
|---|---|---|---|
| 429 throttle | all | RetryOn429Transport (Retry-After, max 3) |
transparent retry; logged |
| 503 transient | reads | retry with jitter (3 attempts, 200/800/2400ms) | transparent retry; logged |
| 503 transient | mutations (non-chunked) | fail fast, surface to model | model can decide to re-issue |
| 503 transient | chunked upload (in-flight) | nextExpectedRanges resume (T2; max 3/chunk) |
transparent within upload |
| 401 token expired | all | _with_token_retry (existing) |
refresh + retry once |
| 404 file not found | all | no retry | FileNotFoundError → model surfaces to user |
| 403 missing permission | all | no retry | MissingPermissionError → operator must re-consent |
| Denied site | all | no retry | SiteNotAllowedError → operator updates ENTRACLAW_FILES_DENIED_SITES |
| Non-sponsor share | share_file |
no retry | NotASponsorError with sponsor list → model corrects |
| Unsupported read format | read_file |
no retry | UnsupportedReadFormatError with hint to read_workbook_range |
| Unsupported comment target (A5) | add_file_comment |
no retry | UnsupportedCommentFormatError (rejects .pptx, personal OneDrive, folder, non-Office formats) |
| File too large (P1) | read_file |
no retry | FileTooLargeError when size > ENTRACLAW_FILES_MAX_PDF_BYTES (default 50 MiB) |
| Upload conflict | upload_file / write_text_file |
no retry (default rename resolves it) |
renamed file returned in FileSummary |
Deferred / TODOs (P2 work)¶
search_files+list_sites(needSites.Read.All; cut from PR1 for permission scope coherence)- Excel writes (
excel_write_range,excel_append_table_rows) — V1.1 - Workbook session context manager for batched writes — V1.1
- Webhook subscriptions for comment replies (currently V1.1 polls)
- Site/library creation
unshare_filefor clean revocation ofshare_filepermissions- Office-format authoring (
.docx,.xlsx,.pptx) — seePLAN-files-llm-authoring-v2.md - PowerPoint slide authoring + reading
What I noticed about how you think¶
- You named both scenarios concretely. That's the kind of specificity that makes a plan implementable instead of aspirational.
- You picked C (full surface) when offered the smaller wedges in office-hours, then accepted CEO-review's recommendation to trim to the locked surface above — that's the right kind of evolution from brainstorm to ship plan.
- The sponsor-only sharing constraint came in early and unprompted. You're already thinking like a security architect about this surface, and the CEO review extended that into Premise 3 + D11.
The Assignment (now de-risked, not deferred)¶
Before PR1 lands: open this plan in a Teams chat with your sponsor and ask one question — "Are there any sites or document libraries the Agent User must NOT touch?" Capture the answer in ENTRACLAW_FILES_DENIED_SITES in .env.example and document it in PR1's setup notes. The denylist is configurable, so the answer doesn't need to be perfect — but capturing it now beats finding out after a tool accidentally reads from a Legal-restricted SharePoint site.
GSTACK REVIEW REPORT¶
| Review | Trigger | Why | Runs | Status | Findings |
|---|---|---|---|---|---|
| CEO Review | /plan-ceo-review |
Scope & strategy | 1 | CLEAR (HOLD_SCOPE) | 7 issues surfaced, 1 critical gap (5xx policy), 0 unresolved; 10 decisions locked (D1-D11, D9 skipped) |
| Codex Review | /codex review |
Independent 2nd opinion | 1 (Claude fallback — Codex hung 11+ min) | ISSUES_FOUND | 7 outside-voice findings: 5 committed, 2 became D10 + D11 |
| Eng Review | /plan-eng-review |
Architecture & tests (required) | 1 (this run, post-rewrite) | CLEAR | 12 issues across 4 sections; 6 decisions taken via AskUserQuestion (A1/A2/A3/C1/T1/T2); 6 mechanical commits (A4/A5/C2/C3/C4/T3-T6/P1/P2); 0 unresolved; 0 critical failure-mode gaps |
| Design Review | /plan-design-review |
UI/UX gaps | 0 | — (no UI scope) | n/a |
| DX Review | /plan-devex-review |
Developer experience gaps | 0 | — | n/a |
Eng review decisions (this run):
- A1: comment_on_file split into add_file_comment (Files-only); model orchestrates the chat-reply leg via existing send_teams_message. tools/files.py does not import from tools/teams.py.
- A2: _check_site_allowed API mismatch resolved — resolve_file_url returns FileRef.site_id; downstream tools (read_file, add_file_comment, etc.) accept FileRef directly. list_recent_files post-filters denied items + surfaces denied_count.
- A3: resolve_file_url uses GET /shares/{base64url(url)}/driveItem — the canonical Graph sharing-URL resolver; returns parentReference.siteId for free.
- A4: Endpoint corrected — POST /beta/drives/{drive-id}/items/{item-id}/comments (NOT /workbook|document/comments); same for Word and Excel.
- A5: add_file_comment rejects folder driveItems (per platform-learnings line 206) in addition to .pptx and personal OneDrive.
- C1: write_text_file accepts conflict: Literal["rename","replace","fail"] = "rename", mirroring upload_file.
- C2: All tools/files.py functions are async def, take *, token: str, transport: httpx.AsyncBaseTransport | None = None. mcp_server.py wraps each with _with_token_retry.
- C3: Single _audit_graph_call(verb, resource, *, allow_5xx_retry=False) async context manager (or @graph_call decorator) — used 9 times; not a framework, just DRY.
- C4: §"Tool surface" needs in-doc rewrite to reflect post-decision shapes (FileRef param, async signatures, dropped chat-reply plumbing on add_file_comment).
- T1: PR1 includes eval scaffolding — tests/evals/conftest.py (replay harness), pyproject.toml eval marker registration, tests/evals/rubric.py (dataclass schema), tests/evals/test_files_scenario_1.py (5 traces). PR1 cannot merge until pytest tests/evals -m eval passes.
- T2: Chunked upload uses protocol-native nextExpectedRanges resume per chunk (max 3 attempts/chunk); session-create failures fail-fast per D6.
- T3-T6: Test plan revisions — regression test for RetryOn429Transport.allow_5xx_retry=False default; PR1 ~50 tests / PR2 ~30 / PR3 ~15 (plan undercounted ~30%); fixture factories in tests/tools/conftest.py; denylist tests assert respx_mock.calls.call_count == 0.
- P1: ENTRACLAW_FILES_MAX_PDF_BYTES env (default 50 MB) + FileTooLargeError; rejection happens BEFORE PDF download.
- P2: Plan needs to specify whether eval-mode LLM is live, snapshotted, or replayed (affects CI cost + gate reliability).
Outside voice (this run): SKIPPED — plan was outside-voice'd 6h ago in CEO review; calling again on same content is noise. CEO-stage outside voice surfaced 7 findings (5 committed, 2 became D10/D11). This eng review caught the implementation-detail layer (architectural module boundaries, API mismatches, eval scaffolding, chunked-upload semantics) that the strategy-level outside voice could not.
Cross-model: No tension in this run (outside voice skipped).
TODOS.md updates: 6 entries added under P2 — search_files+list_sites, Excel writes + workbook session manager, comment-reply webhook subs, unshare_file, V2 Office authoring (back-ref to V2 plan), site/library creation. All carry back-references to this plan's §"Deferred / TODOs".
Test plan artifact: ~/.gstack/projects/entraclaw-identity-research/user-main-eng-review-test-plan-20260430-110538.md.
Worktree parallelization: Sequential — PR1 → PR2 → PR3 all touch src/entraclaw/tools/files.py. Within PR1, the 4 tools share the file + the _audit_graph_call helper, so splitting tools across worktrees would create merge conflicts. Single PR per phase is correct.
Failure-mode critical gaps: 0. Every codepath in the test diagram has either a planned test, an error handler, or both. After P1's FileTooLargeError addition, the failure registry is complete.
UNRESOLVED: 0
VERDICT: CEO + ENG CLEARED — ready for /ship once the in-doc rewrites land (A1/A2/A3/C1/C4/T1/T2 + P1 in failure registry). Eng-review-required gate satisfied.