Pre-Production Concierge Skills

🔍 Observation 🔬 Research Apr 28, 2026

Behavioural Observation · Architectural / Product Surface

Pre-Production Concierge Skill Library on Live Claude.ai Containers

Undocumented Real-World-Action Skills Staged on Every User’s Filesystem

Researcher: ElVec10 · Target: Claude Opus 4.7 (Anthropic) — claude.ai with computer use · Date: April 28, 2026
TL;DR

Every Claude.ai computer-use container ships with /mnt/skills/examples/ containing 22 SKILL.md files. Ten match the public anthropics/skills repository on GitHub. The remaining twelve do not appear in the public repo, on claude.com/skills, in docs.claude.com, in support.claude.com, or in any third-party catalog or web search. They form a coherent product surface — a real-world-action concierge stack — including phone-call booking, prescription refills, subscription cancellations, expense submission via Benepass/Brex/Concur/Expensify, TaskRabbit hiring, grocery delivery, and DMV form filing. The skills also reference an undocumented Tier 1 / Tier 2 / Tier 3 destructive-action classification system used to gate “plan-confirm required” behaviour. The skills are not loaded into the active session by default — <available_skills> does not list them — but they are world-readable on the container filesystem and can be opened with the view tool from any session.

Observation Metrics

TargetClaude Opus 4.7 — claude.ai with computer use
CategoryBehavioural Observation — Architectural / Product Surface
Total Skills on Filesystem22
Publicly Documented10 (45%)
Undocumented12 (55%)
Verification LayersFilesystem inventory + GitHub repo probe + web search
ReproducibilityImmediate — ls /mnt/skills/examples/ from any session
Security ImpactNone — no credentials, API keys, or exploitable code

Background

This observation extends the architectural mapping done in the LINT writeup (#1) and the System Prompt Transparency Audit (#7). Both established that Claude’s container surface contains substantially more than what is presented to the user or documented publicly. This finding documents a specific case: a product category that is staged on the filesystem of every paying user’s container but has no public presence whatsoever as of 2026-04-28.

The discovery was incidental — the /mnt/skills/examples/ directory was inventoried during an unrelated tool-surface audit. The names of the skills were unfamiliar. A check against the public repo found 12 of 22 missing. Web searches across multiple search axes returned zero matches for any of the 12.

Methodology

The verification was done in three layers:

Layer 1: Filesystem Inventory

ls /mnt/skills/examples/ from inside a Claude.ai computer-use session enumerates 22 skill directories (and their corresponding .skill zip archives). Each contains a SKILL.md with YAML frontmatter and prose instructions.

Layer 2: Public-Repo Cross-Check

For each of the 22 skill names, three candidate paths in the public anthropics/skills GitHub repo were probed via raw.githubusercontent.com: skills/<name>/SKILL.md, template/<name>/SKILL.md, and examples/<name>/SKILL.md. Ten skills resolved to HTTP 200 in the skills/ path. Twelve returned 404 across all three paths.

Layer 3: Web-Search Verification

Each of the 12 missing skills was searched against Anthropic’s official surfaces (docs.claude.com, support.claude.com, claude.com/skills), the Anthropic engineering blog, the Anthropic Skilljar courses, third-party skill catalogs (claudecn.com, claudeworld.com), and general web results. Search queries were issued for each skill name alongside “Claude” and “Anthropic”, and for the distinctive Tier 2 / Tier 3 destructive classification language. Zero relevant matches across all queries.

The Twelve

The undocumented skills, with one-line summaries extracted from their YAML descriptions:

SkillDescription
benepass-reimbursementSubmits expense reimbursements through Benepass (app.getbenepass.com) using browser automation, Gmail integration for verification codes, and file upload for receipts.
call-to-bookMakes outbound phone calls to book appointments or reservations. Discloses AI identity on the call, navigates IVR, adds the booking to calendar.
cancel-unsubscribeCancels subscriptions or unsubscribes from services. Works from a description, a charge line, a URL, or a screenshot. Can audit a full statement and cancel several at once. Includes phone calls.
event-planningPlans events from birthday dinners to weddings. Venue research, guest lists, timelines, vendors, budgets.
file-expensesSubmits expenses across Benepass, Brex, Concur, Expensify, etc. Detects platform, finds receipts, checks for duplicates.
file-formHandles bureaucratic tasks — jury duty responses, parking tickets, passport renewals, DMV forms, permit applications.
financial-calculatorTax estimates, loan comparisons, retirement projections, rent vs. buy, investment scenarios.
grocery-shoppingOrders groceries for delivery. Store selection, occasion-based list building, budget tracking.
hire-helpFinds and books service providers via TaskRabbit, Handy, Thumbtack — cleaning, handyman, moving, assembly, yard work.
meal-deliveryOrders food timed to arrive at a specific moment. Works backward from target arrival.
prescription-refillRefills prescriptions at pharmacies. Works from a medication name, Rx number, photo of the bottle, or “I’m running low.” Online portal or phone call.
return-refundReturns items or requests refunds from any retailer. Identifies item, finds policy, navigates the process, handles shipping or phone calls.
Coherent Product Category

The category is coherent. It is not “skills that demonstrate skill patterns” (the public examples are mostly that). It is “skills that perform real-world actions on third-party services on behalf of a user.”

The Tier Classification System

Two of the undocumented skills carry explicit risk tiering in their YAML body:

Tier 2 — prescription-refill

“Tier 2 skill (action, reversible). A refill request can be cancelled or left unpicked-up, so it’s not destructive — but it does involve my health information and real-world contact. Confirm the plan before you act.”

Tier 3 — cancel-unsubscribe

“Tier 3 skill (destructive, plan-confirm required). Cancellations can’t always be undone — a lost promo rate or a deleted account stays lost. Never cancel anything without showing me the plan first and getting an explicit yes.”

The Tier 1/2/3 nomenclature implies a complete classification system (Tier 1 presumably being read-only / informational). The framework is referenced as if the model is expected to recognise it. It does not appear in any public Anthropic documentation, support article, engineering blog post, or third-party guide. A web search for "Tier 2" "Tier 3" Claude skill destructive returns Anthropic’s API billing tiers — a different concept — and nothing else.

The Memory Architecture

Most of the concierge skills contain a near-verbatim repeated instruction:

“Always start completely fresh. Never carry over [details] from prior conversation. DO use memory to recall known details — [identity-shaped fields] — that might be needed for [task-specific purpose].”

The pattern is consistent across call-to-book, cancel-unsubscribe, event-planning, file-expenses, file-form, financial-calculator, grocery-shopping, hire-help, meal-delivery, prescription-refill, and return-refund. The instruction encodes a specific privacy-architecture decision: per-task state does not cross conversation boundaries, but identity-shaped memory (name, address, DOB, payment info, preferences) is expected to persist via a memory tool.

This architecture is not described in any public skill-authoring guide. The specific design pattern of “fresh task state, persistent identity” applied uniformly to a category of action skills is undocumented. The pattern itself — repeated near-identically across eleven skills — suggests a deliberate template, not incidental copy-paste.

Architectural Observations

Present on Disk, Absent from Session

The skills are read-only at the user level but are present on the user’s container. /mnt/skills/examples/ is a mount visible from any computer-use session. The .skill archive files (zipped distributables) are also present. Anyone who runs ls /mnt/skills/examples/ sees them. However, they are not in the session’s <available_skills> block — what Claude is told it has access to contains only the public/documented skills plus any user-installed skills.

Reachable via the View Tool

Despite not being in <available_skills>, the SKILL.md files can be read with view /mnt/skills/examples/<skill-name>/SKILL.md from any session. The instructions inside are then in the model’s context, available to influence behaviour.

Activation Behaviour Unverified

Whether asking Claude in a fresh session to “use the prescription-refill skill” causes the model to load and follow the SKILL.md instructions — and whether it would actually attempt the described phone-call or pharmacy-portal automation — has not been tested. The relevant ask_user_input_v0 tool referenced in the skill flows exists in the session’s tool surface, suggesting infrastructure exists for at least the elicitation parts of the flows.

What This Is And Is Not

This IS

A product-roadmap signal. The skills sketch a coherent category — “Claude does errands” — that Anthropic has not announced. The skills are well-written, safety-conscious, and complete enough to read as production-ready prompts rather than scratch drafts. They include explicit AI-disclosure-on-call language, plan-confirm gating on destructive actions, and consistent privacy-architecture guidance. This is staged code, not abandoned prototypes.

A documentation gap. Anthropic’s public skill documentation lists docx/pdf/pptx/xlsx as the built-in skills and points users at the public repo for examples. The undocumented twelve are present on user filesystems with no notice that they are there or that they encode capabilities not yet released.

An extension of the System Prompt Transparency Audit pattern. That writeup quantified the gap between what Anthropic publishes about Claude’s system prompt (~15-20%) and what Claude actually receives (~15,000+ words). This finding extends the same shape to the skill layer: the documented skill inventory is a strict subset of the actual skill inventory shipping in production.

This IS NOT

A security vulnerability. The skills do not contain credentials, API keys, internal Anthropic infrastructure references, or exploitable code. The 22 SKILL.md files were grep-checked for credential-shaped strings, internal-domain hostnames, and webhook URLs. The only matches were XML schema type definitions in the docx skill (xsd:token type names) and localhost binds in the MCP server scaffolding (correct security practice). The skills are clean text. The finding is about the fact of their unannounced presence, not about exploitable content within them.

A hidden capability the model can be tricked into using. The skills describe flows that depend on tools and integrations the user can already see (browser, phone, calendar, memory). The finding is informational, not exploit-shaped.

Reproduction

From any Claude.ai conversation with computer use enabled:

# 1. Inventory the skills directory
ls /mnt/skills/examples/

# 2. List just the SKILL.md files
find /mnt/skills/examples -name SKILL.md | sort

# 3. Verify which skills are on GitHub
for skill in benepass-reimbursement call-to-book cancel-unsubscribe \
             event-planning file-expenses file-form financial-calculator \
             grocery-shopping hire-help meal-delivery prescription-refill \
             return-refund; do
  CODE=$(curl -s -o /dev/null -w "%{http_code}" \
    "https://raw.githubusercontent.com/anthropics/skills/main/skills/$skill/SKILL.md")
  echo "$skill : $CODE"
done

# 4. Read the Tier-tagged skills directly
cat /mnt/skills/examples/prescription-refill/SKILL.md    # Tier 2
cat /mnt/skills/examples/cancel-unsubscribe/SKILL.md     # Tier 3
Assessment

Category: Behavioural Observation — not a vulnerability. Novelty: The 12 undocumented skills and the Tier 1/2/3 classification system have zero public footprint. Significance: Small but specific — the public skill catalog is approximately 53% of the actual example-skill catalog (10 of 22 visible). The 12 missing form a coherent product category (real-world action concierge), reference an undocumented internal classification system, and apply a consistent undocumented privacy-architecture pattern. Reproducibility: Immediate — under a minute by anyone with claude.ai access and computer use enabled.


ElVec10 — AI Red Team Research
https://elvec1o.github.io/home

This document was produced for AI safety research purposes under responsible disclosure principles.