31 Agent Skills, 12 Products, One Engineer
We run a 12-product software company with a single developer. Not by working harder. By giving AI agents recurring responsibilities and holding them to the same standards as employees.
Most companies hire people to do repetitive work. We write a text file instead.
At Voxos, every recurring task that an AI agent can handle is encoded as a skill: a plain Markdown file that describes what to do, when to do it, and how to verify the result. These skills are version-controlled, code-reviewed, and deployed the same way we deploy software. They are not prompts. They are job descriptions.
We currently run 31 skills, 4 automatic hooks, and 6 recurring maintenance audits across 12 live products spanning two AWS accounts. One person manages all of it. This post is a complete inventory of every skill we use, what it does, and how often we use it.
What We Learned Building This
-
Skills compound. Each new skill unlocks the next. A research pipeline makes blog posts possible. A bench tool makes the pipeline measurable. A deploy skill makes blog posts publishable. The system is more valuable than the sum of its parts.
-
Recurring beats one-shot. The highest-value skills are the ones that run daily without being asked: engagement audits, session tracking, cost monitoring. One-time tasks are solved once. Recurring tasks are solved forever.
-
Measurement makes trust possible. We don't trust our agent because it's smart. We trust it because every change must be proven with numbers. Before and after. Delta shown. No vibes.
-
Hooks are the nervous system. The 4 automatic hooks that fire on session start and end are invisible but critical. They track every session, log every task, and catch orphaned processes. Without them, the skills are just documents.
-
Security is a skill, not an afterthought. A monthly SOC 2 audit skill scans all three AWS accounts for encryption gaps, public S3 buckets, exposed secrets, and overprivileged IAM roles. A penetration testing skill probes every endpoint from the outside. These run on the same infrastructure as every other skill.
-
The hardest skills to write are the simplest.
/commitis four words: "stage, describe, commit." Getting the agent to produce the right commit message every time took more iterations than building the research pipeline.
Research & Intelligence
These are the skills that go find things out. They launch multi-agent pipelines, scrape the web, and synthesize findings into structured output.
-
/research Several times a weekOur most-used intelligence skill. Launches the Scholar multi-agent research pipeline: a planner agent breaks a topic into shards, multiple search agents run in parallel, and a reducer agent synthesizes everything into a sourced report. Supports depth levels (quick, standard, thorough), output profiles (scholarly vs. technical), and one-command blog deployment.
-
/venture-research WeeklyDeep parallel research on investment domains. Launches 8-12 agents analyzing technology readiness levels, market sizing, competitive landscapes, and unit economics. Designed for scientific rigor, not speed.
-
/prospect-research WeeklyFinds potential users and contributors via web search. Maintains an exclusion list to avoid re-contacting the same people. Runs the semantic-mapreduce pipeline with three focused agents.
-
/hn-pain-points WeeklyScrapes Hacker News for trending frustrations and unsolved problems. Deduplicates across runs and flags which pain points are recurring versus newly emerging. We use this to find product ideas that real engineers are already asking for.
-
/idea-mining MonthlyGenerates 100+ creative ideas on a theme, then filters each through web search for novelty. Caches every evaluation to avoid redundant lookups. We use this for brainstorming product directions and blog topics.
Code & Development
The daily drivers. These skills handle the mechanical parts of writing, committing, testing, and auditing code.
-
/commit Multiple times dailyOur single most-used skill. Stages all changes, reads the diff, generates a conventional commit message, and commits. Follows the repository's existing message style. Sounds trivial. Saves ten minutes per cycle.
-
/commit-project Multiple times dailySame as
/commit, but scoped to a single project in the monorepo. In a 12-product codebase, you don't want to accidentally commit pulse changes with a scholar message. -
/finished End of every sessionAuto-detects which project was modified, commits remaining changes with a session summary, and produces a one-line sign-off. It's the last thing the agent does before a session ends.
-
/code-audit MonthlyTraces every user journey through a project: frontend routes, API endpoints, data pipelines. Launches parallel auditing agents that detect dead ends, missing auth, broken flows, and inconsistencies. Produces a prioritized fix list as a dependency graph. Our last audit found 122 issues across 8 user journeys.
-
/bench Several times a weekRuns A/B tests on the Scholar research pipeline. Executes the pipeline on a fixed thesis, scores the output on quality and cost, and compares labeled runs side-by-side. Every pipeline change must show a positive delta before it ships.
-
/done End of every sessionReviews the full conversation for loose ends: uncommitted changes, background processes still running, TODOs created but not addressed, deployments started but not verified. The agent's exit checklist.
Infrastructure & DevOps
The skills that keep the lights on. These handle deployment, security, and the AWS plumbing that a 12-product company requires.
-
/ip Several times a weekGets the current public IP, updates every staging project's WAF whitelist across both AWS accounts, and runs Terraform plan+apply. One command replaces 12 manual console visits. We run this every time the laptop changes networks.
-
/reset A few times a weekNuclear option for a frontend deployment. Clears the S3 bucket, invalidates CloudFront, rebuilds from source, and redeploys. When a staging frontend is in a bad state, this is faster than debugging it.
-
/reset-dynamodb As neededDeletes all items from specified DynamoDB tables. Hard-blocked from running against production. Useful for resetting staging data when testing schema migrations or seeding fresh demo data.
-
/pentest MonthlyAutomated external reconnaissance against our own infrastructure. Runs DNS enumeration, HTTP header analysis, endpoint discovery, API enumeration, rate-limit probing, and auth testing. Generates a security findings report ranked by severity.
-
/soc-audit MonthlyComprehensive SOC 2 compliance scan across all three AWS accounts. Checks DynamoDB encryption, S3 public access settings, Lambda environment variables for leaked secrets, CloudFront TLS configuration, and IAM policy scope. Produces scored findings with delta tracking against the previous month's results.
Content & Frontend
The skills that produce user-facing output: blog posts, favicons, translations, and frontend quality checks.
-
/brands As neededRetrofits a frontend with multi-brand templating. One backend, N branded frontends on subdomains. Creates brand config files, build scripts, deploy scripts, and CloudFront routing functions. Designed for white-labeling a product overnight.
-
/favicon As neededAI-powered favicon generation using Google Gemini. Generates an image from a prompt, launches an interactive circular crop GUI, and exports to the full favicon.ico format: ICO file plus six PNG sizes plus a web manifest.
-
/locales As neededRegenerates i18n translations for a project. Uses hash-based change detection so only modified strings get re-translated. Outputs to the standard
public/locales/directory structure. -
/paper As neededGenerates PDF documents from LaTeX source. Compiles twice for table-of-contents resolution. When we need to produce a formal document, PDF, or report that doesn't belong on the web.
-
/preview A few times a weekLaunches a local server that live-renders a Markdown file with GitHub-style CSS. Auto-reloads on each browser refresh. Useful for reviewing blog drafts and documentation before deployment.
-
/attention Hourly (automated)Scores a frontend project's "Golden Path Clarity" on a 1-5 scale. Checks whether the UI funnels users into a single monetizable journey. Audits CTA placement, dead ends, competing navigation paths, and attention ratio. Compares the declared score against the actual one and flags regressions.
Utility & Workflow
Small skills that remove friction from the daily workflow. Individually minor. Collectively, they eliminate hours of context-switching.
-
/queue Several times a weekParks a task for later. Writes a timestamped entry to a queue file with full context and requirements. When the current session ends before the task is done, this ensures it doesn't get lost.
-
/tabs A few times a weekOpens N new Windows Terminal tabs, each running a Claude agent session. When a task is parallelizable, we split it across multiple agents working simultaneously in separate terminals.
-
/vip As neededAdds or removes VIP and admin accounts on the Scholar platform. VIPs get 25 research credits. Admins get unlimited credits and platform administration access.
-
/nu As neededScaffolds a new skill. Creates the directory structure and a SKILL.md template with the standard frontmatter. When we identify a new recurring task, this is how it gets formalized.
The Invisible Layer: Hooks
Skills are invoked by name. Hooks fire automatically. We run four of them, and they provide the telemetry that makes everything else accountable.
Session register fires when any Claude session starts. It creates a JSON tracking file with the session ID, process ID, working directory, model name, and start timestamp. Session end fires when the session closes, marking it as ended and parsing the transcript for token and turn counts.
Task check fires before exit and nudges if no tasks were logged during the session. If you did meaningful work but didn't track it, the agent won't let you leave without acknowledgment. Artifact push handles file uploads to the Scholar platform via presigned S3 URLs.
Together, these hooks create an audit trail for every session: when it started, what model ran, how many tokens were consumed, and what tasks were completed. We can reconstruct any session's cost and output from this data alone.
When a terminal tab crashes, the session hook never fires. The tracking file stays marked as "active" indefinitely. Our cc-sessions.sh utility detects these orphaned sessions by checking if the recorded process ID is still alive. At the start of every new session, the agent checks for orphans and offers to resume them. Dead sessions get cleaned. Active work never gets lost.
Recurring Maintenance
Six tasks run on a cadence, from session-start to monthly, tracked in a single manifest file:
Session-start: triage the reminders file, verify check dates, remove resolved items, flag anything overdue.
Hourly: verify that memory files and project documentation match the current system state. Run golden path audits on all customer-facing frontends. Collapse completed milestones into summaries.
Daily: check CloudWatch for per-project API spend. Flag any project exceeding $10/month. Run the full engagement audit across DynamoDB, Stripe, CloudFront, and Lambda metrics.
Monthly: run the SOC 2 compliance scan across all three AWS accounts. Compare findings against the previous month's baseline. Surface regressions.
Token Budgeting
Every non-trivial task follows a lifecycle: estimate (predict token cost before starting), start (record the timestamp), complete (log actual tokens, files changed, and commit hash). This gives us estimation accuracy data over time.
Why tokens instead of hours? Because an AI agent cannot predict wall-clock time. It can reason about output volume. A config change is 2-5k tokens. A multi-file feature is 10-30k. A new pipeline stage is 50-100k. We calibrate against these benchmarks and track how often the estimate matches reality.
The Pattern
Every skill is a directory containing a single file: SKILL.md. The file has YAML frontmatter for metadata and a Markdown body for instructions. That's it. No framework. No SDK. No runtime dependency.
The instructions describe what the agent should do, how to verify the result, and what guardrails to respect. The agent reads the file when the skill is invoked and follows the instructions using whatever tools are available: file reads, shell commands, API calls, web searches.
The power of this approach is not in any single skill. It is in the accumulation. Each skill we write makes the next one cheaper to build, because the agent already knows the codebase, the deployment patterns, and the verification standards. Skill #31 took fifteen minutes to write. Skill #1 took an afternoon.
We started with /commit. Then we needed /commit-project for the monorepo. Then /finished to automate end-of-session cleanup. Then session hooks to track what happened. Then /engagement to measure whether any of it mattered. Each skill existed because the previous one created a gap. The system designed itself.
If you're building something similar, start with the task you do most often. For us, that was committing code. For you, it might be running tests, deploying a service, or triaging bug reports. Write the instructions in a Markdown file. Run it. Fix what breaks. Do it again tomorrow. The skill gets better every time because you refine the instructions, and the accumulation of skills creates an infrastructure that no individual prompt could replicate.