Report Date: February 2026 | Repository: Private source repository (sanitized for public release) | Audience: Copilot developers and technical practitioners

Executive Summary¶

If you want the short version first: - Start here: launch/2026-02/start-here.md - Case study: launch/2026-02/case-study.md - Timeline: launch/2026-02/timeline.md

In February 2026, a manually curated Copilot newsletter workflow was rebuilt into a Copilot-assisted pipeline that can draft a full issue from a one-line prompt, then be brought to “ship ready” with a light editorial pass.

The core design is a loop: generate a draft, run checks, review what’s off, turn those notes into durable rules (skills, prompts, reference docs, validators), then rerun. The goal isn’t “no humans.” The goal is that repeated fixes don’t need to be repeated.

A big unlock was treating the historical corpus as more than finished newsletters. The incremental git edits and publication-time polish capture how “good” was reached (what got removed, reordered, reframed, and tightened). Those patterns were turned into reusable instructions and checks.

The rest of this document is the deep technical write-up: architecture, data sources, and the decisions that made the workflow more reliable.

Part 1: How It Was Built¶

The Problem¶

GitHub ships 30-50 changelog entries per month, VS Code now ships weekly releases, JetBrains pushes 3-4 plugin updates, and strategic announcements appear on github.blog/news-insights/ separately from the changelog. Enterprise customers cannot keep up -- updates are scattered across 9+ sources with no single enterprise-focused view.

The curator's monthly newsletter filled this gap, but it had become unsustainable. Each issue required 4-6 hours of manual assembly. The curator had tried using Copilot with ad-hoc prompt files, but the prompts were harder to maintain than the time they saved -- a messy collection that became another thing to debug rather than a solution.

The Old System: 2,456 Lines of Monolithic Prompts¶

The initial commit (f6340a7) preserved the old approach: 2,456 lines across 8 files, each encoding an entire pipeline phase in one monolithic document. The largest -- phase_1b_content_retrieval.prompt.md at 678 lines -- tried to handle per-source extraction strategies for 5 different source types in a single prompt. The customer_newsletter.agent.md was 454 lines of pipeline logic, audience rules, formatting, and orchestration crammed into one file.

When output had problems (wrong dates, hallucinated links, version misattributions), the curator edited the generated markdown directly. There was no feedback loop. The same errors recurred every month. Fixes never propagated back to the prompts.

The New System: Conversational Chat Turns → Permanent Intelligence¶

The current system inverts this entirely. The curator interacts through short, natural chat turns in VS Code:

"the Copilot at Scale section doesn't really make sense. GHEC and GHES updates in there are not copilot related at all."

"combine the two secret scanning entries"

"put the GHAS trials first"

Each note triggers: RCA → update skill files → regenerate from updated rules → validate. The human never touches the newsletter markdown directly. In the old system, fixes were applied to the output. In the new system, fixes are applied to the intelligence.

A 10-word note in Copilot Chat produces a systemic fix in under 2 minutes, including regeneration and validation.

Build Sprint: Feb 9-12 (56 commits)¶

The 454-line monolithic agent shrank to 81 lines (a pure orchestrator). Domain logic moved into 16 SKILL.md files following the Agent Skills specification. Intelligence was mined from 14 newsletters, 132 edit diffs, and 72 SOURCES.yaml entries.

Copilot features used: VS Code Agent Mode (primary), Copilot CLI (copilot -p), custom agents (.agent.md), custom instructions (AGENTS.md), prompt files (.prompt.md), Agent Skills (SKILL.md), GitHub Actions CI, /fleet for parallel sub-agent research.

Key decisions: Skills-first architecture (L42), file-driven configuration, disk-based phase gates (L2), deletion discipline, feed-forward learnings (L1-L69).

Part 2: Self-Learning Evolution¶

The Self-Healing Loop¶

Every bug becomes a permanent fix through a mechanical loop:

Bug found → RCA → Distill learning → Update skills → Regenerate → Verify → Record

69 learnings (L1-L69) and 56 hypotheses (43 confirmed, 1 partially confirmed, 1 falsified, 11 untested) across the sprint. The 10-step Operating Protocol in AGENTS.md governs every change.

How Intelligence Propagates¶

A single human correction follows this path: learning recorded (LEARNINGS.md) → format spec updated (content-format-spec.md) → quality check added (quality-checklist.md) → validation script updated (validate_newsletter.sh) → skill instructions updated (SKILL.md) → scoring catches violations → next generation never regresses.

The check_intelligence_sync.sh script verifies propagation across 7 surfaces. This is infrastructure, not convention.

Key Self-Healing Examples¶

L64 -- Agent fabricated release dates. VS Code v1.108 ("December 2025") released Jan 8 was excluded. The agent fabricated "released Nov 2025" and confirmed its own wrong answer 3 times. Fix: mechanical fetch of the VS Code updates index page at Phase 0, 1A, and 1B. Gate: ≥2 versions for 30+ day ranges.

L66 -- Weekly releases break version-centric output. 2 monthly versions produced a "latest + afterthought" pattern. With weekly cadence (4-5 versions), this would produce 5 afterthought bullets. Fix: diff-based extraction (read newest first, diff earlier for status changes). 3-layer rewrite across 12 files.

L68 -- Eight corrections become eight systemic rules. One editorial review produced: link labels never include status text, surface attribution enforced at assembly, every bullet gets docs links, revenue-positive items sort first, enterprise items in Enterprise & Security not Copilot at Scale, same-domain consolidation mandatory, commercial context on cost-relevant features, section naming reflects content scope. Pre: 44/50. Post: 49/50.

L69 -- Microsoft Reactor as systematic source. 28 GitHub-tagged Reactor events existed but the pipeline never scanned them. Added filtering rules (English, GitHub-integrated, enterprise-relevant). Events section grew from 5 to 13 virtual events.

Evolution Metrics¶

Metric	Value
Learnings	69 (L1-L69)
Hypotheses confirmed	43/44 tested (97.7%)
Score progression	31→50/50 (first run), 44→49/50 (Feb 2026)
Polishing patterns mined	13 from 132 diffs (34% eliminable)
Skills	8 initial → 16 final

Part 3: New Features vs Previous Newsletters¶

Feature	Previous Newsletters	Feb 2026 Generated
Video links with duration	None	Labels include duration (e.g., `Video (60m)`).
MS Learn courses	None	4 courses with episodes, durations, levels
Docs/implementation links	Changelog only	Changelog + VS Code Setup + Docs
Inline feature linking (7+ links)	Sub-bullet lists	Feature names ARE the links in prose
Per-model GA/PREVIEW labels	Prose grouping	Every model individually labeled
Cross-IDE alignment matrix	Sparse, Eclipse missing	All 4 IDEs, every feature labeled
Microsoft Reactor events	0-2 occasional	8+ filtered from 28 candidates
Resources section	None	7 items: GH-AW, Orchestra, prompt engineering, etc.
Commercial context	Missing	"no additional cost, existing terms apply"
Revenue-first ordering	Arbitrary	Revenue-positive items first per section
Same-domain consolidation	Duplicate bullets	Single bullets per topic area

Section structure evolution: Copilot → Scale → Security → Events became Lead → Copilot → IDE Parity → Copilot at Scale (links only) → Enterprise & Security → Resources → Events (with MS Learn) → Migration Notices.

Content density: 29 GA/PREVIEW labels (vs 18 in target), 67 links (vs 49), 34 table rows (vs 21), 20 unique domains (vs 11).

Part 4: Session Metrics¶

Metric	Value
Commits	56 across 4 days
VS Code tool calls	278 (34.5% terminal, 22.7% read_file, 18.3% todo, 6.5% edit)
Strategy	Action-dominant (0.53 action ratio)
Hypotheses requiring 0 rework	40/43 (93%)
Score improvement per rework	+4 points average
Learnings per commit	1.23
Fleet research report	4 parallel sub-agents, 7m 26s wall time (13m 30s API), ~3 PRUs

Appendix: System Inventory¶

16 Skills (plus session-log-manager): url-manifest, content-retrieval, content-consolidation, curator-notes, events-extraction, content-curation, newsletter-assembly, newsletter-polishing, video-matching, newsletter-validation, editorial-review, scope-contract, kb-maintenance, testing-prompt-changes, building-skill, session-log-manager

7 Scoring Tools: score-structural (30pt), score-heuristic (41pt), score-selection (25pt), score-v2-rubric (50pt), score-sync, score-automation, score-sprint

Key Reference Files: editorial-intelligence.md (7 sections), polishing-intelligence.md (13 patterns), curator-notes-intelligence.md, source-intelligence/ (5 per-source files + meta-analysis), fleet_best_practices.md

Key System Files: AGENTS.md (operating protocol), LEARNINGS.md (69 entries), HYPOTHESES.md (56 hypotheses), kb/SOURCES.yaml (72 sources), Makefile (33 targets)

Building a Self-Learning Newsletter System with GitHub Copilot¶