Insight Extractor Agent
Streaming analysis engine for pasted text and URLs, delivering thesis-led insights with direct quoted evidence.
OVERVIEW
A stateless, serverless insight extractor for fast analysis of long-form text and URL-backed sources. It returns a structured brief with a title, thesis, five named insights, grounded evidence quotes, and source attribution that can be reviewed directly from the output card.
ARCHITECTURE
The pipeline runs through a Next.js API route that validates payload size, enforces per-IP throttling, performs SSRF-safe URL fetching, applies a server-only extraction prompt, and streams model output via SSE. Final output is normalised into a strict schema, including boundary-safe evidence span handling, then rendered in typed React cards with copy/export support and source attribution controls.
FUNCTIONALITY
- Dual input paths: paste text directly or provide a source URL
- Client-side validation with min/max length guardrails
- Server-side structured extraction with title, thesis, five insights, quote, and closing
- Streaming status/token/done SSE protocol for responsive UX
- Structured output cards for scannable reading plus one-click markdown copy
- Source attribution summary with inline source-links popover and per-source outbound links
- Recent extraction history with quick recall and delete actions
- Per-IP throttling at 10 requests/minute with retry metadata
- Model fallback chain for improved availability under provider failures
- SSRF-safe URL fetching with allowlist validation and post-fetch content truncation
- Evidence-span normalisation with fallback quote injection and boundary-safe clipping to avoid abrupt cut-off quotes
- Automatic title synthesis from content when model omits or truncates the title field
- Hydration-safe form controls for browser-injected attribute mismatch resilience
HOW IT WORKS
After submission, the API validates input readiness for text and URL paths, fetches and sanitizes remote content when needed, then issues extraction prompts to OpenRouter with SSE streaming for progressive rendering. On completion, normalisation validates structure and evidence grounding, resolves evidence-chain metadata, and trims overlong quote spans at sentence or word boundaries so rendered evidence does not cut off mid-token. The client then renders the finalised brief with source attribution and copy-ready markdown.
OUTCOMES
- Converts unstructured long-form text into a concise, thesis-driven insight brief
- Improves perceived responsiveness with progressive token streaming
- Improves trust/readability with cleaner evidence quotes and direct source-link visibility
- Keeps spend predictable with request caps, input limits, and model fallback routing
- Maintains strict grounding by requiring direct textual evidence for all 5 extracted insights