Portfolio Chatbot
Portfolio Chatbot
OVERVIEW
A portfolio-native chatbot that answers questions about projects, skills, services, and blog content using retrieval-grounded context from a local site index and Pinecone-first semantic search in hybrid mode. Responses stream progressively in the UI and include source links back to relevant pages.
ARCHITECTURE
The feature uses a client chat widget and a server-side /api/chat route. Retrieval runs in hybrid mode: Pinecone acts as the primary vector backend, while local retrieval (MiniSearch lexical ranking plus in-memory semantic reranking) is the automatic fallback on timeout, low confidence, or provider failure. A sync script upserts site-index chunks to Pinecone and can be triggered automatically on push. The route then builds constrained context and streams model output through OpenRouter with primary and fallback models. It includes rate limiting, origin checks, and low-confidence/no-result fallbacks to avoid speculative answers.
FUNCTIONALITY
- Streaming chat UI with incremental token rendering
- Input guardrails with a default 500-character message cap (configurable via env)
- Conversation cap with a default 30-message session limit (configurable via env)
- Hybrid retrieval backend routing (Pinecone primary, local fallback)
- Local retrieval stack with MiniSearch lexical scoring and in-memory semantic reranking
- Retrieval grounding across project, blog, and core site sections
- Source citation links attached to responses
- Rate limiting on chat API requests per IP (Upstash Redis with in-memory fallback)
- Low-confidence and no-results fallbacks instead of speculative answers
- Primary plus fallback model routing through OpenRouter
- Retrieval backend and fallback-reason telemetry in chat logs
- Automated Pinecone index sync support via script and GitHub Actions
- Retryable stream error envelope for resilient client UX
HOW IT WORKS
User prompts are sent to /api/chat with recent conversation history. The route classifies topic scope, attempts retrieval through Pinecone in hybrid mode, and automatically falls back to local retrieval if Pinecone is unavailable, low-confidence, or misconfigured. Relevant chunks are injected into a constrained system prompt and sent to OpenRouter for streaming completion. Tokens are relayed to the UI in real time, then citation URLs and completion metadata are appended so visitors can verify answers. Separately, site-index chunks are vectorized and upserted to Pinecone through a sync command that can run on push.
OUTCOMES
- Gives visitors immediate, grounded answers without leaving the page.
- Reduces hallucination risk through retrieval-first context, guardrails, and fallback policy.
- Improves production resilience with dual retrieval backends, model fallback, rate limiting, and retry-aware client behavior.
- Keeps infrastructure lean while supporting growth through Pinecone-backed retrieval with automatic fallback to in-app retrieval.