Portfolio Chatbot

Next.jsTypeScriptPineconeOpenRouterStreaming SSEHybrid Retrieval

Portfolio Chatbot

Source-backed answers from this portfolio.

Ready

Model

Session usage: 0/30 messages

Session is local to this tab and clears on refresh or reset.

OVERVIEW

A portfolio-native chatbot that answers questions about projects, skills, services, and blog content using retrieval-grounded context from a local site index and Pinecone-first semantic search in hybrid mode. Responses stream progressively in the UI and include source links back to relevant pages.

ARCHITECTURE

The feature uses a client chat widget and a server-side /api/chat route. Retrieval runs in hybrid mode: Pinecone acts as the primary vector backend, while local retrieval (MiniSearch lexical ranking plus in-memory semantic reranking) is the automatic fallback on timeout, low confidence, or provider failure. A sync script upserts site-index chunks to Pinecone and can be triggered automatically on push. The route then builds constrained context and streams model output through OpenRouter with primary and fallback models. It includes rate limiting, origin checks, and low-confidence/no-result fallbacks to avoid speculative answers.

FUNCTIONALITY

Streaming chat UI with incremental token rendering
Input guardrails with a default 500-character message cap (configurable via env)
Conversation cap with a default 30-message session limit (configurable via env)
Hybrid retrieval backend routing (Pinecone primary, local fallback)
Local retrieval stack with MiniSearch lexical scoring and in-memory semantic reranking
Retrieval grounding across project, blog, and core site sections
Source citation links attached to responses
Rate limiting on chat API requests per IP (Upstash Redis with in-memory fallback)
Low-confidence and no-results fallbacks instead of speculative answers
Primary plus fallback model routing through OpenRouter
Retrieval backend and fallback-reason telemetry in chat logs
Automated Pinecone index sync support via script and GitHub Actions
Retryable stream error envelope for resilient client UX

HOW IT WORKS

User prompts are sent to /api/chat with recent conversation history. The route classifies topic scope, attempts retrieval through Pinecone in hybrid mode, and automatically falls back to local retrieval if Pinecone is unavailable, low-confidence, or misconfigured. Relevant chunks are injected into a constrained system prompt and sent to OpenRouter for streaming completion. Tokens are relayed to the UI in real time, then citation URLs and completion metadata are appended so visitors can verify answers. Separately, site-index chunks are vectorized and upserted to Pinecone through a sync command that can run on push.

OUTCOMES

Gives visitors immediate, grounded answers without leaving the page.
Reduces hallucination risk through retrieval-first context, guardrails, and fallback policy.
Improves production resilience with dual retrieval backends, model fallback, rate limiting, and retry-aware client behavior.
Keeps infrastructure lean while supporting growth through Pinecone-backed retrieval with automatic fallback to in-app retrieval.