Adding vector search to Next.js with Pinecone and OpenAI

Vector search makes semantic search and AI-powered features possible in your SaaS — letting users find results by meaning rather than exact keywords, powering retrieval-augmented generation (RAG), and enabling "more like this" recommendations. Here's how to set up the full pipeline: ingest text, generate embeddings with OpenAI, store them in Pinecone, and query them from a Next.js API route — using the same pattern deployed in GetLaunchpad.

What vector search is and when you need it

Traditional full-text search matches documents that contain the query's exact words (or stemmed variants). Vector search converts both the query and your documents into high-dimensional numeric vectors — embeddings — and finds documents whose vectors are closest to the query vector by cosine similarity. A query for "how do I cancel" can match a document titled "subscription termination" even though they share no words.

Use vector search when you need:

Semantic search. Let users search a knowledge base, docs, or product catalog by intent rather than exact phrase.
RAG (Retrieval-Augmented Generation). Fetch the most relevant chunks of your content before sending them to an LLM, so the model answers from your data rather than hallucinating.
Recommendations.Given a piece of content the user is viewing, find the most semantically similar items to surface as "related" suggestions.

Install the packages

npm install @pinecone-database/pinecone openai

Add the required environment variables:

PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX=your_index_name
OPENAI_API_KEY=your_openai_api_key

Create the Pinecone client and index

// lib/pinecone.ts
import "server-only";
import { Pinecone } from "@pinecone-database/pinecone";
import OpenAI from "openai";

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

const INDEX_NAME = process.env.PINECONE_INDEX!;
// text-embedding-3-small produces 1536-dimensional vectors
const EMBEDDING_MODEL = "text-embedding-3-small";

Create the index in the Pinecone console before running any code. Set the dimension to 1536 (to match text-embedding-3-small) and the metric to cosine. The index name must match PINECONE_INDEX exactly.

The ingest pipeline

Ingestion has three steps: chunk your text, generate an embedding for each chunk, then upsert the vectors into Pinecone with metadata.

// lib/pinecone.ts (continued)
export async function ingest(
  id: string,
  text: string,
  metadata: Record<string, string>,
) {
  // 1. Generate embedding
  const embeddingResponse = await openai.embeddings.create({
    model: EMBEDDING_MODEL,
    input: text,
  });
  const vector = embeddingResponse.data[0].embedding;

  // 2. Upsert into Pinecone
  const index = pinecone.index(INDEX_NAME);
  await index.upsert([
    {
      id,
      values: vector,
      metadata: { text, ...metadata },
    },
  ]);
}

Store the original text in the metadata so you can return it alongside search results without a second database lookup. Pinecone metadata values must be strings, numbers, booleans, or arrays of strings.

The query pipeline

Querying mirrors ingestion: embed the user's query, then ask Pinecone for the top-k nearest vectors.

// lib/pinecone.ts (continued)
export async function query(text: string, topK = 5) {
  // 1. Embed the query
  const embeddingResponse = await openai.embeddings.create({
    model: EMBEDDING_MODEL,
    input: text,
  });
  const vector = embeddingResponse.data[0].embedding;

  // 2. Query Pinecone
  const index = pinecone.index(INDEX_NAME);
  const results = await index.query({
    vector,
    topK,
    includeMetadata: true,
  });

  return results.matches;
}

A complete Next.js API route example

// app/api/search/route.ts
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@clerk/nextjs/server";
import { query } from "@/lib/pinecone";

export async function POST(request: NextRequest) {
  const { userId } = await auth();
  if (!userId) {
    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
  }

  const { q } = await request.json();
  if (!q || typeof q !== "string") {
    return NextResponse.json({ error: "Missing query" }, { status: 400 });
  }

  const matches = await query(q, 5);

  const results = matches.map((m) => ({
    id: m.id,
    score: m.score,
    text: m.metadata?.text,
    // spread any other metadata fields you stored at ingest time
  }));

  return NextResponse.json({ results });
}

Gate the route behind Clerk auth so arbitrary callers cannot burn through your OpenAI and Pinecone quotas. Each search query costs one OpenAI embedding call — small, but worth protecting.

Metadata filtering

Pinecone supports filtering results by metadata at query time, which lets you scope a search to a specific user's data or a specific content type without fetching and discarding irrelevant results:

const results = await index.query({
  vector,
  topK: 5,
  includeMetadata: true,
  filter: {
    userId: { $eq: userId },       // only this user's documents
    contentType: { $eq: "note" },  // only notes, not other content types
  },
});

To use a metadata field as a filter, you must declare it as a filterable field in your Pinecone index configuration before ingesting. You cannot add filterable fields retroactively without re-ingesting your vectors.

Vector search vs full-text search

The two approaches solve different problems:

Full-text search (Postgres tsvector, Elasticsearch, Typesense) is fast, exact, and cheap. Use it for searching by product name, SKU, username, or any field where the user expects literal matching. Supabase has built-in full-text search with to_tsvector and to_tsquery that requires zero additional infrastructure.
Vector search shines for natural language queries, long-form content, and any scenario where two different strings can mean the same thing. The tradeoff is cost (every ingest and query call hits OpenAI) and infrastructure (Pinecone is a separate managed service).

Many production SaaS products use both: full-text search for structured fields and vector search for body content. Run full-text first as a fast pre-filter, then re-rank with semantic similarity when result quality matters more than latency.

The complete Pinecone pipeline — client singleton, ingest(), query(), and a protected Next.js API route — is pre-wired in GetLaunchpad, a Next.js 16 SaaS boilerplate. Add semantic search to your product without building the plumbing from scratch.