GetLaunchpad
Back to blog

Adding vector search to Next.js with Pinecone and OpenAI

Vector search makes semantic search and AI-powered features possible in your SaaS. Here's how to ingest text, generate embeddings with OpenAI, store them in Pinecone, and query them from a Next.js API route.

Vector search makes semantic search and AI-powered features possible in your SaaS — letting users find results by meaning rather than exact keywords, powering retrieval-augmented generation (RAG), and enabling "more like this" recommendations. Here's how to set up the full pipeline: ingest text, generate embeddings with OpenAI, store them in Pinecone, and query them from a Next.js API route — using the same pattern deployed in GetLaunchpad.

What vector search is and when you need it

Traditional full-text search matches documents that contain the query's exact words (or stemmed variants). Vector search converts both the query and your documents into high-dimensional numeric vectors — embeddings — and finds documents whose vectors are closest to the query vector by cosine similarity. A query for "how do I cancel" can match a document titled "subscription termination" even though they share no words.

Use vector search when you need:

Install the packages

npm install @pinecone-database/pinecone openai

Add the required environment variables:

PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX=your_index_name
OPENAI_API_KEY=your_openai_api_key

Create the Pinecone client and index

// lib/pinecone.ts
import "server-only";
import { Pinecone } from "@pinecone-database/pinecone";
import OpenAI from "openai";

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

const INDEX_NAME = process.env.PINECONE_INDEX!;
// text-embedding-3-small produces 1536-dimensional vectors
const EMBEDDING_MODEL = "text-embedding-3-small";

Create the index in the Pinecone console before running any code. Set the dimension to 1536 (to match text-embedding-3-small) and the metric to cosine. The index name must match PINECONE_INDEX exactly.

The ingest pipeline

Ingestion has three steps: chunk your text, generate an embedding for each chunk, then upsert the vectors into Pinecone with metadata.

// lib/pinecone.ts (continued)
export async function ingest(
  id: string,
  text: string,
  metadata: Record<string, string>,
) {
  // 1. Generate embedding
  const embeddingResponse = await openai.embeddings.create({
    model: EMBEDDING_MODEL,
    input: text,
  });
  const vector = embeddingResponse.data[0].embedding;

  // 2. Upsert into Pinecone
  const index = pinecone.index(INDEX_NAME);
  await index.upsert([
    {
      id,
      values: vector,
      metadata: { text, ...metadata },
    },
  ]);
}

Store the original text in the metadata so you can return it alongside search results without a second database lookup. Pinecone metadata values must be strings, numbers, booleans, or arrays of strings.

The query pipeline

Querying mirrors ingestion: embed the user's query, then ask Pinecone for the top-k nearest vectors.

// lib/pinecone.ts (continued)
export async function query(text: string, topK = 5) {
  // 1. Embed the query
  const embeddingResponse = await openai.embeddings.create({
    model: EMBEDDING_MODEL,
    input: text,
  });
  const vector = embeddingResponse.data[0].embedding;

  // 2. Query Pinecone
  const index = pinecone.index(INDEX_NAME);
  const results = await index.query({
    vector,
    topK,
    includeMetadata: true,
  });

  return results.matches;
}

A complete Next.js API route example

// app/api/search/route.ts
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@clerk/nextjs/server";
import { query } from "@/lib/pinecone";

export async function POST(request: NextRequest) {
  const { userId } = await auth();
  if (!userId) {
    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
  }

  const { q } = await request.json();
  if (!q || typeof q !== "string") {
    return NextResponse.json({ error: "Missing query" }, { status: 400 });
  }

  const matches = await query(q, 5);

  const results = matches.map((m) => ({
    id: m.id,
    score: m.score,
    text: m.metadata?.text,
    // spread any other metadata fields you stored at ingest time
  }));

  return NextResponse.json({ results });
}

Gate the route behind Clerk auth so arbitrary callers cannot burn through your OpenAI and Pinecone quotas. Each search query costs one OpenAI embedding call — small, but worth protecting.

Metadata filtering

Pinecone supports filtering results by metadata at query time, which lets you scope a search to a specific user's data or a specific content type without fetching and discarding irrelevant results:

const results = await index.query({
  vector,
  topK: 5,
  includeMetadata: true,
  filter: {
    userId: { $eq: userId },       // only this user's documents
    contentType: { $eq: "note" },  // only notes, not other content types
  },
});

To use a metadata field as a filter, you must declare it as a filterable field in your Pinecone index configuration before ingesting. You cannot add filterable fields retroactively without re-ingesting your vectors.

Vector search vs full-text search

The two approaches solve different problems:

Many production SaaS products use both: full-text search for structured fields and vector search for body content. Run full-text first as a fast pre-filter, then re-rank with semantic similarity when result quality matters more than latency.


The complete Pinecone pipeline — client singleton, ingest(), query(), and a protected Next.js API route — is pre-wired in GetLaunchpad, a Next.js 16 SaaS boilerplate. Add semantic search to your product without building the plumbing from scratch.

More articles