Adding OpenAI to your Next.js app: streaming, rate limiting, and chat UI

Adding AI features to a Next.js SaaS doesn't require a separate backend or a complex architecture. The OpenAI API is a REST endpoint — you can call it from a Next.js API route, stream the response to the client, and have a working AI feature in under an hour.

This guide covers the full setup: installing the SDK, creating a streaming API route, building a chat UI that updates token-by-token, and the patterns you need to avoid exposing your API key or running up a surprise bill.

Install the OpenAI SDK

npm install openai

Set your API key in .env.local. Never prefix it with NEXT_PUBLIC_ — it must stay server-side only.

OPENAI_API_KEY=sk-...

Create a singleton client

Avoid re-initialising the client on every request. Create it once in a shared file:

// lib/openai.ts
import OpenAI from "openai";

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

Non-streaming API route

For short responses (classification, summaries under ~200 words) a regular await call is fine:

// app/api/ai/summarize/route.ts
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@clerk/nextjs/server";
import { openai } from "@/lib/openai";

export async function POST(req: NextRequest) {
  const { userId } = await auth();
  if (!userId) return NextResponse.json({ error: "Unauthorized" }, { status: 401 });

  const { text } = await req.json();
  if (!text || typeof text !== "string") {
    return NextResponse.json({ error: "text required" }, { status: 400 });
  }

  const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "Summarize the following text in 2–3 sentences." },
      { role: "user", content: text },
    ],
    max_tokens: 200,
  });

  const summary = completion.choices[0].message.content ?? "";
  return NextResponse.json({ summary });
}

Streaming API route

For chat interfaces, streaming makes the experience feel instant. The OpenAI SDK supports streaming via stream: true, and Next.js can return a ReadableStream directly:

// app/api/ai/chat/route.ts
import { NextRequest } from "next/server";
import { auth } from "@clerk/nextjs/server";
import { openai } from "@/lib/openai";

export async function POST(req: NextRequest) {
  const { userId } = await auth();
  if (!userId) return new Response("Unauthorized", { status: 401 });

  const { messages } = await req.json();

  const stream = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages,
    stream: true,
  });

  // Convert the OpenAI stream to a ReadableStream
  const readableStream = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const delta = chunk.choices[0]?.delta?.content ?? "";
        if (delta) {
          controller.enqueue(new TextEncoder().encode(delta));
        }
      }
      controller.close();
    },
  });

  return new Response(readableStream, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

Streaming chat UI

On the client, read the stream with a ReadableStreamDefaultReader:

"use client";

import { useState, useRef } from "react";

export default function ChatPage() {
  const [messages, setMessages] = useState<{ role: string; content: string }[]>([]);
  const [input, setInput] = useState("");
  const [streaming, setStreaming] = useState(false);

  async function sendMessage() {
    if (!input.trim() || streaming) return;

    const userMessage = { role: "user", content: input };
    const history = [...messages, userMessage];
    setMessages([...history, { role: "assistant", content: "" }]);
    setInput("");
    setStreaming(true);

    const res = await fetch("/api/ai/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ messages: history }),
    });

    const reader = res.body!.getReader();
    const decoder = new TextDecoder();
    let assistantText = "";

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      assistantText += decoder.decode(value, { stream: true });
      setMessages((prev) => {
        const updated = [...prev];
        updated[updated.length - 1] = { role: "assistant", content: assistantText };
        return updated;
      });
    }

    setStreaming(false);
  }

  return (
    <div className="flex flex-col gap-4 p-4">
      {messages.map((m, i) => (
        <div key={i} className={m.role === "user" ? "text-right" : "text-left"}>
          <span className="inline-block rounded-lg bg-zinc-100 px-3 py-2 text-sm dark:bg-zinc-800">
            {m.content}
          </span>
        </div>
      ))}
      <div className="flex gap-2">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyDown={(e) => e.key === "Enter" && sendMessage()}
          className="flex-1 rounded border px-3 py-2 text-sm"
          placeholder="Ask something…"
        />
        <button
          onClick={sendMessage}
          disabled={streaming}
          className="rounded bg-zinc-900 px-4 py-2 text-sm text-white disabled:opacity-50"
        >
          Send
        </button>
      </div>
    </div>
  );
}

Rate limiting AI endpoints

OpenAI charges per token, so rate limiting AI routes is critical. Use Upstash with a tighter limit than your other endpoints:

import { checkRateLimit } from "@/lib/upstash";

const { success } = await checkRateLimit(`ai:${userId}`, 5, "1 m");
if (!success) return NextResponse.json({ error: "Too many requests" }, { status: 429 });

5 requests per minute per user is generous for a chat feature and cheap enough to protect against accidental loops or abuse.

Choosing the right model

For most SaaS features, gpt-4o-miniis the right default. It costs roughly 15× less than gpt-4o, handles 128k context, and is fast enough to feel instant when streaming. Use gpt-4o only when accuracy genuinely matters more than latency or cost.

For embeddings (vector search), use text-embedding-3-small. It produces 1536-dimensional vectors and costs a tiny fraction of what the legacy ada-002 model does.

Storing conversation history

OpenAI is stateless — you must pass the full conversation on every request. For short sessions (under ~20 messages), keep history in React state. For persistent history, store messages in Supabase and load them server-side before rendering the chat page.

// supabase: messages table
create table messages (
  id uuid primary key default gen_random_uuid(),
  user_id uuid references users(id) on delete cascade,
  role text not null,  -- 'user' | 'assistant'
  content text not null,
  created_at timestamptz default now()
);
alter table messages enable row level security;

Error handling and fallbacks

OpenAI has occasional downtime and rate limits at the API level. Always wrap calls in try/catch and return a user-friendly message rather than letting the error bubble up:

try {
  const completion = await openai.chat.completions.create({ ... });
} catch (err) {
  console.error("[ai/chat] OpenAI error:", err);
  return NextResponse.json(
    { error: "AI unavailable. Please try again in a moment." },
    { status: 503 }
  );
}

Key patterns to remember

Never expose your API key client-side. All OpenAI calls must go through a server-side API route.
Set max_tokens. Without a limit, a misbehaving prompt can generate thousands of tokens and cost you real money.
Always rate-limit AI routes. OpenAI charges per token — without limits, a single user can run up a large bill in minutes.
Use gpt-4o-mini for most things.It's 15× cheaper than gpt-4o and fast enough for real-time streaming.
Stream long responses. A 500-token response takes ~5 seconds non-streamed. With streaming, the user sees content immediately.

What's already set up in GetLaunchpad

GetLaunchpad includes the OpenAI client singleton, Pinecone vector search with the text-embedding-3-small pipeline, and Upstash rate limiting — all pre-configured and production-ready. You can add a chat feature to your SaaS by wiring your API route to the existing lib/openai.ts client and applying the rate limiter from lib/upstash.ts.