Adding AI features to a Next.js SaaS doesn't require a separate backend or a complex architecture. The OpenAI API is a REST endpoint — you can call it from a Next.js API route, stream the response to the client, and have a working AI feature in under an hour.
This guide covers the full setup: installing the SDK, creating a streaming API route, building a chat UI that updates token-by-token, and the patterns you need to avoid exposing your API key or running up a surprise bill.
Install the OpenAI SDK
npm install openai
Set your API key in .env.local. Never prefix it with NEXT_PUBLIC_ — it must stay server-side only.
OPENAI_API_KEY=sk-...
Create a singleton client
Avoid re-initialising the client on every request. Create it once in a shared file:
// lib/openai.ts
import OpenAI from "openai";
export const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
Non-streaming API route
For short responses (classification, summaries under ~200 words) a regular await call is fine:
// app/api/ai/summarize/route.ts
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@clerk/nextjs/server";
import { openai } from "@/lib/openai";
export async function POST(req: NextRequest) {
const { userId } = await auth();
if (!userId) return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
const { text } = await req.json();
if (!text || typeof text !== "string") {
return NextResponse.json({ error: "text required" }, { status: 400 });
}
const completion = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "Summarize the following text in 2–3 sentences." },
{ role: "user", content: text },
],
max_tokens: 200,
});
const summary = completion.choices[0].message.content ?? "";
return NextResponse.json({ summary });
}
Streaming API route
For chat interfaces, streaming makes the experience feel instant. The OpenAI SDK supports streaming via stream: true, and Next.js can return a ReadableStream directly:
// app/api/ai/chat/route.ts
import { NextRequest } from "next/server";
import { auth } from "@clerk/nextjs/server";
import { openai } from "@/lib/openai";
export async function POST(req: NextRequest) {
const { userId } = await auth();
if (!userId) return new Response("Unauthorized", { status: 401 });
const { messages } = await req.json();
const stream = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages,
stream: true,
});
// Convert the OpenAI stream to a ReadableStream
const readableStream = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? "";
if (delta) {
controller.enqueue(new TextEncoder().encode(delta));
}
}
controller.close();
},
});
return new Response(readableStream, {
headers: { "Content-Type": "text/plain; charset=utf-8" },
});
}
Streaming chat UI
On the client, read the stream with a ReadableStreamDefaultReader:
"use client";
import { useState, useRef } from "react";
export default function ChatPage() {
const [messages, setMessages] = useState<{ role: string; content: string }[]>([]);
const [input, setInput] = useState("");
const [streaming, setStreaming] = useState(false);
async function sendMessage() {
if (!input.trim() || streaming) return;
const userMessage = { role: "user", content: input };
const history = [...messages, userMessage];
setMessages([...history, { role: "assistant", content: "" }]);
setInput("");
setStreaming(true);
const res = await fetch("/api/ai/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages: history }),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let assistantText = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
assistantText += decoder.decode(value, { stream: true });
setMessages((prev) => {
const updated = [...prev];
updated[updated.length - 1] = { role: "assistant", content: assistantText };
return updated;
});
}
setStreaming(false);
}
return (
<div className="flex flex-col gap-4 p-4">
{messages.map((m, i) => (
<div key={i} className={m.role === "user" ? "text-right" : "text-left"}>
<span className="inline-block rounded-lg bg-zinc-100 px-3 py-2 text-sm dark:bg-zinc-800">
{m.content}
</span>
</div>
))}
<div className="flex gap-2">
<input
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={(e) => e.key === "Enter" && sendMessage()}
className="flex-1 rounded border px-3 py-2 text-sm"
placeholder="Ask something…"
/>
<button
onClick={sendMessage}
disabled={streaming}
className="rounded bg-zinc-900 px-4 py-2 text-sm text-white disabled:opacity-50"
>
Send
</button>
</div>
</div>
);
}
Rate limiting AI endpoints
OpenAI charges per token, so rate limiting AI routes is critical. Use Upstash with a tighter limit than your other endpoints:
import { checkRateLimit } from "@/lib/upstash";
const { success } = await checkRateLimit(`ai:${userId}`, 5, "1 m");
if (!success) return NextResponse.json({ error: "Too many requests" }, { status: 429 });
5 requests per minute per user is generous for a chat feature and cheap enough to protect against accidental loops or abuse.
Choosing the right model
For most SaaS features, gpt-4o-miniis the right default. It costs roughly 15× less than gpt-4o, handles 128k context, and is fast enough to feel instant when streaming. Use gpt-4o only when accuracy genuinely matters more than latency or cost.
For embeddings (vector search), use text-embedding-3-small. It produces 1536-dimensional vectors and costs a tiny fraction of what the legacy ada-002 model does.
Storing conversation history
OpenAI is stateless — you must pass the full conversation on every request. For short sessions (under ~20 messages), keep history in React state. For persistent history, store messages in Supabase and load them server-side before rendering the chat page.
// supabase: messages table
create table messages (
id uuid primary key default gen_random_uuid(),
user_id uuid references users(id) on delete cascade,
role text not null, -- 'user' | 'assistant'
content text not null,
created_at timestamptz default now()
);
alter table messages enable row level security;
Error handling and fallbacks
OpenAI has occasional downtime and rate limits at the API level. Always wrap calls in try/catch and return a user-friendly message rather than letting the error bubble up:
try {
const completion = await openai.chat.completions.create({ ... });
} catch (err) {
console.error("[ai/chat] OpenAI error:", err);
return NextResponse.json(
{ error: "AI unavailable. Please try again in a moment." },
{ status: 503 }
);
}
Key patterns to remember
- Never expose your API key client-side. All OpenAI calls must go through a server-side API route.
- Set max_tokens. Without a limit, a misbehaving prompt can generate thousands of tokens and cost you real money.
- Always rate-limit AI routes. OpenAI charges per token — without limits, a single user can run up a large bill in minutes.
- Use gpt-4o-mini for most things.It's 15× cheaper than gpt-4o and fast enough for real-time streaming.
- Stream long responses. A 500-token response takes ~5 seconds non-streamed. With streaming, the user sees content immediately.
What's already set up in GetLaunchpad
GetLaunchpad includes the OpenAI client singleton, Pinecone vector search with the text-embedding-3-small pipeline, and Upstash rate limiting — all pre-configured and production-ready. You can add a chat feature to your SaaS by wiring your API route to the existing lib/openai.ts client and applying the rate limiter from lib/upstash.ts.