GPT API Unofficial Docs

This is a work in progress. If you want to contribute, here is the GitHub repo.

This site offers an alternative take on OpenAI's Chat Completion API reference and the official GPT guide.

In its most basic form, the Chat Completion API receives a system message (the prompt) and a user message, and then returns an assistant message responding to the user.


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "Hello world" },
],

For this guide, we are using the OpenAI node library, but the focus of this guide will be the request and response format, which is the same for the python library and the HTTP API.

Request and Response

The goal of this guide is to help you understand all the fields in the request and response. There are a lot of fields. Most are optional, some are required. Some only make sense in combination with others.


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "Hello world" },
],
functions: undefined,
function_call: undefined,
user: undefined,
stream: false,
temperature: 1,
top_p: 1,
n: 1,
stop: undefined,
max_tokens: Math.infinity,
presence_penalty: 0,
frequency_penalty: 0,
logit_bias: undefined,
})
// result:
console.log(completion.data, {
id: "chatcmpl-123",
object: "chat.completion",
created: 1677652288,
model: "gpt-3.5-turbo-0613",
choices: [
{
index: 0,
message: {
role: "assistant",
content: "Hello! How may I assist you today?",
},
finish_reason: "stop",
},
],

From the request, the model and messages fields are the most important (and also the ones that are required).

From the response, the choices field is the most important, that's where you'll find the assistant's answer to the last user message.

If you are interested in a particular field, you can click on it to jump to the section that describes it.

API Key


apiKey: process.env.OPENAI_API_KEY,

First you'll need the OPENAI_API_KEY. That means you need to sign up if you haven't already, and then visit your API keys page to get a key.

Remember that your API key is a secret! Do not share it with others or expose it in any client-side code (browsers, apps). Production requests must be routed through your own backend server where your API key can be securely loaded from an environment variable or key management service.

Model


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "Hello world" },
],
functions: undefined,
function_call: undefined,
user: undefined,
stream: false,
temperature: 1,
top_p: 1,
n: 1,
stop: undefined,
max_tokens: Math.infinity,
presence_penalty: 0,
frequency_penalty: 0,
logit_bias: undefined,
model: "gpt-3.5-turbo-0613",

The first field from the request is the model. It's a string with the name of the model to use. If you don't care about the model right now, start with gpt-3.5-turbo-0613.

To understand the differences between models, let's deconstruct a model name:

  • gpt-3.5-turbo-16k-0613

The first part is the model architecture. It could be gpt-3.5-turbo or gpt-4. gpt-3.5-turbo is cheaper and faster. gpt-4 is more powerful and it's not available by default, you need to ask for access to use it.

  • gpt-3.5-turbo-16k-0613

The next part is the context length in tokens. Tokens can be thought of as pieces of words (learn more). A rule of thumb is 100 tokens is about 75 words in english. 16k is the total amount of tokens that the model support per request, not only in the input (messages and functions), it also includes the output (message).

The four options are:

modelcontext
gpt-3.5-turbo4,096 tokens
gpt-3.5-turbo-16k16,384 tokens
gpt-48,192 tokens
gpt-4-32k32,768 tokens

Models with larger context are more expensive.

  • gpt-3.5-turbo-16k-0613

The last part of the model name is the snapshot date.

For example, we have:

  • gpt-3.5-turbo
  • gpt-3.5-turbo-0301
  • gpt-3.5-turbo-0613

Until June 27th, gpt-3.5-turbo will be the same as gpt-3.5-turbo-0301. After June 27th, gpt-3.5-turbo will point to gpt-3.5-turbo-0613. In September 13th, gpt-3.5-turbo-0301 will be depreacated.

The same upgrade and deprecation cycle will happen to the gpt-3.5-turbo-0613 snapshot eventually. You can read more about continuous model upgrades and model deprecation.

In my opinion, it's better to use the snapshot date in the model name, so you can control when to upgrade to a new snapshot.

These are the latest snapshots available today:

  • gpt-3.5-turbo-0613
  • gpt-3.5-turbo-16k-0613
  • gpt-4-0613
  • gpt-4-32k-0613

Messages


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
{ role: "system", content: "You are a helpful assistant" },

This is the real body of your request.

To understand messages, let's see the possible role values:

  • "system" is the role for the first message, the prompt. It's the only message with this role. It's used by the developer to set the style, tone, format, etc. of the response.
  • "user" these are the messages from the end user. Usually, after the system message, the rest of the messages alternate between user and assistant, ending with a user message.
  • "assistant" these are the messages from the model. But you can also use them to give the model examples of desired behavior.
  • "function" we'll talk about this later.

Assistant

The model doesn't have memory, so after you receive the response, you should append the assistant message to the messages array before sending the next user message.


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "Hello world" },
{ role: "assistant", content: "How may I assist you today?" },
{ role: "user", content: "Give me a fact about Messi" },
],
})
// result:
console.log(completion.data, {
id: "chatcmpl-123",
object: "chat.completion",
created: 1677652288,

The other usage of the assistant message is to give the model examples of the output you want. You provide a couple of user messages and the corresponding assistant messages, before sending the real user message.


messages: [
{ role: "system", content: "You are a comedian" },
{ role: "user", content: "A Forrest Gump pun" },
{
role: "assistant",
content: "What's Forrest Gump's password? 1forrest1",
},
{ role: "user", content: "A calendar pun" },
{
role: "assistant",
content: "Can February March? No, but April May",
},
{ role: "user", content: "A pun about AI" },
]

Functions

GPT wasn't very good at generating data in a predefined format. So the latest models come with an new (optional) input called functions.

You can pass a list of functions, each with a name, description, and parameters (as a JSON Schema object).


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo-0613",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "It's hot in Valencia" },
],
functions: [
{
name: "getCityWeather",
city: { type: "string", description: "The city" },

In the example, we are passing the schema of a getCityWeather function. The typescript signature of the function would be something like:


function getCityWeather(params: {
city: string,
unit?: "C" | "F",
});

Function Call

The model is trained to detect when to reply with content as usual and when to reply with a function call.

When it chooses to reply with a function call, the response will have a message from the assistant with the function_call property instead of the usual content. (Also, the finish_reason should be "function_call", but in my tests it sometimes comes as "stop".)


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo-0613",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "It's hot in Valencia" },
],
functions: [
{
name: "getCityWeather",
description: "Get the weather in a given city",
arguments: '{ "city": "Valencia" }',

The function_call in the response message is an object with two properties:

  • the name of the function that the model decided to call
  • the arguments for that function, which is a stringified JSON, so you need to JSON.parse it before using it (be aware that the JSON could be invalid or may not adhere to the schema)

If you want to force the model to call a particular function, you can pass the name of the function to the function_call field in the request (not to be confused with the function_call field in the response message). You need to pass it inside an object like this:


function_call: { name: "getCityWeather" }

The other two options for the function_call field are:

  • function_call: "auto" which is the default behavior, the model will decide if it should call a function or not
  • function_call: "none" which forces the model to reply with content as usual

Function Role

Once you have the function_call provided by the model, you can run the function (be careful here, this may be dangerous depending on the function you are calling) and append the result to the messages array (after the assistant message).

You need to set the role set to function, the name set to the function name, and the content set to the result of the function.


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo-0613",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "It's hot in Valencia" },
{
role: "assistant",
content: null,
function_call: {
name: "getCityWeather",
arguments: '{ "city": "Valencia" }',
},
},
{
role: "function",
name: "getCityWeather",
content: "33 C",
},
content: "It's 33°C in Valencia. Stay hydrated!",

Now that the model has seen the result of the function, it will be able to use it to generate the next response.

Structured Data

functions don't have to be real functions, you can use them to turn any text into structured data.


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo-0613",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "Messi leaving PSG" },
],
functions: [
{
name: "showNamedEntities",
description: "Show the named entities",
parameters: {
type: "object",
properties: {
entities: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string" },

For example, if you want to extract the named entities from a text, you can make up a showNamedEntities function:


function showNamedEntities(params: {
entities: { name: string; type: string }[]
});

Then if you pass a message like { role: "user", content: "Messi leaving PSG" }, the model will reply with a call to the showNamedEntities function:


{
role: "assistant",
content: null,
function_call: {
name: "showNamedEntities",
arguments: `{
entities: [
{ name: "Messi", type: "PERSON" },
{ name: "PSG", type: "ORG" },
],
}`,
},
}

And now you can parse and use that data in your app without telling the model that there is no real function called showNamedEntities.

User


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
{ role: "user", content: "Hello world" },

You can use the user field to pass an ID of the end-user that authored the message.

If OpenAI detects any abuse in your requests they'll send you the problematic user as part of the notice. As far as I know, that's the only use for this field.

Stream

With stream: true we can start showing the response while it's still being generated, without having to wait for the whole response.

OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack (this example is using Node.js). But the idea is the same, you receive a stream of chunks.


const { Configuration, OpenAIApi } = require("openai")
const { Readable } = require("stream")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
const stream = Readable.from(completion.data)

Chunks are strings that starts with data: followed by an object. The first chunk looks like this: 'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'

Here you can see all the chunks from the example formatted for readability:


data: {
"id": "chatcmpl-xxxx",
"object": "chat.completion.chunk",
"created": 1688198627,
"model": "gpt-3.5-turbo-0613",
"choices":[{
"index": 0,
"delta": {"role": "assistant", "content": ""},
"finish_reason": null
}]
}


const { Configuration, OpenAIApi } = require("openai")
const { Readable } = require("stream")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
const stream = Readable.from(completion.data)

After that you'll receive one last chunk with the string "data: [DONE]".

One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself. It's a shame that OpenAI doesn't include it in the last chunk.

Sampling


const { Configuration, OpenAIApi } = require("openai")
temperature: 1,

To control the randomness of the model's output you can use either the temperature or top_p parameters.

temperature accept values between 0 and 2. The default is 1, higher values means more random answers. Lower values means more determinism. Use 0 if you want always the same answers for the same input, but be aware that there's still chance for randomness.

top_p accept values between 0 and 1. Same as before, lower values means less variation. The default is 1.

OpenAI says it's not recommended to use both parameters at the same time. No idea why.

Choices


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "Give me one name" },
{ role: "user", content: "Futbol GOAT" },
],
temperature: 1.2,
n: 2,
})
message: { role: "assistant", content: "Messi" },

You can use the n parameter to get more than one response.

Each response will be a different object inside the choices array.

Note that the content of each choice may be the same, specially for short answer or if you're using a low temperature.

The index field may seem useless, but it's actually useful when using n together with streaming. Each chunk will include content for one of the choices, and the index field will tell you which one.

Stop


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "List names" },
{ role: "user", content: "Best football players" },
],
stop: ["5."],
})
// result:
console.log(completion.data, {
choices: [
{

stop is a list of strings (case-sensitive) that will tell the model to stop generating text when it finds one of them. You can provide up to 4 stop sequences.

The stop sequences aren't included in the response.

Providing a stop list won't make the answer longer, the model will stop naturally if it doesn't find any of the stop sequences.

Max Tokens


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "List names" },
{ role: "user", content: "Best football players" },
],
max_tokens: 5,
})
// result:
console.log(completion.data, {
choices: [
{
index: 0,
message: {

Another way to limit the length of the response is to use the max_tokens parameter.

The model will start generating text as usual, but it will count the output tokens and stop when it reaches the given limit. Only output tokens are counted towards the limit.

In the example, the five tokens are:


["1", ".", " Lionel", " Messi", "\n"]

You can use the finish_reason field to know if the model stopped naturally (finish_reason: "stop") or because of the token limit (finish_reason: "length").

Penalties

To avoid the model from repeating itself too much you can use the presence_penalty and frequency_penalty parameters.

Both are numbers between -2.0 and 2.0 and default to 0.


const { Configuration, OpenAIApi } = require("openai")
frequency_penalty: 0,

A positive presence_penalty makes it less likely for the model to repeat a token that has already been generated. The higher the penalty, the less likely a token will be repeated.

frequency_penalty is similar, but the penalty increase every time the token is generated.

If I had to choose, I'd pick presence_penalty. frequency_penalty might penalize too much tokens that repeat often, such as punctuation and articles.

Logit Bias

You can penalize or encourage specific tokens by using the logit_bias parameter.


logit_bias: undefined,

logit_bias is an object where the keys are token ids and values are the bias. The bias is a number between -100 and 100. Like this:


logit_bias: {
65515: -5,
71105: 10
}

A bias closer to -100 will make the model avoid the token, while a value closer to 100 will make the model more likely to use the token (well, if you use 100 the model will probably repeat that token until the token limit is reached).

Since the keys are ids, you'll need a tool to get the token id from a string.

Response


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-3.5-turbo-0613",

In the response, in addition to the choices array, you'll get some extra fields:

  • id: The unique id OpenAI gives to the response.
  • object: Always "chat.completion".
  • created: The timestamp when the response was created. In case of streaming, this will stay the same for all chunks.
  • model: The model used to generate the response. This is the full name of the model including the snapshot. For example if you used "gpt-3.5-turbo" in your request, the response will be model: "gpt-3.5-turbo-0613".

Finish Reason


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
finish_reason: "stop",

Every response will include a finish_reason. This field will tell you why the model stopped generating text. The possible reasons are:

  • "stop": The model returned a complete response, or it was stopped by a stop sequence.
  • "length": Reached the token limit set by max_tokens or by the context limit of the model.
  • "function_call": The model decided to call a function.
  • null: The response is still in progress.

Usage


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
completion_tokens: 9,

The usage field has information about the number of tokens used by the request (prompt_tokens) and the response (completion_tokens). This is important because those are the two numbers that will be used to calculate the cost of the request.

prompt_tokens includes the tokens from all the content passed in the messages and functions fields.

completion_tokens includes the tokens from all the choices in the response.

Current prices per 1,000 tokens:

ModelPromptCompletion
gpt-3.5-turbo$0.0015$0.002
gpt-3.5-turbo-16k$0.003$0.004
gpt-4$0.03$0.06
gpt-4-32k$0.06$0.12

Let's say we have this usage in a GPT-4 request:


{
"model": "gpt-4-0613",
"usage": {
"prompt_tokens": 100,
"completion_tokens": 150
}
}

The cost of the request is $0.012:


// prompt cost:
100 * 0.03 / 1000 +
// completion cost:
150 * 0.06 / 1000
// total cost:
= 0.012

Errors

// TODO

https://platform.openai.com/docs/guides/error-codes/api-errors

GPT API Unofficial Docs

This is a work in progress. If you want to contribute, here is the GitHub repo.

This site offers an alternative take on OpenAI's Chat Completion API reference and the official GPT guide.

In its most basic form, the Chat Completion API receives a system message (the prompt) and a user message, and then returns an assistant message responding to the user.

For this guide, we are using the OpenAI node library, but the focus of this guide will be the request and response format, which is the same for the python library and the HTTP API.

Request and Response

The goal of this guide is to help you understand all the fields in the request and response. There are a lot of fields. Most are optional, some are required. Some only make sense in combination with others.

From the request, the model and messages fields are the most important (and also the ones that are required).

From the response, the choices field is the most important, that's where you'll find the assistant's answer to the last user message.

If you are interested in a particular field, you can click on it to jump to the section that describes it.

API Key

First you'll need the OPENAI_API_KEY. That means you need to sign up if you haven't already, and then visit your API keys page to get a key.

Remember that your API key is a secret! Do not share it with others or expose it in any client-side code (browsers, apps). Production requests must be routed through your own backend server where your API key can be securely loaded from an environment variable or key management service.

Model

The first field from the request is the model. It's a string with the name of the model to use. If you don't care about the model right now, start with gpt-3.5-turbo-0613.

To understand the differences between models, let's deconstruct a model name:

  • gpt-3.5-turbo-16k-0613

The first part is the model architecture. It could be gpt-3.5-turbo or gpt-4. gpt-3.5-turbo is cheaper and faster. gpt-4 is more powerful and it's not available by default, you need to ask for access to use it.

  • gpt-3.5-turbo-16k-0613

The next part is the context length in tokens. Tokens can be thought of as pieces of words (learn more). A rule of thumb is 100 tokens is about 75 words in english. 16k is the total amount of tokens that the model support per request, not only in the input (messages and functions), it also includes the output (message).

The four options are:

modelcontext
gpt-3.5-turbo4,096 tokens
gpt-3.5-turbo-16k16,384 tokens
gpt-48,192 tokens
gpt-4-32k32,768 tokens

Models with larger context are more expensive.

  • gpt-3.5-turbo-16k-0613

The last part of the model name is the snapshot date.

For example, we have:

  • gpt-3.5-turbo
  • gpt-3.5-turbo-0301
  • gpt-3.5-turbo-0613

Until June 27th, gpt-3.5-turbo will be the same as gpt-3.5-turbo-0301. After June 27th, gpt-3.5-turbo will point to gpt-3.5-turbo-0613. In September 13th, gpt-3.5-turbo-0301 will be depreacated.

The same upgrade and deprecation cycle will happen to the gpt-3.5-turbo-0613 snapshot eventually. You can read more about continuous model upgrades and model deprecation.

In my opinion, it's better to use the snapshot date in the model name, so you can control when to upgrade to a new snapshot.

These are the latest snapshots available today:

  • gpt-3.5-turbo-0613
  • gpt-3.5-turbo-16k-0613
  • gpt-4-0613
  • gpt-4-32k-0613

Messages

This is the real body of your request.

To understand messages, let's see the possible role values:

  • "system" is the role for the first message, the prompt. It's the only message with this role. It's used by the developer to set the style, tone, format, etc. of the response.
  • "user" these are the messages from the end user. Usually, after the system message, the rest of the messages alternate between user and assistant, ending with a user message.
  • "assistant" these are the messages from the model. But you can also use them to give the model examples of desired behavior.
  • "function" we'll talk about this later.

Assistant

The model doesn't have memory, so after you receive the response, you should append the assistant message to the messages array before sending the next user message.

The other usage of the assistant message is to give the model examples of the output you want. You provide a couple of user messages and the corresponding assistant messages, before sending the real user message.


messages: [
{ role: "system", content: "You are a comedian" },
{ role: "user", content: "A Forrest Gump pun" },
{
role: "assistant",
content: "What's Forrest Gump's password? 1forrest1",
},
{ role: "user", content: "A calendar pun" },
{
role: "assistant",
content: "Can February March? No, but April May",
},
{ role: "user", content: "A pun about AI" },
]

Functions

GPT wasn't very good at generating data in a predefined format. So the latest models come with an new (optional) input called functions.

You can pass a list of functions, each with a name, description, and parameters (as a JSON Schema object).

In the example, we are passing the schema of a getCityWeather function. The typescript signature of the function would be something like:


function getCityWeather(params: {
city: string,
unit?: "C" | "F",
});

Function Call

The model is trained to detect when to reply with content as usual and when to reply with a function call.

When it chooses to reply with a function call, the response will have a message from the assistant with the function_call property instead of the usual content. (Also, the finish_reason should be "function_call", but in my tests it sometimes comes as "stop".)

The function_call in the response message is an object with two properties:

  • the name of the function that the model decided to call
  • the arguments for that function, which is a stringified JSON, so you need to JSON.parse it before using it (be aware that the JSON could be invalid or may not adhere to the schema)

If you want to force the model to call a particular function, you can pass the name of the function to the function_call field in the request (not to be confused with the function_call field in the response message). You need to pass it inside an object like this:


function_call: { name: "getCityWeather" }

The other two options for the function_call field are:

  • function_call: "auto" which is the default behavior, the model will decide if it should call a function or not
  • function_call: "none" which forces the model to reply with content as usual

Function Role

Once you have the function_call provided by the model, you can run the function (be careful here, this may be dangerous depending on the function you are calling) and append the result to the messages array (after the assistant message).

You need to set the role set to function, the name set to the function name, and the content set to the result of the function.

Now that the model has seen the result of the function, it will be able to use it to generate the next response.

Structured Data

functions don't have to be real functions, you can use them to turn any text into structured data.

For example, if you want to extract the named entities from a text, you can make up a showNamedEntities function:


function showNamedEntities(params: {
entities: { name: string; type: string }[]
});

Then if you pass a message like { role: "user", content: "Messi leaving PSG" }, the model will reply with a call to the showNamedEntities function:


{
role: "assistant",
content: null,
function_call: {
name: "showNamedEntities",
arguments: `{
entities: [
{ name: "Messi", type: "PERSON" },
{ name: "PSG", type: "ORG" },
],
}`,
},
}

And now you can parse and use that data in your app without telling the model that there is no real function called showNamedEntities.

User

You can use the user field to pass an ID of the end-user that authored the message.

If OpenAI detects any abuse in your requests they'll send you the problematic user as part of the notice. As far as I know, that's the only use for this field.

Stream

With stream: true we can start showing the response while it's still being generated, without having to wait for the whole response.

OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack (this example is using Node.js). But the idea is the same, you receive a stream of chunks.

Chunks are strings that starts with data: followed by an object. The first chunk looks like this: 'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'

Here you can see all the chunks from the example formatted for readability:


data: {
"id": "chatcmpl-xxxx",
"object": "chat.completion.chunk",
"created": 1688198627,
"model": "gpt-3.5-turbo-0613",
"choices":[{
"index": 0,
"delta": {"role": "assistant", "content": ""},
"finish_reason": null
}]
}

After that you'll receive one last chunk with the string "data: [DONE]".

One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself. It's a shame that OpenAI doesn't include it in the last chunk.

Sampling

To control the randomness of the model's output you can use either the temperature or top_p parameters.

temperature accept values between 0 and 2. The default is 1, higher values means more random answers. Lower values means more determinism. Use 0 if you want always the same answers for the same input, but be aware that there's still chance for randomness.

top_p accept values between 0 and 1. Same as before, lower values means less variation. The default is 1.

OpenAI says it's not recommended to use both parameters at the same time. No idea why.

Choices

You can use the n parameter to get more than one response.

Each response will be a different object inside the choices array.

Note that the content of each choice may be the same, specially for short answer or if you're using a low temperature.

The index field may seem useless, but it's actually useful when using n together with streaming. Each chunk will include content for one of the choices, and the index field will tell you which one.

Stop

stop is a list of strings (case-sensitive) that will tell the model to stop generating text when it finds one of them. You can provide up to 4 stop sequences.

The stop sequences aren't included in the response.

Providing a stop list won't make the answer longer, the model will stop naturally if it doesn't find any of the stop sequences.

Max Tokens

Another way to limit the length of the response is to use the max_tokens parameter.

The model will start generating text as usual, but it will count the output tokens and stop when it reaches the given limit. Only output tokens are counted towards the limit.

In the example, the five tokens are:


["1", ".", " Lionel", " Messi", "\n"]

You can use the finish_reason field to know if the model stopped naturally (finish_reason: "stop") or because of the token limit (finish_reason: "length").

Penalties

To avoid the model from repeating itself too much you can use the presence_penalty and frequency_penalty parameters.

Both are numbers between -2.0 and 2.0 and default to 0.

A positive presence_penalty makes it less likely for the model to repeat a token that has already been generated. The higher the penalty, the less likely a token will be repeated.

frequency_penalty is similar, but the penalty increase every time the token is generated.

If I had to choose, I'd pick presence_penalty. frequency_penalty might penalize too much tokens that repeat often, such as punctuation and articles.

Logit Bias

You can penalize or encourage specific tokens by using the logit_bias parameter.

logit_bias is an object where the keys are token ids and values are the bias. The bias is a number between -100 and 100. Like this:


logit_bias: {
65515: -5,
71105: 10
}

A bias closer to -100 will make the model avoid the token, while a value closer to 100 will make the model more likely to use the token (well, if you use 100 the model will probably repeat that token until the token limit is reached).

Since the keys are ids, you'll need a tool to get the token id from a string.

Response

In the response, in addition to the choices array, you'll get some extra fields:

  • id: The unique id OpenAI gives to the response.
  • object: Always "chat.completion".
  • created: The timestamp when the response was created. In case of streaming, this will stay the same for all chunks.
  • model: The model used to generate the response. This is the full name of the model including the snapshot. For example if you used "gpt-3.5-turbo" in your request, the response will be model: "gpt-3.5-turbo-0613".

Finish Reason

Every response will include a finish_reason. This field will tell you why the model stopped generating text. The possible reasons are:

  • "stop": The model returned a complete response, or it was stopped by a stop sequence.
  • "length": Reached the token limit set by max_tokens or by the context limit of the model.
  • "function_call": The model decided to call a function.
  • null: The response is still in progress.

Usage

The usage field has information about the number of tokens used by the request (prompt_tokens) and the response (completion_tokens). This is important because those are the two numbers that will be used to calculate the cost of the request.

prompt_tokens includes the tokens from all the content passed in the messages and functions fields.

completion_tokens includes the tokens from all the choices in the response.

Current prices per 1,000 tokens:

ModelPromptCompletion
gpt-3.5-turbo$0.0015$0.002
gpt-3.5-turbo-16k$0.003$0.004
gpt-4$0.03$0.06
gpt-4-32k$0.06$0.12

Let's say we have this usage in a GPT-4 request:


{
"model": "gpt-4-0613",
"usage": {
"prompt_tokens": 100,
"completion_tokens": 150
}
}

The cost of the request is $0.012:


// prompt cost:
100 * 0.03 / 1000 +
// completion cost:
150 * 0.06 / 1000
// total cost:
= 0.012


const { Configuration, OpenAIApi } = require("openai")
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
})
const openai = new OpenAIApi(configuration)
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "Hello world" },
],
})
// result:
console.log(completion.data.choices[0], {
message: {
role: "assistant",
content: "Hello! How may I assist you today?",
},
finish_reason: "stop",
})