GPT API Unofficial Docs
This is a work in progress. If you want to contribute, here is the GitHub repo.
This site offers an alternative take on OpenAI's Chat Completion API reference and the official GPT guide.
In its most basic form, the Chat Completion API receives a system
message (the prompt) and a user
message, and then returns an assistant
message responding to the user.
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "Hello world" }, ],
For this guide, we are using the OpenAI node library, but the focus of this guide will be the request and response format, which is the same for the python library and the HTTP API.
Request and Response
The goal of this guide is to help you understand all the fields in the request and response. There are a lot of fields. Most are optional, some are required. Some only make sense in combination with others.
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "Hello world" }, ], functions: undefined, function_call: undefined, user: undefined, stream: false, temperature: 1, top_p: 1, n: 1, stop: undefined, max_tokens: Math.infinity, presence_penalty: 0, frequency_penalty: 0, logit_bias: undefined,})// result:console.log(completion.data, { id: "chatcmpl-123", object: "chat.completion", created: 1677652288, model: "gpt-3.5-turbo-0613", choices: [ { index: 0, message: { role: "assistant", content: "Hello! How may I assist you today?", }, finish_reason: "stop", }, ],
From the request, the model
and messages
fields are the most important (and also the ones that are required).
From the response, the choices
field is the most important, that's where you'll find the assistant
's answer to the last user
message.
If you are interested in a particular field, you can click on it to jump to the section that describes it.
API Key
apiKey: process.env.OPENAI_API_KEY,
First you'll need the OPENAI_API_KEY
. That means you need to sign up if you haven't already, and then visit your API keys page to get a key.
Remember that your API key is a secret! Do not share it with others or expose it in any client-side code (browsers, apps). Production requests must be routed through your own backend server where your API key can be securely loaded from an environment variable or key management service.
Model
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "Hello world" }, ], functions: undefined, function_call: undefined, user: undefined, stream: false, temperature: 1, top_p: 1, n: 1, stop: undefined, max_tokens: Math.infinity, presence_penalty: 0, frequency_penalty: 0, logit_bias: undefined, model: "gpt-3.5-turbo-0613",
The first field from the request is the model
. It's a string with the name of the model to use. If you don't care about the model right now, start with gpt-3.5-turbo-0613
.
To understand the differences between models, let's deconstruct a model name:
- gpt-3.5-turbo-16k-0613
The first part is the model architecture. It could be gpt-3.5-turbo
or gpt-4
. gpt-3.5-turbo
is cheaper and faster. gpt-4
is more powerful and it's not available by default, you need to ask for access to use it.
- gpt-3.5-turbo-16k-0613
The next part is the context length in tokens. Tokens can be thought of as pieces of words (learn more). A rule of thumb is 100 tokens is about 75 words in english. 16k is the total amount of tokens that the model support per request, not only in the input (messages
and functions
), it also includes the output (message
).
The four options are:
model | context |
---|---|
gpt-3.5-turbo | 4,096 tokens |
gpt-3.5-turbo-16k | 16,384 tokens |
gpt-4 | 8,192 tokens |
gpt-4-32k | 32,768 tokens |
Models with larger context are more expensive.
- gpt-3.5-turbo-16k-0613
The last part of the model name is the snapshot date.
For example, we have:
gpt-3.5-turbo
gpt-3.5-turbo-0301
gpt-3.5-turbo-0613
Until June 27th, gpt-3.5-turbo
will be the same as gpt-3.5-turbo-0301
. After June 27th, gpt-3.5-turbo
will point to gpt-3.5-turbo-0613
.
In September 13th, gpt-3.5-turbo-0301
will be depreacated.
The same upgrade and deprecation cycle will happen to the gpt-3.5-turbo-0613
snapshot eventually. You can read more about continuous model upgrades and model deprecation.
In my opinion, it's better to use the snapshot date in the model name, so you can control when to upgrade to a new snapshot.
These are the latest snapshots available today:
gpt-3.5-turbo-0613
gpt-3.5-turbo-16k-0613
gpt-4-0613
gpt-4-32k-0613
Messages
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY, { role: "system", content: "You are a helpful assistant" },
This is the real body of your request.
To understand messages
, let's see the possible role
values:
"system"
is the role for the first message, the prompt. It's the only message with this role. It's used by the developer to set the style, tone, format, etc. of the response."user"
these are the messages from the end user. Usually, after thesystem
message, the rest of the messages alternate betweenuser
andassistant
, ending with auser
message."assistant"
these are the messages from the model. But you can also use them to give the model examples of desired behavior."function"
we'll talk about this later.
Assistant
The model doesn't have memory, so after you receive the response, you should append the assistant
message to the messages
array before sending the next user
message.
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "Hello world" }, { role: "assistant", content: "How may I assist you today?" }, { role: "user", content: "Give me a fact about Messi" }, ],})// result:console.log(completion.data, { id: "chatcmpl-123", object: "chat.completion", created: 1677652288,
The other usage of the assistant
message is to give the model examples of the output you want. You provide a couple of user
messages and the corresponding assistant
messages, before sending the real user
message.
messages: [ { role: "system", content: "You are a comedian" }, { role: "user", content: "A Forrest Gump pun" }, { role: "assistant", content: "What's Forrest Gump's password? 1forrest1", }, { role: "user", content: "A calendar pun" }, { role: "assistant", content: "Can February March? No, but April May", }, { role: "user", content: "A pun about AI" },]
Functions
GPT wasn't very good at generating data in a predefined format. So the latest models come with an new (optional) input called functions
.
You can pass a list of functions, each with a name
, description
, and parameters
(as a JSON Schema object).
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo-0613", messages: [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "It's hot in Valencia" }, ], functions: [ { name: "getCityWeather", city: { type: "string", description: "The city" },
In the example, we are passing the schema of a getCityWeather
function. The typescript signature of the function would be something like:
function getCityWeather(params: { city: string, unit?: "C" | "F",});
Function Call
The model is trained to detect when to reply with content as usual and when to reply with a function call.
When it chooses to reply with a function call, the response will have a message from the assistant
with the function_call
property instead of the usual content
. (Also, the finish_reason
should be "function_call"
, but in my tests it sometimes comes as "stop"
.)
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo-0613", messages: [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "It's hot in Valencia" }, ], functions: [ { name: "getCityWeather", description: "Get the weather in a given city", arguments: '{ "city": "Valencia" }',
The function_call
in the response message is an object with two properties:
- the
name
of the function that the model decided to call - the
arguments
for that function, which is a stringified JSON, so you need toJSON.parse
it before using it (be aware that the JSON could be invalid or may not adhere to the schema)
If you want to force the model to call a particular function, you can pass the name of the function to the function_call
field in the request (not to be confused with the function_call
field in the response message). You need to pass it inside an object like this:
function_call: { name: "getCityWeather" }
The other two options for the function_call
field are:
function_call: "auto"
which is the default behavior, the model will decide if it should call a function or notfunction_call: "none"
which forces the model to reply with content as usual
Function Role
Once you have the function_call
provided by the model, you can run the function (be careful here, this may be dangerous depending on the function you are calling) and append the result to the messages
array (after the assistant
message).
You need to set the role
set to function
, the name
set to the function name, and the content
set to the result of the function.
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo-0613", messages: [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "It's hot in Valencia" }, { role: "assistant", content: null, function_call: { name: "getCityWeather", arguments: '{ "city": "Valencia" }', }, }, { role: "function", name: "getCityWeather", content: "33 C", }, content: "It's 33°C in Valencia. Stay hydrated!",
Now that the model has seen the result of the function, it will be able to use it to generate the next response.
Structured Data
functions
don't have to be real functions, you can use them to turn any text into structured data.
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo-0613", messages: [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "Messi leaving PSG" }, ], functions: [ { name: "showNamedEntities", description: "Show the named entities", parameters: { type: "object", properties: { entities: { type: "array", items: { type: "object", properties: { name: { type: "string" },
For example, if you want to extract the named entities from a text, you can make up a showNamedEntities
function:
function showNamedEntities(params: { entities: { name: string; type: string }[]});
Then if you pass a message like { role: "user", content: "Messi leaving PSG" }
, the model will reply with a call to the showNamedEntities
function:
{ role: "assistant", content: null, function_call: { name: "showNamedEntities", arguments: `{ entities: [ { name: "Messi", type: "PERSON" }, { name: "PSG", type: "ORG" }, ], }`, },}
And now you can parse and use that data in your app without telling the model that there is no real function called showNamedEntities
.
User
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,}) { role: "user", content: "Hello world" },
You can use the user
field to pass an ID of the end-user that authored the message.
If OpenAI detects any abuse in your requests they'll send you the problematic user
as part of the notice. As far as I know, that's the only use for this field.
Stream
With stream: true
we can start showing the response while it's still being generated, without having to wait for the whole response.
OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack (this example is using Node.js). But the idea is the same, you receive a stream of chunks.
const { Configuration, OpenAIApi } = require("openai")const { Readable } = require("stream")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [const stream = Readable.from(completion.data)
Chunks are strings that starts with data:
followed by an object. The first chunk looks like this: 'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'
Here you can see all the chunks from the example formatted for readability:
After that you'll receive one last chunk with the string "data: [DONE]"
.
One thing we lose with streaming is the usage
field. So if you need to know how many tokens the request used you'll need to count them yourself. It's a shame that OpenAI doesn't include it in the last chunk.
Sampling
const { Configuration, OpenAIApi } = require("openai") temperature: 1,
To control the randomness of the model's output you can use either the temperature
or top_p
parameters.
temperature
accept values between 0
and 2
. The default is 1
, higher values means more random answers. Lower values means more determinism. Use 0
if you want always the same answers for the same input, but be aware that there's still chance for randomness.
top_p
accept values between 0
and 1
. Same as before, lower values means less variation. The default is 1
.
OpenAI says it's not recommended to use both parameters at the same time. No idea why.
Choices
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [ { role: "system", content: "Give me one name" }, { role: "user", content: "Futbol GOAT" }, ], temperature: 1.2, n: 2,}) message: { role: "assistant", content: "Messi" },
You can use the n
parameter to get more than one response.
Each response will be a different object inside the choices
array.
Note that the content of each choice may be the same, specially for short answer or if you're using a low temperature
.
The index
field may seem useless, but it's actually useful when using n
together with streaming. Each chunk will include content for one of the choices, and the index
field will tell you which one.
Stop
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [ { role: "system", content: "List names" }, { role: "user", content: "Best football players" }, ], stop: ["5."],})// result:console.log(completion.data, { choices: [ {
stop
is a list of strings (case-sensitive) that will tell the model to stop generating text when it finds one of them. You can provide up to 4 stop sequences.
The stop sequences aren't included in the response.
Providing a stop
list won't make the answer longer, the model will stop naturally if it doesn't find any of the stop sequences.
Max Tokens
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [ { role: "system", content: "List names" }, { role: "user", content: "Best football players" }, ], max_tokens: 5,})// result:console.log(completion.data, { choices: [ { index: 0, message: {
Another way to limit the length of the response is to use the max_tokens
parameter.
The model will start generating text as usual, but it will count the output tokens and stop when it reaches the given limit. Only output tokens are counted towards the limit.
In the example, the five tokens are:
["1", ".", " Lionel", " Messi", "\n"]
You can use the finish_reason
field to know if the model stopped naturally (finish_reason:
"stop"
) or because of the token limit (finish_reason:
"length"
).
Penalties
To avoid the model from repeating itself too much you can use the presence_penalty
and frequency_penalty
parameters.
Both are numbers between -2.0
and 2.0
and default to 0
.
const { Configuration, OpenAIApi } = require("openai") frequency_penalty: 0,
A positive presence_penalty
makes it less likely for the model to repeat a token that has already been generated. The higher the penalty, the less likely a token will be repeated.
frequency_penalty
is similar, but the penalty increase every time the token is generated.
If I had to choose, I'd pick presence_penalty
. frequency_penalty
might penalize too much tokens that repeat often, such as punctuation and articles.
Logit Bias
You can penalize or encourage specific tokens by using the logit_bias
parameter.
logit_bias: undefined,
logit_bias
is an object where the keys are token ids and values are the bias. The bias is a number between -100
and 100
. Like this:
logit_bias: { 65515: -5, 71105: 10}
A bias closer to -100
will make the model avoid the token, while a value closer to 100
will make the model more likely to use the token (well, if you use 100
the model will probably repeat that token until the token limit is reached).
Since the keys are ids, you'll need a tool to get the token id from a string.
Response
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY, model: "gpt-3.5-turbo-0613",
In the response, in addition to the choices
array, you'll get some extra fields:
id
: The unique id OpenAI gives to the response.object
: Always"chat.completion"
.created
: The timestamp when the response was created. In case of streaming, this will stay the same for all chunks.model
: The model used to generate the response. This is the full name of the model including the snapshot. For example if you used"gpt-3.5-turbo"
in your request, the response will bemodel: "gpt-3.5-turbo-0613"
.
Finish Reason
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [ finish_reason: "stop",
Every response will include a finish_reason
. This field will tell you why the model stopped generating text. The possible reasons are:
"stop"
: The model returned a complete response, or it was stopped by astop
sequence."length"
: Reached the token limit set bymax_tokens
or by the context limit of the model."function_call"
: The model decided to call a function.null
: The response is still in progress.
Usage
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,}) completion_tokens: 9,
The usage
field has information about the number of tokens used by the request (prompt_tokens
) and the response (completion_tokens
). This is important because those are the two numbers that will be used to calculate the cost of the request.
prompt_tokens
includes the tokens from all the content passed in the messages
and functions
fields.
completion_tokens
includes the tokens from all the choices
in the response.
Current prices per 1,000 tokens:
Model | Prompt | Completion |
---|---|---|
gpt-3.5-turbo | $0.0015 | $0.002 |
gpt-3.5-turbo-16k | $0.003 | $0.004 |
gpt-4 | $0.03 | $0.06 |
gpt-4-32k | $0.06 | $0.12 |
Let's say we have this usage in a GPT-4 request:
{ "model": "gpt-4-0613", "usage": { "prompt_tokens": 100, "completion_tokens": 150 }}
The cost of the request is $0.012:
// prompt cost:100 * 0.03 / 1000 +// completion cost:150 * 0.06 / 1000// total cost:= 0.012
Errors
// TODO
https://platform.openai.com/docs/guides/error-codes/api-errors
GPT API Unofficial Docs
This is a work in progress. If you want to contribute, here is the GitHub repo.
This site offers an alternative take on OpenAI's Chat Completion API reference and the official GPT guide.
In its most basic form, the Chat Completion API receives a system
message (the prompt) and a user
message, and then returns an assistant
message responding to the user.
For this guide, we are using the OpenAI node library, but the focus of this guide will be the request and response format, which is the same for the python library and the HTTP API.
Request and Response
The goal of this guide is to help you understand all the fields in the request and response. There are a lot of fields. Most are optional, some are required. Some only make sense in combination with others.
From the request, the model
and messages
fields are the most important (and also the ones that are required).
From the response, the choices
field is the most important, that's where you'll find the assistant
's answer to the last user
message.
If you are interested in a particular field, you can click on it to jump to the section that describes it.
API Key
First you'll need the OPENAI_API_KEY
. That means you need to sign up if you haven't already, and then visit your API keys page to get a key.
Remember that your API key is a secret! Do not share it with others or expose it in any client-side code (browsers, apps). Production requests must be routed through your own backend server where your API key can be securely loaded from an environment variable or key management service.
Model
The first field from the request is the model
. It's a string with the name of the model to use. If you don't care about the model right now, start with gpt-3.5-turbo-0613
.
To understand the differences between models, let's deconstruct a model name:
- gpt-3.5-turbo-16k-0613
The first part is the model architecture. It could be gpt-3.5-turbo
or gpt-4
. gpt-3.5-turbo
is cheaper and faster. gpt-4
is more powerful and it's not available by default, you need to ask for access to use it.
- gpt-3.5-turbo-16k-0613
The next part is the context length in tokens. Tokens can be thought of as pieces of words (learn more). A rule of thumb is 100 tokens is about 75 words in english. 16k is the total amount of tokens that the model support per request, not only in the input (messages
and functions
), it also includes the output (message
).
The four options are:
model | context |
---|---|
gpt-3.5-turbo | 4,096 tokens |
gpt-3.5-turbo-16k | 16,384 tokens |
gpt-4 | 8,192 tokens |
gpt-4-32k | 32,768 tokens |
Models with larger context are more expensive.
- gpt-3.5-turbo-16k-0613
The last part of the model name is the snapshot date.
For example, we have:
gpt-3.5-turbo
gpt-3.5-turbo-0301
gpt-3.5-turbo-0613
Until June 27th, gpt-3.5-turbo
will be the same as gpt-3.5-turbo-0301
. After June 27th, gpt-3.5-turbo
will point to gpt-3.5-turbo-0613
.
In September 13th, gpt-3.5-turbo-0301
will be depreacated.
The same upgrade and deprecation cycle will happen to the gpt-3.5-turbo-0613
snapshot eventually. You can read more about continuous model upgrades and model deprecation.
In my opinion, it's better to use the snapshot date in the model name, so you can control when to upgrade to a new snapshot.
These are the latest snapshots available today:
gpt-3.5-turbo-0613
gpt-3.5-turbo-16k-0613
gpt-4-0613
gpt-4-32k-0613
Messages
This is the real body of your request.
To understand messages
, let's see the possible role
values:
"system"
is the role for the first message, the prompt. It's the only message with this role. It's used by the developer to set the style, tone, format, etc. of the response."user"
these are the messages from the end user. Usually, after thesystem
message, the rest of the messages alternate betweenuser
andassistant
, ending with auser
message."assistant"
these are the messages from the model. But you can also use them to give the model examples of desired behavior."function"
we'll talk about this later.
Assistant
The model doesn't have memory, so after you receive the response, you should append the assistant
message to the messages
array before sending the next user
message.
The other usage of the assistant
message is to give the model examples of the output you want. You provide a couple of user
messages and the corresponding assistant
messages, before sending the real user
message.
messages: [ { role: "system", content: "You are a comedian" }, { role: "user", content: "A Forrest Gump pun" }, { role: "assistant", content: "What's Forrest Gump's password? 1forrest1", }, { role: "user", content: "A calendar pun" }, { role: "assistant", content: "Can February March? No, but April May", }, { role: "user", content: "A pun about AI" },]
Functions
GPT wasn't very good at generating data in a predefined format. So the latest models come with an new (optional) input called functions
.
You can pass a list of functions, each with a name
, description
, and parameters
(as a JSON Schema object).
In the example, we are passing the schema of a getCityWeather
function. The typescript signature of the function would be something like:
function getCityWeather(params: { city: string, unit?: "C" | "F",});
Function Call
The model is trained to detect when to reply with content as usual and when to reply with a function call.
When it chooses to reply with a function call, the response will have a message from the assistant
with the function_call
property instead of the usual content
. (Also, the finish_reason
should be "function_call"
, but in my tests it sometimes comes as "stop"
.)
The function_call
in the response message is an object with two properties:
- the
name
of the function that the model decided to call - the
arguments
for that function, which is a stringified JSON, so you need toJSON.parse
it before using it (be aware that the JSON could be invalid or may not adhere to the schema)
If you want to force the model to call a particular function, you can pass the name of the function to the function_call
field in the request (not to be confused with the function_call
field in the response message). You need to pass it inside an object like this:
function_call: { name: "getCityWeather" }
The other two options for the function_call
field are:
function_call: "auto"
which is the default behavior, the model will decide if it should call a function or notfunction_call: "none"
which forces the model to reply with content as usual
Function Role
Once you have the function_call
provided by the model, you can run the function (be careful here, this may be dangerous depending on the function you are calling) and append the result to the messages
array (after the assistant
message).
You need to set the role
set to function
, the name
set to the function name, and the content
set to the result of the function.
Now that the model has seen the result of the function, it will be able to use it to generate the next response.
Structured Data
functions
don't have to be real functions, you can use them to turn any text into structured data.
For example, if you want to extract the named entities from a text, you can make up a showNamedEntities
function:
function showNamedEntities(params: { entities: { name: string; type: string }[]});
Then if you pass a message like { role: "user", content: "Messi leaving PSG" }
, the model will reply with a call to the showNamedEntities
function:
{ role: "assistant", content: null, function_call: { name: "showNamedEntities", arguments: `{ entities: [ { name: "Messi", type: "PERSON" }, { name: "PSG", type: "ORG" }, ], }`, },}
And now you can parse and use that data in your app without telling the model that there is no real function called showNamedEntities
.
User
You can use the user
field to pass an ID of the end-user that authored the message.
If OpenAI detects any abuse in your requests they'll send you the problematic user
as part of the notice. As far as I know, that's the only use for this field.
Stream
With stream: true
we can start showing the response while it's still being generated, without having to wait for the whole response.
OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack (this example is using Node.js). But the idea is the same, you receive a stream of chunks.
Chunks are strings that starts with data:
followed by an object. The first chunk looks like this: 'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'
Here you can see all the chunks from the example formatted for readability:
After that you'll receive one last chunk with the string "data: [DONE]"
.
One thing we lose with streaming is the usage
field. So if you need to know how many tokens the request used you'll need to count them yourself. It's a shame that OpenAI doesn't include it in the last chunk.
Sampling
To control the randomness of the model's output you can use either the temperature
or top_p
parameters.
temperature
accept values between 0
and 2
. The default is 1
, higher values means more random answers. Lower values means more determinism. Use 0
if you want always the same answers for the same input, but be aware that there's still chance for randomness.
top_p
accept values between 0
and 1
. Same as before, lower values means less variation. The default is 1
.
OpenAI says it's not recommended to use both parameters at the same time. No idea why.
Choices
You can use the n
parameter to get more than one response.
Each response will be a different object inside the choices
array.
Note that the content of each choice may be the same, specially for short answer or if you're using a low temperature
.
The index
field may seem useless, but it's actually useful when using n
together with streaming. Each chunk will include content for one of the choices, and the index
field will tell you which one.
Stop
stop
is a list of strings (case-sensitive) that will tell the model to stop generating text when it finds one of them. You can provide up to 4 stop sequences.
The stop sequences aren't included in the response.
Providing a stop
list won't make the answer longer, the model will stop naturally if it doesn't find any of the stop sequences.
Max Tokens
Another way to limit the length of the response is to use the max_tokens
parameter.
The model will start generating text as usual, but it will count the output tokens and stop when it reaches the given limit. Only output tokens are counted towards the limit.
In the example, the five tokens are:
["1", ".", " Lionel", " Messi", "\n"]
You can use the finish_reason
field to know if the model stopped naturally (finish_reason:
"stop"
) or because of the token limit (finish_reason:
"length"
).
Penalties
To avoid the model from repeating itself too much you can use the presence_penalty
and frequency_penalty
parameters.
Both are numbers between -2.0
and 2.0
and default to 0
.
A positive presence_penalty
makes it less likely for the model to repeat a token that has already been generated. The higher the penalty, the less likely a token will be repeated.
frequency_penalty
is similar, but the penalty increase every time the token is generated.
If I had to choose, I'd pick presence_penalty
. frequency_penalty
might penalize too much tokens that repeat often, such as punctuation and articles.
Logit Bias
You can penalize or encourage specific tokens by using the logit_bias
parameter.
logit_bias
is an object where the keys are token ids and values are the bias. The bias is a number between -100
and 100
. Like this:
logit_bias: { 65515: -5, 71105: 10}
A bias closer to -100
will make the model avoid the token, while a value closer to 100
will make the model more likely to use the token (well, if you use 100
the model will probably repeat that token until the token limit is reached).
Since the keys are ids, you'll need a tool to get the token id from a string.
Response
In the response, in addition to the choices
array, you'll get some extra fields:
id
: The unique id OpenAI gives to the response.object
: Always"chat.completion"
.created
: The timestamp when the response was created. In case of streaming, this will stay the same for all chunks.model
: The model used to generate the response. This is the full name of the model including the snapshot. For example if you used"gpt-3.5-turbo"
in your request, the response will bemodel: "gpt-3.5-turbo-0613"
.
Finish Reason
Every response will include a finish_reason
. This field will tell you why the model stopped generating text. The possible reasons are:
"stop"
: The model returned a complete response, or it was stopped by astop
sequence."length"
: Reached the token limit set bymax_tokens
or by the context limit of the model."function_call"
: The model decided to call a function.null
: The response is still in progress.
Usage
The usage
field has information about the number of tokens used by the request (prompt_tokens
) and the response (completion_tokens
). This is important because those are the two numbers that will be used to calculate the cost of the request.
prompt_tokens
includes the tokens from all the content passed in the messages
and functions
fields.
completion_tokens
includes the tokens from all the choices
in the response.
Current prices per 1,000 tokens:
Model | Prompt | Completion |
---|---|---|
gpt-3.5-turbo | $0.0015 | $0.002 |
gpt-3.5-turbo-16k | $0.003 | $0.004 |
gpt-4 | $0.03 | $0.06 |
gpt-4-32k | $0.06 | $0.12 |
Let's say we have this usage in a GPT-4 request:
{ "model": "gpt-4-0613", "usage": { "prompt_tokens": 100, "completion_tokens": 150 }}
The cost of the request is $0.012:
// prompt cost:100 * 0.03 / 1000 +// completion cost:150 * 0.06 / 1000// total cost:= 0.012
const { Configuration, OpenAIApi } = require("openai")const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY,})const openai = new OpenAIApi(configuration)const completion = await openai.createChatCompletion({ model: "gpt-3.5-turbo", messages: [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "Hello world" }, ],})// result:console.log(completion.data.choices[0], { message: { role: "assistant", content: "Hello! How may I assist you today?", }, finish_reason: "stop",})