what is gpt?

Every day, we humans talk, write, and share ideas. When someone asks us a question, we don’t just repeat sentences we memorised, we think about what we already know and create new answers on the spot.

AI models can do something similar. One of the most famous AI models is called GPT.
GPT stands for Generative Pretrained Transformer.
Let’s understand each word step by step.

What is generative?

“Generative” means it can make (generate) something new.
Just like we can make new sentences from what we know, GPT can also create new text.

What is Pretrained?

“Pretrained” means already learned before talking to us.

For example, before a teacher teaches in class, the teacher has already studied books. GPT is the same—it reads a lot of text before we use it, so it knows how to answer.

What is a Transformer?

A Transformer is the special computer brain that makes GPT work.
Fun fact: It was first invented by Google in 2017, and now almost all modern AI models (like ChatGPT, Gemini, Claude) use it.

How can we Understand This?

Think about a teacher.

The teacher has already learned from many books.
When the teacher explains something in class, they don’t just repeat the book’s word-for-word.
If a student suddenly asks a new question, the teacher thinks about what they already know and gives a fresh answer.

That’s what a Transformer does. It doesn’t just copy—it creates new answers on the go using what it has already learned.

Input→output

In simple words, A transformer takes input and gives output.

For example:

You type: “Hi”
GPT transforms this input and may answer: “Hey, how are you today?”

It decides this by predicting what word (or part of a word) should come next.

But wait—does it predict letters or words?

Here’s where tokens come in.

A token is like a piece of text.
Different models cut text into tokens in different ways:

Gemini example:
“Hi my name is Aman” → cut into letters:
h, i, m, y, n, a, m, e, i, s, A, m, a, n (plus spaces).
That makes 18 tokens.
OpenAI example:
“Hi my name is Aman” → cut into words:
“Hi”, “my”, “name”, “is”, “Aman”.
That makes 5 tokens.

So, a Transformer doesn’t always predict a whole word or just one letter.
It predicts the next token, and how big that token is depends on the model.

In simple words:

A Transformer is a system that:

Looks at your input (tokens).
Understands the context.
Predicts the next token.
Keeps going until a full sentence (or answer) is formed.

That’s why we say a Transformer is like a smart teacher—it doesn’t just repeat, it listens, thinks, and responds.

The Transformer model by google

This diagram is the original Transformer architecture from the famous paper “Attention is All You Need” (Vaswani et al., 2017

You can read the original research paper “Attention is all you need“ published at NeurIPS 2017.

Attention is All You Need (Google, 2017)

It explains Transformers in full detail—the foundation of GPT.

What is GPT?

What is generative?

What is Pretrained?