Tech 12 min read

The small set of math that makes AI articles readable

IkesanContents

Reading AI articles, equations show up out of nowhere and that’s where a lot of people close the tab.
Honestly though, you don’t need to be able to solve them all.

The goal of this article isn’t to derive AI rigorously with math.
It’s to get to “oh, this equation is just shorthand for that kind of processing.”

No calculus will be solved here.
For the training section, if “it’s nudging things slightly in the direction that reduces the mistake” reads, that’s enough.

Math isn’t a magic spell — it’s shorthand for processing

If you describe what AI is doing really loosely, the flow is this.

flowchart LR
    A[Input<br/>words images audio] --> B[Turn into numbers]
    B --> C[Weight and sum]
    C --> D[Bend the shape]
    D --> E[Make probabilities]
    E --> F[Pick output]

“Turn into numbers”, “add”, “bend the shape”, “turn into probabilities.”
Embarrassingly simple, but fundamentally this is the combination.

The LLM, encoder, image generation, and 3D model articles on this blog all boil down to roughly this flow if you look at the underlying computation loosely.

First, handle things as “a row of numbers”

AI isn’t touching characters or images directly.
It first turns them into rows of numbers.

For example,

  • For words: “what meaning is this word close to”
  • For images: “is there an edge or color change at this part”
  • For audio: “how strong is each frequency band”

— that kind of information is held as a long row of numbers.

These rows of numbers are usually written as vector in articles and papers.
It looks intimidating, but at first it’s enough to think “just numbers lined up horizontally.”

This stage is the same in experiments like I tried whether a local Vision LLM can pull RPG parameters out of a character image, where we fed an image and asked for JSON, and in stories like Running TRELLIS.2 on M1 Max 64GB — a hands-on verification log, where images are turned into 3D locally.

AI first does “weighted addition with priorities”

The first equation to look at can be this.

y=w1x1+w2x2++wnxn+by = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b

Read the symbols like this.

SymbolRough meaning
xxEach input element
wwHow heavily that element is weighted
bbA small shift to adjust the overall position
yyThe final output value

What’s happening is simple — each input is multiplied by an importance and then summed.

For a sentence, for example,

  • Some words pull strongly
  • Some words barely pull
  • Combinations change how they pull

— that’s the kind of effect.

Same for images: information about edges, colors, and positions is weighted and mixed in.
AI isn’t knowing “it’s a cat” from the start; it’s more accurate to picture it scoring features and then totaling them.

Addition alone can’t draw boundaries

Just adding leaves AI only able to react in straight lines.
So a “shape-bending step” is inserted in between.

The easiest example for beginners is sigmoid.

A sketch of the sigmoid curve An S-shaped curve that sits near 0 for small inputs and approaches 1 for large inputs input output 0 0 1
Small inputs sit near 0, large inputs sit near 1. The middle is where it changes sharply.

Written out, it’s this.

σ(x)=11+ex\sigma(x) = \frac{1}{1 + e^{-x}}

But the shape of the graph matters more than the formula itself.
What this basically does is,

  • push very small values close to 0
  • push very large values close to 1
  • respond sharply only around the middle

— that’s the curve.

This lets you build boundaries like “is it dog-ish” / “cat-ish” / “is this feature strong or weak” smoothly instead of with a sudden jump.

That said, real recent models often use different functions.
Sigmoid is used here just to convey the feeling of “addition alone isn’t enough, so the shape is bent in between."

"The ChatGPT feel” comes from “lining up probabilities”

This is the most important part of LLM output.

P(yi)=ezijezjP(y_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}

This is called Softmax — it takes raw scores for each candidate and converts them into probability-like ratios.

Read it like this.

  • ziz_i is the raw score of each candidate
  • P(yi)P(y_i) is how likely that candidate is to be picked

Say there are three candidates for the next word.

CandidateRaw score
”is”high
”was”moderately high
”refrigerator”low

Softmax is what properly turns “high / moderately high / low” into actual ratios.

That’s why ChatGPT isn’t pulling answers off a shelf.
It’s computing “given this context, which is the most natural next token” every single time.

The same idea shows up in experiments like detecting and correcting OCR typos with an encoder model + local LLM, where BERT is used for OCR.
Each candidate character’s score is converted into a probability, and “how suspicious is this character” is read off from that.

Articles like I looked up the source of “ChatGPT lies 27% of the time” that I wrote earlier are connected at the root to this same “line up plausibilities and output the next one” mechanism.

What “training data isn’t baked in directly” actually means

From the discussion so far, what AI holds isn’t the text itself but rather,

  • what kind of weights are placed on what features
  • which candidate gets a high score under which circumstances

— that kind of numerical form.

Basically, it’s not that the model stores training text like a warehouse and pulls it out when needed.
It’s holding connections and tendencies soaked into the weights, as the mental picture.

That said, this isn’t fully black-and-white.
Specific fragments can get memorized strongly, and long proper names or templated phrases can come out verbatim.
Even so, for the basic structure, seeing it as “the pattern is reflected in the weights” over “whole-text storage” makes AI behavior easier to follow.

Training is “when it misses, nudge a little” on repeat

Training, very roughly, can be read from these two lines.

L=logpL = - \log p wnew=woldηgw_{\text{new}} = w_{\text{old}} - \eta g

LL in the first is “by how much it missed.”
pp here can be thought of as the probability assigned to the correct answer.
If the correct answer was only given a low probability, the loss is large.

The second is the weight update.
Read it as “new weight = current weight - adjustment amount.”

SymbolRough meaning
woldw_{\text{old}}Current weight
wneww_{\text{new}}Updated weight
η\etaHow far to move per step
ggDirection that reduces the mistake

In other words,

  • first, measure how much it missed
  • next, move a little in the direction that reduces the miss
  • repeat that a huge number of times

— that’s it.

Differentiation really does show up here.
But as an entry point, “looking at the gap from the correct answer and slowly correcting” is enough, I think.

Once you can read this level of math, it’s much easier to follow what AI articles are saying.

TypeRough role
EncoderPulls features from input and turns them into manageable numbers
LLMChains probabilities of the next word or token
VLMHandles images and text together
Image generationGradually nudges noise toward “something image-like”
3D modelTakes features of images or video as numbers, then turns them back into shape

For example, working articles like I investigated whether Z-Image (Zaoxiang) runs on RunPod — hoping for stable character shapes still treat text and image features as numbers underneath.
Similarly, multimodal experiments like I tried whether a local Vision LLM can pull RPG parameters out of a character image are an extension of the same idea: “handle images and text in the same number space.”


Glossary for the curious

This section is fine to skip.
Just the minimum meaning of the words used above.

AI in general

TermMeaning
vectorNumbers lined up horizontally. In AI, word and image features are often held as such a row
weightA coefficient that decides which information is looked at strongly. Same input, different weights, different result
activation functionA step that adds bends you can’t make with addition alone. Sigmoid is one example
SoftmaxA step that turns a row of scores into a ratio or probability-like form. Not just “what to output next” in LLMs, but also “how much to look at where” in other situations
loss functionA number for how much it missed. The larger it is, the more “correction is still needed”
learning rateHow far to move per single correction. Too large and it gets unstable, too small and it’s slow
encoderThe side of the model that converts input into manageable features. Think of it as a role that maps meaning and shape into numbers

Words you see in image generation

TermMeaning
Text encoderThe part that turns a prompt into a row of numbers the image model can read. In ComfyUI you see it as CLIP Text Encode type nodes
VAEThe part that compresses an image into a manageable form and, at the end, reconstructs the image. VAE Encode and VAE Decode are these
latentNot the image itself — an intermediate representation used inside VAE. If you see latent in ComfyUI, think “numbers before they become an image”
CFGHow strongly the prompt is applied. Too high and it becomes unnatural, too low and the instruction doesn’t land
stepThe number of iterations turning noise back into an image. More means more careful, but slower
samplerThe procedure for how noise is reduced. If you’re touching the KSampler in ComfyUI, that’s enough

Can you actually make this yourself?

Making ChatGPT itself or a full image generator in Excel is out of the question.
But a mini version of the basic computation shown in this article is doable in a spreadsheet.

For example, weighting and summing two inputs looks like this.

  • x1=0.8x_1 = 0.8
  • x2=0.3x_2 = 0.3
  • w1=1.2w_1 = 1.2
  • w2=0.5w_2 = -0.5
  • b=0.1b = 0.1
y=w1x1+w2x2+by = w_1 x_1 + w_2 x_2 + b

This one you can just compute in Excel or Google Sheets.

=1.2*0.8 + (-0.5)*0.3 + 0.1

If you want the output to be in the 0–1 range, you can pipe it through sigmoid too.

=1/(1+EXP(-A1))

Here A1 is the cell holding the yy you just computed.

Even this much is enough to get a feel for,

  • positive weights push the output up
  • negative weights push the opposite way
  • instead of outputting the raw value, sometimes the shape is bent at the end

A half-baked “will it rain tomorrow” AI

To make it a bit more AI-ish, you can think of a tiny version that outputs “is it rainy tomorrow.”

For explanation, the training data is shown in a table below.
But when making a prediction, you don’t look at this table directly.
What gets adjusted during training is “how heavily to weight each factor.”

Say the training data is like this.

Rain yesterdayHigh humidityPressure droppingRain tomorrow
1111
1101
0111
1000
0010
0000

After training, say the weights come out like this.

ElementValue
Weight of “rain yesterday”0.6
Weight of “high humidity”1.0
Weight of “pressure dropping”1.3
Bias bb-1.4

Now feed in a new input not in the training data.

Rain yesterdayHigh humidityPressure dropping
101

The computation is this.

y=0.6×1+1.0×0+1.3×11.4=0.5y = 0.6 \times 1 + 1.0 \times 0 + 1.3 \times 1 - 1.4 = 0.5

Pipe it through sigmoid.

σ(0.5)0.62\sigma(0.5) \approx 0.62

So it reads as “rainy-ish, about 62%.”

In Excel or Sheets, you can write this, for example.

=0.6*1 + 1.0*0 + 1.3*1 - 1.4
=1/(1+EXP(-0.5))

Obviously, real weather prediction isn’t this simple.
But,

  • there’s a training table
  • weights remain after training
  • at prediction time, the weights are used on new input

— to see that flow, this is enough, I think.

The important point is that it isn’t looking up rows in the training data.
Tendencies like “high humidity leans rainy” and “falling pressure leans further rainy” are reflected in the weights — that’s closer to the picture.

In short, even though the giant models are out of reach, “the basic computation behind AI” is traceable by hand.
The equations suddenly stop looking like a string of symbols and start looking like ordinary arithmetic.

A half-baked “horse-race prediction AI” or something similar has basically the same structure.
If you’re curious, play around with building one — you’ll quickly get a feel for turning inputs into numbers and giving them weights.
That said, since real horse racing has a lot of factors to look at, if you’re going to try something yourself, boat racing feels like it’s easier to keep tidy.

Within this blog, the ones where the math underneath is easiest to see are around these.

If you want a side-read