3 Level 3 · Builder

From user
to builder.

The APIs, infrastructure, and economics behind the chat box. This is where you stop renting AI and start wielding it.

9 min read Intermediate+ Some code exposure helps
01 — Three doors

API vs Web Interface vs CLI

There are three ways to interact with AI. Most people only know one. Builders use all three.

🌐 Web Interface

ChatGPT.com, Claude.ai, Gemini.app. Lowest friction. Best for quick questions, exploration, learning. No automation, no integration, no settings control. You're a user on someone else's platform.

🔌 API

Send requests directly to the model via code. Full control: model, temperature, system prompt, tools, output format. Can be automated, integrated, scaled. Pay per token. You're a builder using raw materials.

⌨️ CLI

Terminal-based tools that wrap an API. Best for developers who live in the terminal. Often more powerful than web UI (scripting, piping). Examples: Claude Code, Hermes Agent, custom scripts.

When to use what:

SituationUse
Quick one-off questionWeb UI
Learning, exploringWeb UI
Building an appAPI
Automating workflowsAPI or CLI
Daily driving AI for workCLI with tools
Complex multi-step projectsCLI with agent capabilities
The progression most builders follow: start with web UI → discover API → build tools → adopt CLI agents. Each step gives you more power and more responsibility.
02 — Under the hood

Inference — what happens when you hit "send"

Inference is the technical term for "running the model", taking your input and generating an output. Understanding it helps you understand why AI is sometimes fast, sometimes slow, sometimes smart, sometimes dumb.

When you send a prompt, a server somewhere:

1

Loads the model

Billions of parameters loaded into GPU memory.

2

Processes input

Your input tokens pass through the model's layers.

3

Generates output

Output tokens produced one at a time.

4

Returns the result

The response comes back to you.

Why inference speed varies:

  • Model size: Larger models (400B+ params) are slower than smaller ones (7B). More parameters = more computation per token.
  • Hardware: NVIDIA H100 GPUs are faster than A100s, which are faster than consumer GPUs.
  • Quantization: Compressed models run faster but with lower quality (more below).
  • Load: When millions use ChatGPT at once, everyone gets slower responses.
The dirty secret of cloud AI: you're not always getting the same model quality. Providers may use quantized versions during peak hours, route to a less powerful fallback, or reduce active parameters. You have no visibility into this. Same API, same price, different quality.
03 — The map

Provider landscape

Closed-source providers (API access only):

ProviderModelsStrengthsWeaknesses
OpenAIGPT-4o, o1, o3Largest ecosystem, strong generalExpensive at frontier, closed
AnthropicClaude (Opus, Sonnet, Haiku)Safety, long context, careful reasoningMore cautious, smaller ecosystem
GoogleGeminiMultimodal, huge context, Google integrationInconsistent, sometimes generic

Open-source providers (you can self-host):

ProviderNotable ModelsNotes
MetaLlama 4Best open-source foundation models
MistralMistral, MixtralStrong European alternative
DeepSeekDeepSeek V3Chinese, competitive quality, very cheap
QwenQwen 3Alibaba, strong multilingual

Inference providers (host models for you):

ProviderWhat they doWhy use them
OpenRouterRoute to many modelsOne API, many models, price comparison
Together AIFast open-source inferenceCheap, fast, good selection
Fireworks AIFast inferenceSpeed-optimized
GroqUltra-fast inference (custom chip)Fastest available, limited models

The economics: frontier model pricing (per 1M tokens, June 2026).

  • Claude Opus: ~$15 input / $75 output
  • GPT-4o: ~$2.50 input / $10 output
  • Claude Sonnet: ~$3 input / $15 output
  • DeepSeek V3: ~$0.27 input / $1.10 output
  • Llama 4 (via Together): ~$0.90 input / $0.90 output
Output tokens are always more expensive than input tokens. Asking for concise output literally saves money.
⚡ Try this now
Open openrouter.ai and look at the leaderboard. What's the cheapest model right now? The most expensive? The most popular? Five minutes here and you'll understand the landscape better than most people who use AI daily.
04 — Compression

Quantization — compressed models

Quantization reduces the precision of a model's parameters to make it smaller, faster, and cheaper to run, at the cost of some quality.

The analogy is a photograph. Original: 4000×3000 pixels, 12MB, full quality. Compressed: 1000×750 pixels, 2MB, smaller and faster to load but you lose fine detail. Quantization does the same thing to model weights.

Common quantization levels:

LevelPrecisionSize ReductionQuality Impact
FP16 (original)16-bit1x (baseline)Full quality
FP88-bit~2x smallerMinimal loss
INT88-bit~2x smallerSmall loss
INT44-bit~4x smallerNoticeable loss
INT22-bit~8x smallerSignificant loss
✓ When quantization is fine
  • Simple tasks (summarization, formatting, classification)
  • Bulk processing where speed > peak quality
  • Running on limited hardware (consumer GPUs, laptops)
✗ When to avoid it
  • Complex reasoning tasks
  • Nuanced creative work
  • Tasks where accuracy is critical
05 — The fork

Open-source vs closed-source

One of the most important decisions in AI: do you use a proprietary model via API, or an open-source model you can host yourself?

Closed-source

Pros: Best quality, zero maintenance, always up to date.

Cons: Data goes to third party, can't customize, vendor lock-in, costs can scale unpredictably.

Open-source

Pros: Full control, data stays private, can fine-tune, no vendor lock-in, cheaper at scale.

Cons: Need infrastructure, quality gap with frontier models, maintenance burden.

The trend: the gap is closing fast. DeepSeek V3 and Qwen 3 are competitive with GPT-4o on many tasks. In 2026, the choice isn't "open-source is worse." It's "open-source requires more work but gives you more control."
06 — Build it

Build your first AI tool

You don't need to be a senior developer to build something useful with AI. You need to understand the loop.

Input Format prompt Send to API Get response Use output

Every AI tool, from a simple chatbot to a complex agent, follows this loop. The complexity comes from what you add around it.

1

Simple wrapper

Takes input, sends to API with a system prompt, displays response. Example: a customer support chatbot.

2

With context

Add: search your database for relevant info, combine input + context in the prompt. Example: an AI that answers questions about your docs.

3

With tools

Add: API requests tool calls (search, calculate), system executes them, feeds results back. Example: an AI assistant that can actually DO things.

4

With memory

Add: saves important facts to persistent storage, loads relevant memory each session. Example: a personal AI that knows your preferences and history.

You don't need to build the next ChatGPT. You need to build the tool that makes YOUR specific workflow 10x better.
⚡ Try this now
If you have an API key from OpenAI, Anthropic, or OpenRouter, run this in your terminal:
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"Hello"}]}'
You just called an AI model from the command line. That curl is every API tool you'll ever build, stripped to its core.
07 — Watch out

What can go wrong at this level

Once you start building, the mistakes shift from "wrong answer" to "broken system." Here are the four that bite builders most often.

1. Picking a provider on hype

You hear "Claude is better than GPT" and switch immediately. It's not that simple. Every provider trades something: speed against quality, price against context window, privacy against features.

How to avoid: Test it yourself. Send the same task to three providers and compare. What's best for someone else may be wrong for your workload.
2. Not tracking costs

You route everything through GPT-4o, including summarization tasks a tiny model could handle. The bill runs 10x higher than it needs to. Worse, you don't even know how many tokens each request burns.

How to avoid: Monitor usage. Use cheap models for simple work (translate, summarize, format) and reserve expensive models for hard reasoning (code review, analysis).
3. Not handling errors

You build an app that calls an AI API. You don't handle rate limits, timeouts, malformed responses, or network failures. The first hiccup and your app crashes for the user.

How to avoid: Always handle the failure paths: try/catch, retry with exponential backoff, timeouts, and a fallback response. AI APIs are unreliable by nature. Plan for it.
4. Hardcoding API keys

You paste an API key straight into code and push it to GitHub. Within seconds, automated bots scan it. Within minutes, someone is burning your key to generate content. Bills can climb into the thousands before you notice.

How to avoid: Always load keys from environment variables. Add .env to your .gitignore. If a key ever leaks, revoke it immediately from the provider dashboard.
08 — What you now know

What you should know after Level 3

You now understand the builder's perspective. Tap each as it clicks:

You have the knowledge to start building. The tools are accessible. The APIs are well-documented. The models are capable. The only thing left is to actually build something.