FoundationFrontier_

From Language Models to AI Engineering

Garo Sanchez — Wed, 25 Mar 2026 15:56:57 GMT

This is the second post while reading AI Engineering by Chip Huyen. Chapter 1 already touches on very interesting topics: where language models come from, what foundation models actually are, and how AI engineering emerged as a discipline.

The Statistical Nature of Language

The idea that language follows statistical patterns isn't new. We've known for centuries that certain letters appear more frequently than others, for example: the letter E is the most common in English.

In 1951, Claude Shannon published a landmark paper on the statistical nature of language, and many of his concepts are still used today (i.e. entropy).

This matters because language models are, at their core, statistical machines. They encode patterns about how language works and use those patterns to predict what comes next.

Tokens: The Basic Unit

The fundamental unit of a language model is the token. Tokenization is the process of splitting text into these units. Tokens strike a better balance than characters or full words for how models process text. If you want to see this in action, OpenAI has a tokenizer tool where you can input any text and see exactly how it gets split:

I think its really cool that you can also see how a language model represents text internally:

Two Kinds of Language Models

The book distinguishes two types of language models:

Masked language models are trained to fill in the blank — given a sentence with a missing word, predict what goes there. BERT is the classic example. These are useful for tasks like text classification and code debugging.

Autoregressive language models predict the next token in a sequence. This is the architecture behind GPT and the models we associate with generative AI.

Claude actually created a diagram to explain it to me:

The Breakthrough That Made LLMs Possible

In traditional machine learning, there are 2 main approaches: supervised learning and unsupervised learning. For supervised learning, you need labeled data: inputs paired with known correct outputs. Creating those labels requires humans, and that can be very expensive.

The breakthrough for text was self-supervision: models learned to infer the labels directly from the input data. This meant models could train on essentially all the text available on the internet without anyone manually labeling anything. That's how we went from language models to large language models.

To get a sense of the scale: GPT-1 (2018) had 117 million parameters. GPT-2 (2019) jumped to 1.5 billion. GPT-3 (2020) reached 175 billion. GPT-4 is estimated at 1.7 trillion, though at this point the big labs have stopped disclosing these numbers.

Foundation Models and Multi-Modality

LLMs were trained on text, but the same principles extended to images, video, audio, and other formats. This is called multi-modality, and it gave us Large Multi-modal Models (LMMs). Both LLMs and LMMs are foundation models: general-purpose models trained on massive datasets by a handful of well-resourced labs, then adapted by everyone else.

The adaptation happens mostly through three main techniques:

Prompt engineering (giving the model specific instructions)
RAG (connecting the model to external data sources)
Fine-tuning (further training on domain-specific data)

From Foundation Models to AI Engineering

AI engineering is the discipline of building applications on top of these foundation models. Training foundation models is so prohibitively expensive that only a few organizations can do it, which created the "model as a service" paradigm, and with it, a need for engineers who can take these models and adapt them to solve real-world problems.

What kind of applications can be built?

Chip references AWS's categorization of generative AI into three buckets:

Customer experience
Employee productivity
Process optimization

AI has already proven valuable in marketing, code generation (Chip notes that experts say AI is significantly better at generating frontend code than backend code, uh-oh 😅), information aggregation, and workflow automation: from booking restaurants and planning trips for end users, to lead management and invoicing for enterprises.

The Moat Question

One of the most interesting discussions in this chapter is about product defensibility. The low barrier to entry in AI is both a blessing and a curse. If something is easy for you to build, it's easy for your competitors too.

Chip mentions a VC general partner crushing view: many startups' entire products could become a feature inside Google Docs or Microsoft Office. If their product takes off, what stops a tech giant from assigning three engineers to replicate it in two weeks?

Chip further argues that in AI, competitive advantages come from three places: technology, data, and distribution. Most startups would be using the same foundation models so technology is not a moat. Distribution advantages tend to belong to big companies. That leaves data as the real edge for startups and individuals: even if you couldn't use your data to train a model, the behavioral data you collect — what your users want, how they interact with your product — is invaluable.

The AI Engineering Stack

AI engineering differs from ML engineering in a fundamental way: it's not about developing models from scratch, but about adapting and evaluating them. The core responsibilities boil down to three things:

Evaluation
Prompt engineering
Building the AI interface

Evaluation is especially critical because foundation models are open-ended. Unlike traditional ML models with narrow, well-defined outputs, these models can produce almost anything, which makes measuring success much harder.

Prompt engineering is about extracting desirable behaviors from a model without modifying its weights. This includes providing context, connecting tools and managing memory systems.

Why Full-Stack Engineers Have an Edge

Finally, Chip describes how AI engineering has flipped the traditional ML workflow. Before, you started with data and models, and the product came last. Now, as an AI engineer you can start with the product, validate it with users, and only then invest in data and model optimization. This rewards fast iterators.

In terms of programming languages, Python still dominates, but JavaScript is growing (LangChain.js, Vercel AI SDK, etc.), making this space increasingly accessible to full-stack engineers. Chip puts it clearly: full-stack engineers have an advantage over traditional ML engineers in their ability to quickly turn ideas into demos, get feedback, and iterate.

AI Engineering book: First Notes

Garo Sanchez — Mon, 23 Mar 2026 19:15:18 GMT

I started reading AI Engineering by Chip Huyen. These are my notes on the concepts I found interesting. This is the first entry in an ongoing series where I document what I learn, chapter by chapter.

Chip opens with what surprised her about ChatGPT: a relatively small improvement in model quality led to an explosion of possibilities — new applications, new use cases, a whole ecosystem seemingly overnight. She reminds us that the seed of these technologies has been around for a while; the papers powering them were published as early as the 1950s.

The core goal of this book is to train us in how foundation models work so we can adapt them to solve real-world problems.

Regardless of how fast the tooling evolves, the best practices for working with AI models stay the same:

Systematic experimentation
Rigorous evaluation
Optimization toward cheaper and faster models

The book provides a framework for adapting foundation models — both language models (LLMs) and multi-modal models (LMMs) — and helps answer a set of practical questions that I found very interesting to think of:

Should I build this AI application?
How can I evaluate and measure my app?
Why do models hallucinate, and how can I prevent it?
How can I get the most out of prompt engineering?
What is RAG and how do I use it?
What is an agent? How do I build and evaluate one?
When should I fine-tune a model?
How do I make my model faster, cheaper, and safer?
How do I create a feedback loop for continuous improvement?

Beyond these questions, the book also covers model types, evaluation benchmarks, use cases, and AI application design patterns.

Chip also wrote Designing Machine Learning Systems (DMLS) and considers both books complementary. Some topics are more relevant to ML Engineering and get deeper coverage in DMLS.

AI Engineering is not a tutorial. It's about understanding the fundamentals of this role — the practical concepts needed to build AI applications that solve real-world problems.

Chip says you don't need deep Machine Learning knowledge to build AI applications, but it helps to be familiar with a few core concepts:

Probability: sampling, determinism, and distribution
Machine Learning: supervision, self-supervision, log-likelihood, gradient descent, backpropagation, loss functions, and hyperparameter tuning
Neural network architectures: feedforward, recurrent, and transformer
Metrics: accuracy, F1, precision, recall, cosine similarity, and cross entropy

All of these are explained throughout the book as they become relevant.

The book lists several reasons to read it. The ones that really resonate with me:

Identify underserved areas in AI engineering and better understand real use cases
Deeply understand what the role involves and what a career in AI engineering looks like
Understand how this technology works out of pure curiosity

Finally, here's an overview of what the book will cover chapter by chapter:

Use cases and the current state of the industry
How foundation models work under the hood
Evaluation techniques — measuring the behavior of AI applications
Prompt engineering and security
RAG (Retrieval-Augmented Generation)
Fine-tuning
Data — how to generate the best possible data for your application
Inference optimization

I'm really excited to start this new adventure. Onwards.

Hello, world

Garo Sanchez — Sat, 21 Mar 2026 20:59:45 GMT

A couple months ago I suddenly found myself with a lot of free time on my hands. After taking care of some pending stuff and touching some grass, I decided to stop ignoring what we all know and understand: AI is eating software. It's revolutionizing industries, changing the shape of the world — everything. The interesting stuff, the funding, the innovations, the business cases, the practical day-to-day use cases — it's all happening here. So my curiosity naturally led me to decide to learn more about AI.

Eight years ago, when I started my tech career at Accenture, there was an AI lab that anyone could participate in. From that time I remember the differences between supervised and unsupervised learning, deep Learning terms like CNNs and GANs that I always thought were sexy but never truly understood.

Eight years later, the name of the game is different. It's not about learning linear algebra or calculus (although I definitely want to do that too), or about training models from scratch. It's about understanding how generative AI works — the agents, the LLMs — about being able to work with these models, optimize them, deploy them, monitor them, and in one word, being the person capable of steering them so they give us the results we want.

This is definitely an adventure too interesting to ignore. But where do you even start? Simple. Last week I spent around 40 hours watching and summarizing "AI Engineer Roadmap" videos, reading blogs, podcasts, interviews, and book summaries. Fortunately, the path is straightforward — there's one book that's basically the bible of AI Engineering. Unsurprisingly, that book is called AI Engineering by Chip Huyen. So I subscribed to O'Reilly and started reading it online. That will be my first step, and I will use this blog to update anyone interested about my learning journey.

We will lay our foundation and then, the frontier awaits.