Module 4: Generative AI and Large Language Models#

Artificial Intelligence has entered a new era — one where machines not only analyze information but also create it. Generative AI (GenAI) represents the most visible and transformative branch of modern AI, capable of producing text, images, music, videos, and even software code. At the heart of this revolution are Large Language Models (LLMs) — systems that learn patterns of human language and use them to generate coherent, contextually appropriate responses.

Learning Objectives#

After completing this module, you will be able to:

  • Explain what Generative AI is and how it differs from traditional AI.

  • Describe the core architecture and functioning of Large Language Models (LLMs).

  • Identify key applications, benefits, and risks associated with generative models.

  • Understand how prompts, tokens, and context shape AI-generated content.

  • Reflect on the ethical, social, and creative implications of AI that can generate new knowledge and artifacts.

Part I — From Traditional AI to Generative AI#

1.1 Traditional AI: Learning to Solve Specific Tasks#

Traditional AI systems — such as those used in classification, recommendation, or forecasting — are mostly analytical in nature. They analyze data, detect patterns, make predictions, cluster, recommend, etc., based on what they have learned. They are also sometimes called narrow AI, because they are design to perform specific tasks.

Examples include:

  • Credit risk scoring

  • Medical diagnosis models

  • Recommendation systems (e.g., Netflix, Amazon)

These systems excel at solving a problem but not at creation.

1.2 Generative AI: Learning to Create#

Generative AI (GenAI) goes beyond solving a problem. It learns the underlying structure of data and uses it to produce new, original content that did not exist before.

Generative AI does not just answer questions — it creates possibilities.

Examples of Generative AI:

  • Text generation: ChatGPT, Claude, Gemini, LLaMA, Grok

  • Image generation: DALL·E, Midjourney, Stable Diffusion

  • Audio and music: Suno, MusicLM

  • Video: Runway, Pika, Sora, Google Veo

  • Code generation: GitHub Copilot, Amazon CodeWhisperer, Cursor

1.3 How Generative Models Work#

Generative AI models are trained to estimate the probability distribution of data. Once trained, they can sample from that distribution to generate plausible new instances — words, pixels, or notes for example.

Generative AI learns how data is structured, not just what it contains.

Core generative model families:

  • VAEs (Variational Autoencoders) — encode and decode data to generate similar but novel samples.

  • GANs (Generative Adversarial Networks) — use a generator and a discriminator in competition to produce realistic outputs.

  • Diffusion Models — iteratively remove noise from data to create high-quality images or signals.

  • Autoregressive Models — generate data by predicting the next element in a sequence based on the preceding elements.

  • Flow Based Models — learn a transformation (a flow) that maps a simple distribution (like a Gaussian) to the complex data distribution.

  • Transformers — predict the next token in a sequence, forming the foundation of LLMs.

Part II — Large Language Models (LLMs)#

2.1 What Is a Large Language Model?#

A Large Language Model (LLM) is a neural network trained on vasts amounts of text to predict the next word (token) in a sequence. Through this simple objective, it learns grammar, facts, style, reasoning patterns, and even creative expression.

LLMs do not “know” language, they model it.

Key characteristics:

  • Trained on vasts amounts of text data (e.g. books, articles, web content, code).

  • Contain billions to trillions of parameters (learned weights).

  • Use transformer architectures with self-attention mechanisms to capture context.

2.2 The Transformer Revolution#

Introduced in 2017 (Vaswani et al., “Attention Is All You Need”, Advances in neural information processing systems, 30), the transformer architecture replaced recurrence with attention.
It allowed models to:

  • Understand relationships across long text sequences.

  • Train efficiently in parallel on large datasets.

  • Scale to massive sizes, enabling GPT, BERT, Claude, Gemini, and others.

Key components:

  • Self-Attention/Multi-Head Attention: The central mechanism that weighs the relevance of every token to all others to compute contextual understanding.

  • Encoder and Decoder Stacks: The overall structure, where the Encoder processes the input sequence and the Decoder generates the output sequence.

  • Input Embeddings (with Positional Encoding): Converts tokens into vectors and adds crucial information about the sequence order, as the model processes words simultaneously.

  • Feed-Forward Network (FFN): A layered neural network applied after attention to further refine and transform the representation of each token.

  • Residual Connections & Layer Normalization: Structural elements used around every sub-layer to stabilize the training of very deep models.

Part III — Capabilities and Applications#

LLMs are general-purpose language engines — capable of performing multiple tasks without explicit reprogramming.

3.1 Core Capabilities#

  • Text generation and summarization

  • Question answering and tutoring

  • Translation and language understanding

  • Coding and data analysis

  • Creative writing and ideation

  • Dialogue and conversation (chatbots, assistants)

3.2 Cross-Modal Generative AI#

The boundaries between modalities are dissolving — many modern models are multimodal. They can process text, images, audio, and video in the same framework.

Examples:

  • GPT-4o, Gemini, Claude 3.5 — integrate text, image, and speech.

  • Runway Gen-2, Pika — generate video from text prompts.

  • ImageBind — learns representations across six sensory modalities.

Multimodal AI represents a step toward synthetic general perception, a system that sees, hears, and speaks like humans do.

Part IV — Limitations and Beyond#

Despite their power, LLMs are not intelligent in a human sense, they rely entirely on learned statistical associations.

4.1 Key Limitations#

  • Hallucination — producing plausible but false or unverifiable information.

  • Data bias — reflecting stereotypes or systemic biases in training data.

  • Opacity — lack of transparency in model reasoning.

  • Context sensitivity — difficulty in maintaining consistency over long dialogues.

  • Resource intensity — high energy, data, and compute demands.

An LLM’s “creativity” is a form of statistical generalization, not conscious invention.

4.2 Human-AI Collaboration and the Role of Prompts#

Generative AI becomes truly powerful when guided by human intention. A well-crafted prompt provides context, constraints, and purpose.

  • Prompts are instructions or cues that shape how the model responds.

  • Effective prompting involves specifying:

    • Instruction: The core “what to do” command (e.g., “Summarize,” “Translate,” or “Write”).

    • Context (Optional): The “who or why” framing information (e.g., specifying a persona like “You are a legal expert” or a setting).

    • Input Data: The “what to process” material (e.g., a long article, a block of code, or a specific question).

    • Output Format (Optional): The “how to structure it” requirement (e.g., “in bullet points,” “as a YAML file”, or “using a friendly tone”).

  • This is the basics of Prompt Engineering, covered in the next module.

The intelligence of GenAI emerges not only from the model but from the interaction between human and machine.

4.3 The Generative AI Landscape#

Model Type

Input → Output

Examples

Typical Applications

Text-to-Text (LLMs)

Text → Text

GPT, Claude, Gemini

Writing, tutoring, summarizing

Text-to-Image

Text → Image

DALL·E, Midjourney, Stable Diffusion

Art, design, visualization

Text-to-Audio

Text → Sound/Music

Suno, MusicLM

Sound design, storytelling

Text-to-Video

Text → Video

Runway, Pika, Sora

Animation, film previsualization

Multimodal Models

Text + Image + Audio → Mixed Output

GPT-4o, Gemini, ImageBind

Assistants, education, perception AI

4.4 The Human-Centered Future of Generative AI#

Generative AI challenges us to redefine what it means to create, learn, and communicate. Its value depends not only on what it can generate, but on how responsibly and creatively we use it. The future belongs to those who understand both the capabilities and the limits of AI, and who can combine machine efficiency with human purpose.

Reflection#

What distinguishes human creativity from machine generation?
How can we ensure that generative AI enhances, rather than replaces, human imagination?

Reflect on how GenAI changes your relationship with knowledge, authorship, and originality — and how prompt design can help maintain control, integrity, and ethics in AI-assisted creation.

📘 Further Reading#

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning., The MIT Press.

  • Vaswani, A. et al. (2017). Attention Is All You Need., Advances in neural information processing systems, 30.

  • Bommasani, R. et al. (2022). On the Opportunities and Risks of Foundation Models., arXiv preprint arXiv:2108.07258 https://arxiv.org/abs/2108.07258.

  • Mitchell, M. (2020). Artificial Intelligence: A Guide for Thinking Humans., Picador Paper.

  • Dendritic Institute (2025). AI Literacy Series – Module 4: Generative AI and Large Language Models. (Slides & video lecture)