LLM Anatomía de un LLM

📖 Glosario

Términos clave explicados de forma breve. Usa el buscador o filtra por categoría.

Referencias

Papers fundacionales

Attention Is All You Need — Vaswani et al. (2017)
Language Models are Unsupervised Multitask Learners (GPT-2) — Radford et al. (2019)
Scaling Laws for Neural Language Models — Kaplan et al. (2020)
Training Compute-Optimal Large Language Models (Chinchilla) — Hoffmann et al. (2022)

Modelos abiertos

LLaMA: Open and Efficient Foundation Language Models — Meta (2023)
LLaMA 2: Open Foundation and Fine-Tuned Chat Models — Meta (2023)
The Llama 3 Herd of Models — Meta (2024)
DeepSeek-V2: A Strong, Economical, and Efficient MoE — DeepSeek (2024)
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL — DeepSeek (2025)

Hardware e inferencia

FlashAttention: Fast and Memory-Efficient Exact Attention — Dao et al. (2022)
Efficient Memory Management for LLM Serving (vLLM) — Kwon et al. (2023)
llama.cpp — Gerganov et al. (2023-2026)
NVIDIA Ada GPU Architecture Whitepaper — NVIDIA (2022)
H100 Tensor Core GPU Architecture — NVIDIA (2022)

Alineamiento y alucinaciones

Training Language Models to Follow Instructions with Human Feedback (InstructGPT) — Ouyang et al. (2022)
Constitutional AI — Anthropic (2023)
GRPO: Group Relative Policy Optimization — DeepSeek (2025)

Documento v2 — Hermes — Junio 2026