Home Blog History of IQ Test

History of IQ Test

Over the last century, IQ test has gone from a classroom screening tool in Paris to a family of standardized assessments used in schools, clinics, and research labs. This guide gives an overview of how IQ testing began with Alfred Binet, how it evolved through major American revisions, why the Wechsler scales changed the scoring model, and what modern psychometrics and ethics mean for using these tests today.

Timeline of IQ Test History

Before Binet: Early Attempts to Measure Ability

In the late 1800s, psychologists experimented with timing reactions, sensory thresholds, and other basic tasks to capture “mental ability.” These efforts advanced measurement and statistics, but they didn’t directly assess higher-order thinking such as reasoning, working memory, and problem-solving, the skills most relevant to learning and day-to-day cognition (Spearman, 1904).

1905: Binet–Simon and the Birth of a School Screening Tool

In 1904, the French Ministry of Education asked Alfred Binet and Théodore Simon to identify children who needed extra academic support. They built a scale of tasks that targeted attention, memory, and problem-solving abilities not tied to specific classroom content. The result, the Binet–Simon Scale (1905), is widely regarded as the starting point of modern intelligence testing (Binet & Simon, 1905/1916). Binet introduced the idea of mental age and emphasized that test scores are practical estimates not permanent labels and that intelligence can change with experience.

Binet & Simon - Father of the first IQ test

1916: Stanford–Binet and the Rise of the Ratio IQ

In the United States, Lewis Terman adapted and standardized Binet’s work, publishing the Stanford–Binet in 1916 and popularizing the ratio formula IQ = (Mental Age ÷ Chronological Age) × 100 (Terman, 1916). With large-scale norms, scores became comparable across examinees of the same age, and the test quickly became a leading tool for educational decisions and developmental research.

1917–1920s: Army Alpha/Beta and Mass Testing

During World War I, the U.S. Army had to classify huge numbers of recruits. Psychologists developed two large-scale tests: Army Alpha (written, verbal) and Army Beta (nonverbal pictorial tasks for low-literacy or non-English speakers). The program proved standardized testing could be deployed at scale, but also foreshadowed misuses and simplistic interpretations of results in policy contexts (Yerkes, 1921).

1939 → Wechsler and the Deviation IQ: From One Number to a Profile

David Wechsler addressed the ratio method’s limitation for adults by introducing the Deviation IQ, locating an individual’s performance relative to age-peer norms (mean 100, SD 15). His family of tests—WAIS (adults), WISC (children), WPPSI (preschool)—shifted attention from a single number to a profile of index scores (e.g., Verbal Comprehension, Perceptual/Fluid Reasoning, Working Memory, Processing Speed) plus a Full-Scale IQ summary (Wechsler, 2008).

Theory Meets Measurement: From g to Today’s Models

Beneath test design, theories of intelligence evolved from Spearman’s g (a general factor explaining positive correlations among diverse tasks) to Thurstone’s Primary Mental Abilities, then the Cattell–Horn split between fluid (Gf) and crystallized (Gc) abilities, and finally Carroll’s comprehensive Three-Stratum model (Spearman, 1904; Thurstone, 1938; Cattell, 1963; Carroll, 1993). These strands merged into the widely used CHC framework, which guides modern batteries and ensures subtests map to well-specified cognitive abilities (McGrew, 2005).

Modern Psychometrics: IRT, CAT, and Re-norming

The second half of the 20th century transformed how IQ tests are built and scored:

Item Response Theory (IRT) models the probability of a correct response using item difficulty and discrimination, enabling item banks, cross-form equating, and precise ability estimates with fewer items (Embretson & Reise, 2000).
Computerized Adaptive Testing (CAT) selects items in real time based on prior responses, keeping tests short and targeted while preserving accuracy (Wainer, 2000).
Re-norming addresses generational shifts in raw scores (the Flynn effect) so that an IQ of 100 always reflects the contemporary population mean (Trahan, Stuebing, Fletcher, & Hiscock, 2014).
Together, these advances prioritize precision, fairness, and comparability over the old fixed-booklet experience (AERA et al., 2014).

What IQ Tests Can and Can’t Tell Us

An IQ score is a standardized estimate with a standard error of measurement and confidence intervals. Scores are influenced by test conditions, health, effort, and familiarity with testing. IQ tests estimate general reasoning and related abilities; they do not directly measure creativity, values, motivation, character, or life potential. Used properly and alongside other information they remain among psychology’s most validated tools for certain academic and training outcomes (AERA et al., 2014; Schmidt & Hunter, 1998).

Controversies and Ethics: Why Fairness Matters

The history of IQ testing includes clear misuses, from eugenic arguments to sweeping claims about groups. Modern practice responds with clear purposes, standardized procedures and accommodations, fairness analyses such as differential item functioning (DIF), and responsible interpretation that reports uncertainty and considers educational/medical history (AERA et al., 2014; Zumbo, 1999).

How IQ Tests Are Used Today

Despite debate, IQ assessments play defined roles across contexts:

Education: Identifying students who need support or qualify for gifted services; informing individualized plans (Wechsler, 2008).
Clinical & Neuropsychology: Tracking cognitive change, differentiating developmental conditions, and establishing treatment baselines (Wechsler, 2008).
Justice & Social Policy (jurisdiction-dependent): Contributing data about cognitive functioning in certain evaluations of capacity or responsibility (AERA et al., 2014).
Research & Technology: Providing standardized metrics in developmental and intervention studies; cautiously informing some AI-related evaluation tasks (Carroll, 1993; McGrew, 2005).

Key Takeaways

Alfred Binet laid the groundwork for standardized cognitive assessment to support education not to rank people for life (Binet & Simon, 1905/1916).
Stanford–Binet popularized the ratio IQ; Wechsler introduced Deviation IQ and ability profiles that remain the norm (Terman, 1916; Wechsler, 2008).
Advances like IRT, CAT, and periodic re-norming make scores more precise, comparable, and fair (Embretson & Reise, 2000; Wainer, 2000; Trahan et al., 2014).
IQ is useful but limited: interpret in context, report uncertainty, and combine with other evidence (AERA et al., 2014).

From Binet’s classroom tool to today’s adaptive batteries, the history of IQ testing is a story of practical goals, theoretical refinement, statistical innovation, and ethical course correction. When developed and used under modern standards, IQ tests provide a reliable window into core cognitive abilities, one piece of a larger picture that also includes personality, motivation, health, and opportunity (AERA et al., 2014; Carroll, 1993).

References

Binet, A., & Simon, T. (1905/1916). The development of intelligence in children (The Binet–Simon Scale) (E. S. Kite, Trans.). Williams & Wilkins. (Original work published 1905)
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge University Press.
Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: A critical experiment. Journal of Educational Psychology, 54(1), 1–22.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum.
McGrew, K. S. (2005). The Cattell–Horn–Carroll theory of cognitive abilities. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment (2nd ed., pp. 136–181). Guilford Press.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274.