Scoping Review • 2015-2025

Theories and Mechanisms for
AI-Powered ESL Speaking Design

Synthesizing 17 empirical studies to architect the next generation of language learning systems. Moving beyond random practice to structured, adaptive, and multi-modal competence.

The Core Challenges

Lack of Practice

Insufficient time for meaningful interaction in typical classrooms. Large class sizes prevent individualized practice.

Speaking Anxiety

Fear of negative evaluation leads to reluctance. Foreign language anxiety hinders willingness to communicate (WTC).

Inadequate Feedback

Teachers cannot provide immediate, consistent, and individualized feedback to every student due to time constraints.

Evidence-Based Foundation

A rigorous scoping review following PRISMA guidelines, filtering 2,877 records down to 17 mechanism-rich empirical studies.

2,877
Initial Records
17
Included Studies
47%
Published 2024-25
63%
East Asian Context

Geographic Focus

East Asia Other

Selection Process

1
Identification
2
AI+Human Screen
3
Final Inclusion

Theoretical Foundations

Six key pillars deriving the "Why" behind effective system design.

Skill Acquisition Theory

Progression from declarative (rules) to procedural (practice) to automaticity. Requires high volume practice.

Ref: Li & DeKeyser (2019)

Desirable Difficulties

Effortful retrieval stabilizes memory. Optimal Inter-Session Intervals (ISI) create difficulty that enhances long-term retention.

Ref: Suzuki (2017)

Noticing Hypothesis

Learners must consciously notice the "gap" between their output and the target. Feedback makes this gap salient.

Ref: Schmidt (1990)

Transfer-Appropriate

Practice should resemble target cognitive processes. Interleaved practice simulates real-world conversational flexibility.

Ref: Zhang et al. (2023)

Constructionist Learning

Language is acquired by entrenching form-meaning pairings ("constructions") through repeated use.

Ref: Suzuki et al. (2022)

Sociocultural Theory

Learning occurs in the ZPD with a "More Knowledgeable Other" (MKO). AI/Peers act as non-threatening MKOs.

Ref: Vygotsky (1978)
Imperative 1

Architected Practice

The "Double-Edged Sword": Massed practice builds speed but harms flexibility. The system must transition schedule types.

Phase 1: Blocked (AAA)

Goal: Within-task fluency & Proceduralization

A
A
A
Articulation Rate +20%

Phase 2: Interleaved (ABC)

Goal: Transfer & Flexible Retrieval

A
B
C
High Cognitive Load

Phase 3: Spaced Repetition

Goal: Long-term Retention

A
A
Imperative 2

Tri-Modal Feedback Loop

No single AI modality is sufficient. A robust system combines three distinct layers of feedback.

1. Elicited Feedback (Self-Repair)

"Can you improve that sentence?" Prompting self-correction before showing answers boosts metacognition.

2. ASR Feedback (Form)

Explicit, color-coded feedback on segmental errors (pronunciation, phonemes). Effect size g=0.69

3. LLM Feedback (Discourse)

Assessment of naturalness, coherence, and sociolinguistic appropriateness. Scaffolds anxiety reduction.

Implementation

Structured Pedagogical Frameworks

The BOPPPS Cycle

B

Bridge-in & Objective

Connect context & set clear goal.

P

Participatory Learning (Core)

Task execution + Tri-modal feedback loop.

S

Summary & Reflection

"What is one pattern you noticed?" (Metacognition).

Construction Reuse Dashboard

MY_SPEAKING_PATTERNS ● Live
Signature Phrases High Frequency
"I think that..." "In my opinion..."
Growth Constructions +12%
"Despite the fact..."
Practice Targets Action Required
Target: Past Narrative Tenses

Key Findings Summary

1. Scheduling is Primary

Practice architecture matters more than volume. Blocked builds foundation; Interleaved ensures transfer.

2. Multi-Modal Necessity

Combined ASR (Pronunciation) + LLM (Discourse) + Self-Repair creates holistic competence.

3. Structure Amplifies AI

AI effectiveness is dramatically higher when wrapped in frameworks like BOPPPS (Lai, 2025).

4. Construction Tracking

Tracking "reuse" of grammatical constructions is a better predictor of fluency than simple WPM.

Limitations & Future Research

Limitations

  • Narrow Demographics (56% East Asian learners).
  • Short intervention durations (< 12 weeks).
  • Methodological variability in control groups.

Future Directions

  • Validation in underrepresented regions (Africa, Latin America).
  • Longitudinal studies (6-12 months) for retention.
  • Causal investigation of construction reuse mechanisms.