XJTLU Learning Mall

Abstract

In 2025, the Suzhou Senior High School Entrance Exam (Zhongkao) English speaking test removed model answer prompts. This reform places entirely new demands on students’ ability for independent expression and logical organization; it also highlights the pain points of traditional oral instruction and the limitations of current training systems. To address the difficulties of personalized pronunciation correction, low student speaking frequency, and the heavy grading burden on teachers in traditional settings—as well as the “one-size-fits-all” standardized shortcomings of existing market systems—this study utilizes Long-Term Memory AI technology to create the SuzhouSpeak AI Personalized English Speaking Training System, based on an AI project from Xi’an Jiaotong-Liverpool University.

Guided by the core principle of “Teacher-Led, Tech-Adapted,” the system builds dynamic learning profiles through in-depth analysis of students’ historical training data. This enables precise diagnosis of weak points and the recommendation of personalized learning paths, ultimately forming a precision teaching closed-loop of “Teacher Guidance — AI Assistance — Intensive Student Training — Feedback Optimization.” Short-term teaching practice demonstrates that this system not only significantly enhances the focus and effectiveness of student oral training but also greatly reduces the administrative workload for teachers. It provides a practical and replicable model for the digital transformation of English speaking instruction in secondary schools.

Keywords: Long-Term Memory AI; Computer-Mediated English Speaking Exam; Precision Teaching; Personalized Training; Digital Transformation of Education

I. Research Background: The Practical Dilemmas of Oral Instruction under Exam Reform

In 2025, the Suzhou Senior High School Entrance Exam (Zhongkao) English speaking test underwent a pivotal transformation: the removal of model answer prompts, requiring students to engage in independent expression based on specific topics. This change signifies that the exam no longer assesses a student’s “imitation skills” but has shifted toward a comprehensive evaluation of “language output capacity, logical thinking skills, and situational adaptability”. While this reform underscores the importance of fostering students’ core competencies, it simultaneously presents a massive challenge to frontline English instruction. For a long time, the numerous issues inherent in traditional oral teaching and existing training systems have only been further magnified by this new reform landscape.

1.1 The Triple Core Pain Points of Traditional Oral Instruction

Against the backdrop of large-scale classroom instruction, traditional English oral teaching has long faced a triple dilemma that is difficult to overcome, directly restricting the improvement of students’ speaking abilities.

First, the difficulty of achieving personalized pronunciation correction. Junior high school English classes typically consist of around 40 students, with some schools even exceeding 50. Within a 45-minute oral lesson, a teacher must explain key knowledge points and organize classroom activities, leaving less than one minute of one-on-one guidance time for each student. Consequently, teachers struggle to identify and correct individual pronunciation issues such as “improper tongue placement for the ‘th’ sound,” “confusion in the pronunciation of past tense ‘-ed’ endings,” or “unnatural linking and weakening of sounds”. In most cases, teachers can only provide collective explanations for common class errors, causing the individual differences of students to be entirely overlooked and leaving some students’ stubborn pronunciation errors unresolved for long periods.

Second, a severe lack of student speaking frequency. In an exam-oriented environment, classroom time allocation tends to lean heavily toward written content such as vocabulary, grammar, reading, and writing. Oral training is often treated as an “ancillary component,” where classroom practice consists mainly of “choral reading” or “demonstrations by a few students,” leaving very few opportunities for the majority to engage in genuine independent expression. After-school training mostly becomes a mere formality due to the lack of effective supervision and guidance mechanisms. This leaves students in a long-term state of “high input, low output,” making it difficult to improve their proficiency and confidence in independent expression; when faced with the new Zhongkao requirements, they often find themselves “at a loss for words” or “logically disorganized”.

Third, the heavy burden and low efficiency of teacher grading and feedback. Traditional methods of grading oral assignments are primarily manual; teachers must listen to each student’s audio recording, award scores based on grading rubrics, and record error points. This process is time-consuming and labor-intensive, often taking several hours for a teacher to grade a single class’s assignments. More critically, the feedback cycle for manual grading is long, and students usually have to wait a significant amount of time to receive their results. By then, their original thought process for the answer has become blurred, greatly diminishing the timeliness and effectiveness of the feedback. Furthermore, manual grading makes it difficult to perform systematic statistical analysis of errors, preventing teachers from accurately tracking the changing trends in each student’s weak areas and leaving subsequent instructional guidance without a clear focus.

1.2 Application Limitations of Existing Training Systems

To alleviate instructional pressure, many schools have introduced commercialized English computer-mediated oral training systems, such as the iFlytek training system. While these systems play a foundational role—providing standardized content aligned with Zhongkao exam types, automating voice recording and scoring, and saving teachers time in organizing practice—they exhibit significant limitations when measured against the personalized teaching requirements of the Suzhou Zhongkao reform. They struggle to meet the pedagogical goal of “teaching according to aptitude”.

First, evaluative feedback is too generalized, making it impossible to locate personalized issues. Scoring dimensions in existing systems are typically limited to a few categories like “Pronunciation,” “Fluency,” and “Content,” providing only a final score without detailed error analysis. For example, if a student mispronounces past-tense “-ed” endings during a reading task, the system merely deducts points under “Pronunciation” without specifying the error type. Similarly, if a student demonstrates “logical incoherence” in a topic summary, the system fails to provide specific suggestions for improvement. Students are left knowing they “scored low” without understanding “where they went wrong” or “how to improve,” often falling into a cycle of “repeated practice without progress”.

Second, training content is homogenized, lacking differentiated design . Current systems employ a “one-size-fits-all” delivery model where all students use the same materials regardless of their specific weaknesses. For instance, a student weak in “linking sounds” is still pushed generic reading materials rather than targeted linking exercises. Conversely, a student with strong grammar but thin content expression receives no specialized guidance on expanding topical ideas. This homogenized model ignores individual differences, leading to low training efficiency.

Finally, there is a lack of a long-term memory mechanism, making it impossible to track learning trajectories. Each session in existing systems is treated as an independent event. They do not record historical training data or analyze trends in a student’s ability over time. For example, if a student misses an article in Week 1 and repeats the same mistake in Week 3, the system cannot identify this as a “stubborn error” or push reinforced training. Consequently, teachers must manually track historical errors—a process that is both labor-intensive and difficult to maintain for data accuracy and comprehensiveness.

In summary, neither traditional oral instruction nor existing training systems can effectively address the challenges posed by the Suzhou Zhongkao reform. There is an urgent need in junior high school English teaching to construct an intelligent training system capable of accurately capturing individual differences, dynamically tracking learning trajectories, and providing personalized guidance.

II. System Design: A Precision Training Scheme Driven by Long-Term Memory AI

To resolve the pain points of traditional instruction and existing systems, this research established the core principle of “Teacher-Led, Tech-Adapted”. Core members of the Junior High School English Teaching and Research Group at the Affiliated School of JTLU participated in the entire design and construction process. Starting from actual pedagogical needs, they refined the Suzhou Zhongkao oral scoring standards and identified typical instructional issues to ensure the system’s functions are deeply integrated with teaching scenarios. Ultimately, based on a collaborative AI project with Xi’an Jiaotong-Liverpool University, the SuzhouSpeak AI Personalized English Oral Training System was developed using Long-Term Memory AI.

2.1 Core Technical Architecture

The SuzhouSpeak AI system utilizes a decoupled frontend and backend architecture to ensure stable operation, ease of use, and fulfillment of personalized teaching requirements.

Frontend Architecture: Developed using React + TypeScript, the interface design is clean and intuitive, catering to the usage habits of both secondary students and teachers. The student and teacher interfaces are differentiated: the student end emphasizes “training and feedback,” while the teacher end prioritizes “learning analytics and instructional management”.

Backend Architecture: The system integrates the DeepSeek multi-modal large model, which possesses powerful capabilities in speech recognition, natural language processing, and data analysis. By invoking the DeepSeek API, the system achieves core functions such as precise assessment of student speech, automatic identification of error points, and intelligent recommendation of learning paths.

Localization of Scoring Standards: To ensure the system’s scoring aligns with official exam standards, the research group conducted an in-depth study of the 2025 Suzhou Zhongkao English Computer-Mediated Oral Examination Guidelines. Four core competency indicators were extracted: Pronunciation, Fluency, Grammar, and Content, with detailed rubrics developed for each. For example, “Pronunciation” is subdivided into accuracy, stress/intonation, and linking/weakening; “Content” is subdivided into topical relevance, logical coherence, and lexical richness. The system’s scoring logic is entirely based on these detailed rules to ensure the relevance of training.

2.2. Core Application of the Long-Term Memory Mechanism

The Long-Term Memory (LTM) mechanism is the core innovation that distinguishes the SuzhouSpeak AI system from traditional training platforms. The system breaks through the limitations of “single-session training and evaluation” by deeply tracking students’ historical training data to construct dynamic and comprehensive learning profiles, thereby achieving precision training that truly “teaches according to aptitude”.

2.2.1 Bidirectional Memory Data Tracking

While traditional training systems focus on student errors, SuzhouSpeak AI employs a bidirectional memory tracking mode that records both typical errors and highlights strengths in performance.

Error Tracking: The system automatically records pronunciation errors, grammatical mistakes, and logical issues occurring in every training session, providing statistical categorization by error type. For example, it might record that “past tense -ed ending pronunciation errors occurred 5 times” or “article omission occurred 3 times,” marking these as “high-frequency errors”.

Strength Tracking: The system identifies “bright spots” such as “natural and fluent speech” “appropriate pacing” “correct stress placement,” or “rich topical content,” and integrates these strengths into the learning profile. This design helps students clarify their direction for improvement while reinforcing their confidence, preventing the academic burnout that can arise from an over-focus on mistakes.

2.2.2 Exponential Moving Proficiency Algorithm

To prevent single-session performance fluctuations from skewing the assessment of a student’s true ability, the system introduces a 0.7/0.3 weighted smoothing coefficient. Specifically, a student’s current proficiency rating is composed of 70% of their historical average and 30% of their current training score.

This algorithm effectively filters out performance dips caused by nerves or carelessness, as well as outliers caused by unusually easy topics, reflecting the student’s actual ability level more authentically. For instance, if a student with an average of 80 points scores only 70 due to anxiety, the system calculates their current proficiency as 77 points. This acknowledges the session’s shortcomings without disproportionately affecting their overall rating, ensuring objective evaluation results.

2.3 Covering the Full “Teach-Learn-Assess-Refine” Instructional Cycle

The SuzhouSpeak AI system is designed with two core modules—Student End and Teacher End—covering the entire “teach-learn-assess-refine” process to form a “Teacher-Led, AI-Assisted” precision teaching closed-loop.

2.3.1 Student End: Personalized Autonomous Training

The primary goal of the student end is to ensure students “know their problems, know how to improve, and are willing to train actively”. Its specific functions are as follows:

AI Memory Profile Display: Upon logging in, the homepage displays a personal learning portrait clearly presenting strengths, weaknesses, high-frequency error types, and ability trends. For example, a profile might show “Strength: Natural Intonation; Weakness: Past tense -ed endings; High-frequency error: Article omission”. This allows students to understand their status intuitively and avoid aimless practice.

Dual-Mode Training Path Selection: Students can choose a training mode based on their needs. The first is the “AI Smart-Path for Score Improvement,” where the system automatically recommends targeted content based on the learning profile (e.g., pushing linking-sound drills to students weak in that area). The second is the “Autonomous Specialized Question Bank,” allowing students to select content based on personal interest or specific weaknesses.

Real-time Training and Deep Evaluation Feedback: As students practice, the system records audio in real-time and generates a deep evaluation report immediately upon completion. The report includes four core elements: official Suzhou Zhongkao standardized scores; a detailed analysis of “bright spots” and areas for improvement; the LTM decision logic (explaining which historical data informed the result); and a “3-Day Sprint Recommendation” (e.g., Day 1: -ed ending drills; Day 2: Reading passages with -ed endings; Day 3: Topic summary recording). This report essentially provides each student with a 1-on-1 dedicated oral coach.

2.3.2 Teacher End: Precision Instructional Guidance

The core objective of the teacher end is to “reduce teacher burden, accurately locate instructional priorities, and achieve personalized guidance.” The specific functions include:

Class Learning Analytics Dashboard: Upon logging into the system, teachers can view the overall class performance via a dashboard. This includes class averages, achievement rates for various competency indicators, the most concentrated weak points, and a list of students needing breakthrough support. For example, the dashboard might show that “out of 40 students, 30 are struggling with past-tense -ed endings,” allowing the teacher to quickly pivot instructional focus and adjust lesson plans.

Precision Individual Student Profile Queries: Teachers can click to view any individual student’s portrait to understand their historical training data, ability trends, and specific strengths and weaknesses. The system also automatically generates targeted improvement suggestions for the teacher’s reference. This feature eliminates the time teachers previously spent manually tallying data and analyzing student issues, significantly reducing their administrative workload.

One-Click Deployment of Personalized Learning Packs: Teachers can combine the system’s suggestions with their own pedagogical experience to customize training tasks and deploy personalized learning packs with a single click. For instance, a teacher can send a pronunciation-specific pack to students with weak phonetics or a topical expansion pack to those with thin content expression. Once students receive the pack, the system reminds them to complete the training, and teachers can track their progress in real-time.

Dual Feedback via AI Assistance + Teacher Coaching: After a student completes a session, the system provides an initial AI assessment. The teacher then combines this AI feedback with the student’s actual performance to provide secondary guidance and emotional encouragement. This “AI Assessment + Teacher Coaching” model leverages the efficiency of AI while retaining the essential human touch of a teacher, ensuring maximum instructional impact.

2.4 Explainable AI (XAI) Design

To enhance the credibility and actionable nature of the system’s recommendations, SuzhouSpeak AI specially designed an “AI Decision Explainability” module. When generating improvement tips or recommending training paths, the system simultaneously displays the underlying decision logic.

For example, if the system recommends “Specialized practice for past-tense -ed endings,” the displayed logic would be: “Based on your historical training data, past-tense -ed pronunciation errors have appeared 5 times, qualifying as a high-frequency error; in your last three sessions, this error has not shown significant improvement, therefore specialized intensive training is recommended.” Both students and teachers can clearly understand the basis for the suggestion, avoiding the distrust associated with “black-box” operations and making teachers more willing to adopt the suggestions and students more motivated to follow them.

III. Practical Outcomes: Advantages and Effects of the Precision Training Model

To verify the actual impact of the SuzhouSpeak AI system, we conducted a short-term teaching practice involving two parallel ninth-grade classes at the Affiliated School of JTLU. The experimental class used the SuzhouSpeak AI system, while the control class used a traditional standardized training system. The trial lasted 8 weeks, evaluating the system’s value through teacher and student feedback and a comparison of training efficiency.

3.1 Core Advantages Compared to Traditional Training Models

3.1.1 More Precise Diagnosis, Eliminating “Aimless Training"

Traditional systems provide only vague score feedback, leaving students unaware of their specific issues. In contrast, SuzhouSpeak AI utilizes the long-term memory mechanism to accurately locate personalized weaknesses and high-frequency errors.

For example, a student in the experimental class repeatedly struggled with “improper tongue placement for the ‘th’ sound.” By tracking historical data, the system marked this as a high-frequency error and pushed specialized training materials. After targeted practice, the frequency of this error decreased significantly. Meanwhile, students in the control class with the same issue only received general training because their system could not identify the individual error, leaving the problem unresolved over the long term.

3.1.2 Higher Training Efficiency and Enhanced “Sense of Achievement"

The personalized learning pack delivery feature of SuzhouSpeak AI avoids ineffective repetitive training, greatly boosting efficiency. One student in the experimental class noted, “Before, practicing speaking felt like reading aimlessly without knowing what to focus on; now, the system recommends specific materials, and I feel my problems are actually improving.”

Practical data showed that the training completion rate for the experimental class reached 95%, far exceeding the 75% observed in the control group. This indicates that personalized training enhances a student’s sense of achievement in learning, which in turn increases their training motivation.

3.1.3 Reduced Teacher Burden: Releasing “Core Instructional Energy"

The SuzhouSpeak AI system’s functions—such as learning analytics, automated scoring, and error statistics—have significantly lightened the administrative workload for teachers. A teacher involved in the practice shared: “Previously, grading oral assignments took several hours and required manual tallying of student errors; now, the system handles this automatically. I only need to check the learning analytics dashboard to identify class-wide issues and provide targeted explanations”.

Preliminary statistics indicate that teachers’ grading time has been reduced by more than 60%. The time saved can now be redirected toward core instructional phases such as lesson design and personalized guidance, thereby enhancing overall teaching quality.

3.2 Practice Feedback from Teachers and Students

3.2.1 Teacher Feedback

Ms. Gu, a ninth-grade English teacher involved in the practice, stated: “SuzhouSpeak AI has truly solved a major hurdle in our teaching. Through the class dashboard, I can identify the weaknesses of the entire class at a glance, making instructional priorities immediately clear. The system-generated personalized learning packs are also very practical, saving us the time usually spent screening materials. More importantly, student motivation has notably increased, and more students are taking the initiative to speak up in class”.

Another teacher noted: “The individual student profile feature is excellent; I can clearly see each student’s progress trajectory and offer targeted praise in class, which significantly boosts their self-confidence. This ’Teacher-Led + AI-Assisted’ model makes teaching more precise and efficient”.

3.2.2 Student Feedback

Students in the experimental class generally reported that the SuzhouSpeak AI system made oral training “purposeful and effective”. One student remarked: “I used to fear oral practice because I didn’t know where my mistakes were, and I saw no progress despite practicing for a long time. Now, the system tells me what is wrong and teaches me how to fix it; I feel my speaking is getting better and I am becoming more courageous”.

Another student added: “The system records my strengths, like ‘natural intonation,’ which gives me a great sense of achievement. I can see my progress in every session, and that feeling is wonderful”.

IV. Conclusion and Insights

The SuzhouSpeak AI Personalized English Oral Training System, empowered by Long-Term Memory AI, effectively resolves the pain points of traditional oral instruction through core functions such as dynamic profile construction, precise weakness diagnosis, and personalized path recommendations. It serves as an effective response to the challenges of the Suzhou Zhongkao English computer-mediated exam reform. Short-term practice shows that the system enhances the focus and effectiveness of training while reducing teacher workload, achieving the dual goals of “teacher relief and student improvement”.

This research practice offers several core insights for the development of AI in education:

The value of AI in education lies in its alignment with pedagogical principles. Technical sophistication is not the sole measure of a system’s value; the key to success is whether it accurately solves teaching pain points and fits frontline needs. The teacher-led design of SuzhouSpeak AI—starting from needs and returning to practice—is the fundamental reason for its effectiveness.

The Long-Term Memory mechanism is the critical technical support for personalized teaching. Traditional online systems are often “stateless” and cannot track learning trajectories. LTM allows AI to “remember” each student’s history and match individual needs, which is a core advantage for personalized empowerment.

"Teacher-Led, AI-Assisted” is the optimal model for AI in education. The heart of education is interpersonal interaction and emotional encouragement, which AI cannot replace. SuzhouSpeak AI functions as a “Smart Teaching Assistant,” handling data and routine tasks so teachers can focus on instructional design, personalized guidance, and emotional support.

In the future, we will expand the scope of our practice and introduce control groups for quantitative research to further verify effectiveness by comparing oral ability changes. We will continue to optimize system functions and deepen the “Teacher-Led — AI-Assisted” integration, ensuring technology truly serves teaching and provides every student with a personalized learning experience.

References

Ministry of Education. Education Informatization 2.0 Action Plan[Z]. 2018.
Jiangsu Provincial Educational Examination Institute. 2025 Suzhou Senior High School Entrance Exam English Computer-Mediated Oral Examination Guidelines[Z]. 2024.
Anderson, J. R. (2002). Spanning seven orders of magnitude: A split-second view of human cognition. American Psychologist.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.
DeepSeek API Documentation[EB/OL]. 2025.

Author Profile: Gu Yunfei, English Teacher and Preparation Group Leader at the Affiliated School of JTLU. Research interests include AI-empowered foreign language teaching and personalized learning path design.