Participants
The study involved 42 children with autism (37 males and 5 females; average age 6.17, with standard deviation 1.00 and range of values [4.2–7.8]), recruited between January and June 2023 from various sources dedicated to supporting individuals with autism in the Spanish regions of Álava and Cantabria, including child psychiatry and pediatric outpatient clinics, family associations, and school counseling professionals. The inclusion criteria were: (1) ASD diagnosis without other psychiatric comorbidities, as per the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; American Psychiatric Association,
2013); (2) non-verbal IQ (NVIQ) ≥ 70 as measured by the Leiter-3 non-verbal intelligence scale (Koch et al.,
2019; Roid & Miller,
2013); (3) aged 4–8; and (4) comprehension of brief instructions and the ability to point.
Initially, we selected 51 children previously diagnosed with ASD at mental health units. A child psychiatrist reviewed their medical records, confirming ASD diagnoses per DSM-5 guidelines and absence of comorbidities, based on parent interviews and patient evaluations. Nine children did not meet the inclusion criteria: two had comorbidities with attention deficit/hyperactivity disorder (ADHD), seven were excluded for not demonstrating comprehension of brief instructions or not displaying the ability to point.
After explaining the study, parents or legal guardians consented by signing forms. This research had approval from the Ethics Committee for Clinical Research of Cantabria (CEIC-C) and the University of the Basque Country’s Ethics Committee for Research Involving Human Beings (CEISH).
Measurement Variables and Instruments
Autism severity was assessed using the chart from Gotham et al. (
2009, p. 699 [Table 2]), which gathers ADOS-2 (Lord et al.,
2012) modules 1, 2, and 3 for ages 2–16. Scores are mapped from ADOS-2 modules, age, and score received in the ADOS-2 to a severity score, ranging from 1 to 10: 1–3 as NS (“Non-spectrum”), 4–5 as ASD (“Autism Spectrum Disorder”), and 6–10 as AUT (“Autistic”).
Non-verbal intelligence was measured with the Leiter-3 scale (Koch et al.,
2019; Roid & Miller,
2013), which must be fully administered without verbal instruction or requiring verbal responses from participants. It comprises three sets of subtests: Fluid Intelligence subtests, Attention and Memory subtests, and Social/Emotional scale. The subtests comprising Fluid Intelligence provide a non-verbal intelligence quotient (NVIQ). Within Fluid Intelligence, two groups of subtests can be formed: Visuo-spatial abilities (Figure Ground, and Form Completion), and Reasoning (Classification and Analogies, and Sequential Order). Figure Ground consists in finding figures within visual displays of increasing complexity, while Form Completion requires the participant to mentally ensemble parts of a geometrical display. Concerning Reasoning, Classification and Analogies taps onto reasoning by analogy, while Sequential Order is a task similar to Raven’s matrices, measuring the meaning making ability of participants. The children’s NVIQ distribution within our sample was typical, with five participants (11.9%) falling within the 70–85 range, compared to an expected 13.5% in a normal distribution with zero mean and a standard deviation of 15 (Roid & Miller,
2013: p. 159). Symmetrically, other five participants had a NVIQ between 115 and 130.
Receptive vocabulary skills were assessed using the PPVT-III (Spanish version by Dunn et al.,
2010). This test can be administered to children of ages starting at two years and six months. This is a pointing task, whereby the experimenter utters a target vocabulary item, and the child has to point at the right pictures given four alternatives. There are 16 blocks of 12 items each, ordered by difficulty. Correct and incorrect answers are collected until eight mistakes are made within the same block, which leads to the end of the test. The test is calibrated according to the breadth of vocabulary at each Chronological Age (CA) in typical populations, providing both a direct score and an estimated verbal mental age (VMA). Receptive vocabulary was transformed in this study as (VMA-CA)/CA for test comparison.
Grammatical skills were evaluated using the Spanish Test of Comprehension of Grammatical Structures (CEG; Mendoza et al.,
2005a),
a grammatical scale inspired by Bishop’s (
2003) Test for Reception of Grammar (TROG-2; see, for the English language, Mendoza et al.,
2005b). CEG is a clinical tool for children from 4 to 11 years old, and consists of 20 blocks, each one identifying one linguistic structure of Spanish as defined by the authors of the test. It is thus not strictly speaking a tool of evaluation of grammatical competence, but rather of command of specific grammatical structures. Each block is composed of four items exemplifying each structure. These include transitive constructions (reversible and non-reversible depending on whether it is sensible to analyze the first constituent as the subject or object), copular sentences, sentential negation, clefts, clitic left dislocation structures, or relative clauses of different sorts, to name a few examples. In this test, the experimenter reads a target sentence and the child has to point at the right picture in view of four alternatives, much like in the PPVT-III. All children complete the entire test and the number of errors is counted. In relation to this, unlike what happens with the TROG-2, here blocks are not ordered by degree of complexity, even if the first sets of blocks illustrate simpler syntactic structures than the last ones. For our analysis, we collected the number of correct blocks by each child and computed
z-scores based on expected values and standard deviations for different age intervals given by Mendoza et al. (
2005b).
Finally, early mathematical abilities were evaluated using the TEMA-3 (Ginsburg et al.,
2007; Ginsburg & Baroody,
2003), designed for children aged 3 to 9. The instrument’s internal consistency has been reported at 0.90 for TD population (Ginsburg & Baroody,
2003) and the test has been employed in studies involving children with intellectual and developmental disabilities (e.g., Vostanis et al.,
2021) and autism (Fernández-Cobos & Polo-Blanco,
2024; Polo-Blanco et al.,
2024). This performance-based test comprises 72 items that assess both formal (31 items) and informal (41 items) mathematical skills. Within informal skills, the following categories are distinguished: (1) numbering (subitizing, mastery of numerical sequence through tasks involving basic counting and enumeration skills, cardinality principle), (2) comparison (ability to establish relative distances between numbers, order of the numerical sequence), (3) calculation (strategies supported by counting, with and without concrete objects, and non-verbal mental calculation skills), and (4) concepts (numerical constancy, application of advanced counting strategies, basic understanding of object distribution, part-whole relationship). As it can be seen, TEMA-3 evaluates several relevant mathematical skills, although such skills do not need to constitute a completely exhaustive inventory of the early mathematical skills. Test direct scores (
DS) range from 0 to 72, with one point awarded for each correct item, ending after five consecutive incorrect responses. To compare mathematical performance with that expected in TD children, we computed
z-scores as
\(\:\left(DS-\stackrel{-}{DS}\right)/\sigma\:\left(DS\right)\), using expected values (
\(\:\stackrel{-}{DS}\)) and standard deviations (
\(\:\sigma\:\left(DS\right)\)) for each age interval from the Spanish standardization sample in Ginsburg et al. (
2007, p. 88). Although TEMA-3 provides a mathematical competence index (MCI), which is a standardized variable, we decided to use
z-scores because they are more sensitive at low and high performance (MCI saturates at 55 and 150, corresponding to
\(\:\pm\:3\sigma\:\)). Since the statistical information is not available for separate scores of informal and formal skills, relative differences between the obtained score and the expected score for the participant’s age were considered for informal and formal total scores, and for numbering (which includes a significant proportion of the informal items). The small number of items for the rest of categories at small ages makes no possible a robust analysis.
Analysis
We classified mathematical performance into three levels (low, medium and high) based on TEMA-3 z-scores using \(\:\pm\:1.5\sigma\:\:\)thresholds. The z-scores allows us to compare the distribution of mathematical competence in our sample with that expected from TD children without requiring a control group, for instance, examining whether the number of children within the low or high-performance groups significantly differs from the TD expectations. To address the second research question, we analyzed clinical, cognitive and linguistic variables across groups of mathematical performance (hereinafter, LMP, MMP and HMP, for low, medium and high mathematical performance). For instruments yielding direct scores expected to increase with age in TD children (TEMA-3, PPTV-III, and CEG) we employed standardized variables (z-scores or relative differences). ANOVA tests were employed to compare the average of each variable across different mathematical performance groups, with Bonferroni correction applied to minimize type-I error in multiple comparisons. Statistically significant differences between mathematical performance groups would suggest that mathematical performance may explain the distribution of the corresponding variable in our sample.
On the other hand, Pearson’s correlation test with Bonferroni correction was used to examine which clinical, cognitive and linguistic variables may be related with mathematical performance, and also to investigate whether the possible associations are more linked to informal or formal mathematical skills. Guided by this previous exploration, a predictive model was built using Stepwise Regression (Harrell,
2015) to identify variables with the strongest predictive power for mathematical performance. Variables are sequentially added in the model based on highest correlation with the dependent variable (mathematical performance). An independent variable is entered into the equation only if it meets the entry criterion (in this case,
p < 0.05). Variables already entered in the regression equation can be removed from the model if they meet the output criterion (
p > 0.10). The method terminates when there are no more candidate variables to include or eliminate.