Introduction
In the study of language acquisition, it has been proposed that nouns are learned before verbs, and that is a universal phenomenon (Gentner, 1982). This prevalence of nouns in early vocabulary is called noun bias and has been attributed to cognitive and/or perceptual predispositions in children to learn nouns. One possible explanation of this phenomenon is based on the natural partitions hypothesis (Gentner, 1982). According to it, nouns are learned before verbs based on the assumption that there is a preexisting conceptual distinction between concepts about concrete objects and people, which are simpler and hence learned earlier, and predicative concepts about actions and cause and effect, which are more complex and learned later. Another explanation of the noun bias is based on the semantic organization hypothesis, specifically the whole object bias, according to which nouns are easier to acquire both perceptually and cognitively because they represent whole objects that are perceptually easier to recognize, organize and structure based on meaning (Goldfield, 1993; Markman, 1989). Regardless of the theoretical explanation, a universalist claim such as noun bias in early language acquisition necessitates empirical confirmation from multiple and diverse languages. Past research yields conflicting findings based on the language studied and provides different methods for its investigation.
Two main methodological approaches examine the potential noun bias in language acquisition depending on how the child's expressive vocabulary is assessed: speech samples and parent reports. Speech samples are typically collected during a child-adult play interaction and can provide various measures of parts of speech. In previous studies, measures such as absolute values of noun and verb types, proportion of nouns out of total number of words, ratio between noun types out of both noun and verb types, and noun type/token ratio have been used among others (e.g., Choi & Gopnik, 1995; Ogura et al., 2006; Tardif, 1996). Relying on speech samples, no noun bias was found in the speech of Mandarin-speaking toddlers when noun type/token ratios are compared to verb type/token ratios (Tardif, 1996) and there were some mixed findings from Japanese-speaking toddlers analyzing noun types out of total nouns and verbs produced during a speech sample as well (Ogura et al., 2006). Although speech samples are a direct observation of children's spontaneous expressive vocabulary, studies have shown that the communicative context of the sample is associated with specific uses of nouns and verbs. For example, book reading is associated with the use of more nouns than verbs in children's spontaneous speech, while playing with toys is not (Ogura et al., 2006; Tardif et al., 1999). The potential role of activity type in the use of nouns and verbs, questions how representative the speech sample is of the early expressive vocabulary of the child. Furthermore, studies coding corpus data often rely on a limited number of speech samples from small numbers of participants.
In contrast, when children's early expressive vocabulary is examined with parental report, this allows for the collection of data from hundreds of children and the report is based on broader/more general observation of children's spontaneous speech across multiple different contexts in the child's daily life. In fact, noun bias has been extensively studied with the use of parent questionnaires/vocabulary checklists, such as the MacArthur Bates Communicative Developmental Inventories and the Language Development Survey (MBCDI, Fenson et al., 1994; Bates et al., 1994; Caselli et al., 1995; Rescorla & Safyer, 2013). Using parent report, past studies have reported on noun dominance in children's early expressive vocabulary for English (Bates et al., 1994), Italian (Caselli et al., 1995), French (Bassano, 2000) and Spanish (Jackson-Maldonado et al., 1993) among other languages. Examining children's early lexicon with the MBCDI (Fenson et al., 1994), some studies have focused not only on the proportions of nouns and verbs, but also on other categories and have investigated how the relative proportion of each changes with age and language development. For example, Caselli et al. (1995) conducted a cross-linguistic comparison between English- and Italian-speaking toddlers examining different semantic categories more closely. In particular, the authors focus on words for games, routines, and sound effects labeled “social words” which are frequently found in the everyday activities of young children, and on function words or closed-class words in addition to nouns and verbs. Their results revealed that the prevalence of each of these four categories varies with children's vocabulary growth with social words more common in the very early stages of lexical acquisition, then nouns take over, followed by verbs, and closed-class words emerging in the speech of children with richer lexicons. This general pattern is reported for both English-speaking and Italian-speaking toddlers (Caselli et al., 1995).
Investigating the prevalence of nouns, verbs, and other word categories, and how it changes with lexical development in multiple different languages is a way to test the noun bias universality claim. In addition, cross-linguistic studies on noun bias could help identify the potential reasons behind this phenomenon as they pertain to or not to specific language characteristics. Furthermore, investigating noun bias in the language acquisition of children with atypical development could contribute to a better understanding of potential additional constraints to the universality of the phenomenon. Examining the lexicon of children with autism spectrum disorder (ASD) presents a unique opportunity to investigate noun bias and its role in language acquisition. Studies have already reported on certain differences in the use of specific word categories (defined based on syntactic and semantic characteristics) in autism. For instance, some studies show more limited use of mental state terms (words such as think, feel, know) in ASD (e.g., Losh & Capps, 2003; Tager-Flusberg & Sullivan, 1995) in comparison with typical development (TD). Other research has focused on difficulties in deixis, particularly pronoun reversal, avoidance of personal pronouns and preference for nouns when referring to oneself and others (Lee et al., 1994; Shield et al., 2015; Tager-Flusberg, 1994). Furthermore, past research provides mixed evidence as to the presence of shape bias as an organizing principle in language acquisition in autism (Field et al., 2015; Potrzeba et al., 2015; Tek et al., 2008). All of these and other so far reported unique features of language acquisition in ASD make a strong case for the study of noun bias as a way to better understand the mechanisms behind it.
Yet, research on noun bias in autism is scarce. In a longitudinal study, Tager-Flusberg et al. (1990) compared the prevalence of nouns, verbs, modifiers and closed-class words in the speech samples of six children with ASD to those of six chronological age- and MLU-matched children with Down syndrome. The children with ASD used significantly more nouns than the children with Down syndrome, and their noun use decreased as their grammar skills increased. Using a parent report measure, Ellis Weismer et al. (2011) compared toddlers with ASD to productive vocabulary-matched late talkers across the 22 CDI-2 word categories. Word use across all word categories was equivalent across the two groups. In another study, Charman et al. (2003) reports that children with ASD's patterns of comprehension and production of words across semantic categories of the CDI-Infant Form do not broadly differ from those of the normative sample, with no more detailed comparisons. And in yet another study using parent report, Rescorla and Safyer (2013) provided a detailed comparison of ASD and TD lexical development by word category. They found no differences by semantic category in the words acquired by English-speaking children with ASD and a normative TD sample in earlier stages of their lexical development (with vocabulary between 1 and 49 words). There were, however, multiple different word categories, where children with ASD with vocabulary between 1 and 310 words used significantly fewer words than the normative sample, including foods, actions, people, etc. Overall, past studies show that in early stages of lexical development, children with ASD do not seem to differ in their relative use of words across more general categories, such as nouns and verbs. However, when more specific word categories are examined, such as words for clothes, people, places, etc., some differences between ASD and normative samples emerge.
To the best of our knowledge there are no published studies on noun bias in ASD for children acquiring a language different from English. In the present study we focus on noun bias in typical development and in ASD for Bulgarian. Currently there is only one study under review examining word vocabulary composition in the early lexicon of TD Bulgarian children. In it the analytical schema of Caselli et al. (1999) was followed, which consists of four general word categories as described above: social words, nouns, predicates, and closed-class words. This allowed for the comparison of Bulgarian findings with the findings of Caselli et al. on English and Italian to identify similarities and differences. Although a detailed report of the findings is beyond the scope of this paper, we can summarize the main conclusions as follows. Overall, there is evidence for noun bias in Bulgarian as well based on the CDI-2 data which is in line with the results for Italian and English (Caselli et al., 1999). Further, the trends in vocabulary composition across increasing levels of vocabulary size (1-50; 51-100; etc.) for Bulgarian bear greater similarity with the pattern in Italian than English, in particular as this concerns the higher ratio of social words in smaller vocabularies.
As an extension of the described study, we aim to compare the vocabulary composition of Bulgarian children with ASD to that of the Bulgarian normative sample described above. Examining noun bias in ASD for children acquiring Bulgarian, presents a unique opportunity because, on the one hand, social communication and pragmatic deficits are part of the core difficulties across the autism spectrum, on the other hand, social words, such as words for people, routines, animal sounds, are one of the most prevalent word categories in the early stages of Bulgarian lexical acquisition. To the best of our knowledge, this will be the first study to examine noun bias in ASD in a language, where social words make up the majority of the early lexicon.
To address some of the gaps in the literature on noun bias in language acquisition in ASD, the present study aims to compare word category distribution, following the Caselli et al. (1999) analytical schema, in the ASD sample with the normative sample.
Method
Participants
Normative sample
The data for the normative sample were obtained from parents of 510 children aged 16 to 30 months, with an average age of 22.47 months, including 252 girls and 258 boys. Information for most children (98%) was provided by their mothers.
ASD sample
The data set for the ASD sample consisted of 48 observation points from 28 children. Each data point is considered as a separate participant for the purposes of the analysis. Repeat observations for participants were made one year apart. The ASD diagnosis of all children in this sample was confirmed with the administration of the Autism Diagnostic Observation Schedule - 2nd ed (ADOS-2; Lord et al., 2012) at each data collection point. One child was excluded because of a serious medical condition. Six children were excluded because Bulgarian was not their primary language. The data of three children were excluded from further analyses because they had a CDI-2 total score of 0. Thus the final ASD sample consisted of 38 children aged 45 to 116 months, M = 74.5 (SD = 19.9). Information on two of the children's exact age was missing. Five of the children were girls and thirty-three were boys.
Materials
The data were collected with the Bulgarian version of CDI-2 (MacArthur Bates Communicative Development Inventories, CDI). This instrument is among the few developed tools for the study of Bulgarian language acquisition. Past studies using it have found increases in expressive vocabulary with age, and associations between vocabulary and socio-demographic factors, thus replicating findings from English-speaking toddlers (Andonova 2015, 2022a, 2022b). Parents provided information on the words their children produced in a checklist format along with details about the family environment, health status, and other relevant factors.
The Bulgarian adaptation of CDI-2 (Andonova, 2015) has a 639-word vocabulary checklist arranged into the same twenty-two semantic categories as in the US original CDI-2 (Fenson et al., 1994) which serves as a measure for toddlers' expressive vocabulary. Caregivers are asked to identify each word on the list the child uses spontaneously but are not asked to indicate the frequency of use of the word or its range of reference.
Results
We report the distribution of lexical categories in a comparative analysis between the normative sample and the ASD sample. We start with direct comparisons of vocabulary size, the ratio scores among the four analytical categories, and then proceed to examine opportunity scores. Raw scores are the number of words parents report for their children and opportunity scores indicate the percentage of all words on the CDI-2 list that a child produces. For example, if a child's raw score on the CDI-2 may be 64 (words in total) and their opportunity score in this case would be approximately 10% (from the full list of 639 words). The ratio scores are calculated as the percent words of a given category within a child's individual total score on the CDI.
The analytical categories in our study align as closely as possible with those reported in Caselli et al. (1999) and in this way provide a suitable comparative basis for the normative samples across three languages (Bulgarian, Italian, and US versions of the CDIs). Four word categories were defined as follows. The noun category includes the following word groups from CDI-2: animals, vehicles, toys, food and drinks, clothing, body parts, small household objects, furniture, and rooms. In the Bulgarian adaptation, the total number of words in this category is 273 (42.72% of the full word list). Predicates as a category comprise two word groups – 106 verb forms and 43 adjectives – totaling 149 words (23.32% of the full list). Closed-class words include pronouns, prepositions and spatial terms, question words, quantity words, conjunctions, and conjugated verb forms, amounting to 95 words (14.87% of the total inventory list). The final category consists of the so-called social words, identified in the analyses of Caselli et al. (1999) as a combination of sound words and sound effects, names for people, games and routine activities – 68 words in total (10.64% of the full list) for the Bulgarian CDI-2.
Given the objectives of this study, the total number of words from the CDI-2 that children produced was first calculated for each child, along with the corresponding percentage ratio of words from each of the four main categories in the child's vocabulary.
Vocabulary Size
The mean CDI-2 vocabulary score for the thirty-eight children in the ASD sample was 253.61 words (SD = 234.28), which was marginally higher than the normative sample mean score of 177.40 words (SD = 183.54) as shown in a t-test for independent samples, t(546) = 2.42, p = .057. The large SD values indicate that vocabulary scores varied widely for the ASD children in line with the considerable variation found among the typically developing children in the normative sample.
Vocabulary Composition
Given that lexical composition in this ASD sample was the primary focus of this investigation, we analyzed it in two different ways. First, we compared the ratios of the four analytical categories of ASD and typically developing children. We then compared the opportunity scores for the same four word categories across the samples. The first analysis allows us to draw a parallel with the analytical approach in Caselli et al. (1999) and the second analysis approximates the procedure adopted specifically for ASD by Rescorla & Safyer (2013).
ASD vs. Typical Development (Ratio Scores)
We first compare the ratio scores of the ASD and typically developing children from the normative sample on the four analytical word categories (nouns, predicates, closed-class words, and social words). The percentage of words from each of these within children's individual vocabulary are presented in Table 1. No significant differences were found between the ASD and the normative sample on any of the four word categories in a series of independent samples t-tests on the ratio of words from each of the four analytical categories, ts < 1.6, ps > .10.
The ratio (percentage) scores calculated on the basis of children's individual vocabularies reflect their distribution within an individual's lexicon, but they give no indication of their share of words within the full CDI list, i.e., the degree to which given word categories are filled up within the checklist as a whole. Utilizing the measure of opportunity scores allows us to examine these shares within the inventory.
| Normative (n = 510) | ASD (n = 38) | |
|---|---|---|
| Nouns | 37.53 | 35.7 |
| Predicates | 13.27 | 13.38 |
| CC W | 6.53 | 5.22 |
| Social W | 38.67 | 40.53 |
Note: CC W = closed class words; Social W = social words
ASD vs. Typical Development (Opportunity Scores)
Here we compare the opportunity scores of the ASD and typically developing children from the normative sample on the four analytical word categories (nouns, predicates, closed-class words, and social words). The opportunity score values are presented in Table 2. A series of independent samples t-tests revealed a significant difference between the two groups on the opportunity scores for Nouns, t(546) = 2.24, p = .031, a marginal difference for Predicates, t(546) = 1.85, p = .072, and no difference for closed-class words or social words (Table 2).
| Normative (n = 510) | ASD (n = 38) | |
|---|---|---|
| Nouns | 13.09 | 19.65 |
| Predicates | 5.62 | 8.38 |
| CC W | 2.44 | 3.53 |
| Social W | 4.81 | 4.73 |
Note: CC W = closed class words; Social W = social words
Discussion
We set out to examine vocabulary composition in ASD for a language, where both social words and nouns are most prevalent in the early lexicon. We found a trend with vocabularies of the children with ASD marginally higher than those of the normative TD sample. This difference could be attributed to the parent report used, even though such results have not been found in other vocabulary composition comparisons with a normative sample using similar expressive language measures (e.g., Charman et al., 2003; Rescorla &amp; Safyer, 2013). In our study, children with ASD were much older than the normative sample (unlike in both Charman et al. (2003) and Rescorla & Safyer (2013)), and thus their parents have had years more of observing their expressive vocabulary. Because the CDI-2 assesses word types rather than word tokens or frequency, that could explain this marginal advantage for the ASD sample. In addition, the very large standard deviations for both participant samples reflect the considerable heterogeneity in language skills in both ASD and TD (e.g., Tager-Flusberg et al., 2009).
When comparing ratio scores across the four word categories (nouns, predicates, closed-class words and social words), no significant differences were found between the ASD and normative sample. Social words and nouns were most common in the expressive vocabularies of both groups, followed by predicates and closed-class words. This is in line with past studies reporting no differences between ASD and other language-matched participants with Down Syndrome, late talkers, and normative TD samples when focusing on broader word categories in early lexical acquisition (Charman et al., 2003; Ellis Weismar et al., 2011; Rescorla & Safyer, 2013; Tager-Flusberg et al., 1990). This similar but delayed general lexical acquisition pattern is accounted for by Naigles and Tek's (2017) proposal that "form is easy, meaning is hard" in ASD. According to the proposal, vocabulary growth is easy for and a relative strength of children with ASD, while semantic organization (e.g., shape bias) is an area of difficulty. This perspective is echoed by Arunachalam and Luyster (2015), who go on to add that "While syntactic knowledge can support acquisition of a broad meaning category, it cannot override difficulties children may have with particular concepts." (p. 7). Based on these accounts, no lexical composition differences should be found in broad word categories, as we report here for Bulgarian children with ASD production of nouns, predicates and closed class words. However, differences would be expected for more specific semantic categories, such as mental state verbs, where meaning discernment requires social skills or happens in the context of a social interaction. In a sense, the social words category that we examine here consists precisely of such words: words for people, games, routines, sounds, part of the everyday activities of children as they interact with their caregiver. In that sense, the lack of a difference between our ASD and normative sample in the ratio of social words is unexpected. One possible explanation for it is that the social words included in the Bulgarian CDI-2 do not necessarily pose that high of a social demand to be acquired.
Beyond the lack of group difference, the relatively high percent of social words in the vocabulary of Bulgarian children with ASD in and of itself is noteworthy. As described above, the pattern of early lexical composition of Bulgarian TD children closely resembles that of Italian TD children from Caselli et al.’s (1999) study, where social words were the most prevalent in the early stages of vocabulary acquisition. We report the same pattern here in ASD, where we do not find evidence to reject the presence of a nominal bias, as more nouns are used than predicates, but social words are just as commonly used as nouns. A more detailed examination of this finding is necessary, as to what specific social words are used, to put it in the context of the characteristic social and pragmatic impairments of children with ASD. In addition, unique cultural, social, and language factors pertaining to language acquisition could play a role in this high prevalence of social words.
Next, we focus on opportunity scores. Opportunity scores reflect the degree to which given word categories are filled up within the checklist as a whole. Based on opportunity scores, nouns show a much stronger presence in the expressive vocabulary of children with ASD and in the normative TD sample than both social words and predicates, thus providing additional evidence in support of noun bias in Bulgarian lexical acquisition. What is surprising here is that noun opportunity scores, but for no other word category, are significantly higher in ASD than in the normative sample. This finding could potentially be accounted for by the much higher chronological age of the ASD children compared to the normative sample, where parents had many more opportunities to observe their child's expressive language. Furthermore, the marginally higher vocabulary size of the ASD group could be a contributing factor to this opportunity score difference.
Limitations and Future Research
The present study lays the foundation for future research on lexical composition in ASD in non-English-speaking children. Although informative, it possesses a number of limitations that can be addressed in future studies. For example, future studies can compare the vocabulary composition of Bulgarian children with ASD to expressive-vocabulary-matched TD controls. It would also be helpful to collect ASD data from children closer to the age of the normative sample to account for potential age effects, although this would be challenging considering the late age of diagnosis in the country (Andonova, 2022). Furthermore, it would be helpful to match samples based on nonverbal IQ, as well, considering similar matching procedures in Ellis Weismer et al. (2012).
Another logical next step would be to follow the example of Rescorla and Safyer (2013) and Ellis Weismer et al. (2012) and conduct more detailed word category comparisons. On the one hand, all 22 CDI-2 word categories can be compared. On the other, some word-level analyses could be conducted to address previously reported potential semantic difficulties associated with deixis and mental state language.
Last but not least, other methodological approaches could be used to investigate noun bias and word categories, more broadly, in Bulgarian. For instance, speech samples can be collected and coded for word categories, and then compared to parent report measures.
Despite its limitations, the present study makes a significant contribution to the study of noun bias in languages different from English and in the expressive vocabulary of atypical populations. Relying on a normative sample comparison as done previously (Charman et al., 2003; Rescorla & Safyer, 2013), our results replicate some published findings on noun bias in ASD. In addition, the high percent of social words in the vocabulary of Bulgarian children with autism can serve as the basis of a more detailed investigation of vocabulary composition across different populations.