The bvq_vocabulary()
function allows to extract vocabulary sizes for individual responses to
any of the questionnaires. It takes the output of the
bvq_responses()
function as an argument, and returns
several measures of vocabulary size base on such dataset.
To compute vocabulary size, we first need to run
bvq_responses()
(although if this argument is not provided,
bvq_responses()
is run under the hood):
library(bvq)
# vocabularies will be computed from these datasets
participants <- bvq_participants()
responses <- bvq_responses(participants = participants)
bvq_vocabulary(participants, responses)
#> # A tibble: 34,465 × 16
#> child_id response_id time version version_list date_birth date_started
#> <chr> <chr> <dbl> <chr> <chr> <date> <date>
#> 1 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 2 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 3 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 4 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 5 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 6 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 7 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 8 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 9 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> 10 58298 BL1879 2 bvq-lockdown C 2020-04-27 2022-10-08
#> # ℹ 34,455 more rows
#> # ℹ 9 more variables: date_finished <date>, item <chr>, response <int>,
#> # sex <chr>, doe_catalan <dbl>, doe_spanish <dbl>, doe_others <dbl>,
#> # edu_parent1 <chr>, edu_parent2 <chr>
The bvq_vocabulary()
computes four measures of
vocabulary size.
total_*
): total number of item
the child was reported to know, summing both languages together.l1_*
): number of word the child
was reported to know in their dominant language (e.g., Catalan words for
a child whose language of most exposure is Catalan).l2_*
): number of word the child
was reported to know in their non-dominant language (e.g., Spanish words
for a child whose language of most exposure is Catalan)concept_*
): number of
concepts the child know at least one word for, regardless of the
language the word belongs to.te_*
): number of translation
equivalents the child knows, i.e., or how many concepts the child know
one word in each language for.Vocabulary sizes are, by default, computed in two different scales:
*_prop
): proportion of the
items the child was reported to known, from the total of items that were
included in the questionnaire, and caregivers answered to.*_count
): sum of the total
number of items the child was reported to know.The scale returned by bvq_vocabulary()
can be modified
with the .scale
argument, which takes "prop"
for proportions (default), and "count"
for counts. Both can
be computed using .scale = c("prop", "count")
. For instance
we can get vocabulary sizes as proportions running:
#> # A tibble: 76 × 9
#> child_id response_id type total_prop l1_prop l2_prop concept_prop te_prop
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 58298 BL1879 underst… 0.518 0.924 0.141 0.881 0.114
#> 2 58298 BL1879 produces 0.430 0.784 0.103 0.743 0.0838
#> 3 58361 BL1863 underst… 0.703 0.805 0.601 0.826 0.574
#> 4 58361 BL1863 produces 0.476 0.573 0.379 0.617 0.331
#> 5 58298 BL1848 underst… 0.370 0.702 0.0623 0.662 0.0486
#> 6 58298 BL1848 produces 0.00985 0.0205 0 0.0189 0
#> 7 58068 BL1833 underst… 0.737 0.850 0.633 0.846 0.563
#> 8 58068 BL1833 produces 0.328 0.528 0.146 0.526 0.102
#> 9 57177 BL1748 underst… 0.837 0.814 0.857 0.887 0.714
#> 10 57177 BL1748 produces 0.568 0.678 0.466 0.757 0.329
#> # ℹ 66 more rows
#> # ℹ 1 more variable: contents <list>
To get vocabulary sizes as counts, we can run this instead:
#> # A tibble: 76 × 9
#> child_id response_id type total_count l1_count l2_count concept_count
#> <chr> <chr> <chr> <int> <int> <int> <int>
#> 1 58298 BL1879 understands 368 316 52 326
#> 2 58298 BL1879 produces 306 268 38 275
#> 3 58361 BL1863 understands 490 281 209 289
#> 4 58361 BL1863 produces 332 200 132 216
#> 5 58298 BL1848 understands 263 240 23 245
#> 6 58298 BL1848 produces 7 7 0 7
#> 7 58068 BL1833 understands 523 288 235 314
#> 8 58068 BL1833 produces 233 179 54 195
#> 9 57177 BL1748 understands 594 276 318 329
#> 10 57177 BL1748 produces 403 230 173 281
#> # ℹ 66 more rows
#> # ℹ 2 more variables: te_count <int>, contents <list>
Finally, two types of vocabulary sizes are computed:
understands
): number of
items the child understands.produces
) number of items
the child says.These two measures are returned in the long format under the
type
column.
In additional to the vocabulary size scores,
bvq_vocabulary()
also returns the column
contents
. This column is a list containing the items marked
as acquired for comprehension or production. For instance:
...
extra
argumentsWe can also compute vocabulary sizes conditional to some variables at
the item or participant level, such as semantic/functional
(semantic_category
) or language profile (lp
),
using the argument ...
argument. Just take a look at the
variables included in the data frame returned by bvq_logs()
or in the pool
dataset. For each participant, vocabulary
sizes are computed for each level or combination of levels of the
variables included in the columns included in ...
. You can
use this argument to preserve participant-level information in the
output data frame. For instance, we can keep information about the
language profile (lp
) of the participant:
We can also can also preserve information about the items, like the
semantic/functional category of the words
(semantic_category
). In this case, the vocabulary sizes
will be computed for each level of the semantic_category
variable:
bvq_vocabulary(participants, responses, semantic_category)
#> # A tibble: 1,970 × 10
#> child_id response_id type semantic_category total_prop l1_prop l2_prop
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 58298 BL1879 understands Adventures 0.75 1 0.5
#> 2 58298 BL1879 produces Adventures 0.7 0.9 0.5
#> 3 58298 BL1879 understands Animals 0.516 0.968 0.0645
#> 4 58298 BL1879 produces Animals 0.452 0.839 0.0645
#> 5 58298 BL1879 understands Parts of animals 0.409 0.818 0
#> 6 58298 BL1879 produces Parts of animals 0.136 0.273 0
#> 7 58298 BL1879 understands Parts of things 0.375 0.75 0
#> 8 58298 BL1879 produces Parts of things 0.125 0.25 0
#> 9 58298 BL1879 understands Question words 0.5 1 0
#> 10 58298 BL1879 produces Question words 0.5 1 0
#> # ℹ 1,960 more rows
#> # ℹ 3 more variables: concept_prop <dbl>, te_prop <dbl>, contents <list>
Finally, we can preserve more than one variable, including
combinations of participant-level and item-level variables, such as
language profile (lp
), age (age
) and
grammatical class (class
):
bvq_vocabulary(participants, responses, age, lp, semantic_category)
#> # A tibble: 1,970 × 12
#> child_id response_id type age lp semantic_category total_prop l1_prop
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 58298 BL1879 unders… 29.4 Mono… Adventures 0.75 1
#> 2 58298 BL1879 produc… 29.4 Mono… Adventures 0.7 0.9
#> 3 58298 BL1879 unders… 29.4 Mono… Animals 0.516 0.968
#> 4 58298 BL1879 produc… 29.4 Mono… Animals 0.452 0.839
#> 5 58298 BL1879 unders… 29.4 Mono… Parts of animals 0.409 0.818
#> 6 58298 BL1879 produc… 29.4 Mono… Parts of animals 0.136 0.273
#> 7 58298 BL1879 unders… 29.4 Mono… Parts of things 0.375 0.75
#> 8 58298 BL1879 produc… 29.4 Mono… Parts of things 0.125 0.25
#> 9 58298 BL1879 unders… 29.4 Mono… Question words 0.5 1
#> 10 58298 BL1879 produc… 29.4 Mono… Question words 0.5 1
#> # ℹ 1,960 more rows
#> # ℹ 4 more variables: l2_prop <dbl>, concept_prop <dbl>, te_prop <dbl>,
#> # contents <list>