--- title: "Retrieving raw data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Retrieving raw data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, message=FALSE, warning=FALSE} library(bvq) library(tidyr) library(dplyr) ``` The function `bvq_responses()` retrieves participants' responses to the Barcelona Vocabulary Questionnaire (BVQ) using the [formr API](https://formr.org/documentation), and returns them along participant- and item-level information. This function returns a tidy data frame in which each row is one participant's response to an individual item. ```{r responses, eval=FALSE, echo=TRUE, message=FALSE, warning=FALSE, paged.print=FALSE} responses <- bvq_responses() # select relevant variables responses %>% select(id, time, item, response, randomisation) %>% drop_na(response) # drop unanswered items ``` ```{r load-responses, echo=FALSE, message=FALSE, warning=FALSE, paged.print=FALSE} responses <- readRDS(system.file("fixtures/responses.rds", package = "bvq" )) participants <- readRDS(system.file("fixtures/participants.rds", package = "bvq" )) (responses) ``` This dataset is the base for many analyses of interest, like participants' vocabulary size, word prevalence, or modelling item-level probability of acquisition. This package offers several functions to do this (e.g., `bvq_vocabulary()`, `bvq_norms()`), but one could already do this from this dataset. Additional participant-level properties like language profile variables can be extracted using the `bvq_logs()` function. ```{r logs, eval=FALSE, echo=TRUE, message=FALSE, warning=FALSE, paged.print=FALSE} logs <- bvq_logs(responses = responses) logs %>% select( child_id, time, age, lp, starts_with("doe_") ) ``` Item-level properties can be consulted in the `pool` dataset (See ?pool): ```{r pool} data("pool") pool ``` ## Longitudinal responses Several participants have filled the questionnaire more than once. All questionnaire responses included in any dataset returned by any function in BVQ have an associated `time` value. This variable indexes how many times that specific participant has filled the questionnaire (any version), including their last response. This allows to track each participant's responses across time and perform longitudinal analyses. By default, `bvq_responses()` retrieves all responses. Using `get_longitudinal()`, we can filter what cases we want to keep. The argument `longitudinal ` takes one of the following character strings: * `"all"`: all responses are returned * `"no"`: participants with more than one response to the questionnaire (any version) are excluded from the output * `"first"`: only the first response from each participant is returned (including responses of participant that only responded once) * `"last"`: only the most recent response from each participant is returned (including responses of participant that only responded once) * `"only"`: only responses from participants that filled the questionnaire more than once are returned. Setting `longitudinal = "only"` is especially useful to perform repeated measures analyses. For example: ```{r responses-longitudinal, echo=FALSE, message=FALSE, warning=FALSE, paged.print=FALSE} responses %>% get_longitudinal("only") %>% select( child_id, time, item, response, version_list, starts_with("date_") ) %>% drop_na(response) ``` ### Please note The values of `time` in the outcome of `bvq_participants()` and the outcome of the rest of the functions may *not* be identical. This is because in `bvq_participants()` this value increases in one unit every time a given participant is sent the questionnaire, even if they do not end up filling it. In contrast, the value of `time` in the rest of the functions (e.g., `bvq_responses()`, `bvq_logs()`) only increases when the questionnaire is filled. Since the outcome of `bvq_participants()` is mainly intended for internal use, you don't have to worry about this as long as you don't try to cross the outcomes of `bvq_participants()` and the rest of the functions.