teal.builder.checks is a package for checking the data integrity of ADaM datasets
- [Package website] - (https://github.com/teal.builder.checks/index.html)
You can install the development version from Github with:
# remotes::install_github(
# "https://<github.com>/teal.builder.checks",
# auth_token = Sys.getenv("GITHUB_PAT"),
# upgrade = "never"
# )
teal.builder.checks checks the following ADaM datasets:
ADaM datasets | ADaM datasets contd. |
---|---|
adam | adrs |
adsl | adqs |
adae | adlb |
adtte |
teal.builder.checks can perform the following data integrity checks by column on those datasets:
Column Checks | Column Checks contd. |
---|---|
existence | is date |
is character | is datetime |
is integerish | has values |
is double | lacks values |
is logical | has one record per subject |
is factor | has one record per subject per param |
library(purrr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(stringr)
library(checkmate)
library(vctrs)
#>
#> Attaching package: 'vctrs'
#> The following object is masked from 'package:dplyr':
#>
#> data_frame
library(random.cdisc.data)
# Load adam datasets for checking
adsl <- random.cdisc.data::radsl(N = 10, seed = 1, na_percentage = .1)
adtte <- random.cdisc.data::radtte(adsl, seed = 1, na_percentage = .1)
This check returns a logical TRUE with an attribute data which contains the original ADaM dataset checked
library(teal.builder.checks)
check_adsl(adsl)
#> [1] TRUE
#> attr(,"data")
#> # A data frame: 10 × 55
#> STUDYID USUBJID SUBJID SITEID AGE AGEU SEX RACE ETHNIC COUNTRY DTHFL
#> <chr> <chr> <chr> <chr> <int> <fct> <fct> <fct> <fct> <fct> <fct>
#> 1 AB12345 AB12345-C… id-10 CHN-3 NA YEARS M <NA> NOT H… CHN N
#> 2 AB12345 AB12345-J… id-7 JPN-4 30 YEARS <NA> ASIAN NOT H… JPN N
#> 3 AB12345 AB12345-U… id-3 USA-13 35 YEARS F AMER… NOT H… USA N
#> 4 AB12345 AB12345-B… id-8 BRA-9 31 YEARS F ASIAN NOT H… BRA Y
#> 5 AB12345 AB12345-C… id-2 CHN-11 35 YEARS M BLAC… NOT H… CHN N
#> 6 AB12345 AB12345-C… id-1 CHN-11 35 YEARS F WHITE NOT H… CHN Y
#> 7 AB12345 AB12345-C… id-5 CHN-3 36 YEARS F ASIAN NOT H… CHN N
#> 8 AB12345 AB12345-R… id-4 RUS-1 36 YEARS M BLAC… NOT H… RUS N
#> 9 AB12345 AB12345-R… id-6 RUS-1 36 YEARS F ASIAN NOT H… RUS N
#> 10 AB12345 AB12345-B… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N
#> # ℹ 44 more variables: INVID <chr>, INVNAM <chr>, ARM <fct>, ARMCD <fct>,
#> # ACTARM <fct>, ACTARMCD <fct>, TRT01P <fct>, TRT01A <fct>, TRT02P <fct>,
#> # TRT02A <fct>, REGION1 <fct>, STRATA1 <fct>, STRATA2 <fct>, BMRKR1 <dbl>,
#> # BMRKR2 <fct>, ITTFL <fct>, SAFFL <fct>, BMEASIFL <fct>, BEP01FL <fct>,
#> # AEWITHFL <fct>, RANDDT <date>, TRTSDTM <dttm>, TRTEDTM <dttm>,
#> # TRT01SDTM <dttm>, TRT01EDTM <dttm>, TRT02SDTM <dttm>, TRT02EDTM <dttm>,
#> # AP01SDTM <dttm>, AP01EDTM <dttm>, AP02SDTM <dttm>, AP02EDTM <dttm>, …
This check returns a logical FALSE with two attributes
-
msg contains a character vector describing the reason for failure, if the check failed
-
data which contains the ADaM dataset checked
adsl |>
dplyr::select(-STUDYID) |>
check_adsl()
#> [1] FALSE
#> attr(,"msg")
#> [1] "Dataset must have columns 'STUDYID', 'USUBJID', but is missing 'STUDYID'."
#> [2] "Column 'STUDYID' must have data type 'factor', but is data type 'NULL'."
#> attr(,"data")
#> # A data frame: 10 × 54
#> USUBJID SUBJID SITEID AGE AGEU SEX RACE ETHNIC COUNTRY DTHFL INVID
#> <chr> <chr> <chr> <int> <fct> <fct> <fct> <fct> <fct> <fct> <chr>
#> 1 AB12345-CHN… id-10 CHN-3 NA YEARS M <NA> NOT H… CHN N INV …
#> 2 AB12345-JPN… id-7 JPN-4 30 YEARS <NA> ASIAN NOT H… JPN N INV …
#> 3 AB12345-USA… id-3 USA-13 35 YEARS F AMER… NOT H… USA N INV …
#> 4 AB12345-BRA… id-8 BRA-9 31 YEARS F ASIAN NOT H… BRA Y INV …
#> 5 AB12345-CHN… id-2 CHN-11 35 YEARS M BLAC… NOT H… CHN N INV …
#> 6 AB12345-CHN… id-1 CHN-11 35 YEARS F WHITE NOT H… CHN Y INV …
#> 7 AB12345-CHN… id-5 CHN-3 36 YEARS F ASIAN NOT H… CHN N INV …
#> 8 AB12345-RUS… id-4 RUS-1 36 YEARS M BLAC… NOT H… RUS N INV …
#> 9 AB12345-RUS… id-6 RUS-1 36 YEARS F ASIAN NOT H… RUS N INV …
#> 10 AB12345-BRA… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N INV …
#> # ℹ 43 more variables: INVNAM <chr>, ARM <fct>, ARMCD <fct>, ACTARM <fct>,
#> # ACTARMCD <fct>, TRT01P <fct>, TRT01A <fct>, TRT02P <fct>, TRT02A <fct>,
#> # REGION1 <fct>, STRATA1 <fct>, STRATA2 <fct>, BMRKR1 <dbl>, BMRKR2 <fct>,
#> # ITTFL <fct>, SAFFL <fct>, BMEASIFL <fct>, BEP01FL <fct>, AEWITHFL <fct>,
#> # RANDDT <date>, TRTSDTM <dttm>, TRTEDTM <dttm>, TRT01SDTM <dttm>,
#> # TRT01EDTM <dttm>, TRT02SDTM <dttm>, TRT02EDTM <dttm>, AP01SDTM <dttm>,
#> # AP01EDTM <dttm>, AP02SDTM <dttm>, AP02EDTM <dttm>, EOSSTT <fct>, …
In the following example, the checks are performed:
- the existence of certain columns
- specific values in one column
- one record per subject per parameter
my_adtte_checker <- make_multi_checker(
check_has_cols(
column_names = c("STUDYID", "USUBJID", "PARAMCD")
),
check_has_values(
column_name = "ARM",
values = c("A: Drug X", "B: Placebo", "C: Combination")
),
check_has_one_record_per_subject_per_param
)
my_adtte_checker(adtte)
#> [1] TRUE
#> attr(,"data")
#> # A data frame: 50 × 67
#> STUDYID USUBJID SUBJID SITEID AGE AGEU SEX RACE ETHNIC COUNTRY DTHFL
#> <chr> <chr> <chr> <chr> <int> <fct> <fct> <fct> <fct> <fct> <fct>
#> 1 AB12345 AB12345-B… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N
#> 2 AB12345 AB12345-B… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N
#> 3 AB12345 AB12345-B… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N
#> 4 AB12345 AB12345-B… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N
#> 5 AB12345 AB12345-B… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N
#> 6 AB12345 AB12345-B… id-8 BRA-9 31 YEARS F ASIAN NOT H… BRA Y
#> 7 AB12345 AB12345-B… id-8 BRA-9 31 YEARS F ASIAN NOT H… BRA Y
#> 8 AB12345 AB12345-B… id-8 BRA-9 31 YEARS F ASIAN NOT H… BRA Y
#> 9 AB12345 AB12345-B… id-8 BRA-9 31 YEARS F ASIAN NOT H… BRA Y
#> 10 AB12345 AB12345-B… id-8 BRA-9 31 YEARS F ASIAN NOT H… BRA Y
#> # ℹ 40 more rows
#> # ℹ 56 more variables: INVID <chr>, INVNAM <chr>, ARM <fct>, ARMCD <fct>,
#> # ACTARM <fct>, ACTARMCD <fct>, TRT01P <fct>, TRT01A <fct>, TRT02P <fct>,
#> # TRT02A <fct>, REGION1 <fct>, STRATA1 <fct>, STRATA2 <fct>, BMRKR1 <dbl>,
#> # BMRKR2 <fct>, ITTFL <fct>, SAFFL <fct>, BMEASIFL <fct>, BEP01FL <fct>,
#> # AEWITHFL <fct>, RANDDT <date>, TRTSDTM <dttm>, TRTEDTM <dttm>,
#> # TRT01SDTM <dttm>, TRT01EDTM <dttm>, TRT02SDTM <dttm>, TRT02EDTM <dttm>, …
This check returns a msg attribute with multiple error messages.
adtte |>
dplyr::slice_head(n = 12) |>
dplyr::select(-STUDYID) |>
dplyr::filter(ARM != "A: Drug X") |>
my_adtte_checker()
#> [1] FALSE
#> attr(,"msg")
#> [1] "Dataset must have columns 'STUDYID', 'USUBJID', 'PARAMCD', but is missing 'STUDYID'."
#> [2] "Column 'ARM' must have values 'A: Drug X', 'B: Placebo', 'C: Combination', but is missing values 'A: Drug X', 'B: Placebo'."
#> attr(,"data")
#> # A data frame: 7 × 66
#> USUBJID SUBJID SITEID AGE AGEU SEX RACE ETHNIC COUNTRY DTHFL INVID
#> <chr> <chr> <chr> <int> <fct> <fct> <fct> <fct> <fct> <fct> <chr>
#> 1 AB12345-BRA-… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N INV …
#> 2 AB12345-BRA-… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N INV …
#> 3 AB12345-BRA-… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N INV …
#> 4 AB12345-BRA-… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N INV …
#> 5 AB12345-BRA-… id-9 BRA-1 35 YEARS F BLAC… UNKNO… BRA N INV …
#> 6 AB12345-CHN-… id-1 CHN-11 35 YEARS F WHITE NOT H… CHN Y INV …
#> 7 AB12345-CHN-… id-1 CHN-11 35 YEARS F WHITE NOT H… CHN Y INV …
#> # ℹ 55 more variables: INVNAM <chr>, ARM <fct>, ARMCD <fct>, ACTARM <fct>,
#> # ACTARMCD <fct>, TRT01P <fct>, TRT01A <fct>, TRT02P <fct>, TRT02A <fct>,
#> # REGION1 <fct>, STRATA1 <fct>, STRATA2 <fct>, BMRKR1 <dbl>, BMRKR2 <fct>,
#> # ITTFL <fct>, SAFFL <fct>, BMEASIFL <fct>, BEP01FL <fct>, AEWITHFL <fct>,
#> # RANDDT <date>, TRTSDTM <dttm>, TRTEDTM <dttm>, TRT01SDTM <dttm>,
#> # TRT01EDTM <dttm>, TRT02SDTM <dttm>, TRT02EDTM <dttm>, AP01SDTM <dttm>,
#> # AP01EDTM <dttm>, AP02SDTM <dttm>, AP02EDTM <dttm>, EOSSTT <fct>, …