fabricate helps you simulate a dataset before you collect it. You can either start with your own data and add simulated variables to it (by passing data to fabricate()) or start from scratch by defining N. Create hierarchical data with multiple levels of data such as citizens within cities within states using level(). You can use any R function to create each variable. We provide several built-in options to easily draw from binary and count outcomes, draw_binary and draw_discrete.

Fabricate a Level of Data for Hierarchical Data

fabricate(data, N, ID_label, ...)

level(N = NULL, ...)

Arguments

data

(optional) user-provided data that forms the basis of the fabrication, i.e. you can add variables to existing data. Provide either N or data (N is the number of rows of the data if data is provided).

N

(optional) number of units to draw. If provided as fabricate(N = 5), this determines the number of units in the single-level data. If provided in level, i.e. fabricate(cities = level(N = 5)), N determines the number of units in a specific level of a hierarchical dataset.

ID_label

(optional) variable name for ID variable, i.e. citizen_ID

...

Variable or level-generating arguments, such as my_var = rnorm(N). For fabricate, you may also pass level() arguments, which define a level of a multi-level dataset. See examples.

Value

data.frame

Examples

# Draw a single-level dataset with no covariates df <- fabricate(N = 100) head(df)
#> ID #> 1 001 #> 2 002 #> 3 003 #> 4 004 #> 5 005 #> 6 006
# Draw a single-level dataset with a covariate df <- fabricate( N = 100, height_ft = runif(N, 3.5, 8) ) head(df)
#> ID height_ft #> 1 001 6.945201 #> 2 002 6.963537 #> 3 003 7.958205 #> 4 004 7.867344 #> 5 005 5.251322 #> 6 006 5.575339
# Start with existing data df <- fabricate( data = df, new_variable = rnorm(N) ) # Draw a two-level hierarchical dataset # containing cities within regions df <- fabricate( regions = level(N = 5), cities = level(N = 2, pollution = rnorm(N, mean = 5))) head(df)
#> regions cities pollution #> 1 1 01 5.239065 #> 2 1 02 5.236321 #> 3 2 03 4.740881 #> 4 2 04 5.649046 #> 5 3 05 3.782359 #> 6 3 06 5.841970
# Start with existing data and add variables to hierarchical data # note: do not provide N when adding variables to an existing level df <- fabricate( data = df, regions = level(watershed = sample(c(0, 1), N, replace = TRUE)), cities = level(runoff = rnorm(N)) )