Data is generated to ensure inter-cluster correlation 0, intra-cluster correlation in expectation ICC. The data generating process used in this function is specified at the following URL: https://stats.stackexchange.com/questions/263451/create-synthetic-data-with-a-given-intraclass-correlation-coefficient-icc

draw_normal_icc(mean = 0, N = NULL, clusters, sd = NULL,
  sd_between = NULL, ICC = NULL)

Arguments

mean

A number or vector of numbers, one mean per cluster. If none is provided, will default to 0.

N

(Optional) A number indicating the number of observations to be generated. Must be equal to length(clusters) if provided.

clusters

A vector of factors or items that can be coerced to clusters; the length will determine the length of the generated data.

sd

A number or vector of numbers, indicating the standard deviation of each cluster's error terms -- standard deviation within a cluster (default 1)

sd_between

A number or vector of numbers, indicating the standard deviation between clusters.

ICC

A number indicating the desired ICC.

Value

A vector of numbers corresponding to the observations from the supplied cluster IDs.

Details

The typical use for this function is for a user to provide an ICC and, optionally, a set of within-cluster standard deviations, sd. If the user does not provide sd, the default value is 1. These arguments imply a fixed between-cluster standard deviation.

An alternate mode for the function is to provide between-cluster standard deviations, sd_between, and an ICC. These arguments imply a fixed within-cluster standard deviation.

If users provide all three of ICC, sd_between, and sd, the function will warn the user and use the provided standard deviations for generating the data.

Examples

# Divide observations into clusters clusters = rep(1:5, 10) # Default: unit variance within each cluster draw_normal_icc(clusters = clusters, ICC = 0.5)
#> [1] -1.02554590 0.62443504 -2.31939115 -0.83443824 0.55794551 1.53804683 #> [7] -0.22289235 1.14690447 -0.65298416 2.17395191 0.93017782 -0.99045081 #> [13] -0.76061979 -2.46877160 1.19729849 1.37574171 -0.94850980 0.51731007 #> [19] -1.17555277 1.38167859 -0.47570412 -0.74476059 -0.12639827 -0.26539863 #> [25] 3.19550217 0.30373915 0.67777768 -1.76081700 -1.40004144 -0.93182015 #> [31] 0.45643320 -1.46012658 -1.09803959 0.71703320 1.03944986 -0.03334539 #> [37] -0.15605976 -1.01039834 -0.04367906 0.80156012 0.47585110 0.22116711 #> [43] -0.15391441 0.09899335 1.49281217 1.53738284 0.47059085 -1.67012692 #> [49] 0.43175770 1.57700318
# Alternatively, you can specify characteristics: draw_normal_icc(mean = 10, clusters = clusters, sd = 3, ICC = 0.3)
#> [1] 9.209539 6.946955 7.280875 2.663388 15.406216 13.109010 9.931105 #> [8] 6.439851 7.340947 11.535676 11.219845 10.889006 9.943836 6.589445 #> [15] 7.329716 11.209727 8.794985 8.253895 10.731750 7.722170 9.411930 #> [22] 6.296899 9.374113 8.337104 6.355048 10.574635 7.147801 10.207142 #> [29] 12.951676 12.653866 10.357931 8.776974 10.735098 7.420525 9.172164 #> [36] 15.434823 4.686745 8.902265 8.857391 12.050750 6.447235 9.464802 #> [43] 4.916680 14.715999 7.217758 11.054769 12.617152 5.155777 4.699359 #> [50] 7.700490
# Can specify between-cluster standard deviation instead: draw_normal_icc(clusters = clusters, sd_between = 4, ICC = 0.2)
#> [1] 13.6348820 -9.3986100 -10.1894901 -4.3994492 15.6367316 1.9925917 #> [7] -0.3076630 7.6256722 2.1741504 17.2390780 6.8659439 -4.6678638 #> [13] -1.7383786 5.5829177 5.7807091 5.6198047 -12.8067743 12.7308042 #> [19] -9.2940687 12.7807675 9.4914447 -10.0633085 10.5242161 4.0939280 #> [25] -13.9777692 -1.5699654 -11.3426364 9.7798558 -1.1224243 14.1309119 #> [31] 7.6503185 -13.0704663 3.9990677 6.2734686 0.9106016 7.0969074 #> [37] -17.4752527 -6.4680375 -0.3626732 12.3058826 2.3691358 -19.1565832 #> [43] 3.7634140 -0.2841549 -7.4997532 -1.2743522 -8.9896589 -0.1177371 #> [49] -5.2378603 -7.2251413
# Verify that ICC generated is accurate corr_draw = draw_normal_icc(clusters = clusters, ICC = 0.4) summary(lm(corr_draw ~ as.factor(clusters)))$r.squared
#> [1] 0.3853618