More complicated level creation with variable numbers of observations

level() can be used to create more complicated patterns of nesting. For example, when creating lower level data, it is possible to use a different N for each of the values of the higher level data:

variable_data <-
  fabricate(
    cities = level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
    citizens = level(N = c(2, 4), age = runif(N, 18, 70))
  )
variable_data
cities elevation citizens age
1 1778 1 46
1 1778 2 50
2 1499 3 35
2 1499 4 65
2 1499 5 34
2 1499 6 23

Here, each city has a different number of citizens. And the value of N used to create the age variable automatically updates as needed. The result is a dataset with 6 citizens, 2 in the first city and 4 in the second. As long as N is either a number, or a vector of the same length of the current lowest level of the data, level() will know what to do.

It is also possible to provide a function to N, enabling a random number of citizens per city:

my_data <-
  fabricate(
    cities = level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
    citizens = level(N = sample(1:6, size = 2, replace = TRUE), 
                     age = runif(N, 18, 70))
  )
my_data
cities elevation citizens age
1 1850 1 53
2 1128 2 55
2 1128 3 45
2 1128 4 42
2 1128 5 47
2 1128 6 69

Here, each city is given a random number of citizens between 1 and 6. Since the sample() function returns a vector of length 2, this is like specifying 2 separate Ns as in the example above.

Finally, it is possible to define N on the basis of higher level variables themselves. Consider the following example:

variable_n_function = fabricate(
  cities = level(N = 5, population = runif(N, 10, 200)),
  citizens = level(N = round(population * 0.3))
)
head(variable_n_function)
cities population citizens
1 1 90 001
1.1 1 90 002
1.2 1 90 003
1.3 1 90 004
1.4 1 90 005
1.5 1 90 006

Here, the city has a defined population, and the number of citizens in our simulated data reflects a sample of 30% of that population. Each city has a different population, so each city gets a different number of citizens in this example.

Averages within higher levels of hierarchy

You may want to include the mean value of a variable within a group defined by a higher level of the hierarchy, for example the average income of citizens within city. You can do this with ave():

ave_example = fabricate(
    cities = level(N = 2),
    citizens = level(N = 1:2, 
                     income = rnorm(N), 
                     income_mean_city = ave(income, cities))
    ) 
ave_example
cities citizens income income_mean_city
1 1 -1.22 -1.2
2 2 -0.46 0.2
2 3 0.86 0.2

Tidyverse integration

Because the functions in fabricatr take data and return data, they are easily slotted into a tidyverse workflow:

library(dplyr)

# letting higher levels depend on lower levels

my_data <- 
fabricate(
    cities = level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
    citizens = level(N = c(2, 3), age = runif(N, 18, 70))
  ) %>%
  group_by(cities) %>%
  mutate(pop = n())

my_data
cities elevation citizens age pop
1 1715 1 45 2
1 1715 2 48 2
2 1573 3 65 3
2 1573 4 41 3
2 1573 5 61 3
my_data <- 
data_frame(Y = sample(1:10, 2)) %>%
  fabricate(lower_level = level(N = 3, Y2 = Y + rnorm(N)))
my_data
Y lower_level Y2
6 1 6.6
6 2 5.5
6 3 5.3
8 4 7.8
8 5 8.6
8 6 8.0