Simulated and synthetic (simsyn) data

The IRW includes a branch of simulated and synthetic item response datasets, collectively referred to as simsyn. These data follow the same IRW data standard as the core repository and can be accessed using the same irw package functions, with source = "sim".

Accessing the simsyn data

Documentation for the available tables can be found here.

Code
library(irw)

tables <- irw_list_tables(source = "sim")     # List available simsyn tables
df <- irw_fetch(tables$name[1], source = "sim") # Fetch a simsyn table
Code
import irw

tables = irw.list_tables(source="sim")        # List available simsyn tables
df = irw.fetch(tables["name"].iloc[0], source="sim")  # Fetch a simsyn table

Use cases

Simsyn data are particularly useful when you want to:

  • Benchmark estimation methods against data with known ground-truth parameters
  • Develop or test new psychometric approaches without relying on real participants
  • Reproduce analyses from published work that used proprietary data, via a synthetic stand-in
  • Teach IRT concepts using data that behave predictably under a given model

For realistic simulation of item difficulty parameters informed by the empirical IRW distribution, see the Simulating Item Difficulties vignette.