Training Resources
IMPS 2025 Datathon
The Challenge: Predicting Accuracy of Executing Functioning tasks
Your task is to develop predictive models leveraging information about the stimulus, response time, and demographic features to predict accuracy on a holdout sample. The Hearts and Flowers (H&F) task is a cognitive task that measures inhibitory control and cognitive flexibility, two components of executive functions. For more information on this task from the group that collected these data, see here. A respondent is shown either a heart or flower on the left or right of the screen. If the shape is a heart, the respondent needs to press the button on the same side whereas if it is a flower they press the button on the opposite side. Participants respond to visual stimuli following simple (congruent: hearts or flowers) or conflicting (incongruent: hearts and flowers) rules. Key features of the dataset include:
Different task conditions (congruent, incongruent, mixed)
Response time
Demographic variables (such as age)
The data are structured in an Item Response Warehouse (IRW)-consistent format (more information available here).
The data
The H&F data for the competition were collected from a large urban district in the US. In particular, these data were collected in school environments. Students completed the Hearts and Flowers measures on digital devices during the school day.
The data are available here [to be posted June 9 2025].
Note that we are posting two datasets. Best practice is to develop a predictive approach in the training dataset and then report accuracy in the test dataset.
Variables
In the competition, you need to predict whether the response is accurate.
resp
: Response accuracy (1 = correct, 0 = incorrect).
The following information is available for prediction.
id
: Unique identifier for each respondent.cov_grade
: Respondent’s current school grade level.cov_female
: Respondent’s gender (0 = male, 1 = female).cov_age
: Respondent’s age in years (numeric, continuous).trial_num
: The trial number within the respondent’s task session (starts at 1).resp_side
: Side on which the response was made (either “left” or “right”); can be ignored for scoring (seeresp
).rt
: Response time in seconds.block
: Block label indicating task phase (e.g., “practice”, or test condition).time_limit
: Time limit in place for that particular data collection.stim_side
: Side on which the stimulus was shown (“left” or “right”).time
: Season during which data were collected (e.g., “Fall”).stim_shape
: Shape of the stimulus shown on screen (e.g., “heart”, “floewr”).cohort
: Condition or experimental grouping label (e.g., “plus”).
Hints
We make a few observations about this data to help respondents get oriented with the H&F data.
The H&F data involve relatively weakly defined ‘items’. Rather, persons respond to trials where the stimuli are varying in along several dimensions (e.g., side shape is presented on, what shape is presented, whether the trial is in one of the congruent or incongruent blocks, etc).
Response time is potentially a valuable predictor of response accuracy. In particular, note that there were different time limits in place at various points of data collection.
Judging criteria
We note two criteria that will be used in judging this competition. Some attention will be paid to accuracy in the test dataset; in particular, please report the root mean squared error between predicted and observed responses in your report. Specifically, please report the root mean squared error between predicted probabilities (i.e., model-estimated probabilities of a correct response) and the actual binary responses (0 = incorrect, 1 = correct) in the test dataset. Note that there will also be a subjective evaluation of the degree to which the predictive approach is novel or forwards new insights about the nature of the response process.
Submission Instructions
Entrants should submit three components (using the below link).
- A written report should be formatted using Times New Roman 12 point with double spacing and one-inch margins, following the structure below:
Cover page (one page maximum) including:
- Designated Team Lead’s name, email, and affiliation with a graduate program;
- Name of other team members and their affiliation with a graduate program;
- Name of the presenter (can be different from the team lead but must be one of the team members) and email.
Description (five pages maximum) including:
- Research question
- Methodology
- Key findings including the RMSE between predicted and actual accuracy in the test dataset
- Implications
- References, optional (one page maximum)
- Analysis code to ensure reproducibility.
- Visualizations or slides to effectively communicate insights.
Submissions need to be made here.
Suggested Readings
Camerota, M., Willoughby, M. T., Magnus, B. E., & Blair, C. B. (2020). Leveraging item accuracy and reaction time to improve measurement of child executive function ability. Psychological Assessment, 32(12), 1118–1132. https://doi.org/10.1037/pas0000953
Domingue, B.W., Braginsky, M., Caffrey-Maffei, L., Gilbert, J.B., Kanopka, K., Kapoor, R., Liu, Y., Nadela, S., Pan, G., Zhang, L., Zhang, S., & Frank, M. (2025). Solving the problem of data in psychometrics: An introduction to the Item Response Warehouse (IRW). PsyArXiv. https://doi.org/10.31234/osf.io/7bd54
Obradović, J., Sulik, M. J., Finch, J. E., & Tirado-Strayer, N. (2018). Assessing students’ executive functions in the classroom: Validating a scalable group-based procedure. Journal of Applied Developmental Psychology, 55, 4-13. https://psycnet.apa.org/record/2018-17703-003
Stanford Project on Adaptation and Resilience in Kids. (n.d.). Assessment of Motivation, Effort, and Self-Regulation (AMES). Stanford University. Retrievable from https://sparklab.stanford.edu/ames
Wu, T., Weiland, C., McCormick, M., Hsueh, J., Snow, C., & Sachs, J. (2024). One Score to Rule Them All? Comparing the Predictive and Concurrent Validity of 30 Hearts and Flowers Scoring Approaches. Assessment, 31(8), 1702-1720. https://journals.sagepub.com/doi/10.1177/10731911241229566
V International Meeting of Psychometrics and Neuropsychological Evaluation
Slides for this meeting can be found here.
IOMW 2025
Information here pertains to the April 2025 demonstration of the IRW at the IOMW conference held in Boulder, CO. Slides for this training can be found here. Prior to attending the conference, please go through the following checklist:
Install R and the necessary packages. We will be working with R. Make sure you have R installed. We will also use some specific packages that you will need to install. Please follow the instructions here.
Download the necessary data. Please download the zip file here and store is so that you can access it readily during the training. This file will make it possible to follow along without depending on Redivis authentication.
Prepare to interact with IRW data. I will discuss several methods of interacting with IRW data. These methods require a few minutes of preparation. You only need to choose one of them and I recommend you try the second.
You can use a Redivis notebook. To get started, go here. If you have not previously used Redivis you will need to create an account. After doing so, you can attempt to run the code (and should be good to go if you can run the lines showing the first few rows of the dataset). To run the code, you will first need to
fork
the notebook (chooseclone this workflow
) and then you should be able to ‘start notebook’ and execute the code blocks.You can use
irw
(the preferred option). To do so, you will need to follow the instructions shown here. If you plan to work with the data this way, you can test that you are set up to do so by trying to successfully execute the following in your local R environment:df<-irw::irw_fetch("DART_Brysbaert_2020_1") head(df)
We recommend working with irw
given that you can work in your local environment and utilize the filtering techniques we are developing. Please let me know (email ben.domingue at gmail.com) if you have any problems/questions!