Small numbers are an opportunity , not a problem

Aims: outcomes of research in education and training are partly a function of the context in which that study takes place, the questions we ask, and what is feasible. Many questions are about learning, which involves repeated measurements in a particular time window, and the practical context is usually such that offering an intervention to some but not to all learners does not make sense or is unethical. For quality assurance and other purposes, education and training centers may have very locally oriented questions that they seek to answer, such as whether an intervention can be considered effective in their context of small numbers of learners. While the rationale behind the design and outcomes of this kind of studies may be of interest to a much wider community, for example to study the transferability of findings to other contexts, people are often discouraged to report on the outcomes of such studies at conferences or in educational research journals. The aim of this paper is to counter that discouragement and instead encourage people to see small numbers as an opportunity instead of as a problem.


Introduction
In research in education and training, at least where statistical analysis is involved, small numbers of participants are often considered a problem and a reason for either not carrying out a study that might yield important results or for not presenting the outcomes of a study carried out to a wider audience. This is unfortunate for several reasons. To start, even though virtually any study on education or training takes place in a particular context, the reasoning behind as well as the design and outcomes of a given study may have useful lessons to be learned for a much wider audience. In addition, for publication purposes and equally for internal quality assurance and accreditation purposes, we want education and training to be evidence based, and appropriately designed studies with numbers of participants large or small provide the best if not the only way to enable that. Finally, from an ethical and usually also financial and logistic perspective, research ought not be about getting ever larger numbers of people to participate in our studies; instead, a key principle should be to not use more resources than necessary. Some institutions and centers may have the luxury to involve hundreds of students in some of their research, whereas in other places the numbers are much smaller. In settings other than learning, an alternative form of SCED could be found in randomized combinations of intervention / no intervention for each of a series of trials, but in learning that is often not an option because learning at one point in time tends to carry over to next measurement occasions.
If variation in the starting point of an intervention is not considered feasible, for example because the seven residents in question take the training at the same time and it is considered important to give every resident five practice (i.e., measurement) occasions prior to and five occasions after the introduction of the intervention, we are dealing with a form of SCD that is sometimes also referred to as interrupted time series design but it is not a form of SCED. After all, in the latter case, although there is still a manipulation in the form of a baseline (i.e., prior to intervention) condition and an intervention condition, the moment when the intervention is introduced no longer varies between participants (i.e., it is after five occasions for all participants) and is not randomized.
Nevertheless, the outcomes of this study can still provide useful insights for decision making in the department as well as for informing similar studies in other settings. Therefore, Table 1 presents a simulated example of what the data could look like for the seven hypothetical residents.

TABLE 1 -
The (simulated) performance data of 7 students (ID #1-#7) before and during the intervention, with five measurement occasions within each phase (i.e., time_in_phase), and each occasion resulting in an integer performance score from 0 (min) to 10 (max) The data matrix in Table 1 presents the performance indicated by an integer score ranging from 0 (min) to 10 (max) for each of the seven residents for each of ten measurement occasions, with the first five measurement occasions being in the baseline phase (i.e., Phase 0) and the last five measurement occasions being in the intervention phase (i.e., Phase 1). Time in phase in Table 1 indicates the number of measurement occasions prior to the end of the phase.

A mixed model
To analyze these data, we need a method that can account for the fact that the seven residents times ten measurement occasions are not seventy independent observations but seven sets of correlated observations (also called a mixed model, e.g., (1), (7), with the correlation between occasions decreasing as time between occasions increases, and -since an intervention can have

4/9
Scientia Medica Porto Alegre, v. 31, p. are presented in Figure 1 and Table 2 (R version used for this paper: 4.0.5).   In plain language, these four coefficients mean the following:

5/9
• B 0 : the model's score (i.e., red line in Figure  1) at the end of the baseline phase, which is in this case at occasion i = 5; • B 1 : the model's difference between the end of the intervention phase and the end of the baseline phase; • B 2 : the model's slope in the baseline phase (which given the coding is negative when scores go up in the baseline phase, and vice versa); and • B 3 : the difference between the model's slopes for baseline and intervention (in statistical terms, the interaction effect).
Thus, in Table 2

7/9
And what if the proposed mixed model does not work?
Apart from its applicability to individual data, an important strength of the presented mixed model is that it accounts for baseline trends (through B 2 ) and different trends between phases (through B 3 ) and can be extended to more than two phases if more than two phases are present, for instance in a study with more than one intervention. However, one requirement for this model to work is that we deal with scale outcome variables such as the integer performance score in the example.
When we deal with dichotomous outcomes (e.g., If the outcomes were less clear than in the current example, the outcomes of different residents could be combined into an overall estimate accounting for the time series data structure (for an example, see (12)). Finally, to provide an estimate of the proportion of residents for which this intervention could be effective, we can use the same Binomial procedure: Beta(1,1) + Beta(7,0) = Posterior (8,1).
In this formula, Beta(7,0) comes from the intervention having an effect for all seven residents.