Adverse event
A harmful or negative event. Adverse events can affects people individuals or groups. In the context of children in foster care, an adverse event refers to any traumatic or distressing event experienced by a child during their stay in a foster home.
Adverse effect (or harm)
Any negative consequence or negative effect that arises from an intervention or program. Adverse effects studied in children in foster care include behavioural and emotional problems, including aggression, defiance, anxiety, and depression.
Analysis
The process of looking for patterns in information to identify cause and effect, or to answer specific questions, such as whether an intervention works and what the risks are. There are two main types of analysis. Quantitative analysis looks for patterns in the form of numbers, such as the most frequent choice of treatment option or average rating of pain during treatment. Qualitative analysis looks for patterns of meaning, feeling or beliefs. It can lead to a finding such as ‘most people who support paying more for end-of-life therapy also believe society should give more to those with greater need.
Base-line data
The data collected from / about study participants before the programme. For example, a programme to improve literacy might test children’s reading beforehand (this is the ‘base-line’ data) and after it (‘end-line’) and compare the two.
Bias
Bias refers to the systematic error or distortion when conducting research or analysis. Bias can result from various factors and undermine a study’s quality and compromise the reliability and validity of its results. Types of bias include selection bias: an example would be if certain children are excluded from the study based on their age, race, or other characteristics. Another type is measurement bias, in which findings are distorted due to inappropriate or inconsistent measurement tools. For example, if a study relies on self-report measures to assess children’s well-being, there may be inherent biases in how the children interpret and respond to the questions.
Child maltreatment
The World Health Organization (WHO) describes child maltreatment as abuse and neglect of children under 18 years of age. Child maltreatment covers all types of physical, emotional, and sexual abuse, neglect, negligence and commercial or other exploitation, which results in actual or potential harm to the child’s health, survival, development, or dignity in the context of a relationship of responsibility, trust, or power.
Controlled before-and-after studies
A study in which observations are made before and after an intervention, both in a group that receives the intervention and in a “control group” that does not. People are allocated to the two groups by factors outside the researchers’ control, such as whether the schools studied signed up to get the programme.
Dependent variable
A dependent variable is something that depends on other factors. For example, in a study of the effect of a programme on children’s scores on standardised tests, the dependent variable would be the children’s scores. Researchers often try to identify what causes changes in the dependent variable.
Difference-in-difference or other econometric designs
A quasi-experimental design is a prospective study (i.e., people or organisations are recruited into it); the study has groups which get a programme and ‘control groups’ which get either a different programme or no intervention, but those groups are not randomised. QEDs are sometimes used to estimate the effect of a specific intervention (such as a passage of law, enactment of policy, or largescale programme implementation) by comparing the changes in outcomes over time between the intervention group/s and/or the control group.
Experimental design
A research design used to establish cause-and-effect relationships between the independent and dependent variables by manipulating variables, control groups and randomisation. A randomised controlled trial is an example of an experiment. Here, participants are randomly allocating to groups which get the programme or some version of the programme (the ‘intervention groups’) and control groups which do not get the programme, manipulating the independent variable (e.g., providing instruction on reading). Participants are assessed after (or before and after) the manipulation of the independent variable in order to assess its effect on the dependent variable (e.g., the reading skills of children in both the intervention and control groups are assessed). Differences between the groups are assumed to be due to the intervention.
Effect size
The observed association between interventions and outcomes. It allows us to quantify and compare the strength and significance of the relationship between variables. For example, a reading programme might have an ‘effect size’ of 10 percentage points, meaning that it appears to increase children’s reading levels by 10 percentage points. It shows the size of the difference in the dependent variable (here, reading levels) between two or more groups (here, the children who did the reading programme vs. those who did not).
Effectiveness
The degree to which an intervention or programme achieves its intended goals or objectives. It is a measure of the programme’s success.
End-line
The data collected from / about study participants after the programme. For example, a programme to improve literacy might test children’s reading beforehand (this is the ‘base-line’ data) and after it (‘end-line’) and compare the two.
Evidence and gap maps
Evidence and gap maps are systematic evidence synthesis products with a visual presentation of existing evidence relevant to a specific research question. They display areas with concentrations / gaps in evidence. The EGM on this site is organised with the rows as intervention types and the columns as outcome types – though a map can be arranged in other ways. An EGM of itself does not show what the studies say (hence our Guidebook which does that).
Generalizability
The extent to which conclusions from one study can be applied to the population as a whole. For instance, suppose a study in Kenya found that treating school children for intestinal worms reduced absenteeism. That result might ‘generalise to’ (i.e., also be true in) another place where intestinal worms are prevalent. But that result would not ‘generalise to’ (i.e., be true in) Scotland, where intestinal worms are not prevalent and do not cause school absences.
Interrupted time series study
A study that uses observations at multiple time points before and after an intervention (the “interruption”). The design attempts to detect whether the intervention has a significantly greater effect than any underlying trend over time.
Interventions
The programme or policy which is implemented. All studies on the EGM look at the effect of some intervention(s) on some outcome(s). Four intervention categories were included in this EGM and Guidebook: prevention, disclosure, response and treatment.
Institution
“Institution” means any public or private organisation, body, agency, association, club, or other entity (incorporated or unincorporated) that also provides, or has at any time provided, activities, facilities, programs or services of any sort that provide the means through which adults have contact with children, including through their families. Institutions included in the EGM include: kindergarten/preschool/centre-based early childhood education and care settings; schools/before and after‐school care settings; sport and recreation settings; dance, drama and music studios/schools; churches/religious institutions; summer/vacation camps; out‐of‐home care settings (including foster care, residential care, orphanages); detention centres/juvenile justice settings; rescue centres; primary and secondary health care facilities, etc.
Independent variable
The variable that the researcher expects to be associated with an outcome of interest. For example, if a researcher wants to examine the relationship between parental education and children’s language development, parent education (years of schooling or highest level of education completed) is the independent variable. Sometimes, this variable is referred to as the treatment or causal variable.
Meta-analysis
Meta-analysis is a statistical technique that combines the findings of multiple studies to explore the overall effect or pattern of results. It involves a systematic review of the primary studies, analyses their effect sizes, and combines them to provide a more comprehensive understanding of the topic. The studies need to assess similar programmes and outcomes in similar populations to be meaningful. Meta-analysis is less subject to bias than any individual primary study, so it gives the most precise possible view of all the evidence.
Mid-line data
The data collected from / about study participants part-way through the programme. For example, a programme to improve literacy might test children’s reading beforehand (this is the ‘base-line’ data), during the programme (‘mid-line’) and after it (‘end-line’), and compare them.
Non-randomised trial
A study in which people are allocated to different interventions using methods that are not random. The allocation might be based on characteristics or criteria that researchers believe are relevant to the study: or the allocation may be outside researchers’ control and possibly create bias, e.g., if the programme only serves people who score highly in a selection test, then results may be due to the programme, or due to the fact that those people were able to score highly already.
Outcomes
The measured behaviours, attitudes, or other characteristics which research seeks to explain. A study may have one or more outcomes of interest. Outcomes may be measured at various levels, e.g., communities, schools, nurseries, churches, classrooms, families and children.
P-Value
The probability that the results of a statistical test were due to chance. The p-value is a statistical measure to assess the strength and significance of research findings. A p-value greater than .05 is usually interpreted to mean that the results were not statistically significant. Sometimes, researchers use a p-value of .01 or a p-value of .10 to indicate whether a result is statistically significant. The lower the p-value, the more rigorous the criteria for concluding the significance
Protocol
A document that outlines the background, objectives, location/context and methods and analyses for planned a study. It is the ‘recipe’ for a study, and normally published before the study starts.
PICOS
PICOS (population, intervention, comparison, outcome and study design) is a structured approach for describing the findings of impact evaluations. For example, a particular reading programme (the intervention) found a 20% increase in learning levels (outcome) in children aged 5-6 in Cambodia (population) compared to standard ways of teaching (comparison), in a study with random allocation (study design).
Propensity score matching and other matching designs
Propensity score matching creates sets of participants for treatment and control groups. A matched set consists of at least one participant in the treatment group and one in the control group with similar propensity scores. The technique attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment.
Publication bias
Publication bias is the selective publication of research findings. An example is that, sometimes, only statistically significant results are published and null or inconclusive results are not; another example is when only negative findings are published and positive results are withheld. In all these cases, the published results give an inaccurate idea of how well the intervention works.
Primary research
Collecting and analysing data from people or organisations. Primary research normally involves field work, e.g., conducting observations, interviews, surveys, or gathering other measurements such as people’s mental health, height, test scores, or whether organisations have safeguarding policies or collaborate with other organisations.
Quality of study
The quality of studies refers to various attributes and characteristics that make a study reliable, valid and credible. It measures the rigour and competence of the research process, including study design, data collection methods, data analysis techniques, and ethical considerations. The quality of studies can significantly influence how confident researchers and policymakers can be in the findings and recommendations derived from it. The word ‘quality’ is less used now, in place of the term ‘confidence’.
Quasi-experimental study design
A quasi-experimental research design closely resembles a randomised controlled trial (RCT) but lacks random assignment of participants or control groups. It is commonly used in social sciences, particularly when random assignment is not feasible or ethical. In an RCT, the two (or more) groups should be equivalent except for receiving the programme, and hence any differences in outcomes at the end must due to the programme: but in an QED, there may be other differences between the groups so the findings are less reliable than for an RCT.
Randomisation
Assigning people in a research study to groups without taking account of any similarities or differences between them. For example, randomisation might involve drawing numbers from a hat or using a computer-generated random sequence. Each individual (or each group in the case of cluster randomisation) has the same chance of having the intervention. The point of an RCT is that the two (or more) groups should be equivalent except for receiving the programme, and hence any differences in outcomes at the end must be due to the programme.
Randomised controlled trial
A randomised controlled trial is a study in which people or organisations are randomly assigned to two or more groups to test an intervention. One group (the experimental group) receives the intervention (e.g., gets the new reading programme), whereas the other (the comparison or control group) has an alternative intervention (e.g., teaching as usual) or no intervention at all. The ‘baseline’ measures are taken from all the groups at the beginning, before the intervention (e.g., children’s reading levels). After the intervention, the ‘endline’ measures are taken at the end (e.g., those would be children’s reading levels again). The ‘endline’ measures for the intervention and control groups are compared statistically. [There may also be mid-line data from part-way through]. The point of an RCT is that the two (or more) groups should be equivalent at the beginning except for receiving the programme, and hence any differences in outcomes at the end must due to the programme.
Regression discontinuity designs
A quasi-experimental, pretest-posttest control group design that is characterised by its unique method of assignment to intervention. Participants are assigned to either the intervention group or control group solely on the basis of a cut-off score on a pre-test measure. For example, studies of the effect on scoring a First class degree obviously cannot be randomised, so they might study students who are just above the grade boundary vs. those who are just below: their academic attainment is very similar, so over their lifetime, the effect of scoring a First class degree vs. a Second class degree may be detectable.
Reliability
Reliability measures the consistency and reproducibility of a study’s findings. In a ‘reliable’ study, the process would produce similar results if repeated by different researchers. For example, in assessing the impact of foster care placement on academic performance, it is essential to use standardised assessment tools and evaluate the participants multiple times to ensure reliability.
Several measures are commonly used to assess the reliability of social science research:
- Test-Retest reliability: For example, an IQ is only reliable if you get the same score on it when you take it on different days. If your score changes – e.g., because you can learn the test, or the results are affected by your mood that day – it is not a reliable test.
- Internal Consistency: Internal consistency measures the extent to which items within an instrument or measure are related. High internal consistency reflects high reliability, indicating that the items measure the same construct.
- External Consistency: External consistency measures the similarity between scores obtained on the same instrument by different researchers or in other settings. For example, a reliable test does not rely on the opinion of the researcher.
- Intraclass Correlation Coefficient (ICC): This measures the agreement or correlation between scores on different versions of the same measure. High ICC values indicate high reliability, indicating that the measurement is consistent across versions.
- Reliability Indices: Reliability indices, such as Cronbach’s alpha or McDonald’s omega, provide a summary measure of reliability. These indices consider the number of items and the agreement between items to determine the overall reliability of the measure.
Risk of bias
This is the chance that a study’s results are inaccurate or misleading because of how the study was run. For example, in a randomised controlled trial, if only the children with higher reading ages are given the new reading programme, they will probably gain disproportionately from it. The study will overstate the effect of the new programme (i.e., how much it helped those children relative to the children with lower reading ages who were taught as normal). Equally, if a study says that it will measure 10 outcomes but only reports on four of them, we have no idea what happened on the other six: perhaps the researchers have only reported outcomes on which the programme succeeded, which creates bias.
Statistical power
The ability to detect meaningful differences between groups or variables when conducting research. A study has ‘statistical power’ if it is big enough to distinguish between the effect of the programme vs. the effect of other factors. These are sometimes respectively called ‘the signal’ and ‘the noise’. If the programme effect is expected to be small, the sample needs to be large.
Imagine a study examining the effect of placing children in foster care on their academic performance. To study this, the researcher needs to determine the sample size and calculate the statistical power. The sample size is the number of participants needed to provide sufficient data to detect any real differences between the groups. The statistical power represents the probability that the study will be able to detect a true effect, even if it is small. It quantifies the study’s ability to detect meaningful differences in outcomes, allowing for confidence in the findings. The researcher needs to ensure that the sample size is large enough to detect a significant difference in the academic performance of children in foster care compared to their peers.
Sample
A group that is selected from a larger group (the population). The researcher tries to draw valid conclusions about the sample by studying the population. For example, researchers may study a class of seven-year-olds and hope to make conclusions about all seven-year-olds in that country.
Sampling
The process of selecting a sample (i.e., the subgroup of a population) that will be used to represent the entire population.
Sampling bias
Distortions that occur when some members of a population are systematically excluded from the sample selection process.
Sample size
The number of participants in the study.
The sample size is crucial because it directly affects the reliability and extent to which you can generalise the study’s findings to the larger population (i.e., the extent to which the study’s findings would be true of individuals or organisations who were not in this particular study, such as whether the programme would have the same effects on them.)
Statistical analysis
The process of gathering and analysing data using statistical methods. It involves the application of mathematical techniques, such as descriptive statistics, inferential statistics, and statistical modelling, to uncover patterns and relationships in data. Statistical analysis aims to extract meaningful insights from large datasets and make predictions or generalisations about a population based on the data collected.
Statistical significance
Statistical significance refers to the probability or likelihood that the difference between the observed results and those expected by chance is genuine and not due to random chance. It measures how confident we can be in our findings. In social science research, statistical significance is usually expressed at different levels of confidence. These levels include:
- P-value: The p-value is the probability of obtaining the observed results, or more significant if the null hypothesis were true. A p-value of less than a predetermined significance level (usually 0.05) is generally considered statistically significant.
- Cohen’s d: Cohen’s d is a measure of effect size that quantifies the difference between two groups. It is considered statistically significant when it reaches a predetermined threshold (usually 0.5).
- Effect size: Effect size refers to the magnitude of the observed effect. It is considered statistically significant when it exceeds a predetermined threshold (usually 0.5).
- Odds ratio: The odds ratio is a ratio that compares the odds of an event occurring in one group versus another. It is considered statistically significant when it exceeds a predetermined threshold (usually 1.5 or 2).
Systematic reviews
A systematic review synthesises and summarises all the evidence (which meets stated quality levels) on a particular topic. It is a comprehensive, structured and rigorous process to identify, select, and evaluate relevant studies and critically appraise the confidence and relevance. Systematic reviews are conducted to answer specific research questions or to inform decision-making in healthcare, social sciences, or any other field where evidence is needed.
To make a systematic review of all the evidence on a particular topic, researchers define their research question, systematically search the literature for relevant studies, and then synthesise them together. “Instead of just mooching through the research literature, consciously or unconsciously picking out papers here and there that support [our] pre-existing beliefs, [we] take a scientific, systematic approach to the very process of looking for scientific evidence, ensuring that [our] evidence is as complete and representative as possible of all the research that has ever been done.” – Professor Dr Ben Goldacre, Bad Science (2012)
Target population
The type of people or organisations studied in a piece of research. For example, for the EGM itself, the target population was children under 18 years at the point of baseline measurement and living in and engaging in activities in institutional settings. Although children were the key target population, study participants could also be adults, such as perpetrators of institutional child maltreatment, as well as staff and youth-serving organisations.
Theory
Hypothesised relationship between various phenomena or characteristics, including results. Theories should be specific enough to be testable with a well-designed research study.
Treatment effect
The difference in outcomes or characteristics between individuals or groups exposed to treatment (intervention) and those not. It measures the effectiveness of the intervention or program being evaluated.
Unit of analysis
The individuals, groups of people, or organisations are analysed in a study. For example, if an analysis examines children’s well-being, children are the unit of analysis. If an analysis examines family income, families are the unit of analysis. For some analyses, classrooms (the mean quality of early childhood classrooms) or organisations (e.g., churches or schools) are the units of analysis.
Validity
This refers to the extent to which a study accurately measures and represents the phenomenon it aims to study. It is a critical aspect of scientific inquiry and plays a significant role in determining the reliability and generalizability of research findings.
- Internal validity: Internal validity measures the extent to which a study’s findings are reliable, i.e., can be attributed to the stated cause or independent variable, rather than to other factors. For example, in a study examining the relationship between foster care placement and academic performance, internal validity would rely on controlling for factors such as prior academic attainment and socioeconomic status, since those might otherwise affect outcomes. If a study is not ‘internally valid’, its conclusions are not supported by its data and it was not well run.
- External validity: External validity refers to the extent to which a study’s findings will be true in the broader population or context. It relates to factors such as sample size, representativeness, and the applicability of the research findings to real-world situations. For example, a study on the impact of foster care placement on academic performance may be limited to the specific foster care system where the study was conducted, and the findings may not generalise. Note that if a study is not ‘externally valid’, that may not be a failing of the study: rather, it may just reflect that reality is different in different places.
Variance
Variance is a statistical measure that quantifies the amount and spread of variation within a dataset. It helps us understand how the scores or values of different variables vary and is a key concept in statistical analysis. It represents the difference between observed values and the mean or average value. It is calculated as the sum of squared deviations from the mean divided by the number of observations. The greater the variance, the greater the dispersion or spread of the variables in the dataset. It helps us identify patterns, trends, and outliers and allows for more meaningful comparisons and inferences to be drawn.
Note:
The descriptions included in the glossary are adapted from definitions from organisations such as Campbell Collaboration, Cochrane Collaboration, International Initiative of Impact Evaluation (3ie), National Institute for Health and Care Excellence (NICE), World Health Organisation (WHO).