GuideStar Research
Contact Us FREE Research Tools Quotes & Proposals
Research Papers
White Papers
Survey Items
Qualitative Research
Request a
Survey Quote
Tests for Differences
Rarely are two or more percentages or averages in your data exactly the same.  Even when, in reality, there are no meaningful differences between groups or no real changes over time, there will be fluctuations in the data that arise solely from sampling or measurement error.  Consider flipping a coin.  Even though tossing either a  "heads" or a "tails" each has a probability of 0.5 (a 50-50 chance), you will not often get exactly five of each with ten tosses of the coin.  The same is true of taking any measurement from samples drawn from the population - even when there are no real  differences, the data you collect across a series of samples (two subgroups or two points in time can represent two samples) will rarely match perfectly across the samples.  One will usually seem "more" and the other "less."  Sometimes, by chance alone, some will be a lot more, and others a lot less.

As a result, it is often necessary to apply some statistical criterion (or criteria) to decide which differences are large enough that we will not simply attribute them to normal (random) fluctuations in the data.  In essence, we use a statistical test to help us decide how much of a difference is likely to be a real difference.

Which statistical tests we use is dependent on the nature of the data that we collect.  Generally, such tests weigh differences between groups (or between two or more time periods) against the variability that is seen in the data within each of the groups (or within each time point).  When differences between groups or over time grow larger than would be expected given the variability observed within groups or within individual time points, then they become less likely to be due to chance alone.  By convention, when the differences observed are likely to be seen less often than once in twenty times by chance alone (a probability of less than 0.5; expressed as " p  < .05"), then we will accept them as statistically significant.  If we make many comparisons, we will also consider the pattern of findings, so that occasional differences that arise within the context of many comparisons are evaluated within the context of the findings as a whole.  Ideally, individual significant differences will form a pattern that increases our confidence that they are meaningful.

The specific tests that are applied for any given comparison will vary depending on the nature of the data that are being analyzed.  Unless you have a solid background in measurement and statistics, it is likely that you will need the assistance of a professional analyst to conduct such tests and to interpret their findings.  At GuideStar Research, evaluating the significance of group differences and changes that occur over time is a routine part of our strategic consulting and reporting services.  

The most powerful statistical tests are referred to as parametric tests.  The t-test and analysis of variance procedures are common tests within this category.  These tests compare average (mean) scores across groups or within one or more groups over time.  Though more powerful than other alternatives, these procedures are designed for data that is collected using interval-level measurement and that is normally (or at least symmetrically) distributed.  If you do not have an interval level measure or if your data are not normally distributed, then these procedures may not be acceptable for your analyses.

Data that are collected using categorical variables (nominal or ordinal data) are usually analyzed with non-parametric statistical procedures. These include procedures that use a variety of approaches suitable to a range of different comparisons.  Some, for example, compare the observed frequencies in data tables to what we might expect from chance (e.g., the Chi2 test).   Others rank the data within each group and then compare ranks in each of the groups (e.g., the Mann-Whitney U test) or look at differences as simply "more" or "less" in pairs of data taken from the same group of people (e.g., the Wilcoxon Sign test).  When data are not normally distributed (e.g., there are unusually low or high values or the data are skewed towards higher or lower values), then these tests or other procedures for normalizing data may also be indicated.  Again, this is where the expert consulting we provide at GuideStar Research plays an important role in the research process.

As an alternative, you can also view differences in light of the margin(s) of error or the confidence interval(s).  Survey researchers should present you with the margin of error for estimates of percentages  along with the percentages themselves.  Differences that exceed 1.7 times the margin of error (this generally approximates the margin of error for difference scores) are usually meaningful and those that do not usually are not.  If the President has a 60% approval rating today and had a 55% approval rating last month, each with a margin of error of +4%, then the margin of error for differences is slightly larger than the change that was observed across the two measurement times.  As a result, we would suggest caution in interpreting this as a meaningful increase in approval ratings.  When using this kind of rule of thumb, however, be advised that margins of error that apply to the sample as a whole will be smaller than the margin of error for subgroups within the data.  When comparing two subgroups, you will need the margins of error for those groups to use this approach.

When averages (mean scores) are computed, we can also compute the confidence intervals for those averages.  When the confidence intervals for two averages (means) do not overlap, then it is also likely that the scores will be significantly different.  When they do overlap, then unless the groups are very large, they are unlikely to be significantly different.  Some  researchers prefer approaches like this (e.g., looking at the confidence interval for the difference between two scores) as it is a more conservative test. 
©2007 GuideStar Research 212 426-2333 | Contact Us Privacy Policy