CSUF Department of Psychology
Unit Banner

Statistical Relationships

Overview

It is often interesting to know about people’s scores on individual variables: the percentage of people in the U.S. who are depressed, the average driving speed on Highway 41, the range of sizes of the normal human brain, and so on.  However, most psychological research concerns relationships between variables.  Of course, there are lots of ways in which variables can be related to each other.  For example, the variables “depression” and “anger” both involve negative emotions.  The variables “working memory capacity” and “vocabulary size” are both related to intellectual functioning.  But these are not the kind of relationships that concern us here.  Instead, we are interested in statistical relationships between variables.  In general, there is a statistical relationship between two variables when the average score on one variable differs reliably across the values or levels of the other variable.

So although the percentage of depressed people in the U.S. might be interesting, it might be even more interesting to study the relationship between depression (Variable 1) and where people live, specifically whether they live in urban or rural environments (Variable 2).  If the average depression score differs across locations, then there is a relationship between these variables.  Similarly, although it might be interesting to know how large human brains are in general, it might be more interesting to know whether there is a relationship between brain size (Variable 1) and intelligence (Variable 2).  If the average intelligence score differs across people who have small, medium, large, and extra-large brains, then there is a relationship between these variables.

Two Key Points

One important point about statistical relationships is that they tend to be probabilistic rather than deterministic.  That is, they are general rules that probably, but not necessarily, apply to any individual case.  There will almost always be exceptions.  For example, there is a statistical relationship between pet ownership and stress level if the average stress level is lower for pet owners than for non-owners.  The fact that there are some relatively stressed out pet owners and some relatively calm non-owners would not invalidate the statistical relationship.

The failure to understand this point leads to the following kind of mistake.  Brenda reads in the newspaper that psychological research has shown a statistical relationship between sex (male vs. female) and map reading skill: men are better at reading maps.  But Brenda says, “That can’t be true because my friend Rachael is great at reading maps while my friend Ted is terrible.”  Brenda’s mistake is that she assumes that the general rule—the statistical relationship—must apply to each and every individual person.  But even if men perform better on average, there can still be high-performing women and low-performing men.  The case of Rachael and Ted is not inconsistent with the claim of a statistical relationship between sex and map reading skill. 

A second important point about statistical relationships is they do not rule out the possibility that either variable is statistically related to other variables.  For example, to say that sex is statistically related to map reading ability is not to say that sex is the only variable related to map reading ability.  Certainly, amount of practice, general spatial reasoning skills, genetic factors, and so on are also related to map reading ability.  So a kind of mistake associated with this point would be if Brenda reasoned that sex cannot be related to map reading ability because practice reading maps must be related to map reading ability.  In fact, there is no reason that both variables cannot be related to map reading ability.

Two Kinds of Statistical Relationships

There are two basic kinds of statistical relationships that you will see frequently in psychological research.  We will refer to them as D relationships and R relationships.  D relationships are differences between two group means.  R relationships are relationships between two quantitative variables. 

Although D relationships and R relationships are basically the same thing (i.e., differences in the average level of one variable across levels of a second variable), they are often graphed, analyzed, and discussed in ways that seem very different.  For example, although both D and R relationships can be referred to as “correlations,” R relationships are more likely to be referred to this way.  D relationships are more likely to be called “differences.”

D Relationships

D relationships are differences between two group means.  The D stands for “difference.”  In D relationships, one variable defines two groups.  In a previous example, one variable was whether or not a person lives in an urban or rural environment.  Note that this variable defines two groups of people: urban dwellers and rural dwellers.  There is a statistical relationship, then, if the mean level of a second variable (e.g., depression) differs across those two groups.  

As another example, consider a psychologist who uses a new technique for treating depression on 20 depressed clients and a standard technique on 20 other depressed clients.  The technique (new vs. standard) is a categorical variable that defines two groups.  If the psychologist also measures each client’s level of depression at the end of therapy, then she can compare the average level of depression across the two groups.  If the average level of depression is lower among the new-technique clients than among the standard-technique clients, then there is a statistical relationship between these variables.  The new technique works better than the old. 

gender diffSex differences in behavior are also D relationships.  Sex (female vs. male) is a categorical variable that defines two groups.  Whenever researchers compare men and women in terms of their average levels of some variable (e.g., aggression, spatial reasoning, frequency of smiling), they are looking for D relationships.

diff-politicsD relationships are usually represented graphically using bar graphs.  The x-axis of the bar graph represents the two groups and the y-axis represents the second variable.  The means of the two groups, then, are represented as bars.  Here are two examples.  Note that the first shows a relationship between sex and happiness (because there is a difference between the two means) but the second shows no relationship between political party preference and honesty (because there is no difference).

R Relationships

R relationships are relationships between two quantitative variables.  (The R does not really stand for anything, but you will see why we use this terminology very soon.)  In R relationships, the mean score on one quantitative variable differs systematically across levels of the other quantitative variable.

For example, imagine that a health psychologist measures the number of close friends that people have.  This is obviously a quantitative variable.  She also measures people’s systolic blood pressure—another quantitative variable.  The health psychologist can now compute the mean systolic blood pressure for people with 0 close friends, with 1 close friend, with 2 close friends, and so on.  There is a statistical relationship if the mean systolic blood pressure differs systematically across the different numbers of close friends.

This kind of R relationship would normally be presented graphically in the form of a line graph.  A line graph works well when one quantitative variable has a small number of values.  In this case, the number of close friends has a small number of values.  It can be 0, 1, 2, … , and is unlikely to be greater than 8 or 10.  This variable would go on the x-axis.  The second variable would go on the y-axis.  Specifically, you would compute the mean systolic blood pressure for each number of close friends and line graphplot this mean as a point on the graph.  Finally, you would connect these points with lines.  See the example to the side, which shows very clearly that as the number of close friends a person has increases, their systolic blood pressure decreases.  This is a good example of an R relationship.

 

As a second example, imagine that a college professor asks students how long they spent studying for an exam and, of course, records their scores on that exam.  There is a statistical relationship if the mean exam score differs systematically across the different study times.  To be more specific, you might expect that in general students who studied more tended to achieve better scores. 

This kind of R relationship would normally be presented graphically in the form of a scatterplot.  A scatterplotscatterplot works well when both variables have many different values (say, more than 10).  The important difference between a scatterplot and a line graph is that whereas the points in a line graph represent the average score for a group of cases, the points in a scatterplot represent individual cases.  See the example to the side, which shows the expected relationship.

With R relationships, we are usually looking for certain kinds of patterns.  One pattern is a positive relationship, in which higher scores on one variable are associated with higher scores on the other.  The second example above (the scatterplot) shows a positive relationships.  Another pattern is a negative relationship, in which higher scores on one variable are associated with lower scores on the other.  The first example above (the line graph) shows a negative relationship.  Both of these relationships are roughly linear.  That is, the points fall roughly along a straight line.  Often, however, relationships between quantitative variables are non-linear.  The general relationship between stress and task performance is often conceptualized as non-linear.  As stress increases from none to a moderate amount, task performance increases, but as stress increases from moderate to extreme, task performance decreases.  Plotted as a line graph or scatterplot, this relationship would look like an upside-down U.  For now, however, let us think mainly in terms of linear—positive and negative—relationships.  You will learn more about non-linear relationships if you take more advanced methodology and statistics courses.