19 Statistical reasoning: from one to two variables
Alice Bloch
1. Using the data matrix in Table 19.1 draw a frequency distribution for the variable Working. Do the same for Age, recoding it into three broad categories. Draw a bar chart of these distributions. Calculate the mean, median and mode for each variable.
Construct contingency tables that show the relationship between: Sex and Working; Sex and Jobsat; Working and Jobsat. Ensure that each cell contains a count and column and row percentages. Describe the character of the relationships which you find.
Draw a scattergram, plotting Age against Jobsat. Describe the character of this relationship.
Using the recoded version of Age construct contingency tables showing the relationship between this variable and each of the other three variables. Describe the character of the relationships you find.
If you are learning SPSS or another statistical package try inputting these data. You will find it easier to get the computer to do the analyses specified above. You can also generate tests of association and significance and consider the meaning of these. Try using the software to produce output in the form of graphs (e.g., pie charts, histograms).
Table 19.1. A data matrix
Variables or questions |
|
Sex |
Age |
Working |
Jobsat |
Case 1 |
Male |
66 |
No |
Missing |
Case 2 |
Female |
34 |
Full time |
1 |
Case 3 |
Female |
25 |
Part time |
2 |
Case 4 |
Female |
44 |
Full time |
5 |
Case 5 |
Male |
78 |
No |
Missing |
Case 6 |
Male |
40 |
Full time |
2 |
Case 7 |
Male |
33 |
Full time |
1 |
Case 8 |
Male |
16 |
No |
Missing |
Case 9 |
Female |
35 |
Full time |
1 |
Case 10 |
Female |
45 |
Full time |
2 |
Case 11 |
Male |
30 |
No |
Missing |
Case 12 |
Female |
56 |
Part time |
4 |
Case 13 |
Male |
79 |
No |
Missing |
Case 14 |
Male |
60 |
Part time |
4 |
Case 15 |
Female |
55 |
Part time |
4 |
Case 16 |
Female |
54 |
Part time |
5 |
Case 17 |
Male |
55 |
Full time |
1 |
Case 18 |
Male |
17 |
No |
Missing |
Case 19 |
Female |
23 |
Full time |
3 |
Case 20 |
Female |
20 |
No |
Missing |
2. Table 19.2 consists of four contingency tables demonstrating different types of relationship between the two variables of social class and home ownership. Below each is a p-value and the result of a test of association (Q). For each table, describe the character of the relationship and explain why the p-values and tests of association vary.
Table19.2. Tables showing different relationships between social class and home ownership (column %)
(a) (b)
Social class Social class
Home ownership |
Lower |
Middle |
Upper |
Home ownership |
Lower |
Middle |
Upper |
Owner |
20 |
30 |
50 |
Owner |
60 |
40 |
3 |
Private, rented |
30 |
40 |
30 |
Private, rented |
35 |
35 |
45 |
Council, rented |
50 |
30 |
20 |
Council, rented |
5 |
25 |
52 |
p <0.01, Q = 0.6 p <0.01, Q = –0.8
(c) (d)
Social class Social class
Home ownership |
Lower |
Middle |
Upper |
Home ownership |
Lower |
Middle |
Upper |
Owner |
33 |
32 |
36 |
Owner |
56 |
10 |
59 |
Private, rented |
30 |
28 |
33 |
Private, rented |
23 |
20 |
22 |
Council, rented |
37 |
40 |
31 |
Council, rented |
21 |
70 |
19 |
p <0.05, Q = 0.04 p <0.01, Q = –0.02
Table 20.1. Demonstration of the elaboration paradigm: the relationship between education and income, as affected by gender
(a)(i) Zero-order table showing an association
Educational achievement |
Income |
High |
Low |
High |
120 (60%) |
80 (40%) |
Low |
80 (40%) |
120 (60%) |
Total |
200 (100%) |
200 (100%) |
p = 0.00006; phi = 0.20; gamma = 0.38
(a)(ii) Replication
Men Women
Educational achievement |
Educational achievement |
Income |
High |
Low |
Income |
High |
Low |
High |
40 (61%) |
26 (39%) |
High |
80 (60%) |
54 (40%) |
Low |
26 (39%) |
40 (61 %) |
Low |
54 (40%) |
80 (60%) |
Total |
66 (100%) |
66 (100%) |
Total |
134 (100%) |
134 (100%) |
p = 0.01481; phi = 0.21; gamma = 0.41 p = 0.00149; phi = 0.19; gamma = 0.37
(a)(iii) Spurious or intervening
Men Women
Educational achievement |
Educational achievement |
Income |
High |
Low |
Income |
High |
Low |
High |
112 (78%) |
64 (76%) |
High |
8 (14%) |
16 (14%) |
Low |
32 (22%) |
20 (24%) |
Low |
48 (86%) |
100 (86%) |
Total |
144 (100%) |
84 (100%) |
Total |
56 (100%) |
116 (100%) |
p = 0.78290; phi = 0.02; gamma = 0.04 p = 0.93038; phi = 0.01; gamma = 0.02
(a) (iv) Specification
Men Women
Educational achievement |
Educational achievement |
Income |
High |
Low |
Income |
High |
Low |
High |
90 (60%) |
50 (33%) |
High |
30 (60%) |
30 (60%) |
Low |
60 (40%) |
100 (67%) |
Low |
20 (40%) |
20 (40%) |
Total |
150 (100%) |
150 (100%) |
Total |
50 (100%) |
50 (100%) |
p <0.00000; phi = 0.27; gamma = 0.50 p = 1.00000; phi =0.00; gamma = 0.00
(b)(i) Zero-order table showing no association
Educational achievement |
Income High Low |
High 120 (60%) 120 (60%) |
Low 80 (40%) 200 (100%) |
Total 80 (40%) 200 (100%) |
p = 1.00000; phi = 0.00; gamma = 0.00
(b) (ii) Suppressor
Men Women
Educational achievement |
Educational achievement |
Income |
High |
Low |
Income |
High |
Low |
High |
20 (67%) |
20 (20%) |
High |
100 (59%) |
100 (100%) |
Low |
10 (33%) |
80 (80%) |
Low |
70 (41%) |
0 (0%) |
Total |
30 (100%) |
100 (100%) |
Total |
170 (100%) |
100 (100%) |
p < 0.00000; phi = 0.43; gamma = 0.78 p <0.00000; phi = -0.45; gamma = -1.00
3. This is a structured exercise in reading a statistical table that aims to give you a general strategy for perceiving the main messages of such tables. You could apply this approach to Table 17.1 in the book, or find tables as suggested in the fourth workshop and discussion exercise associated with Chapter 17.You will find that not all of the questions are relevant to every table, but experience has shown that these steps, if followed carefully, enable a deeper understanding of any statistical table.
(a) Read the title before you look at any numbers. What does this reveal about the content of the table?
(b) Look at the source: Who produced the data, with what purpose? Was it a census or a sample?
(c) Look at any notes above or below the table. How will they influence its scope and your interpretation?
(d) Read the column and row titles. They indicate which variables are applied to the data.
(e) How many variables are there and what are they? Can any be considered independent or dependent?
(f) How are the variables measured? Are there any omissions or peculiarities in the measurement scale? How else might such a measure have been constructed?
(g) What units are used – percentages; thousands; millions? If you are dealing with percentages, then which way adds up to 100%?
(h) Look at the 'All' or 'Total' column. These are usually found on the right-hand column and/ or the bottom row (the 'margins' of a table). What do variations in the row or column tell you about the variables concerned?
(i) Now look at some rows and/or columns inside the table. What do these tell you about the relationships between variables? What social processes might have generated the trends you find?
(j) Is it possible to make causal statements about the relationship between variables? If so, do any of these involve the interaction of more than two variables?
(k) What are the shortcomings of the data in drawing conclusions about social processes?
(l) What other enquiries could be conducted to take this analysis further?
(m) Finally, consider the issue of whether the table reveals something about social reality, or creates a particular way of thinking about reality.
|