Stat 350A: Chapter 4.8

Part 1: The Binomial Distribution

EXAMPLE 1: Suppose a student takes a quiz that consists of four multiple choice questions. Each

question has ve options. Unfortunately, the student did not review for the quiz and has to randomly

guess on each question (assume each guess is independent of one another). Let Y = number of

questions the student guesses correctly.

(A) Calculate P(Y = 0)

(B) Calculate P(Y = 5)

(C) Calculate P(Y = 2)

• For the above example, what if the number of questions was 20?

The Binomial Distribution

• Properties of the Binomial:

◦1) There are two possible outcomes for each trial: “success” and “failure”

◦2) There are a xed number (n) of trials or observations

◦3) Outcomes of the trials are independent of each other. If sampling without replacement,

then sample size should be less than 10% of the population size.

◦4) The Probability of Success is π and is the same for each observation. The Probability of

Failure is 1 – π.

• A binomial random variable, Y, is the count of successful outcomes of a binomial experiment.

• Binomial Distribution: The distribution of the count Y of successes in a binomial experiment with

parameters n and π. The distribution of this random variable Y is described as follows:

Y ~ Bin(n, π)

◦NOTE: Bernoulli = Bin(1, 0.5)

• Binomial Probabilities

◦If Y has a binomial distribution with n independent trials and probability π of success and the

possible values of Y = 0, 1, 2, …, n, then the binomial probability of Y = y successes is

y

n-y

P(Y = y) = nCy • π • (1-π)

◦The number of ways of arranging y successes among n trials is the binomial coe cient:

n!

=

y! (n – y)!

Part 2: Binomial Examples

EXAMPLE 1: A San Diego interviewer has found that one in ve of the people approached will gree

to take part in a survey on drug abuse. An interviewer approaches 7 people at random.

(A) Describe the distribution of Y, the number that will agree to take part in the survey.

(B) Find the probability that exactly four will agree to an interview.

(C) Find the probability that at most 2 will agree to an interview.

EXAMPLE 2: A marketing study found that twenty- ve percent of households own a gaming

console. Suppose we randomly select 5 households at random

(A) Describe the distribution of Y, the number of households that own a gaming console.

(B) What is the probability that exactly 3 of the 5 households own a gaming console.

(C) What is the probability that at least 2 of the 5 households selected will own a gaming console?

Part 3: Mean and Variance of the Binomial Distribution

• For a random variable Y, such that Y ~ Bin(n, π), then

‣ Mean = μ y = nπ

2

‣ Variance = σ y= nπ(1-π)

‣ Standard Deviation = σ = √nπ(1-π)

EXAMPLE 1: Suppose that 18% of movie-goers say that horror is their favorite genre. If a random

sample of 70 movie-goers is selected, what are the (a) mean, (b) variance, and (c) standard deviation

for the number of movie-goers that prefer horror?

EXAMPLE 2: Compute the mean and standard deviation of the Bernoulli distribution.

Stat 350A: Chapters 4.9-4.10

Part 1: Continuous Distributions

• A random variable is said to be continuous if the outcomes within any given interval of this

variable are not countable.

• Examples: Weight, height, income, GPA, speed of a car, time sleep per night, etc.

• For continuous variables, we cannot construct a probability distribution like we did for discrete

models as we cannot assign individual probabilities to all possible outcomes.

• Instead, we use the area under a density curve to calculate the probability of a random variable

taking values in a speci c interval. The probability density function for a continuous random

variable Y is a curve such that

◦1) The area under the curve for a particular interval, (a, b), is the probability that Y is in that

interval, i.e. P(a < Y < b).
◦2) P(Y = a) = 0 for any a.
◦3) The total area under the curve is one.
Part 2: The Standard Normal Distribution
• The standard normal distribution: A normal distribution whose mean is 0 and standard deviation
is 1, denoted as Z ~ N(0, 1), i.e., Z-scores.
• Bell-shaped and symmetric about 0. Thus, P(Z < -a) = P(Z > a).

• Areas under the standard normal distribution can be found in the Z Table (found on end of

package).

• The range of Z-scores is from -∞ < Z < ∞.
Part 3: Finding Areas and Percentiles Under the Standard Normal Curve
• Finding areas given a Z-score:
EXAMPLE: Use the Z Table to nd the following: Draw the picture rst, shade the region you want,
and look up the Z-score in the Z Table corresponding to that region.
(A) P(Z < -0.89)
(B) P(Z > 1.53)

(C) P(-0.89 < Z < 1.53)
• Finding Percentiles (i.e., nding Z given an area)
◦De nition of z p: P(Z < zp ) = p
EXAMPLE: In a standard normal model, nd the following percentiles.
(A) The 10th percentile.
(B) The top 20th percentile.
(C) The middle 80%.
Part 4: General Normal Distribution
• A normal random variable is denoted by
Y ~ N(μ, σ 2 )
μ = population mean
σ 2= population variance
• The standard deviation is the most common measure of spread used for normal curves and is a
natural ruler for comparing individual values to the mean.
• To determine how many standard deviations an observation is away from the mean, we can
standardize this value using the Z-score formula.
Ζ=
y-μ
σ
• The z-score tells us how many standard deviations an observation is above or below the mean.
• When we standardize into Z-scores,
◦The shape of the distribution does not change.
◦The center/mean becomes 0.
◦The spread/standard deviation becomes 1.
• Finding normal areas (direct normal):
‣ 1) Use the Z-score formula to standardize the given observation.
‣ 2) Use the Z Table to nd the area of that Z-score.
• Finding a value given a percentile (inverse normal):
‣ 1) Use the Z Table corresponding to the given percentile/area (draw a picture)
‣ 2) Use the inverse normal formula, y = zσ + μ
EXAMPLE 1: Suppose the length of a certain sh is normally distributed with a mean of 54 mm and
a variance of 20.25 mm^2.
(A) What percentage of this sh are less than 48 mm long?
(B) What percentage of this sh are more than 60 mm long?
(C) What percentage of this sh are between 51 and 60 mm long?
EXAMPLE 2: An athletic association wants to sponsor a footrace. The time it takes to run the course
is normally distributed with a mean of 58.6 minutes and a standard deviation of 3.9 minutes.
(A) What is the value of the rst quartile for this distribution?
(B) The association decides to have a tryout run and eliminate the slowest 30% of the racers. What
should the cuto time be in the tryout run for elimination?
Part 5: The Empirical Rule
• The Empirical Rule states that for any normal or approximately normal distribution, approximate
percentages under the curve can be estimated. Also referred to as the 68-95-99.7% rule, it states:
◦68% of the observations are within one standard deviation of the mean.
◦95% of the observations are within two standard deviations of the mean.
◦99.7% of the observations are within three standard deviations of the mean.
EXAMPLE: The IQ scores of a certain city follows a bell-shaped curve with a mean of 100 and
variance of 225. Using the Empirical Rule, nd the following.
(A) What percent of IQs feel between 70 and 115?
(B) What percent of IQs were above 130?
(C) 16% of IQs were below what value?
Probability
Table entry for z is
the area under the
standard normal curve
to the left of z .
z
TABLE A Standard normal probabilities
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
!3.4
!3.3
!3.2
!3.1
!3.0
!2.9
!2.8
!2.7
!2.6
!2.5
!2.4
!2.3
!2.2
!2.1
!2.0
!1.9
!1.8
!1.7
!1.6
!1.5
!1.4
!1.3
!1.2
!1.1
!1.0
!0.9
!0.8
!0.7
!0.6
!0.5
!0.4
!0.3
!0.2
!0.1
!0.0
.0003
.0005
.0007
.0010
.0013
.0019
.0026
.0035
.0047
.0062
.0082
.0107
.0139
.0179
.0228
.0287
.0359
.0446
.0548
.0668
.0808
.0968
.1151
.1357
.1587
.1841
.2119
.2420
.2743
.3085
.3446
.3821
.4207
.4602
.5000
.0003
.0005
.0007
.0009
.0013
.0018
.0025
.0034
.0045
.0060
.0080
.0104
.0136
.0174
.0222
.0281
.0351
.0436
.0537
.0655
.0793
.0951
.1131
.1335
.1562
.1814
.2090
.2389
.2709
.3050
.3409
.3783
.4168
.4562
.4960
.0003
.0005
.0006
.0009
.0013
.0018
.0024
.0033
.0044
.0059
.0078
.0102
.0132
.0170
.0217
.0274
.0344
.0427
.0526
.0643
.0778
.0934
.1112
.1314
.1539
.1788
.2061
.2358
.2676
.3015
.3372
.3745
.4129
.4522
.4920
.0003
.0004
.0006
.0009
.0012
.0017
.0023
.0032
.0043
.0057
.0075
.0099
.0129
.0166
.0212
.0268
.0336
.0418
.0516
.0630
.0764
.0918
.1093
.1292
.1515
.1762
.2033
.2327
.2643
.2981
.3336
.3707
.4090
.4483
.4880
.0003
.0004
.0006
.0008
.0012
.0016
.0023
.0031
.0041
.0055
.0073
.0096
.0125
.0162
.0207
.0262
.0329
.0409
.0505
.0618
.0749
.0901
.1075
.1271
.1492
.1736
.2005
.2296
.2611
.2946
.3300
.3669
.4052
.4443
.4840
.0003
.0004
.0006
.0008
.0011
.0016
.0022
.0030
.0040
.0054
.0071
.0094
.0122
.0158
.0202
.0256
.0322
.0401
.0495
.0606
.0735
.0885
.1056
.1251
.1469
.1711
.1977
.2266
.2578
.2912
.3264
.3632
.4013
.4404
.4801
.0003
.0004
.0006
.0008
.0011
.0015
.0021
.0029
.0039
.0052
.0069
.0091
.0119
.0154
.0197
.0250
.0314
.0392
.0485
.0594
.0721
.0869
.1038
.1230
.1446
.1685
.1949
.2236
.2546
.2877
.3228
.3594
.3974
.4364
.4761
.0003
.0004
.0005
.0008
.0011
.0015
.0021
.0028
.0038
.0051
.0068
.0089
.0116
.0150
.0192
.0244
.0307
.0384
.0475
.0582
.0708
.0853
.1020
.1210
.1423
.1660
.1922
.2206
.2514
.2843
.3192
.3557
.3936
.4325
.4721
.0003
.0004
.0005
.0007
.0010
.0014
.0020
.0027
.0037
.0049
.0066
.0087
.0113
.0146
.0188
.0239
.0301
.0375
.0465
.0571
.0694
.0838
.1003
.1190
.1401
.1635
.1894
.2177
.2483
.2810
.3156
.3520
.3897
.4286
.4681
.0002
.0003
.0005
.0007
.0010
.0014
.0019
.0026
.0036
.0048
.0064
.0084
.0110
.0143
.0183
.0233
.0294
.0367
.0455
.0559
.0681
.0823
.0985
.1170
.1379
.1611
.1867
.2148
.2451
.2776
.3121
.3483
.3859
.4247
.4641
Probability
Table entry for z is
the area under the
standard normal curve
to the left of z .
z
TABLE A Standard normal probabilities (continued)
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
.5000
.5398
.5793
.6179
.6554
.6915
.7257
.7580
.7881
.8159
.8413
.8643
.8849
.9032
.9192
.9332
.9452
.9554
.9641
.9713
.9772
.9821
.9861
.9893
.9918
.9938
.9953
.9965
.9974
.9981
.9987
.9990
.9993
.9995
.9997
.5040
.5438
.5832
.6217
.6591
.6950
.7291
.7611
.7910
.8186
.8438
.8665
.8869
.9049
.9207
.9345
.9463
.9564
.9649
.9719
.9778
.9826
.9864
.9896
.9920
.9940
.9955
.9966
.9975
.9982
.9987
.9991
.9993
.9995
.9997
.5080
.5478
.5871
.6255
.6628
.6985
.7324
.7642
.7939
.8212
.8461
.8686
.8888
.9066
.9222
.9357
.9474
.9573
.9656
.9726
.9783
.9830
.9868
.9898
.9922
.9941
.9956
.9967
.9976
.9982
.9987
.9991
.9994
.9995
.9997
.5120
.5517
.5910
.6293
.6664
.7019
.7357
.7673
.7967
.8238
.8485
.8708
.8907
.9082
.9236
.9370
.9484
.9582
.9664
.9732
.9788
.9834
.9871
.9901
.9925
.9943
.9957
.9968
.9977
.9983
.9988
.9991
.9994
.9996
.9997
.5160
.5557
.5948
.6331
.6700
.7054
.7389
.7704
.7995
.8264
.8508
.8729
.8925
.9099
.9251
.9382
.9495
.9591
.9671
.9738
.9793
.9838
.9875
.9904
.9927
.9945
.9959
.9969
.9977
.9984
.9988
.9992
.9994
.9996
.9997
.5199
.5596
.5987
.6368
.6736
.7088
.7422
.7734
.8023
.8289
.8531
.8749
.8944
.9115
.9265
.9394
.9505
.9599
.9678
.9744
.9798
.9842
.9878
.9906
.9929
.9946
.9960
.9970
.9978
.9984
.9989
.9992
.9994
.9996
.9997
.5239
.5636
.6026
.6406
.6772
.7123
.7454
.7764
.8051
.8315
.8554
.8770
.8962
.9131
.9279
.9406
.9515
.9608
.9686
.9750
.9803
.9846
.9881
.9909
.9931
.9948
.9961
.9971
.9979
.9985
.9989
.9992
.9994
.9996
.9997
.5279
.5675
.6064
.6443
.6808
.7157
.7486
.7794
.8078
.8340
.8577
.8790
.8980
.9147
.9292
.9418
.9525
.9616
.9693
.9756
.9808
.9850
.9884
.9911
.9932
.9949
.9962
.9972
.9979
.9985
.9989
.9992
.9995
.9996
.9997
.5319
.5714
.6103
.6480
.6844
.7190
.7517
.7823
.8106
.8365
.8599
.8810
.8997
.9162
.9306
.9429
.9535
.9625
.9699
.9761
.9812
.9854
.9887
.9913
.9934
.9951
.9963
.9973
.9980
.9986
.9990
.9993
.9995
.9996
.9997
.5359
.5753
.6141
.6517
.6879
.7224
.7549
.7852
.8133
.8389
.8621
.8830
.9015
.9177
.9319
.9441
.9545
.9633
.9706
.9767
.9817
.9857
.9890
.9916
.9936
.9952
.9964
.9974
.9981
.9986
.9990
.9993
.9995
.9997
.9998
An Introduction to
Statistical Methods
& Data Analysis
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
An Introduction to
Statistical Methods
& Data Analysis
Seventh Edition
R. Lyman Ott
Michael Longnecker
Texas A&M University
Australia • Brazil • Mexico • Singapore • United Kingdom • United States
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
This is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial
review has deemed that any suppressed content does not materially affect the overall learning experience. The publisher reserves the right to
remove content from this title at any time if subsequent rights restrictions require it. For valuable information on pricing, previous
editions, changes to current editions, and alternate formats, please visit www.cengage.com/highered to search by
ISBN#, author, title, or keyword for materials in your areas of interest.
Important Notice: Media content referenced within the product description or the product text may not be available in the eBook version.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
An Introduction to Statistical Methods and
Data Analysis, Seventh Edition
R. Lyman Ott, Michael Longnecker
Senior Product Team Manager:
Richard Stratton
Content Developer: Andrew Coppola
Associate Content Developer:
Spencer Arritt
Product Assistant: Kathryn Schrumpf
Marketing Manager: Julie Schuster
© 2016, 2010 Cengage Learning
WCN: 02-200-203
ALL RIGHTS RESERVED. No part of this work covered by the copyright
herein may be reproduced, transmitted, stored, or used in any form
or by any means graphic, electronic, or mechanical, including but not
limited to photocopying, recording, scanning, digitizing, taping, Web
distribution, information networks, or information storage and retrieval
systems, except as permitted under Section 107 or 108 of the 1976
United States Copyright Act, without the prior written permission of
the publisher.
Content Project Manager: Cheryll Linthicum
For product information and technology assistance, contact us at
Cengage Learning Customer & Sales Support, 1-800-354-9706.
Art Director: Vernon Boes
For permission to use material from this text or product,
submit all requests online at www.cengage.com/permissions.
Further permissions questions can be e-mailed to
permissionrequest@cengage.com
Manufacturing Planner: Sandee Milewski
Intellectual Property Analyst: Christina
Ciaramella
Intellectual Property Project Manager:
Farah Fard
Production Service and Compositor:
Cenveo Publishing Services
Photo and Text Researcher: Lumina
Datamatics, LTD
Copy Editor:
Illustrator: Macmillan Publishing Services/
Cenveo Publishing Services
Text and Cover Designer: C. Miller
Cover Image: polygraphus/Getty Images
Library of Congress Control Number: 2015938496
ISBN: 978-1-305-26947-7
Cengage Learning
20 Channel Center Street
Boston, MA 02210
USA
Cengage Learning is a leading provider of customized learning solutions
with employees residing in nearly 40 different countries and sales in more
than 125 countries around the world. Find your local representative at
www.cengage.com
Cengage Learning products are represented in Canada by
Nelson Education, Ltd.
To learn more about Cengage Learning Solutions, visit
www.cengage.com
Purchase any of our products at your local college store or at our
preferred online store www.cengagebrain.com
Printed in the United States of America
Print Number: 01 Print Year: 2015
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CONTENTS
Preface
PART 1
CHAPTER 1
1
2
Introduction 2
Why Study Statistics? 6
Some Current Applications of Statistics 9
A Note to the Student 13
Summary 13
Exercises 14
PART 2
Collecting Data
17
Using Surveys and Experimental Studies
to Gather Data 18
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Introduction and Abstract of Research Study 18
Observational Studies 20
Sampling Designs for Surveys 26
Experimental Studies 32
Designs for Experimental Studies 38
Research Study: Exit Polls Versus Election Results 48
Summary 50
Exercises 50
PART 3
CHAPTER 3
Introduction
Statistics and the Scientific Method
1.1
1.2
1.3
1.4
1.5
1.6
CHAPTER 2
xi
Summarizing Data
Data Description
3.1
3.2
3.3
3.4
3.5
3.6
3.7
59
60
Introduction and Abstract of Research Study 60
Calculators, Computers, and Software Systems 65
Describing Data on a Single Variable: Graphical Methods 66
Describing Data on a Single Variable: Measures of Central Tendency 82
Describing Data on a Single Variable: Measures of Variability 90
The Boxplot 104
Summarizing Data from More Than One Variable:
Graphs and Correlation 109
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
v
vi
Contents
3.8
3. 9
3.10
3.11
CHAPTER 4
Research Study: Controlling for Student Background
in the Assessment of Teaching 119
R Instructions 124
Summary and Key Formulas 124
Exercises 125
Probability and Probability Distributions
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
Introduction and Abstract of Research Study 149
Finding the Probability of an Event 153
Basic Event Relations and Probability Laws 155
Conditional Probability and Independence 158
Bayes’ Formula 161
Variables: Discrete and Continuous 164
Probability Distributions for Discrete Random Variables 166
Two Discrete Random Variables: The Binomial and the Poisson 167
Probability Distributions for Continuous Random Variables 177
A Continuous Probability Distribution: The Normal Distribution 180
Random Sampling 187
Sampling Distributions 190
Normal Approximation to the Binomial 200
Evaluating Whether or Not a Population Distribution Is Normal 203
Research Study: Inferences About Performance-Enhancing Drugs
Among Athletes 208
R Instructions 211
Summary and Key Formulas 212
Exercises 214
PART 4
CHAPTER 5
Analyzing THE Data, Interpreting the
Analyses, and Communicating THE Results
Inferences About Population Central Values
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
CHAPTER 6
149
231
232
Introduction and Abstract of Research Study 232
Estimation of m 235
Choosing the Sample Size for Estimating m 240
A Statistical Test for m 242
Choosing the Sample Size for Testing m 255
The Level of Significance of a Statistical Test 257
Inferences About m for a Normal Population, s Unknown 260
Inferences About m When the Population Is Nonnormal and n Is Small:
Bootstrap Methods 269
Inferences About the Median 275
Research Study: Percentage of Calories from Fat 280
Summary and Key Formulas 283
Exercises 285
Inferences Comparing Two Population Central
Values 300
6.1
6.2
Introduction and Abstract of Research Study 300
Inferences About m1 2 m2: Independent Samples 303
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Contents
6.3
6.4
6.5
6.6
6.7
6.8
6.9
CHAPTER 7
7.2
7.3
7.4
7.5
7.6
7.7
CHAPTER 8
366
Introduction and Abstract of Research Study 366
Estimation and Tests for a Population Variance 368
Estimation and Tests for Comparing Two Population Variances 376
Tests for Comparing t . 2 Population Variances 382
Research Study: Evaluation of Methods for Detecting E. coli 385
Summary and Key Formulas 390
Exercises 391
Inferences About More Than Two Population Central
Values 400
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
CHAPTER 9
A Nonparametric Alternative:
The Wilcoxon Rank Sum Test 315
Inferences About m1 2 m2: Paired Data 325
A Nonparametric Alternative:
The Wilcoxon Signed-Rank Test 329
Choosing Sample Sizes for Inferences About m1 2 m2 334
Research Study: Effects of an Oil Spill on Plant Growth 336
Summary and Key Formulas 341
Exercises 344
Inferences About Population Variances
7.1
vii
Introduction and Abstract of Research Study 400
A Statistical Test About More Than Two Population Means:
An Analysis of Variance 403
The Model for Observations in a Completely Randomized Design 412
Checking on the AOV Conditions 414
An Alternative Analysis: Transformations of the Data 418
A Nonparametric Alternative: The Kruskal–Wallis Test 425
Research Study: Effect of Timing on the Treatment
of Port-Wine Stains with Lasers 428
Summary and Key Formulas 433
Exercises 435
Multiple Comparisons
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
445
Introduction and Abstract of Research Study 445
Linear Contrasts 447
Which Error Rate Is Controlled? 454
Scheffé’s S Method 456
Tukey’s W Procedure 458
Dunnett’s Procedure: Comparison of Treatments to a Control 462
A Nonparametric Multiple-Comparison Procedure 464
Research Study: Are Interviewers’ Decisions Affected by Different
Handicap Types? 467
Summary and Key Formulas 474
Exercises 475
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
viii
Contents
CHAPTER 10
Categorical Data
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
10.9
10.10
10.11
CHAPTER 11
555
Introduction and Abstract of Research Study 555
Estimating Model Parameters 564
Inferences About Regression Parameters 574
Predicting New y-Values Using Regression 577
Examining Lack of Fit in Linear Regression 581
Correlation 587
Research Study: Two Methods for Detecting E. coli 598
Summary and Key Formulas 602
Exercises 604
Multiple Regression and the General Linear Model
12.1
12.2
12.3
12.4
12.5
12.6
12.7
12.8
12.9
12.10
12.11
12.12
CHAPTER 13
Introduction and Abstract of Research Study 482
Inferences About a Population Proportion p 483
Inferences About the Difference Between
Two Population Proportions, p1 2 p2 491
Inferences About Several Proportions:
Chi-Square Goodness-of-Fit Test 501
Contingency Tables: Tests for Independence
and Homogeneity 508
Measuring Strength of Relation 515
Odds and Odds Ratios 517
Combining Sets of 2 3 2 Contingency Tables 522
Research Study: Does Gender Bias Exist in the Selection of Students
for Vocational Education? 525
Summary and Key Formulas 531
Exercises 533
Linear Regression and Correlation
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
11.9
CHAPTER 12
482
Introduction and Abstract of Research Study 625
The General Linear Model 635
Estimating Multiple Regression Coefﬁcients 636
Inferences in Multiple Regression 644
Testing a Subset of Regression Coefﬁcients 652
Forecasting Using Multiple Regression 656
Comparing the Slopes of Several Regression Lines 658
Logistic Regression 662
Some Multiple Regression Theory (Optional) 669
Research Study: Evaluation of the Performance of an Electric Drill 676
Summary and Key Formulas 683
Exercises 685
Further Regression Topics
13.1
13.2
13.3
13.4
625
711
Introduction and Abstract of Research Study 711
Selecting the Variables (Step 1) 712
Formulating the Model (Step 2) 729
Checking Model Assumptions (Step 3) 745
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Contents
13.5
13.6
13.7
CHAPTER 14
Analysis of Variance for Completely
Randomized Designs 798
14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9
CHAPTER 15
15.5
15.6
15.7
15.8
16.6
16.7
865
Introduction and Abstract of Research Study 865
Randomized Complete Block Design 866
Latin Square Design 878

Factorial Treatment Structure in a Randomized Complete

Block Design 889

A Nonparametric Alternative—Friedman’s Test 893

Research Study: Control of Leatherjackets 897

Summary and Key Formulas 902

Exercises 904

The Analysis of Covariance

16.1

16.2

16.3

16.4

16.5

CHAPTER 17

Introduction and Abstract of Research Study 798

Completely Randomized Design with a Single Factor 800

Factorial Treatment Structure 805

Factorial Treatment Structures with an Unequal Number

of Replications 830

Estimation of Treatment Differences and Comparisons

of Treatment Means 837

Determining the Number of Replications 841

Research Study: Development of a Low-Fat Processed Meat 846

Summary and Key Formulas 851

Exercises 852

Analysis of Variance for Blocked Designs

15.1

15.2

15.3

15.4

CHAPTER 16

Research Study: Construction Costs for Nuclear Power Plants 765

Summary and Key Formulas 772

Exercises 773

917

Introduction and Abstract of Research Study 917

A Completely Randomized Design with One Covariate 920

The Extrapolation Problem 931

Multiple Covariates and More Complicated Designs 934

Research Study: Evaluation of Cool-Season Grasses for Putting

Greens 936

Summary 942

Exercises 942

Analysis of Variance for Some Fixed-, Random-,

and Mixed-Effects Models 952

17.1

17.2

17.3

17.4

17.5

Introduction and Abstract of Research Study 952

A One-Factor Experiment with Random Treatment Effects 955

Extensions of Random-Effects Models 959

Mixed-Effects Models 967

Rules for Obtaining Expected Mean Squares 971

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

ix

x

Contents

17.6

17.7

17.8

17.9

CHAPTER 18

Split-Plot, Repeated Measures,

and Crossover Designs 1004

18.1

18.2

18.3

18.4

18.5

18.6

18.7

18.8

CHAPTER 19

Nested Factors 981

Research Study: Factors Affecting Pressure Drops

Across Expansion Joints 986

Summary 991

Exercises 992

Introduction and Abstract of Research Study 1004

Split-Plot Designed Experiments 1008

Single-Factor Experiments with Repeated Measures 1014

Two-Factor Experiments with Repeated Measures on

One of the Factors 1018

Crossover Designs 1025

Research Study: Effects of an Oil Spill on Plant Growth 1033

Summary 1035

Exercises 1035

Analysis of Variance for Some Unbalanced

Designs 1050

19.1

19.2

19.3

19.4

19.5

19.6

19.7

Introduction and Abstract of Research Study 1050

A Randomized Block Design with One or More

Missing Observations 1052

A Latin Square Design with Missing Data 1058

Balanced Incomplete Block (BIB) Designs 1063

Research Study: Evaluation of the Consistency

of Property Assessors 1070

Summary and Key Formulas 1074

Exercises 1075

Appendix: Statistical Tables

Answers to Selected Exercises

References

Index

1085

1125

1151

1157

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

PREFACE

INDEX

Intended Audience

An Introduction to Statistical Methods and Data Analysis, Seventh Edition, provides

a broad overview of statistical methods for advanced undergraduate and graduate

students from a variety of disciplines. This book is intended to prepare students to

solve problems encountered in research projects, to make decisions based on data

in general settings both within and beyond the university setting, and finally to

become critical readers of statistical analyses in research papers and in news reports.

The book presumes that the students have a minimal mathematical background

(high school algebra) and no prior course work in statistics. The first 11 chapters

of the textbook present the material typically covered in an introductory statistics

course. However, this book provides research studies and examples that connect

the statistical concepts to data analysis problems that are often encountered in

undergraduate capstone courses. The remaining chapters of the book cover regression modeling and design of experiments. We develop and illustrate the statistical

techniques and thought processes needed to design a research study or experiment

and then analyze the data collected using an intuitive and proven four-step approach.

This should be especially helpful to graduate students conducting their MS thesis

and PhD dissertation research.

Major Features of Textbook

Learning from Data

In this text, we approach the study of statistics by considering a four-step process

by which we can learn from data:

1. Defining the Problem

2. Collecting the Data

3. Summarizing the Data

4. Analyzing the Data, Interpreting the Analyses, and Communicating

the Results

Case Studies

In order to demonstrate the relevance and critical nature of statistics in solving realworld problems, we introduce the major topic of each chapter using a case study.

The case studies were selected from many sources to illustrate the broad applicability of statistical methodology. The four-step learning from data process is illustrated through the case studies. This approach will hopefully assist in overcoming

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

xi

xii

Preface

the natural initial perception held by many people that statistics is just another

“math course.’’ The introduction of major topics through the use of case studies

provides a focus on the central nature of applied statistics in a wide variety of

research and business-related studies. These case studies will hopefully provide the

reader with an enthusiasm for the broad applicability of statistics and the statistical

thought process that the authors have found and used through their many years

of teaching, consulting, and R & D management. The following research studies

illustrate the types of studies we have used throughout the text.

●● Exit Polls Versus Election Results:

A study of why the exit polls

from 9 of 11 states in the 2004 presidential election predicted John

Kerry as the winner when in fact President Bush won 6 of the 11

states.

●● Evaluation of the Consistency of Property Assessors: A study to

determine if county property assessors differ systematically in their

determination of property values.

●● Effect of Timing of the Treatment of Port-Wine Stains with Lasers:

A prospective study that investigated whether treatment at a younger

age would yield better results than treatment at an older age.

●● Controlling for Student Background in the Assessment of Teaching:

An examination of data used to support possible improvements to

the No Child Left Behind program while maintaining the important

concepts of performance standards and accountability.

Each of the research studies includes a discussion of the whys and hows of the

study. We illustrate the use of the four-step learning from data process with each

case study. A discussion of sample size determination, graphical displays of the

data, and a summary of the necessary ingredients for a complete report of the statistical findings of the study are provided with many of the case studies.

Examples and Exercises

We have further enhanced the practical nature of statistics by using examples and

exercises from journal articles, newspapers, and the authors’ many consulting

experiences. These will provide the students with further evidence of the practical

usages of statistics in solving problems that are relevant to their everyday lives.

Many new exercises and examples have been included in this edition of the book.

The number and variety of exercises will be a great asset to both the instructor and

students in their study of statistics.

Topics Covered

This book can be used for either a one-semester or a two-semester course. Chapters

1 through 11 would constitute a one-semester course. The topics covered would

include

Chapter 1—Statistics and the scientific method

Chapter 2—Using surveys and experimental studies to gather data

Chapters 3 & 4—Summarizing data and probability distributions

Chapters 5–7—Analyzing data: inferences about central values and

variances

Chapters 8 & 9—One-way analysis of variance and multiple

comparisons

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Preface

xiii

Chapter 10—Analyzing data involving proportions

Chapter 11—Linear regression and correlation

The second semester of a two-semester course would then include model building

and inferences in multiple regression analysis, logistic regression, design of experiments, and analysis of variance:

Chapters 11–13—Regression methods and model building: multiple regression and the general linear model, logistic regression, and building

regression models with diagnostics

Chapters 14–19—Design of experiments and analysis of variance: design

concepts, analysis of variance for standard designs, analysis of covariance, random and mixed effects models, split-plot designs, repeated

measures designs, crossover designs, and unbalanced designs

Emphasis on Interpretation, not Computation

In the book are examples and exercises that allow the student to study how to

calculate the value of statistical estimators and test statistics using the definitional

form of the procedure. After the student becomes comfortable with the aspects of

the data the statistical procedure is reflecting, we then emphasize the use of computer software in making computations in the analysis of larger data sets. We provide

output from three major statistical packages: SAS, Minitab, and SPSS. We find that

this approach provides the student with the experience of computing the value of the

procedure using the definition; hence, the student learns the basics b

ehind each procedure. In most situations beyond the statistics course, the student should be using

computer software in making the computations for both e xpedience and quality of

calculation. In many exercises and examples, the use of the computer allows for more

time to emphasize the interpretation of the results of the computations without having to expend enormous amounts of time and effort in the actual computations.

In numerous examples and exercises, the importance of the following aspects

of hypothesis testing are demonstrated:

1. The statement of the research hypothesis through the summarization

of the researcher’s goals into a statement about population

parameters.

2. The selection of the most appropriate test statistic, including sample

size computations for many procedures.

3. The necessity of considering both Type I and Type II error

rates (a and b) when discussing the results of a statistical test of

hypotheses.

4. The importance of considering both the statistical significance and

the practical significance of a test result. Thus, we illustrate the

importance of estimating effect sizes and the construction of confidence intervals for population parameters.

5. The statement of the results of the statistical test in nonstatistical

jargon that goes beyond the statement ‘‘reject H0’’ or ‘‘fail to

reject H0.’’

New to the Seventh Edition

●● There are instructions on the use of R code. R is a free software package

that can be downloaded from http:/ /lib.stat.cmu.edu/R/CRAN.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

xiv

Preface

Click your choice of platform (Linux, MacOS X, or Windows) for the

precompiled binary distribution. Note the FAQs link to the left for

additional information. Follow the instructions for installing the base

system software (which is all you will need).

●● New examples illustrate the breadth of applications of statistics to

real-world problems.

●● An alternative to the standard deviation, MAD, is provided as a

measure of dispersion in a population/sample.

●● The use of bootstrapping in obtaining confidence intervals and

p-values is discussed.

●● Instructions are included on how to use R code to obtain percentiles

and probabilities from the following distributions: normal, binomial,

Poisson, chi-squared, F, and t.

●● A nonparametric alternative to the Pearson correlation coefficient:

Spearman’s rank correlation, is provided.

●● The binomial test for small sample tests of proportions is presented.

●● The McNemar test for paired count data has been added.

●● The Akaike information criterion and Bayesian information criterion

for variable selection are discussed.

Additional Features Retained from Previous Editions

●● Many practical applications of statistical methods and data analysis

from agriculture, business, economics, education, engineering, medicine, law, political science, psychology, environmental studies, and

sociology have been included.

●● The seventh edition contains over 1,000 exercises, with nearly 400 of

the exercises new.

●● Computer output from Minitab, SAS, and SPSS is provided in

numerous examples. The use of computers greatly facilitates the use

of more sophisticated graphical illustrations of statistical results.

●● Attention is paid to the underlying assumptions. Graphical

procedures and test procedures are provided to determine if assumptions have been violated. Furthermore, in many settings, we provide

alternative procedures when the conditions are not met.

●● The first chapter provides a discussion of “What Is Statistics?” We

provide a discussion of why students should study statistics along with

a discussion of several major studies that illustrate the use of statistics

in the solution of real-life problems.

Ancillaries

Student Solutions Manual (ISBN-10: 1-305-26948-9;

ISBN-13: 978-1-305-26948-4), containing select worked solutions

for problems in the textbook.

l A Companion Website at www.cengage.com/statistics/ott, containing

downloadable data sets for Excel, Minitab, SAS, SPSS, and others,

plus additional resources for students and faculty.

l

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Preface

xv

Acknowledgments

There are many people who have made valuable, constructive suggestions for

the development of the original manuscript and during the preparation of the

subsequent editions. We are very appreciative of the insightful and constructive

comments from the following reviewers:

Naveen Bansal, Marquette University

Kameryn Denaro, San Diego State University

Mary Gray, American University

Craig Leth-Steensen, Carleton University

Jing Qian, University of Massachusetts

Mark Riggs, Abilene Christian University

Elaine Spiller, Marquette University

We are also appreciate of the preparation assistance received from Molly Taylor

and Jay Campbell; the scheduling of the revisions by Mary Tindle, the Senior

Project Manager at Cenveo Publisher Services, who made sure that the book

was completed in a timely manner. The authors of the solutions manual, Soma

Roy, California Polytechnic State University, and John Draper, The Ohio State

University, provided me with excellent input which resulted in an improved set of

exercises for the seventh edition. The person who assisted me the greatest degree

in the preparation of the seventh edition, was Sherry Goldbecker, the copy editor.

Sherry not only corrected my many grammatical errors but also provided rephrasing of many sentences which made for a more straight forward explanation of statistical concepts. The students, who use this book in their statistics classes, will be

most appreciative of Sherry’s many contributions.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

PART

1

Introduction

Chapter 1

St atistic s a nd the Sc ientific Method

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

CHAPTER 1

1.1

Introduction

1.2

Why Study Statistics?

1.3

Some Current

Applications of Statistics

1.4 A Note to the Student

Statistics and

the Scientific

Method

1.1

1.5

Summary

1.6 Exercises

Introduction

Statistics is the science of designing studies or experiments, collecting data, and

modeling/analyzing data for the purpose of decision making and scientific discovery when the available information is both limited and variable. That is, statistics is

the science of Learning from Data.

Almost everyone, including social scientists, medical researchers, superintendents of public schools, corporate executives, market researchers, engineers,

government employees, and consumers, deals with data. These data could be in the

form of quarterly sales ﬁgures, percent increase in juvenile crime, contamination

levels in water samples, survival rates for patients undergoing medical therapy,

census ﬁgures, or information that helps determine which brand of car to purchase.

In this text, we approach the study of statistics by considering the four-step process

in Learning from Data: (1) defining the problem, (2) collecting the data, (3) summarizing the data, and (4) analyzing the data, interpreting the analyses, and communicating the results. Through the use of these four steps in Learning from Data,

our study of statistics closely parallels the Scientific Method, which is a set of principles and procedures used by successful scientists in their p

ursuit of knowledge.

The method involves the formulation of research goals, the design of observational

studies and/or experiments, the collection of data, the modeling/analysis of the

data in the context of research goals, and the testing of hypotheses. The conclusion

of these steps is often the formulation of new research goals for a nother study.

These steps are illustrated in the schematic given in Figure 1.1.

This book is divided into sections corresponding to the four-step process in

Learning from Data. The relationship among these steps and the chapters of the

book is shown in Table 1.1. As you can see from this table, much time is spent discussing how to analyze data using the basic methods presented in Chapters 5–19.

2

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

1.1

Introduction

3

FIGURE 1.1

Scientific Method

Schematic

Formulate research goal:

research hypotheses, models

Design study:

sample size, variables,

experimental units,

sampling mechanism

TABLE 1.1

Organization of the text

Formulate new

research goals:

new models,

new hypotheses

Make decisions:

written conclusions,

oral presentations

Collect data:

data management

Draw inferences:

graphs, estimation,

hypotheses testing,

model assessment

The Four-Step Process

Chapters

1 Defining the Problem

2 Collecting the Data

3 Summarizing the Data

4 Analyzing the Data,

Interpreting the Analyses,

and Communicating

the Results

1 Statistics and the Scientific Method

2 Using Surveys and Experimental Studies to Gather Data

3 Data Description

4 Probability and Probability Distributions

5 Inferences about Population Central Values

6 Inferences Comparing Two Population Central Values

7 Inferences about Population Variances

8 Inferences about More Than Two Population Central Values

9 Multiple Comparisons

10 Categorical Data

11 Linear Regression and Correlation

12 Multiple Regression and the General Linear Model

13 Further Regression Topics

14 Analysis of Variance for Completely Randomized Designs

15 Analysis of Variance for Blocked Designs

16 The Analysis of Covariance

17 Analysis of Variance for Some Fixed-, Random-, and

Mixed-Effects Models

18 Split-Plot, Repeated Measures, and Crossover Designs

19 Analysis of Variance for Some Unbalanced Designs

However, you must remember that for each data set requiring analysis, someone

has defined the problem to be examined (Step 1), developed a plan for collecting

data to address the problem (Step 2), and summarized the data and prepared the

data for analysis (Step 3). Then following the analysis of the data, the results of the

analysis must be interpreted and communicated either verbally or in written form

to the intended audience (Step 4).

All four steps are important in Learning from Data; in fact, unless the problem to be addressed is clearly defined and the data collection carried out properly,

the interpretation of the results of the analyses may convey misleading information because the analyses were based on a data set that did not address the problem

or that was incomplete and contained improper information. Throughout the text,

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

4

Chapter 1

Statistics and the Scientific Method

we will try to keep you focused on the bigger picture of Learning from Data

through the four-step process. Most chapters will end with a summary section

that emphasizes how the material of the chapter fits into the study of statistics—

Learning from Data.

To illustrate some of the above concepts, we will consider four situations

in which the four steps in Learning from Data could assist in solving a real-world

problem.

1. Problem: Inspection of ground beef in a large beef-processing facility.

A beef-processing plant produces approximately half a million packages of ground beef per week. The government inspects packages

for possible improper labeling of the packages with respect to the

percent fat in the meat. The inspectors must open the ground beef

package in order to determine the fat content of the ground beef.

The inspection of every package would be prohibitively costly and

time consuming. An alternative approach is to select 250 packages

for inspection from the daily production of 100,000 packages. The

fraction of packages with improper labeling in the sample of 250

packages would then be used to estimate the fraction of packages

improperly labeled in the complete day’s production. If this fraction

exceeds a set specification, action is then taken against the meat

processor. In later chapters, a procedure will be formulated to determine how well the sample fraction of improperly labeled packages

approximates the fraction of improperly labeled packages for the

whole day’s output.

2. Problem: Is there a relationship between quitting smoking and

gaining weight? To investigate the claim that people who quit

smoking often experience a subsequent weight gain, researchers

selected a random sample of 400 participants who had successfully

participated in programs to quit smoking. The individuals were

weighed at the beginning of the program and again 1 year later.

The average change in weight of the participants was an increase of

5 pounds. The investigators concluded that there was evidence that

the claim was valid. We will develop techniques in later chapters to

assess when changes are truly significant changes and not changes

due to random chance.

3. Problem: What effect does nitrogen fertilizer have on wheat production?

For a study of the effects of nitrogen fertilizer on wheat production,

a total of 15 fields was available to the researcher. She randomly

assigned three fields to each of the five nitrogen rates under investigation. The same variety of wheat was planted in all 15 fields. The

fields were cultivated in the same manner until harvest, and the

number of pounds of wheat per acre was then recorded for each of

the 15 fields. The experimenter wanted to determine the optimal

level of nitrogen to apply to any wheat field, but, of course, she was

limited to running experiments on a limited number of fields. After

determining the amount of nitrogen that yielded the largest production of wheat in the study fields, the experimenter then concluded

that similar results would hold for wheat fields possessing characteristics somewhat the same as the study fields. Is the experimenter

justified in reaching this conclusion?

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

1.1

Introduction

5

4. Problem: Determining public opinion toward a question, issue,

product, or candidate. Similar applications of statistics are brought

to mind by the frequent use of the New York Times/CBS News,

Washington Post/ABC News, Wall Street Journal/NBC News, Harris,

Gallup/Newsweek, and CNN/Time polls. How can these pollsters

determine the opinions of more than 195 million Americans who are

of voting age? They certainly do not contact every potential voter in

the United States. Rather, they sample the opinions of a small number of potential voters, perhaps as few as 1,500, to estimate the reaction of every person of voting age in the country. The amazing result

of this process is that if the selection of the voters is done in an unbiased way and voters are asked unambiguous, nonleading questions,

the fraction of those persons contacted who hold a particular opinion

will closely match the fraction in the total population holding that

opinion at a particular time. We will supply convincing supportive

evidence of this assertion in subsequent chapters.

These problems illustrate the four-step process in Learning from Data.

First, there was a problem or question to be addressed. Next, for each problem a study or experiment was proposed to collect meaningful data to solve the

problem. The government meat inspection agency had to decide both how many

packages to inspect per day and how to select the sample of packages from the

total daily output in order to obtain a valid prediction. The polling groups had to

decide how many voters to sample and how to select these individuals in order

to obtain information that is representative of the population of all voters. Similarly, it was necessary to carefully plan how many participants in the weight-gain

study were needed and how they were to be selected from the list of all such

participants. Furthermore, what variables did the researchers have to measure

on each participant? Was it necessary to know each participant’s age, sex, physical fitness, and other health-related variables, or was weight the only important

variable? The results of the study may not be relevant to the general population

if many of the participants in the study had a particular health condition. In the

wheat experiment, it was important to measure both the soil characteristics of

the fields and the environmental conditions, such as temperature and rainfall, to

obtain results that could be generalized to fields not included in the study. The

design of a study or experiment is crucial to obtaining results that can be generalized beyond the study.

Finally, having collected, summarized, and analyzed the data, it is important

to report the results in unambiguous terms to interested people. For the meat

inspection example, the government inspection agency and the personnel in the

beef-processing plant would need to know the distribution of fat content in the

daily production of ground beef. Based on this distribution, the agency could then

impose fines or take other remedial actions against the production facility. Also,

knowledge of this distribution would enable company production personnel to

make adjustments to the process in order to obtain acceptable fat content in their

ground beef packages. Therefore, the results of the statistical analyses cannot

be presented in ambiguous terms; decisions must be made from a well-deﬁned

knowledge base. The results of the weight-gain study would be of vital interest to

physicians who have patients participating in the smoking-cessation program. If

a signiﬁcant increase in weight was recorded for those individuals who had quit

smoking, physicians would have to recommend diets so that the former smokers

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

6

Chapter 1

Statistics and the Scientific Method

FIGURE 1.2

Population and sample

Set of all measurements:

the population

Set of measurements

selected from the

population:

the sample

population

sample

would not go from one health problem (smoking) to another (elevated blood

pressure due to being overweight). It is crucial that a careful description of the

participants—that is, age, sex, and other health-related information—be included

in the report. In the wheat study, the experiment would provide farmers with

information that would allow them to economically select the optimum amount of

nitrogen required for their ﬁelds. Therefore, the report must contain information

concerning the amount of moisture and types of soils present on the study ﬁelds.

Otherwise, the conclusions about optimal wheat production may not pertain to

farmers growing wheat under considerably different conditions.

To infer validly that the results of a study are applicable to a larger group

than just the participants in the study, we must carefully deﬁne the population

(see Deﬁnition 1.1) to which inferences are sought and design a study in which the

sample (see Deﬁnition 1.2) has been appropriately selected from the designated

population. We will discuss these issues in Chapter 2.

DEFINITION 1.1

A population is the set of all measurements of interest to the sample collector.

(See Figure 1.2.)

DEFINITION 1.2

A sample is any subset of measurements selected from the population.

(See Figure 1.2.)

1.2

Why Study Statistics?

We can think of many reasons for taking an introductory course in statistics. One

reason is that you need to know how to evaluate published numerical facts. Every

person is exposed to manufacturers’ claims for products; to the results of sociological, consumer, and political polls; and to the published results of scientiﬁc

research. Many of these results are inferences based on sampling. Some inferences are valid; others are invalid. Some are based on samples of adequate size;

others are not. Yet all these published results bear the ring of truth. Some people (particularly statisticians) say that statistics can be made to support almost

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

1.2

Why Study Statistics?

7

anything. Others say it is easy to lie with statistics. Both statements are true. It

is easy, purposely or unwittingly, to distort the truth by using statistics when

presenting the results of sampling to the uninformed. It is thus crucial that you

become an informed and critical reader of data-based reports and articles.

A second reason for studying statistics is that your profession or employment

may require you to interpret the results of sampling (surveys or experimentation)

or to employ statistical methods of analysis to make inferences in your work. For

example, practicing physicians receive large amounts of advertising describing

the beneﬁts of new drugs. These advertisements frequently display the numerical

results of experiments that compare a new drug with an older one. Do such data

really imply that the new drug is more effective, or is the observed difference in

results due simply to random variation in the experimental measurements?

Recent trends in the conduct of court trials indicate an increasing use of

probability and statistical inference in evaluating the quality of evidence. The use

of statistics in the social, biological, and physical sciences is essential because all

these sciences make use of observations of natural phenomena, through sample

surveys or experimentation, to develop and test new theories. Statistical methods

are employed in business when sample data are used to forecast sales and proﬁt.

In addition, they are used in engineering and manufacturing to monitor product

quality. The sampling of accounts is a useful tool to assist accountants in conducting audits. Thus, statistics plays an important role in almost all areas of science,

business, and industry; persons employed in these areas need to know the basic

concepts, strengths, and limitations of statistics.

The article “What Educated Citizens Should Know About Statistics and Probability,” by J. Utts (2003), contains a number of statistical ideas that need to be

understood by users of statistical methodology in order to avoid confusion in the

use of their research findings. Misunderstandings of statistical results can lead to

major errors by government policymakers, medical workers, and consumers of this

information. The article selected a number of topics for discussion. We will summarize some of the findings in the article. A complete discussion of all these topics

will be given throughout the book.

1. One of the most frequent misinterpretations of statistical findings

is when a statistically significant relationship is established between

two variables and it is then concluded that a change in the explanatory variable causes a change in the response variable. As will be

discussed in the book, this conclusion can be reached only under

very restrictive constraints on the experimental setting. Utts examined a recent Newsweek article discussing the relationship between

the strength of religious beliefs and physical healing. Utts’ article

discussed the problems in reaching the conclusion that the stronger

a patient’s religious beliefs, the more likely the patient would be

cured of his or her ailment. Utts showed that there are numerous

other factors involved in a patient’s health and the conclusion that

religious beliefs cause a cure cannot be validly reached.

2. A common confusion in many studies is the difference between

(statistically) significant findings in a study and (practically) significant findings. This problem often occurs when large data sets are

involved in a study or experiment. This type of problem will be discussed in detail throughout the book. We will use a number of examples that will illustrate how this type of confusion can be avoided by

researchers when reporting the findings of their experimental results.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

8

Chapter 1

Statistics and the Scientific Method

Utts’ article illustrated this problem with a discussion of a study that

found a statistically significant difference in the average heights of

military recruits born in the spring and in the fall. There were 507,125

recruits in the study and the difference in average height was about

1/4 inch. So, even though there may be a difference in the actual average heights of recruits in the spring and the fall, the difference is so

small (1/4 inch) that it is of no practical importance.

3. The size of the sample also may be a determining factor in studies

in which statistical significance is not found. A study may not have

selected a sample size large enough to discover a difference between

the several populations under study. In many government-sponsored

studies, the researchers do not receive funding unless they are able

to demonstrate that the sample sizes selected for their study are of

an appropriate size to detect specified differences in populations if

in fact they exist. Methods to determine appropriate sample sizes

will be provided in the chapters on hypotheses testing and experimental design.

4. Surveys are ubiquitous, especially during the years in which national

elections are held. In fact, market surveys are nearly as widespread

as political polls. There are many sources of bias that can creep

into the most reliable of surveys. The manner in which people are

selected for inclusion in the survey, the way in which questions are

phrased, and even the manner in which questions are posed to the

subject may affect the conclusions obtained from the survey. We will

discuss these issues in Chapter 2.

5. Many students find the topic of probability to be very confusing. One

of these confusions involves conditional probability where the probability of an event occurring is computed under the condition that a

second event has occurred with certainty. For example, a new diagnostic test for the pathogen Escherichia coli in meat is proposed to

the U.S. Department of Agriculture (USDA). The USDA evaluates

the test and determines that the test has both a low false positive rate

and a low false negative rate. That is, it is very unlikely that the test

will declare the meat contains E. coli when in fact it does not contain

E. coli. Also, it is very unlikely that the test will declare the meat does

not contain E. coli when in fact it does contain E. coli. Although the

diagnostic test has a very low false positive rate and a very low false

negative rate, the probability that E. coli is in fact present in the meat

when the test yields a positive test result is very low for those situations in which a particular strain of E. coli occurs very infrequently.

In Chapter 4, we will demonstrate how this probability can be computed in order to provide a true assessment of the performance of a

diagnostic test.

6. Another concept that is often misunderstood is the role of the degree

of variability in interpreting what is a “normal” occurrence of some

naturally occurring event. Utts’ article provided the following example. A company was having an odor problem with its wastewater

treatment plant. It attributed the problem to “abnormal” rainfall during the period in which the odor problem was occurring. A company

official stated that the facility experienced 170% to 180% of its

“normal” rainfall during this period, which resulted in the water in

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

1.3

Some Current Applications of Statistics

9

the holding ponds t aking longer to exit for irrigation. Thus, there was

more time for the pond to develop an odor. The company official did

not point out that yearly rainfall in this region is extremely variable.

In fact, the historical range for rainfall is between 6.1 and 37.4 inches

with a median rainfall of 16.7 inches. The rainfall for the year of the

odor problem was 29.7 inches, which was well within the “normal”

range for rainfall. There was a confusion between the terms “average” and “normal” rainfall. The concept of natural variability is crucial to correct interpretation of statistical results. In this example, the

company official should have evaluated the percentile for an annual

rainfall of 29.7 inches in order to demonstrate the abnormality of

such a rainfall. We will discuss the ideas of data summaries and percentiles in Chapter 3.

The types of problems expressed above and in Utts’ article represent common

and important misunderstandings that can occur when researchers use statistics in

interpreting the results of their studies. We will attempt throughout the book to discuss possible misinterpretations of statistical results and how to avoid them in your

data analyses. More importantly, we want the reader of this book to become a discriminating reader of statistical findings, the results of surveys, and project reports.

1.3

Some Current Applications of Statistics

Defining the Problem: Obtaining Information

from Massive Data Sets

Data mining is defined to be a process by which useful information is obtained

from large sets of data. Data mining uses statistical techniques to discover patterns

and trends that are present in a large data set. In most data sets, important patterns

would not be discovered by using traditional data exploration techniques because

the types of relationships between the many variables in the data set are either too

complex or because the data sets are so large that they mask the relationships.

The patterns and trends discovered in the analysis of the data are defined

as data mining models. These models can be applied to many different situations,

such as:

●● Forecasting: Estimating future sales, predicting demands on a power

grid, or estimating server downtime

●● Assessing risk: Choosing the rates for insurance premiums, selecting

best customers for a new sales campaign, determining which medical

therapy is most appropriate given the physiological characteristics of

the patient

●● Identifying sequences: Determining customer preferences in online

purchases, predicting weather events

●● Grouping: Placing customers or events into cluster of related items,

analyzing and predicting relationships between demographic characteristics and purchasing patterns, identifying fraud in credit card

purchases

A new medical procedure referred to as gene editing has the potential to

assist thousands of people suffering many different diseases. An article in the

Houston Chronicle (2013 ), describes how data mining techniques are used to

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10

Chapter 1

Statistics and the Scientific Method

explore massive genomic data bases to interpret millions of bits of data in a person’s DNA. This information is then used to identify a single defective gene,

which is cut out, and splice in a correction. This area of research is referred to as

biomedical informatics and is based on the premise that the human body is a data

bank of incredible depth and complexity. It is predicted that by 2015, the average

hospital will have approximately 450 terabytes of patient data consisting of large,

complex images from CT scans, MRIs, and other imaging techniques. However,

only a small fraction of the current medical data has been analyzed, thus opening

huge opportunities for persons trained in data mining. In a case described in the

article, a 7-year-old boy tormented by scabs, blisters, and scars was given a new

lease on life by using data mining techniques to discover a single letter in his faulty

genome.

Defining the Problem: Determining the Effectiveness

of a New Drug Product

The development and testing of the Salk vaccine for protection against poliomyelitis (polio) provide an excellent example of how statistics can be used in solving

practical problems. Most parents and children growing up before 1954 can recall

the panic brought on by the outbreak of polio cases during the summer months.

Although relatively few children fell victim to the disease each year, the pattern

of outbreak of polio was unpredictable and caused great concern because of the

possibility of paralysis or death. The fact that very few of today’s youth have even

heard of polio demonstrates the great success of the vaccine and the testing program that preceded its release on the market.

It is standard practice in establishing the effectiveness of a particular drug product to conduct an experiment (often called a clinical trial) with human participants.

For some clinical trials, assignments of participants are made at random, with half

receiving the drug product and the other half receiving a solution or tablet that does

not contain the medication (called a placebo). One statistical problem concerns the

determination of the total number of participants to be included in the clinical trial.

This problem was particularly important in the testing of the Salk vaccine because

data from previous years suggested that the incidence rate for polio might be less

than 50 cases for every 100,000 children. Hence, a large number of participants had

to be included in the clinical trial in order to detect a difference in the incidence rates

for those treated with the vaccine and those receiving the placebo.

With the assistance of statisticians, it was decided that a total of 400,000

children should be included in the Salk clinical trial begun in 1954, with half of them

randomly assigned the vaccine and the remaining children assigned the placebo. No

other clinical trial had ever been attempted on such a large group of participants.

Through a public school inoculation program, the 400,000 participants were treated

and then observed over the summer to determine the number of children contracting polio. Although fewer than 200 cases of polio were reported for the 400,000

participants in the clinical trial, more than three times as many cases appeared in

the group receiving the placebo. These results, together with some statistical calculations, were sufﬁcient to indicate the effectiveness of the Salk polio vaccine.

However, these conclusions would not have been possible if the statisticians and

scientists had not planned for and conducted such a large clinical trial.

The development of the Salk vaccine is not an isolated example of the use

of statistics in the testing and development of drug products. In recent years,

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

1.3

Some Current Applications of Statistics

11

the U.S. Food and Drug Administration (FDA) has placed stringent requirements

on pharmaceutical ﬁrms wanting to establish the effectiveness of proposed new

drug products. Thus, statistics has played an important role in the development

and testing of birth control pills, rubella vaccines, chemotherapeutic agents in the

treatment of cancer, and many other preparations.

Defining the Problem: lmproving the Reliability

of Evidence in Criminal Investigations

The National Academy of Sciences released a report (National Research Council,

2009) in which one of the more important findings was the need for applying statistical methods in the design of studies used to evaluate inferences from evidence

gathered by forensic technicians. The following statement is central to the report:

“Over the last two decades, advances in some forensic science disciplines, especially the use of DNA technology, have demonstrated that some areas of forensic science have great additional potential to help law enforcement identify

criminals. . . . Those advances, however, also have revealed that, in some cases,

substantive information and testimony based on faulty forensic science analyses may have contributed to wrongful convictions of innocent people. This fact

has demonstrated the potential danger of giving undue weight to evidence and

testimony derived from imperfect testing and analysis.”

There are many sources that may impact the accuracy of conclusions inferred

from the crime scene evidence and presented to a jury by a forensic investigator.

Statistics can play a role in improving forensic analyses. Statistical principles can

be used to identify sources of variation and quantify the size of the impact that

these sources of variation can have on the conclusions reached by the forensic

investigator.

An illustration of the impact of an inappropriately designed study and

statistical analysis on the conclusions reached from the evidence obtained at

a crime scene can be found in Spiegelman et al. (2007). They demonstrate that

the evidence used by the FBI crime lab to support the claim that there was not

a second assassin of President John F. Kennedy was based on a faulty analysis

of the data and an overstatement of the results of a method of forensic testing

called Comparative Bullet Lead Analysis (CBLA). This method applies a chemical analysis to link a bullet found at a crime scene to the gun that had discharged

the bullet. Based on evidence from chemical analyses of the recovered bullet fragments, the 1979 U.S. House Select Committee on Assassinations concluded that all

the bullets striking President Kennedy were fired from Lee Oswald’s rifle. A new

analysis of the bullets using more appropriate statistical analyses demonstrated

that the evidence presented in 1979 was overstated. A case is presented for a new

analysis of the assassination bullet fragments, which may shed light on whether the

five bullet fragments found in the Kennedy assassination are derived from three or

more bullets and not just two bullets, as was presented as the definitive evidence

that Oswald was the sole shooter in the assassination of President Kennedy.

Defining the Problem: Estimating Bowhead Whale

Population Size

Raftery and Zeh (1998) discuss the estimation of the population size and rate of

increase in bowhead whales, Balaena mysticetus. The importance of such a study

derives from the fact that bowheads were the ﬁrst species of great whale for

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

12

Chapter 1

Statistics and the Scientific Method

which commercial whaling was stopped; thus, their status indicates the recovery

prospects of other great whales. Also, the International Whaling Commission

uses these estimates to determine the aboriginal subsistence whaling quota for

Alaskan Eskimos. To obtain the necessary data, researchers conducted a visual

and acoustic census off Point Barrow, Alaska. The researchers then applied statistical models and estimation techniques to the data obtained in the census to

determine whether the bowhead population had increased or decreased since

commercial whaling was stopped. The statistical estimates showed that the

bowhead population was increasing at a healthy rate, indicating that stocks of

great whales that have been decimated by commercial hunting can recover after

hunting is discontinued.

Defining the Problem: Ozone Exposure

and Population Density

Ambient ozone pollution in urban areas is one of the nation’s most pervasive environmental problems. Whereas the decreasing stratospheric ozone layer may lead

to increased instances of skin cancer, high ambient ozone intensity has been shown

to cause damage to the human respiratory system as well as to agricultural crops

and trees. The Houston, Texas, area has ozone concentrations and are rated second only to those of Los Angeles. that exceed the National Ambient Air Quality

Standard. Carroll et al. (1997) describe how to analyze the hourly ozone measurements collected in Houston from 1980 to 1993 by 9 to 12 monitoring stations.

Besides the ozone level, each station recorded three meteorological variables:

temperature, wind speed, and wind direction.

The statistical aspect of the project had three major goals:

1. Provide information (and/or tools to obtain such information)

about the amount and pattern of missing data as well as about the

quality of the ozone and the meteorological measurements.

2. Build a model of ozone intensity to predict the ozone concentration

at any given location within Houston at any given time between 1980

and 1993.

3. Apply this model to estimate exposure indices that account for

either a long-term exposure or a short-term high-concentration

exposure; also, relate census information to different exposure

indices to achieve population exposure indices.

The spatial–temporal model the researchers built provided estimates demonstrating that the highest ozone levels occurred at locations with relatively small

populations of young children. Also, the model estimated that the exposure of

young children to ozone decreased by approximately 20% from 1980 to 1993. An

examination of the distribution of population exposure had several policy implications. In particular, it was concluded that the current placement of monitors

is not ideal if one is concerned with assessing population exposure. This project

involved all four components of Learning from Data: planning where the monitoring stations should be placed within the city, how often the data should be

collected, and what variables should be recorded; conducting spatial–temporal

graphing of the data; creating spatial–temporal models of the ozone data, meteorological data, and demographic data; and, ﬁnally, writing a report that could

assist local and federal ofﬁcials in formulating policy with respect to decreasing

ozone levels.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

1.5

Summary

13

Defining the Problem: Assessing Public Opinion

Public opinion, consumer preference, and election polls are commonly used to

assess the opinions or preferences of a segment of the public regarding issues,

products, or candidates of interest. We, the American public, are exposed to the

results of these polls daily in newspapers, in magazines, on the internet, on the

radio, and on television. For example, the results of polls related to the following

subjects were printed in local newspapers:

●● Public confidence in the potential for job growth in the coming year

●● Reactions of Texas residents to the state legislature’s failure to expand

Medicaid coverage

●● Voters’ preferences for tea party candidates in the fall congressional

elections

●● Attitudes toward increasing the gasoline tax in order to increase

funding for road construction and maintenance

●● Product preference polls related to specific products (Toyota vs. Ford,

DirecTV vs. Comcast, Dell vs. Apple, Subway vs. McDonald’s)

●● Public opinion on a national immigration policy

A number of questions can be raised about polls. Suppose we consider a poll

on the public’s opinion on a proposed income tax increase in the state of Michigan.

What was the population of interest to the pollster? Was the pollster interested in

all residents of Michigan or just those citizens who currently pay income taxes?

Was the sample in fact selected from this population? If the population of interest

was all persons currently paying income taxes, did the pollster make sure that all

the individuals sampled were current taxpayers? What questions were asked and

how were the questions phrased? Was each person asked the same question? Were

the questions phrased in such a manner as to bias the responses? Can we believe

the results of these polls? Do these results “represent’’ how the general public

currently feels about the issues raised in the polls?

Opinion and preference polls are an important, visible application of statistics for the consumer. We will discuss this topic in more detail in Chapters 2 and

10. We hope that after studying this material you will have a better understanding

of how to interpret the results of these polls.

1.4 A Note to the Student

We think with words and concepts. A study of the discipline of statistics requires

us to memorize new terms and concepts (as does the study of a foreign language).

Commit these deﬁnitions, theorems, and concepts to memory.

Also, focus on the broader concept of making sense of data. Do not let details

obscure these broader characteristics of the subject. The teaching objective of this

text is to identify and amplify these broader concepts of statistics.

1.5

Summary

The discipline of statistics and those who apply the tools of that discipline deal

with Learning from Data. Medical researchers, social scientists, accountants,

agronomists, consumers, government leaders, and professional statisticians are all

involved with data collection, data summarization, data analysis, and the effective

communication of the results of data analysis.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

14

Chapter 1

Statistics and the Scientific Method

1.6

Exercises

1.1

Introduction

Bio.

1.1 H

ansen (2006) describes a study to assess the migration and survival of salmon released

from fish farms located in Norway. The mingling of escaped farmed salmon with wild salmon

raises several concerns. First, the assessment of the abundance of wild salmon stocks will be

biased if there is a presence of large numbers of farmed salmon. Second, potential interbreeding between farmed and wild salmon may result in a reduction in the health of the wild stocks.

Third, diseases present in farmed salmon may be transferred to wild salmon. Two batches of

farmed salmon were tagged and released in two locations, one batch of 1,996 fish in northern

Norway and a second batch of 2,499 fish in southern Norway. The researchers recorded the

time and location at which the fish were captured by either commercial fisherman or anglers

in fresh water. Two of the most important pieces of information to be determined by the

study were the distance from the point of the fish’s release to the point of its capture and the

length of time it took for the fish to be captured.

a. Identify the population that is of interest to the researchers.

b. Describe the sample.

c. What characteristics of the population are of interest to the researchers?

d. If the sample measurements are used to make inferences about the population

characteristics, why is a measure of reliability of the inferences important?

Env.

1.2

Soc.

1.3 In 2014, Congress cut $8.7 billion from the Supplemental Nutrition Assistance Program

(SNAP), more commonly referred to as food stamps. The rationale for the decrease is that

providing assistance to people will result in the next generation of citizens being more dependent on the government for support. Hoynes (2012) describes a study to evaluate this claim. The

study examines 60,782 families over the time period of 1968 to 2009 which is subsequent to the

introduction of the Food Stamp Program in 1961. This study examines the impact of a positive and policy-driven change in economic resources available in utero and during childhood

on the economic health of individuals in adulthood. The study assembled data linking family

background in early childhood to adult health and economic outcomes. The study concluded

that the Food Stamp Program has effects decades after initial exposure. Specifically, access

to food stamps in childhood leads to a significant reduction in the incidence of metabolic

syndrome (obesity, high blood pressure, and diabetes) and, for women, an increase in economic self-sufficiency. Overall, the results suggest substantial internal and external benefits

of SNAP.

a. Identify the population that is of interest to the researchers.

b. Describe the sample.

c. What characteristics of the population are of interest to the researchers?

d. If the sample measurements are used to make inferences about the population

characteristics, why is a measure of reliability of the inferences important?

During 2012, Texas had listed on FracFocus, an industry fracking disclosure site, nearly

6,000 oil and gas wells in which the fracking methodology was used to extract natural gas.

Fontenot et al. (2013 ) reports on a study of 100 private water wells in or near the Barnett Shale

in Texas. There were 91 private wells located within 5 km of an active gas well using fracking, 4

private wells with no gas wells located within a 14 km radius, and 5 wells outside of the Barnett

Shale with no gas well located with a 60 km radius. They found that there were elevated levels

of potential contaminants such as arsenic and selenium in the 91 wells closest to natural gas

extraction sites compared to the 9 wells that were at least 14 km away from an active gas well

using the £racking technique to extract natural gas.

a. Identify the population that is of interest to the researchers.

b. Describe the sample.

c. What characteristics of the population are of interest to the researchers?

d. If the sample measurements are used to make inferences about the population

characteristics, why is a measure of reliability of the inferences important?

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

1.6

Exercises

15

Med.

1.4 Of all sports, football accounts for the highest incidence of concussion in the United States

due to the large number of athletes participating and the nature of the sport. While there is general agreement that concussion incidence can be reduced by making rule changes and teaching

proper tackling technique, there remains debate as to whether helmet design may also reduce the

incidence of concussion. Rowson et al. (2014) report on a retrospective analysis of head impact

data collected between 2005 and 2010 from eight collegiate football teams. Concussion rates for

players wearing two types of helmets, Riddell VSR4 and Riddell Revolution, were compared. A

total of 1,281,444 head impacts were recorded, from which 64 concussions were diagnosed. The

relative risk of sustaining a concussion in a Revolution helmet compared with a VSR4 helmet

was 46.1%. This study illustrates that differences in the ability to reduce concussion risk exist

between helmet models in football. Although helmet design may never prevent all concussions

from occurring in football, evidence illustrates that it can reduce the incidence of this injury.

a. Identify the population that is of interest to the researchers.

b. Describe the sample.

c. What characteristics of the population are of interest to the researchers?

d. If the sample measurements are used to make inferences about the population

characteristics, why is a measure of reliability of the inferences important?

Pol. Sci.

1.5 During the 2004 senatorial campaign in a large southwestern state, illegal immigration was

a major issue. One of the candidates argued that illegal immigrants made use of educational

and social services without having to pay property taxes. The other candidate pointed out that

the cost of new homes in their state was 20–30% less than the national average due to the low

wages received by the large number of illegal immigrants working on new home construction. A

random sample of 5,500 registered voters was asked the question, “Are illegal immigrants generally a benefit or a liability to the state’s economy?” The results were as follows: 3,500 people

responded “liability,” 1,500 people responded “benefit,” and 500 people responded “uncertain.”

a. What is the population of interest?

b. What is the population from which the sample was selected?

c. Does the sample adequately represent the population?

d. If a second random sample of 5,000 registered voters was selected, would the

results be nearly the same as the results obtained from the initial sample of

5,000 voters? Explain your answer.

Edu.

1.6 An American history professor at a major university was interested in knowing the history

literacy of college freshmen. In particular, he wanted to find what proportion of college freshmen

at the university knew which country controlled the original 13 colonies prior to the American

Revolution. The professor sent a questionnaire to all freshman students enrolled in HIST 101 and

received responses from 318 students out of the 7,500 students who were sent the questionnaire.

One of the questions was “What country controlled the original 13 colonies prior to the American

Revolution?”

a. What is the population of interest to the professor?

b. What is the sampled population?

c. Is there a major difference in the two populations. Explain your answer.

d. Suppose that several lectures on the American Revolution had been given in

HIST 101 prior to the students receiving the questionnaire. What possible source

of bias has the professor introduced into the study relative to the population of

interest?

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affe…

Don't use plagiarized sources. Get Your Custom Essay on

Help with Homework

Just from $13/Page

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

- Free title page and bibliography
- Unlimited revisions
- Plagiarism-free guarantee
- Money-back guarantee
- 24/7 support

On-demand options

- Writer’s samples
- Part-by-part delivery
- Overnight delivery
- Copies of used sources
- Expert Proofreading

Paper format

- 275 words per page
- 12 pt Arial/Times New Roman
- Double line spacing
- Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Delivering a high-quality product at a reasonable price is not enough anymore.

That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
## Recent Comments