Home » Help with Homework

# Help with Homework

Stat 350A: Chapter 4.8
Part 1: The Binomial Distribution
EXAMPLE 1: Suppose a student takes a quiz that consists of four multiple choice questions. Each
question has ve options. Unfortunately, the student did not review for the quiz and has to randomly
guess on each question (assume each guess is independent of one another). Let Y = number of
questions the student guesses correctly.
(A) Calculate P(Y = 0)
(B) Calculate P(Y = 5)
(C) Calculate P(Y = 2)
• For the above example, what if the number of questions was 20?
The Binomial Distribution
• Properties of the Binomial:
◦1) There are two possible outcomes for each trial: “success” and “failure”
◦2) There are a xed number (n) of trials or observations
◦3) Outcomes of the trials are independent of each other. If sampling without replacement,
then sample size should be less than 10% of the population size.
◦4) The Probability of Success is π and is the same for each observation. The Probability of
Failure is 1 – π.
• A binomial random variable, Y, is the count of successful outcomes of a binomial experiment.
• Binomial Distribution: The distribution of the count Y of successes in a binomial experiment with
parameters n and π. The distribution of this random variable Y is described as follows:
Y ~ Bin(n, π)
◦NOTE: Bernoulli = Bin(1, 0.5)
• Binomial Probabilities
◦If Y has a binomial distribution with n independent trials and probability π of success and the
possible values of Y = 0, 1, 2, …, n, then the binomial probability of Y = y successes is
y
n-y
P(Y = y) = nCy • π • (1-π)
◦The number of ways of arranging y successes among n trials is the binomial coe cient:
n!
=
y! (n – y)!
Part 2: Binomial Examples
EXAMPLE 1: A San Diego interviewer has found that one in ve of the people approached will gree
to take part in a survey on drug abuse. An interviewer approaches 7 people at random.
(A) Describe the distribution of Y, the number that will agree to take part in the survey.
(B) Find the probability that exactly four will agree to an interview.
(C) Find the probability that at most 2 will agree to an interview.
EXAMPLE 2: A marketing study found that twenty- ve percent of households own a gaming
console. Suppose we randomly select 5 households at random
(A) Describe the distribution of Y, the number of households that own a gaming console.
(B) What is the probability that exactly 3 of the 5 households own a gaming console.
(C) What is the probability that at least 2 of the 5 households selected will own a gaming console?
Part 3: Mean and Variance of the Binomial Distribution
• For a random variable Y, such that Y ~ Bin(n, π), then
‣ Mean = μ y = nπ
2
‣ Variance = σ y= nπ(1-π)
‣ Standard Deviation = σ = √nπ(1-π)
EXAMPLE 1: Suppose that 18% of movie-goers say that horror is their favorite genre. If a random
sample of 70 movie-goers is selected, what are the (a) mean, (b) variance, and (c) standard deviation
for the number of movie-goers that prefer horror?
EXAMPLE 2: Compute the mean and standard deviation of the Bernoulli distribution.
Stat 350A: Chapters 4.9-4.10
Part 1: Continuous Distributions
• A random variable is said to be continuous if the outcomes within any given interval of this
variable are not countable.
• Examples: Weight, height, income, GPA, speed of a car, time sleep per night, etc.
• For continuous variables, we cannot construct a probability distribution like we did for discrete
• Instead, we use the area under a density curve to calculate the probability of a random variable
taking values in a speci c interval. The probability density function for a continuous random
variable Y is a curve such that
◦1) The area under the curve for a particular interval, (a, b), is the probability that Y is in that
interval, i.e. P(a < Y < b). ◦2) P(Y = a) = 0 for any a. ◦3) The total area under the curve is one. Part 2: The Standard Normal Distribution • The standard normal distribution: A normal distribution whose mean is 0 and standard deviation is 1, denoted as Z ~ N(0, 1), i.e., Z-scores. • Bell-shaped and symmetric about 0. Thus, P(Z < -a) = P(Z > a).
• Areas under the standard normal distribution can be found in the Z Table (found on end of
package).
• The range of Z-scores is from -∞ < Z < ∞. Part 3: Finding Areas and Percentiles Under the Standard Normal Curve • Finding areas given a Z-score: EXAMPLE: Use the Z Table to nd the following: Draw the picture rst, shade the region you want, and look up the Z-score in the Z Table corresponding to that region. (A) P(Z < -0.89) (B) P(Z > 1.53)
Factorial Treatment Structure in a Randomized Complete
Block Design  889
A Nonparametric Alternative—Friedman’s Test   893
Research Study: Control of Leatherjackets   897
Summary and Key Formulas   902
Exercises  904
The Analysis of Covariance
16.1
16.2
16.3
16.4
16.5
CHAPTER 17
Introduction and Abstract of Research Study   798
Completely Randomized Design with a Single Factor   800
Factorial Treatment Structure  805
Factorial Treatment Structures with an Unequal Number
of Replications  830
Estimation of Treatment Differences and Comparisons
of Treatment Means  837
Determining the Number of Replications    841
Research Study: Development of a Low-Fat Processed Meat   846
Summary and Key Formulas   851
Exercises  852
Analysis of Variance for Blocked Designs
15.1
15.2
15.3
15.4
CHAPTER 16
Research Study: Construction Costs for Nuclear Power Plants   765
Summary and Key Formulas   772
Exercises  773
917
Introduction and Abstract of Research Study   917
A Completely Randomized Design with One Covariate   920
The Extrapolation Problem   931
Multiple Covariates and More Complicated Designs   934
Research Study: Evaluation of Cool-Season Grasses for Putting
Greens  936
Summary  942
Exercises  942
Analysis of Variance for Some Fixed-, Random-,
and Mixed-Effects Models 952
17.1
17.2
17.3
17.4
17.5
Introduction and Abstract of Research Study   952
A One-Factor Experiment with Random Treatment Effects   955
Extensions of Random-Effects Models   959
Mixed-Effects Models  967
Rules for Obtaining Expected Mean Squares   971
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
ix
x
Contents
17.6
17.7
17.8
17.9
CHAPTER 18
Split-Plot, Repeated Measures,
and Crossover Designs 1004
18.1
18.2
18.3
18.4
18.5
18.6
18.7
18.8
CHAPTER 19
Nested Factors  981
Research Study: Factors Affecting Pressure Drops
Across Expansion Joints   986
Summary  991
Exercises  992
Introduction and Abstract of Research Study   1004
Split-Plot Designed Experiments   1008
Single-Factor Experiments with Repeated Measures   1014
Two-Factor Experiments with Repeated Measures on
One of the Factors   1018
Crossover Designs  1025
Research Study: Effects of an Oil Spill on Plant Growth   1033
Summary  1035
Exercises  1035
Analysis of Variance for Some Unbalanced
Designs 1050
19.1
19.2
19.3
19.4
19.5
19.6
19.7
Introduction and Abstract of Research Study   1050
A Randomized Block Design with One or More
Missing Observations  1052
A Latin Square Design with Missing Data   1058
Balanced Incomplete Block (BIB) Designs   1063
Research Study: Evaluation of the Consistency
of Property Assessors  1070
Summary and Key Formulas   1074
Exercises  1075
Appendix: Statistical Tables
References
Index
1085
1125
1151
1157
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PREFACE
INDEX
Intended Audience
An Introduction to Statistical Methods and Data Analysis, Seventh Edition, provides
students from a variety of disciplines. This book is intended to prepare students to
solve problems encountered in research projects, to make decisions based on data
in general settings both within and beyond the university setting, and finally to
become critical readers of statistical analyses in research papers and in news reports.
The book presumes that the students have a minimal mathematical background
(high school algebra) and no prior course work in statistics. The first 11 chapters
of the textbook present the material typically covered in an introductory statistics
course. However, this book provides research studies and examples that connect
the statistical concepts to data analysis problems that are often encountered in
undergraduate capstone courses. The remaining chapters of the book cover regression modeling and design of experiments. We develop and illustrate the statistical
techniques and thought processes needed to design a research study or experiment
and then analyze the data collected using an intuitive and proven four-step approach.
This should be especially helpful to graduate students conducting their MS thesis
and PhD dissertation research.
Major Features of Textbook
Learning from Data
In this text, we approach the study of statistics by considering a four-step process
by which we can learn from data:
1. Defining the Problem
2. Collecting the Data
3. Summarizing the Data
4. Analyzing the Data, Interpreting the Analyses, and Communicating
the ­Results
Case Studies
In order to demonstrate the relevance and critical nature of statistics in solving realworld problems, we introduce the major topic of each chapter using a case study.
The case studies were selected from many sources to illustrate the broad applicability of statistical methodology. The four-step learning from data process is illustrated through the case studies. This approach will hopefully assist in overcoming

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xi
xii
Preface
the natural initial perception held by many people that statistics is just another
“math course.’’ The introduction of major topics through the use of case studies
provides a focus on the central nature of applied statistics in a wide variety of
research and business-related studies. These case studies will hopefully provide the
reader with an enthusiasm for the broad applicability of statistics and the statistical
thought process that the authors have found and used through their many years
of teaching, consulting, and R & D management. The following research studies
­illustrate the types of studies we have used throughout the text.
●● Exit Polls Versus Election Results:
A study of why the exit polls
from 9 of 11 states in the 2004 presidential election predicted John
Kerry as the winner when in fact President Bush won 6 of the 11
states.
●● Evaluation of the Consistency of Property Assessors:   A study to
determine if county property assessors differ systematically in their
determination of property values.
●● Effect of Timing of the Treatment of Port-Wine Stains with Lasers:
A prospective study that investigated whether treatment at a younger
age would yield better results than treatment at an older age.
●● Controlling for Student Background in the Assessment of Teaching:
An examination of data used to support possible improvements to
the No Child Left Behind program while maintaining the important
concepts of performance standards and accountability.
Each of the research studies includes a discussion of the whys and hows of the
study. We illustrate the use of the four-step learning from data process with each
case study. A discussion of sample size determination, graphical displays of the
data, and a summary of the necessary ingredients for a complete report of the statistical findings of the study are provided with many of the case studies.
Examples and Exercises
We have further enhanced the practical nature of statistics by using examples and
exercises from journal articles, newspapers, and the authors’ many consulting
experiences. These will provide the students with further evidence of the practical
usages of statistics in solving problems that are relevant to their everyday lives.
Many new exercises and examples have been included in this edition of the book.
The number and variety of exercises will be a great asset to both the instructor and
students in their study of statistics.
Topics Covered
This book can be used for either a one-semester or a two-semester course. Chapters
1 through 11 would constitute a one-semester course. The topics covered would
­include
Chapter 1—Statistics and the scientific method
Chapter 2—Using surveys and experimental studies to gather data
Chapters 3 & 4—Summarizing data and probability distributions
Chapters 5–7—Analyzing data: inferences about central values and
­variances
Chapters 8 & 9—One-way analysis of variance and multiple
comparisons
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface
xiii
Chapter 10—Analyzing data involving proportions
Chapter 11—Linear regression and correlation
The second semester of a two-semester course would then include model building
and inferences in multiple regression analysis, logistic regression, design of experiments, and analysis of variance:
Chapters 11–13—Regression methods and model building: multiple regression and the general linear model, logistic regression, and building
­regression models with diagnostics
Chapters 14–19—Design of experiments and analysis of variance: design
concepts, analysis of variance for standard designs, analysis of covariance, random and mixed effects models, split-plot designs, repeated
measures ­designs, crossover designs, and unbalanced designs
Emphasis on Interpretation, not Computation
In the book are examples and exercises that allow the student to study how to
­calculate the value of statistical estimators and test statistics using the definitional
form of the procedure. After the student becomes comfortable with the aspects of
the data the statistical procedure is reflecting, we then emphasize the use of computer software in making computations in the analysis of larger data sets. We provide
output from three major statistical packages: SAS, Minitab, and SPSS. We find that
this approach provides the student with the experience of computing the value of the
procedure using the definition; hence, the student learns the basics b
­ ehind each procedure. In most situations beyond the statistics course, the student should be using
computer software in making the computations for both e­ xpedience and quality of
calculation. In many exercises and examples, the use of the computer allows for more
time to emphasize the interpretation of the ­results of the computations without having to expend enormous amounts of time and effort in the ­actual computations.
In numerous examples and exercises, the importance of the following aspects
of hypothesis testing are demonstrated:
1. The statement of the research hypothesis through the summarization
of the researcher’s goals into a statement about population
parameters.
2. The selection of the most appropriate test statistic, including sample
size computations for many procedures.
3. The necessity of considering both Type I and Type II error
rates (a and b) when discussing the results of a statistical test of
hypotheses.
4. The importance of considering both the statistical significance and
the practical significance of a test result. Thus, we illustrate the
importance of estimating effect sizes and the construction of confidence intervals for population parameters.
5. The statement of the results of the statistical test in nonstatistical
jargon that goes beyond the statement ‘‘reject H0’’ or ‘‘fail to
reject H0.’’
New to the Seventh Edition
●● There are instructions on the use of R code. R is a free software package
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xiv
Preface
Click your choice of platform (Linux, MacOS X, or Windows) for the
precompiled binary distribution. Note the FAQs link to the left for
system software (which is all you will need).
●● New examples illustrate the breadth of applications of statistics to
real-world problems.
●● An alternative to the standard deviation, MAD, is provided as a
measure of dispersion in a population/sample.
●● The use of bootstrapping in obtaining confidence intervals and
p-values is discussed.
●● Instructions are included on how to use R code to obtain percentiles
and probabilities from the following distributions: normal, binomial,
Poisson, chi-squared, F, and t.
●● A nonparametric alternative to the Pearson correlation coefficient:
Spearman’s rank correlation, is provided.
●● The binomial test for small sample tests of proportions is presented.
●● The McNemar test for paired count data has been added.
●● The Akaike information criterion and Bayesian information criterion
for variable selection are discussed.
Additional Features Retained from Previous Editions
●● Many practical applications of statistical methods and data analysis
from agriculture, business, economics, education, engineering, medicine, law, political science, psychology, environmental studies, and
sociology have been included.
●● The seventh edition contains over 1,000 exercises, with nearly 400 of
the exercises new.
●● Computer output from Minitab, SAS, and SPSS is provided in
numerous examples. The use of computers greatly facilitates the use
of more sophisticated graphical illustrations of statistical results.
●● Attention is paid to the underlying assumptions. Graphical
procedures and test procedures are provided to determine if assumptions have been violated. Furthermore, in many settings, we provide
alternative procedures when the conditions are not met.
●● The first chapter provides a discussion of “What Is Statistics?” We
provide a discussion of why students should study statistics along with
a discussion of several major studies that illustrate the use of statistics
in the solution of real-life problems.
Ancillaries
Student Solutions Manual (ISBN-10: 1-305-26948-9;
ISBN-13: 978-1-305-26948-4), containing select worked solutions
for problems in the textbook.
l A Companion Website at www.cengage.com/statistics/ott, containing
plus additional resources for students and faculty.
l
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface
xv
Acknowledgments
There are many people who have made valuable, constructive suggestions for
the development of the original manuscript and during the preparation of the
subsequent editions. We are very appreciative of the insightful and constructive
Naveen Bansal, Marquette University
Kameryn Denaro, San Diego State University
Mary Gray, American University
Craig Leth-Steensen, Carleton University
Jing Qian, University of Massachusetts
Mark Riggs, Abilene Christian University
Elaine Spiller, Marquette University
We are also appreciate of the preparation assistance received from Molly Taylor
and Jay Campbell; the scheduling of the revisions by Mary Tindle, the Senior
Project Manager at Cenveo Publisher Services, who made sure that the book
was completed in a timely manner. The authors of the solutions manual, Soma
Roy, California Polytechnic State University, and John Draper, The Ohio State
University, provided me with excellent input which resulted in an improved set of
exercises for the seventh edition. The person who assisted me the greatest degree
in the preparation of the seventh edition, was Sherry Goldbecker, the copy editor.
Sherry not only corrected my many grammatical errors but also provided rephrasing of many sentences which made for a more straight forward explanation of statistical concepts. The students, who use this book in their statistics classes, will be
most appreciative of Sherry’s many contributions.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PART
1
Introduction
Chapter 1
St atistic s a nd the Sc ientific Method
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 1
1.1
Introduction
1.2
Why Study Statistics?
1.3
Some Current
Applications of Statistics
1.4 A Note to the Student
Statistics and
the Scientific
Method
1.1
1.5
Summary
1.6 Exercises
Introduction
Statistics is the science of designing studies or experiments, collecting data, and
modeling/analyzing data for the purpose of decision making and scientific discovery when the available information is both limited and variable. That is, statistics is
the science of Learning from Data.
Almost everyone, including social scientists, medical researchers, superintendents of public schools, corporate executives, market researchers, engineers,
government employees, and consumers, deals with data. These data could be in the
form of quarterly sales ﬁgures, percent increase in juvenile crime, contamination
levels in water samples, survival rates for patients undergoing medical therapy,
census ﬁgures, or information that helps determine which brand of car to purchase.
In this text, we approach the study of statistics by considering the four-step process
in Learning from Data: (1) defining the problem, (2) collecting the data, (3) summarizing the data, and (4) analyzing the data, interpreting the analyses, and communicating the results. Through the use of these four steps in Learning from Data,
our study of statistics closely parallels the Scientific Method, which is a set of principles and procedures used by successful scientists in their p
­ ursuit of knowledge.
The method involves the formulation of research goals, the design of observational
studies and/or experiments, the collection of data, the modeling/analysis of the
data in the context of research goals, and the testing of hypotheses. The conclusion
of these steps is often the formulation of new research goals for a­ nother study.
These steps are illustrated in the schematic given in Figure 1.1.
This book is divided into sections corresponding to the four-step process in
Learning from Data. The relationship among these steps and the chapters of the
book is shown in Table 1.1. As you can see from this table, much time is spent discussing how to analyze data using the basic methods presented in Chapters 5–19.
2
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.1
Introduction
3
FIGURE 1.1
Scientific Method
Schematic
Formulate research goal:
research hypotheses, models
Design study:
sample size, variables,
experimental units,
sampling mechanism
TABLE 1.1
Organization of the text
Formulate new
research goals:
new models,
new hypotheses
Make decisions:
written conclusions,
oral presentations
Collect data:
data management
Draw inferences:
graphs, estimation,
hypotheses testing,
model assessment
The Four-Step Process
Chapters
1 Defining the Problem
2 Collecting the Data
3 Summarizing the Data
4 Analyzing the Data,
Interpreting the Analyses,
and Communicating
the Results
1 Statistics and the Scientific Method
2 Using Surveys and Experimental Studies to Gather Data
3 Data Description
4 Probability and Probability Distributions
5 Inferences about Population Central Values
6 Inferences Comparing Two Population Central Values
8 Inferences about More Than Two Population Central Values
9 Multiple Comparisons
10 Categorical Data
11 Linear Regression and Correlation
12 Multiple Regression and the General Linear Model
13 Further Regression Topics
14 Analysis of Variance for Completely Randomized Designs
15 Analysis of Variance for Blocked Designs
16 The Analysis of Covariance
17 Analysis of Variance for Some Fixed-, Random-, and
Mixed-Effects Models
18 Split-Plot, Repeated Measures, and Crossover Designs
19 Analysis of Variance for Some Unbalanced Designs
However, you must remember that for each data set requiring analysis, someone
has defined the problem to be examined (Step 1), developed a plan for collecting
data to address the problem (Step 2), and summarized the data and prepared the
data for analysis (Step 3). Then following the analysis of the data, the results of the
analysis must be interpreted and communicated either verbally or in written form
to the intended audience (Step 4).
All four steps are important in Learning from Data; in fact, unless the problem to be addressed is clearly defined and the data collection carried out properly,
the interpretation of the results of the analyses may convey misleading information because the analyses were based on a data set that did not address the problem
or that was incomplete and contained improper information. Throughout the text,
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4
Chapter 1
Statistics and the Scientific Method
we will try to keep you focused on the bigger picture of Learning from Data
through the four-step process. Most chapters will end with a summary section
that emphasizes how the material of the chapter fits into the study of statistics—
Learning from Data.
To illustrate some of the above concepts, we will consider four situations
in which the four steps in Learning from Data could assist in solving a real-world
problem.
1. Problem: Inspection of ground beef in a large beef-processing facility.
A beef-processing plant produces approximately half a million packages of ground beef per week. The government inspects packages
for possible improper labeling of the packages with respect to the
percent fat in the meat. The inspectors must open the ground beef
package in order to determine the fat content of the ground beef.
The inspection of every package would be prohibitively costly and
time consuming. An alternative approach is to select 250 packages
for inspection from the daily production of 100,000 packages. The
fraction of packages with improper labeling in the sample of 250
packages would then be used to estimate the fraction of packages
improperly labeled in the complete day’s production. If this fraction
exceeds a set specification, action is then taken against the meat
processor. In later chapters, a procedure will be formulated to determine how well the sample fraction of improperly labeled packages
approximates the fraction of improperly labeled packages for the
whole day’s output.
2. Problem: Is there a relationship between quitting smoking and
gaining weight? To investigate the claim that people who quit
smoking often ­experience a subsequent weight gain, researchers
selected a random sample of 400 participants who had successfully
participated in programs to quit smoking. The individuals were
weighed at the beginning of the program and again 1 year later.
The average change in weight of the participants was an increase of
5 pounds. The investigators concluded that there was evidence that
the claim was valid. We will develop techniques in later chapters to
assess when changes are truly significant changes and not changes
due to random chance.
3. Problem: What effect does nitrogen fertilizer have on wheat production?
For a study of the effects of nitrogen fertilizer on wheat production,
a total of 15 fields was available to the researcher. She randomly
assigned three fields to each of the five nitrogen rates under investigation. The same variety of wheat was planted in all 15 fields. The
fields were cultivated in the same manner until harvest, and the
number of pounds of wheat per acre was then recorded for each of
the 15 fields. The experimenter wanted to determine the optimal
level of nitrogen to apply to any wheat field, but, of course, she was
limited to running experiments on a limited number of fields. After
determining the amount of nitrogen that yielded the largest production of wheat in the study fields, the ­experimenter then concluded
that similar results would hold for wheat fields possessing characteristics somewhat the same as the study fields. Is the experimenter
justified in reaching this conclusion?
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.1
Introduction
5
4. Problem: Determining public opinion toward a question, issue,
product, or candidate. Similar applications of statistics are brought
to mind by the frequent use of the New York Times/CBS News,
Washington Post/ABC News, Wall Street Journal/NBC News, Harris,
Gallup/Newsweek, and CNN/Time polls. How can these pollsters
determine the opinions of more than 195 million Americans who are
of voting age? They certainly do not contact every potential voter in
the United States. Rather, they sample the opinions of a small number of potential voters, perhaps as few as 1,500, to estimate the reaction of every person of voting age in the country. The amazing result
of this process is that if the selection of the voters is done in an unbiased way and voters are asked unambiguous, nonleading questions,
the fraction of those persons contacted who hold a particular opinion
will closely match the fraction in the total population holding that
opinion at a ­particular time. We will supply convincing supportive
evidence of this assertion in subsequent chapters.
These problems illustrate the four-step process in Learning from Data.
First, there was a problem or question to be addressed. Next, for each problem a study or experiment was proposed to collect meaningful data to solve the
problem. The government meat inspection agency had to decide both how many
packages to inspect per day and how to select the sample of packages from the
total daily output in order to obtain a valid prediction. The polling groups had to
decide how many voters to sample and how to select these individuals in order
to obtain information that is representative of the population of all voters. Similarly, it was necessary to carefully plan how many participants in the weight-gain
study were needed and how they were to be selected from the list of all such
participants. Furthermore, what variables did the researchers have to measure
on each participant? Was it necessary to know each participant’s age, sex, physical fitness, and other health-related variables, or was weight the only important
variable? The results of the study may not be relevant to the general population
if many of the participants in the study had a particular health condition. In the
wheat experiment, it was important to measure both the soil characteristics of
the fields and the environmental conditions, such as temperature and rainfall, to
obtain results that could be generalized to fields not included in the study. The
design of a study or experiment is crucial to obtaining results that can be generalized beyond the study.
Finally, having collected, summarized, and analyzed the data, it is important
to report the results in unambiguous terms to interested people. For the meat
inspection example, the government inspection agency and the personnel in the
beef-processing plant would need to know the distribution of fat content in the
daily production of ground beef. Based on this distribution, the agency could then
impose fines or take other remedial actions against the production facility. Also,
knowledge of this distribution would enable company production personnel to
make adjustments to the process in order to obtain acceptable fat content in their
ground beef packages. Therefore, the results of the statistical analyses cannot
be presented in ambiguous terms; decisions must be made from a well-deﬁned
knowledge base. The results of the weight-gain study would be of vital interest to
physicians who have patients participating in the smoking-cessation program. If
a signiﬁcant increase in weight was recorded for those individuals who had quit
smoking, physicians would have to recommend diets so that the former smokers
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
6
Chapter 1
Statistics and the Scientific Method
FIGURE 1.2
Population and sample
Set of all measurements:
the population
Set of measurements
selected from the
population:
the sample
population
sample
would not go from one health problem (smoking) to another (elevated blood
pressure due to being overweight). It is crucial that a careful description of the
participants—that is, age, sex, and other health-related information—be included
in the report. In the wheat study, the experiment would provide farmers with
information that would allow them to economically select the optimum amount of
nitrogen required for their ﬁelds. Therefore, the report must contain ­information
concerning the amount of moisture and types of soils present on the study ﬁelds.
Otherwise, the conclusions about optimal wheat production may not pertain to
farmers growing wheat under considerably different conditions.
To infer validly that the results of a study are applicable to a larger group
than just the participants in the study, we must carefully deﬁne the population
(see Deﬁnition 1.1) to which inferences are sought and design a study in which the
sample (see Deﬁnition 1.2) has been appropriately selected from the designated
population. We will discuss these issues in Chapter 2.
DEFINITION 1.1
A population is the set of all measurements of interest to the sample collector.
(See Figure 1.2.)
DEFINITION 1.2
A sample is any subset of measurements selected from the population.
(See Figure 1.2.)
1.2
Why Study Statistics?
We can think of many reasons for taking an introductory course in statistics. One
reason is that you need to know how to evaluate published numerical facts. Every
person is exposed to manufacturers’ claims for products; to the results of sociological, consumer, and political polls; and to the published results of scientiﬁc
research. Many of these results are inferences based on sampling. Some inferences are valid; others are invalid. Some are based on samples of adequate size;
others are not. Yet all these published results bear the ring of truth. Some people (particularly statisticians) say that statistics can be made to support almost
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.2
Why Study Statistics?
7
anything. Others say it is easy to lie with statistics. Both statements are true. It
is easy, ­purposely or unwittingly, to distort the truth by using statistics when
presenting the results of sampling to the uninformed. It is thus crucial that you
become an ­informed and critical reader of data-based reports and articles.
A second reason for studying statistics is that your profession or employment
may require you to interpret the results of sampling (surveys or experimentation)
or to employ statistical methods of analysis to make inferences in your work. For
­results of experiments that compare a new drug with an older one. Do such data
­really imply that the new drug is more effective, or is the observed difference in
­results due simply to random variation in the experimental measurements?
Recent trends in the conduct of court trials indicate an increasing use of
probability and statistical inference in evaluating the quality of evidence. The use
of statistics in the social, biological, and physical sciences is essential because all
these sciences make use of observations of natural phenomena, through sample
surveys or experimentation, to develop and test new theories. Statistical methods
are employed in business when sample data are used to forecast sales and proﬁt.
In addition, they are used in engineering and manufacturing to monitor product
quality. The sampling of accounts is a useful tool to assist accountants in conducting audits. Thus, statistics plays an important role in almost all areas of science,
business, and industry; persons employed in these areas need to know the basic
concepts, strengths, and limitations of statistics.
The article “What Educated Citizens Should Know About Statistics and Probability,” by J. Utts (2003), contains a number of statistical ideas that need to be
understood by users of statistical methodology in order to avoid confusion in the
use of their research findings. Misunderstandings of statistical results can lead to
major errors by government policymakers, medical workers, and consumers of this
information. The article selected a number of topics for discussion. We will summarize some of the findings in the article. A complete discussion of all these topics
will be given throughout the book.
1. One of the most frequent misinterpretations of statistical findings
is when a statistically significant relationship is established between
two variables and it is then concluded that a change in the explanatory ­variable causes a change in the response variable. As will be
discussed in the book, this conclusion can be reached only under
very restrictive constraints on the experimental setting. Utts examined a recent Newsweek article discussing the relationship between
the strength of religious beliefs and physical healing. Utts’ article
discussed the ­problems in reaching the conclusion that the stronger
a patient’s religious beliefs, the more likely the patient would be
cured of his or her ailment. Utts showed that there are ­numerous
other factors involved in a patient’s health and the conclusion that
religious beliefs cause a cure cannot be validly reached.
2. A common confusion in many studies is the difference between
(statistically) significant findings in a study and (practically) significant findings. This problem often occurs when large data sets are
involved in a study or experiment. This type of problem will be discussed in detail throughout the book. We will use a number of examples that will illustrate how this type of confusion can be avoided by
researchers when reporting the findings of their experimental results.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
8
Chapter 1
Statistics and the Scientific Method
Utts’ article illustrated this problem with a discussion of a study that
found a statistically significant difference in the average heights of
military recruits born in the spring and in the fall. There were 507,125
recruits in the study and the difference in average height was about
1/4 inch. So, even though there may be a difference in the actual average heights of recruits in the spring and the fall, the difference is so
small (1/4 inch) that it is of no practical ­importance.
3. The size of the sample also may be a determining factor in studies
in which statistical significance is not found. A study may not have
­selected a sample size large enough to discover a difference between
the several populations under study. In many government-sponsored
studies, the researchers do not receive funding unless they are able
to demonstrate that the sample sizes selected for their study are of
an ­appropriate size to ­detect specified differences in populations if
in fact they exist. Methods to determine appropriate sample sizes
will be provided in the chapters on hypotheses testing and experimental ­design.
4. Surveys are ubiquitous, especially during the years in which national
elections are held. In fact, market surveys are nearly as widespread
as political polls. There are many sources of bias that can creep
into the most reliable of surveys. The manner in which people are
selected for ­inclusion in the survey, the way in which questions are
phrased, and even the manner in which questions are posed to the
subject may affect the conclusions ­obtained from the survey. We will
discuss these issues in Chapter 2.
5. Many students find the topic of probability to be very confusing. One
of these confusions involves conditional probability where the probability of an event occurring is computed under the condition that a
second event has occurred with certainty. For example, a new diagnostic test for the pathogen Escherichia coli in meat is proposed to
the U.S. Department of Agriculture (USDA). The USDA evaluates
the test and determines that the test has both a low false positive rate
and a low false negative rate. That is, it is very unlikely that the test
will declare the meat contains E. coli when in fact it does not contain
E. coli. Also, it is very unlikely that the test will ­declare the meat does
not contain E. coli when in fact it does contain E. coli. ­Although the
diagnostic test has a very low false positive rate and a very low false
negative rate, the probability that E. coli is in fact present in the meat
when the test yields a positive test result is very low for those situations in which a particular strain of E. coli occurs very infrequently.
In Chapter 4, we will demonstrate how this probability can be computed in order to provide a true assessment of the performance of a
diagnostic test.
6. Another concept that is often misunderstood is the role of the degree
of variability in interpreting what is a “normal” occurrence of some
naturally occurring event. Utts’ article provided the following example. A company was having an odor problem with its wastewater
treatment plant. It attributed the problem to “abnormal” rainfall during the ­period in which the odor problem was occurring. A company
official stated that the facility experienced 170% to 180% of its
“normal” rainfall during this period, which resulted in the water in
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.3
Some Current Applications of Statistics
9
the holding ponds t­ aking longer to exit for irrigation. Thus, there was
more time for the pond to develop an odor. The company official did
not point out that yearly rainfall in this region is extremely variable.
In fact, the historical range for rainfall is between 6.1 and 37.4 inches
with a median rainfall of 16.7 inches. The rainfall for the year of the
odor problem was 29.7 inches, which was well within the “normal”
range for rainfall. There was a confusion between the terms “average” and “normal” rainfall. The concept of natural variability is crucial to correct interpretation of statistical ­results. In this example, the
company official should have evaluated the percentile for an annual
rainfall of 29.7 inches in order to demonstrate the abnormality of
such a rainfall. We will discuss the ideas of data summaries and percentiles in Chapter 3.
The types of problems expressed above and in Utts’ article represent common
and important misunderstandings that can occur when researchers use statistics in
­interpreting the results of their studies. We will attempt throughout the book to discuss possible misinterpretations of statistical results and how to avoid them in your
data analyses. More importantly, we want the reader of this book to become a discriminating reader of statistical findings, the results of surveys, and ­project ­reports.
1.3
Some Current Applications of Statistics
Defining the Problem: Obtaining Information
from Massive Data Sets
Data mining is defined to be a process by which useful information is obtained
from large sets of data. Data mining uses statistical techniques to discover patterns
and trends that are present in a large data set. In most data sets, important patterns
would not be discovered by using traditional data exploration techniques because
the types of relationships between the many variables in the data set are either too
complex or because the data sets are so large that they mask the relationships.
The patterns and trends discovered in the analysis of the data are defined
as data mining models. These models can be applied to many different situations,
such as:
●● Forecasting: Estimating future sales, predicting demands on a power
grid, or estimating server downtime
●● Assessing risk: Choosing the rates for insurance premiums, selecting
best customers for a new sales campaign, determining which medical
therapy is most appropriate given the physiological characteristics of
the patient
●● Identifying sequences: Determining customer preferences in online
purchases, predicting weather events
●● Grouping: Placing customers or events into cluster of related items,
analyzing and predicting relationships between demographic characteristics and purchasing patterns, identifying fraud in credit card
purchases
A new medical procedure referred to as gene editing has the potential to
assist thousands of people suffering many different diseases. An article in the
Houston Chronicle (2013 ), describes how data mining techniques are used to
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10
Chapter 1
Statistics and the Scientific Method
explore massive genomic data bases to interpret millions of bits of data in a person’s DNA. This information is then used to identify a single defective gene,
which is cut out, and splice in a correction. This area of research is referred to as
biomedical informatics and is based on the premise that the human body is a data
bank of incredible depth and complexity. It is predicted that by 2015, the average
hospital will have approximately 450 terabytes of patient data consisting of large,
complex images from CT scans, MRIs, and other imaging techniques. However,
only a small fraction of the current medical data has been analyzed, thus opening
huge opportunities for persons trained in data mining. In a case described in the
article, a 7-year-old boy tormented by scabs, blisters, and scars was given a new
lease on life by using data mining techniques to discover a single letter in his faulty
genome.
Defining the Problem: Determining the Effectiveness
of a New Drug Product
The development and testing of the Salk vaccine for protection against poliomyelitis (polio) provide an excellent example of how statistics can be used in solving
practical problems. Most parents and children growing up before 1954 can recall
the panic brought on by the outbreak of polio cases during the summer months.
Although relatively few children fell victim to the disease each year, the pattern
of outbreak of polio was unpredictable and caused great concern because of the
possibility of paralysis or death. The fact that very few of today’s youth have even
heard of polio demonstrates the great success of the vaccine and the testing program that preceded its release on the market.
It is standard practice in establishing the effectiveness of a particular drug product to conduct an experiment (often called a clinical trial) with human partici­pants.
For some clinical trials, assignments of participants are made at random, with half
receiving the drug product and the other half receiving a solution or tablet that does
not contain the medication (called a placebo). One statistical problem concerns the
determination of the total number of participants to be included in the clinical trial.
This problem was particularly important in the testing of the Salk vaccine because
data from previous years suggested that the incidence rate for polio might be less
than 50 cases for every 100,000 children. Hence, a large number of participants had
to be included in the clinical trial in order to detect a difference in the incidence rates
for those treated with the vaccine and those receiving the placebo.
With the assistance of statisticians, it was decided that a total of 400,000
children should be included in the Salk clinical trial begun in 1954, with half of them
randomly assigned the vaccine and the remaining children assigned the placebo. No
other clinical trial had ever been attempted on such a large group of participants.
Through a public school inoculation program, the 400,000 participants were treated
and then observed over the summer to determine the number of ­children contracting polio. Although fewer than 200 cases of polio were reported for the 400,000
participants in the clinical trial, more than three times as many cases appeared in
the group receiving the placebo. These results, together with some statistical calculations, were sufﬁcient to indicate the effectiveness of the Salk polio vaccine.
However, these conclusions would not have been possible if the statisticians and
scientists had not planned for and conducted such a large ­clinical trial.
The development of the Salk vaccine is not an isolated example of the use
of statistics in the testing and development of drug products. In recent years,
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.3
Some Current Applications of Statistics
11
the U.S. Food and Drug Administration (FDA) has placed stringent requirements
on pharmaceutical ﬁrms wanting to establish the effectiveness of proposed new
drug products. Thus, statistics has played an important role in the development
and testing of birth control pills, rubella vaccines, chemotherapeutic agents in the
treatment of cancer, and many other preparations.
Defining the Problem: lmproving the Reliability
of Evidence in Criminal Investigations
The National Academy of Sciences released a report (National Research Council,
2009) in which one of the more important findings was the need for applying statistical methods in the design of studies used to evaluate inferences from evidence
gathered by forensic technicians. The following statement is central to the report:
“Over the last two decades, advances in some forensic science disciplines, especially the use of DNA technology, have demonstrated that some areas of forensic science have great additional potential to help law enforcement identify
criminals. . . . Those advances, however, also have revealed that, in some cases,
substantive information and testimony based on faulty forensic science analyses may have contributed to wrongful convictions of innocent people. This fact
has demonstrated the potential danger of giving undue weight to evidence and
testimony derived from imperfect testing and analysis.”
There are many sources that may impact the accuracy of conclusions inferred
from the crime scene evidence and presented to a jury by a forensic investigator.
Statistics can play a role in improving forensic analyses. Statistical principles can
be used to identify sources of variation and quantify the size of the impact that
these sources of variation can have on the conclusions reached by the forensic
investigator.
An illustration of the impact of an inappropriately designed study and
statistical analysis on the conclusions reached from the evidence obtained at
a crime scene can be found in Spiegelman et al. (2007). They demonstrate that
the evidence used by the FBI crime lab to support the claim that there was not
a second assassin of President John F. Kennedy was based on a faulty analysis
of the data and an overstatement of the results of a method of forensic testing
called Comparative Bullet Lead Analysis (CBLA). This method applies a chemical analysis to link a bullet found at a crime scene to the gun that had discharged
the bullet. Based on evidence from chemical analyses of the recovered bullet fragments, the 1979 U.S. House Select Committee on Assassinations concluded that all
the bullets striking President Kennedy were fired from Lee Oswald’s rifle. A new
analysis of the bullets using more appropriate statistical analyses demonstrated
that the evidence presented in 1979 was overstated. A case is presented for a new
analysis of the assassination bullet fragments, which may shed light on whether the
five bullet fragments found in the Kennedy assassination are derived from three or
more bullets and not just two bullets, as was presented as the definitive evidence
that Oswald was the sole shooter in the assassination of President Kennedy.
Defining the Problem: Estimating Bowhead Whale
Population Size
Raftery and Zeh (1998) discuss the estimation of the population size and rate of
­increase in bowhead whales, Balaena mysticetus. The importance of such a study
­derives from the fact that bowheads were the ﬁrst species of great whale for
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
12
Chapter 1
Statistics and the Scientific Method
which commercial whaling was stopped; thus, their status indicates the recovery
prospects of other great whales. Also, the International Whaling Commission
uses these estimates to determine the aboriginal subsistence whaling quota for
Alaskan Eskimos. To obtain the necessary data, researchers conducted a visual
and acoustic census off Point Barrow, Alaska. The researchers then applied statistical models and estimation techniques to the data obtained in the census to
commercial whaling was stopped. The statistical estimates showed that the
­bowhead popu­lation was increasing at a healthy rate, indicating that stocks of
great whales that have been decimated by commercial hunting can recover after
hunting is ­discontinued.
Defining the Problem: Ozone Exposure
and Population Density
Ambient ozone pollution in urban areas is one of the nation’s most pervasive environmental problems. Whereas the decreasing stratospheric ozone layer may lead
to increased instances of skin cancer, high ambient ozone intensity has been shown
to cause damage to the human respiratory system as well as to agricultural crops
and trees. The Houston, Texas, area has ozone concentrations and are rated second only to those of Los Angeles. that exceed the National Ambient Air Quality
Standard. Carroll et al. (1997) describe how to analyze the hourly ozone measurements collected in ­Houston from 1980 to 1993 by 9 to 12 monitoring stations.
Besides the ozone level, each station recorded three meteorological variables:
temperature, wind speed, and wind direction.
The statistical aspect of the project had three major goals:
1. Provide information (and/or tools to obtain such information)
about the amount and pattern of missing data as well as about the
quality of the ozone and the meteorological measurements.
2. Build a model of ozone intensity to predict the ozone concentration
at any given location within Houston at any given time between 1980
and 1993.
3. Apply this model to estimate exposure indices that account for
either a long-term exposure or a short-term high-concentration
exposure; also, relate census information to different exposure
indices to achieve population exposure indices.
The spatial–temporal model the researchers built provided estimates demonstrating that the highest ozone levels occurred at locations with relatively small
populations of young children. Also, the model estimated that the exposure of
young children to ozone decreased by approximately 20% from 1980 to 1993. An
examination of the distribution of population exposure had several policy implications. In particular, it was concluded that the current placement of monitors
is not ideal if one is concerned with assessing population exposure. This project
involved all four components of Learning from Data: planning where the monitoring stations should be placed within the city, how often the data should be
collected, and what variables should be recorded; conducting spatial–temporal
graphing of the data; creating spatial–temporal models of the ozone data, meteorological data, and demographic data; and, ﬁnally, writing a report that could
assist local and federal ofﬁcials in formulating policy with respect to decreasing
ozone levels.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.5
Summary
13
Defining the Problem: Assessing Public Opinion
Public opinion, consumer preference, and election polls are commonly used to
­assess the opinions or preferences of a segment of the public regarding issues,
products, or candidates of interest. We, the American public, are exposed to the
results of these polls daily in newspapers, in magazines, on the internet, on the
radio, and on television. For example, the results of polls related to the following
subjects were printed in local newspapers:
●● Public confidence in the potential for job growth in the coming year
●● Reactions of Texas residents to the state legislature’s failure to expand
Medicaid coverage
●● Voters’ preferences for tea party candidates in the fall congressional
elections
●● Attitudes toward increasing the gasoline tax in order to increase
funding for road construction and maintenance
●● Product preference polls related to specific products (Toyota vs. Ford,
DirecTV vs. Comcast, Dell vs. Apple, Subway vs. McDonald’s)
●● Public opinion on a national immigration policy
A number of questions can be raised about polls. Suppose we consider a poll
on the public’s opinion on a proposed income tax increase in the state of Michigan.
What was the population of interest to the pollster? Was the pollster i­nterested in
all residents of Michigan or just those citizens who currently pay income taxes?
Was the sample in fact selected from this population? If the population of interest
was all persons currently paying income taxes, did the pollster make sure that all
the individuals sampled were current taxpayers? What questions were asked and
how were the questions phrased? Was each person asked the same question? Were
the questions phrased in such a manner as to bias the responses? Can we believe
the results of these polls? Do these results “represent’’ how the general public
currently feels about the issues raised in the polls?
Opinion and preference polls are an important, visible application of statistics for the consumer. We will discuss this topic in more detail in Chapters 2 and
10. We hope that after studying this material you will have a better understanding
of how to interpret the results of these polls.
1.4 A Note to the Student
We think with words and concepts. A study of the discipline of statistics requires
us to memorize new terms and concepts (as does the study of a foreign language).
Commit these deﬁnitions, theorems, and concepts to memory.
Also, focus on the broader concept of making sense of data. Do not let details
obscure these broader characteristics of the subject. The teaching objective of this
text is to identify and amplify these broader concepts of statistics.
1.5
Summary
The discipline of statistics and those who apply the tools of that discipline deal
with Learning from Data. Medical researchers, social scientists, accountants,
agronomists, consumers, government leaders, and professional statisticians are all
involved with data collection, data summarization, data analysis, and the effective
communication of the results of data analysis.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
14
Chapter 1
Statistics and the Scientific Method
1.6
Exercises
1.1
Introduction
Bio.
1.1 H
­ ansen (2006) describes a study to assess the migration and survival of salmon released
from fish farms located in Norway. The mingling of escaped farmed salmon with wild salmon
raises several concerns. First, the assessment of the abundance of wild salmon stocks will be
biased if there is a presence of large numbers of farmed salmon. Second, potential interbreeding between farmed and wild salmon may result in a reduction in the health of the wild stocks.
Third, diseases present in farmed salmon may be transferred to wild salmon. Two batches of
farmed salmon were tagged and released in two locations, one batch of 1,996 fish in northern
Norway and a second batch of 2,499 fish in southern Norway. The researchers recorded the
time and location at which the fish were captured by either commercial fisherman or anglers
in fresh water. Two of the most important pieces of information to be determined by the
study were the distance from the point of the fish’s release to the point of its capture and the
length of time it took for the fish to be captured.
a. Identify the population that is of interest to the researchers.
b. Describe the sample.
c. What characteristics of the population are of interest to the researchers?
d. If the sample measurements are used to make inferences about the population
characteristics, why is a measure of reliability of the inferences important?
Env.
1.2
Soc.
1.3 In 2014, Congress cut \$8.7 billion from the Supplemental Nutrition Assistance Program
(SNAP), more commonly referred to as food stamps. The rationale for the decrease is that
providing assistance to people will result in the next generation of citizens being more dependent on the government for support. Hoynes (2012) describes a study to evaluate this claim. The
study examines 60,782 families over the time period of 1968 to 2009 which is subsequent to the
introduction of the Food Stamp Program in 1961. This study examines the impact of a positive and policy-driven change in economic resources available in utero and during childhood
on the economic health of individuals in adulthood. The study assembled data linking family
background in early childhood to adult health and economic outcomes. The study concluded
that the Food Stamp Program has effects decades after initial exposure. Specifically, access
to food stamps in childhood leads to a significant reduction in the incidence of metabolic
syndrome (obesity, high blood pressure, and diabetes) and, for women, an increase in economic self-sufficiency. Overall, the results suggest substantial internal and external benefits
of SNAP.
a. Identify the population that is of interest to the researchers.
b. Describe the sample.
c. What characteristics of the population are of interest to the researchers?
d. If the sample measurements are used to make inferences about the population
characteristics, why is a measure of reliability of the inferences important?
During 2012, Texas had listed on FracFocus, an industry fracking disclosure site, nearly
6,000 oil and gas wells in which the fracking methodology was used to extract natural gas.
Fontenot et al. (2013 ) reports on a study of 100 private water wells in or near the Barnett Shale
in Texas. There were 91 private wells located within 5 km of an active gas well using fracking, 4
private wells with no gas wells located within a 14 km radius, and 5 wells outside of the Barnett
Shale with no gas well located with a 60 km radius. They found that there were elevated levels
of potential contaminants such as arsenic and selenium in the 91 wells closest to natural gas
extraction sites compared to the 9 wells that were at least 14 km away from an active gas well
using the £racking technique to extract natural gas.
a. Identify the population that is of interest to the researchers.
b. Describe the sample.
c. What characteristics of the population are of interest to the researchers?
d. If the sample measurements are used to make inferences about the population
characteristics, why is a measure of reliability of the inferences important?
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.6
Exercises
15
Med.
1.4 Of all sports, football accounts for the highest incidence of concussion in the United States
due to the large number of athletes participating and the nature of the sport. While there is general agreement that concussion incidence can be reduced by making rule changes and teaching
proper tackling technique, there remains debate as to whether helmet design may also reduce the
incidence of concussion. Rowson et al. (2014) report on a retrospective analysis of head impact
data collected between 2005 and 2010 from eight collegiate football teams. Concussion rates for
players wearing two types of helmets, Riddell VSR4 and Riddell Revolution, were compared. A
total of 1,281,444 head impacts were recorded, from which 64 concussions were diagnosed. The
relative risk of sustaining a concussion in a Revolution helmet compared with a VSR4 helmet
was 46.1%. This study illustrates that differences in the ability to reduce concussion risk exist
between helmet models in football. Although helmet design may never prevent all concussions
from occurring in football, evidence illustrates that it can reduce the incidence of this injury.
a. Identify the population that is of interest to the researchers.
b. Describe the sample.
c. What characteristics of the population are of interest to the researchers?
d. If the sample measurements are used to make inferences about the population
characteristics, why is a measure of reliability of the inferences important?
Pol. Sci.
1.5 During the 2004 senatorial campaign in a large southwestern state, illegal immigration was
a major issue. One of the candidates argued that illegal immigrants made use of educational
and social services without having to pay property taxes. The other candidate pointed out that
the cost of new homes in their state was 20–30% less than the national average due to the low
wages received by the large number of illegal immigrants working on new home construction. A
random sample of 5,500 registered voters was asked the question, “Are illegal immigrants generally a benefit or a liability to the state’s economy?” The results were as follows: 3,500 people
responded “liability,” 1,500 people responded “benefit,” and 500 people responded “uncertain.”
a. What is the population of interest?
b. What is the population from which the sample was selected?
c. Does the sample adequately represent the population?
d. If a second random sample of 5,000 registered voters was selected, would the
results be nearly the same as the results obtained from the initial sample of
Edu.
1.6 An American history professor at a major university was interested in knowing the history
literacy of college freshmen. In particular, he wanted to find what proportion of college freshmen
at the university knew which country controlled the original 13 colonies prior to the American
Revolution. The professor sent a questionnaire to all freshman students enrolled in HIST 101 and
received responses from 318 students out of the 7,500 students who were sent the questionnaire.
One of the questions was “What country controlled the original 13 colonies prior to the American
Revolution?”
a. What is the population of interest to the professor?
b. What is the sampled population?
c. Is there a major difference in the two populations. Explain your answer.
d. Suppose that several lectures on the American Revolution had been given in
HIST 101 prior to the students receiving the questionnaire. What possible source
of bias has the professor introduced into the study relative to the population of
interest?
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affe…

Don't use plagiarized sources. Get Your Custom Essay on
Help with Homework
Just from \$13/Page

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors:
Number of pages
Urgency
Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

# Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

### Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.