**
Analysis of
Variance: One-Way
ANOVA**

■ The
owner
of my
company,
which
publishes
computer
books,
wants to know
whether
the
position
of our books
in
the
computer
book
section
of
bookstores
influences
sales.
More specifically,
does
it
really
matter
whether
the
books
are placed
in the
front,
back,

or middle
of
the computer
book
section?

■ If
I
am
determining
whether
populations
have
significantly
different
means,
why
is
the technique
called *analysis
of
variance*?

■
How
can
I use the
results
of one-way
ANOVA
for
forecasting?

We
often
have
several
different
groups
of
people
or items
and want
to
determine
whether
data about
the
groups
differs
significantly.
Here are
some examples:

■ Is
there
a significant
difference
in the
length
of time
that
four
doctors
keep mothers
in the
hospital
after
they
give birth?

■
Does
the
production
yield
for
a new
drug
depend
on
whether
the
size of
the
container

in
which
the
drug
is produced
is large,
small,
or medium?

■ Does
the
drop in
blood
pressure
attained
after
taking
one of
four
drugs
depend on
the
drug
taken?

When
you’re
trying
to
determine
whether
the
means
in several
sets
of data
that
depend
on one factor
are
significantly
different,
one-way
analysis
of
variance,
or
ANOVA,
is
the
correct
tool
to
use.
In the
examples
given
above,
the
factors
are
the
doctors,
the
container size,
and the
drug,
respectively.
In analyzing
the
data,
we can choose
between
two
hypotheses:

■
*Null
hypothesis*,
which
indicates
that
the means
of all
groups
are identical.

■ *Alternative
hypothesis*,
which indicates
that
there
is a statistically significant
difference
between
the
groups’
means.

To
test these
hypotheses
in Microsoft
Office Excel
2007,
we can
use
the
Anova:
Single
Factor
option
in
the
Data
Analysis
dialog
box.
If
the
p-value
computed
by
Excel
is small
(usually less

than
or
equal
to
0.15),
we
can
conclude
that
the
alternative
hypothesis
is
true
(the
means
are
significantly
different).
If
the
p-value is
greater
than
0.15,
the
null
hypothesis
is
true
(the
populations
have identical
means).
Let’s look
at an example.

The
owner of
my
company,
which publishes
computer
books, wants
to
know whether
the position
of our books
in the computer
book section
of bookstores
influences
sales. More
specif
ically,
does
it
really
matter
whether the
books are
placed in
the front,
back,
or
middle

of
the
computer
book
section?

The
publishing
company
wants
to
know
whether
its books
sell better
when a
display
is set
up
in
the
front,
back,
or
middle
of
the
computer
book section.
Weekly
sales (in
hundreds)
were
monitored
at
12
different
stores.
At 5
stores,
the
books
were
placed
in the
front;
at 4
stores,
in
the
back;
and
at
3 stores,
in
the
middle.
Resulting sales
are
contained

in
the
*
Signif
*
worksheet
in
the
file
Onewayanova.xlsx,
which
is shown
in
Figure
50-1.

Does the
data indicate
that
the
location
of the
books
has a significant
effect
on sales?

Figure
50-1
Book sales data

We
assume
that
the
12
stores
have
similar
sales
patterns
and
are
approximately
the
same
size.
This
assumption
allows
us
to
use one-way
ANOVA
because
we
believe
that
at
most one
factor
(the
position
of
the
display
in
the
computer
book section)
is affecting
sales.
(If the
stores
were
different
sizes,
we
would
need
to
analyze
our data
with
two-way
ANOVA,
which
I’ll
discuss
in Chapter
51,
“Randomized
Blocks
and
Two-Way
ANOVA.”)

To
analyze
the
data,
on
the
Data
tab,
click
Data
Analysis,
and
then
select
Anova:
Single

Factor.
Fill in the
dialog box
as shown
in Figure
50-2.

Figure
50-2
Anova:
Single
Factor
dialog
box

To visually learn more about Excel, please click

here.

To view all advanced complex functions, please click

here

We
use
the
following
configurations:

❑
The
data
for
our input
range,
including
labels,
is
in
cells
B3:D8.

❑
Select
the
Labels option
because
the
first
row
of our input
range contains
labels.

❑
I’ve
selected
the
Columns
option because the
data is organized
in
columns.

❑
I’ve
selected C12
as the
upper-left
cell
of
the
output
range.

❑
The
selected alpha
value is not
important.
You
can
use
the
default
value.
After
clicking
OK,
we
obtain
the
results
shown in
Figure 50-3.

Figure
50-3
One-way ANOVA
results

In
cells
F16:F18,
we
see average
sales depending
on
the
location
of
the
display.
When
the
display
is at
the
front
of
the
computer
book section,
average
sales are
900; when
the
display
is at
the
back of
the
section,
sales average
1400;
and when
the
display
is in
the
middle,
sales
average
1100.
Because
our
p-value
of 0.003 (in
cell
H23)
is less
than
0.15,
we can conclude
that
these
means
are significantly
different.

If
I
am determining
whether
populations
have signif
icantly different
means, why
is the
technique
called
*a**nalysis**
of
variance*?

Suppose
that
the
data
in
our
book
sales
study
is
the
data
shown
in
the
worksheet
named
*Insig*, shown
in Figure
50-4
on the
next
page
(also
in
the
file
Onewayanova.xlsx).
If
we
run
a one-way
ANOVA
on
this
data,
we
obtain
the
results
shown
in Figure
50-5 on
the
next
page.

Note
that
the
mean
sales
for
each
part
of
the
store
are
exactly
as
before,
yet
our
p-value

of
.66
indicates
that
we
should
accept
the
null
hypothesis
and conclude
that
the
position

of the
display
in
the
computer
book
section
doesn’t affect
sales. The
reason
for
this strange
result
is
that
in our
second data
set,
we
have
much more
variation
in sales
when the
display
is at each
position in
the
computer
book section.
In our
first data
set,
for example,
the
variation in
sales when
the
display
is at
the
front
is between
700
and
1100,
whereas
in
the
second
data
set,
the
variation
in sales
is between
200
and
2000.
The
variation of
sales within
each store
position is
measured
by
the
sum of
the
squares
of data

within
a
group.
This
measure
is
shown
in cell
D24 in
the
first
data
set and
in cell
F24
in the
second.
In our
first
data
set,
the
sum
of squares
of data
within
groups
is only
22, whereas
in
the
second
data
set,
the
sum of
squares
within
groups
is 574!
This large
variation within
the
data
points
at
each
store
position masks
the
variation
between
the
groups
(store
positions)
themselves
and makes
it impossible
to
conclude
for
the
second data set
that
the
difference
between
sales
in different
store
positions is
significant.

Figure
50-4
Book
store
data
for which the
null
hypothesis is
accepted

Figure
50-5
Anova
results
accepting
the null hypothesis

How can I
use the
results
of
a one-way ANOVA
for forecasting?

If
there
is a
significant
difference
between
group
means,
our best
forecast
for
each
group

is simply
the
group’s
mean. Therefore,
in
the
first data
set,
we predict
the
following:

❑
Sales
when
the
display
is at
the
front
of
the
computer
book section
will be
900

books
per
week.

❑
Sales
when
the
display
is
at
the
back will be
1400
books
per
week.

❑
Sales
when
the
display
is
in
the
middle
will be
1100
books
per
week.

If
there
is
no
significant
difference
between
the
group
means,
our
best
forecast
for
each observation
is simply
the
overall
mean. Thus,
in the
second data
set,
we
predict
weekly
sales of
1117,
independent
of where
the
books
are placed.

We
can
also
estimate
the
accuracy
of our
forecasts.
The square
root
of
the
Within
Groups

MS
(mean
square)
is
the
standard
deviation
of
our
forecasts
from
a one-way
ANOVA.
As

shown
in Figure
50-6, our
standard
deviation of
forecasts
for
the
first
data
set is
156.
By the
rule
of
thumb,
this
means
that
we
would
expect,
for
example:

❑
During
68
percent
of
all
the
weeks
in
which
books
are
placed
at
the
front
of
the
computer
section, sales will
be between
*900–156=744
*and
*900+156=1056
*books.

❑
During
95
percent
of all
weeks
in which books
are placed
at the
front of
the

computer
book
section,
sales
will
be
between
*
900–2(156)=588
*
book
s
and

*
900+2(156)=1212
*
books.

Figure
50-6
Computation
of forecast
standard
deviation

**
Problems**

You
can
find
the
data
for
the
following
problems
in
the
file
Chapter50data.xlsx.

**
1.
**For
patients
of
four
cardiologists,
we
are
given
the
number
of
days
the
patients
stayed
in
the
hospital
after
open-heart
surgery.

❑
Is
there
evidence that
the
doctors have
different
discharge
policies?

❑
You
are
95
percent
sure
that
a
patient
of
Doctor
1
will
stay
in
the
hospital
between
what range of
days?

**
2. **A
drug
can
be
produced
by
using
a
400-degree,
300-degree,
or
200-degree
oven.
You
are
given
the
pounds
of
the
drug
yielded
when
various
batches are
baked
at different
temperatures.

❑
Does
temperature
appear
to influence
the
process
yield?

❑
What
is
the
range
of pounds
of
the
product
that
you
are
95
percent
sure
will be produced
with
a 200-degree
oven?

❑
If
you
believe
that
pressure
within
the
container
also
influences
process
yield,
does
this
analysis
remain
valid?