knowledge-database (beta)

Current group: comp.soft-sys.sas

geomean questions.

geomean questions.  
Carry Croghan
 Re: geomean questions.  
Dale McLerran
 Re: geomean questions.  
David L. Cassell
 Re: geomean questions.  
Dale McLerran
 Re: geomean questions.  
David L. Cassell
 Re: geomean questions.  
omugeye
From:Carry Croghan
Subject:geomean questions.
Date:21 Jan 05 21:10:56 GMT
I have a couple of questions concerning geomean.

1. I had thought that geomean was available with proc means and/or proc
summary. Both geomean and gmean as a keyword have produced an error.
Is geomean available as an additional stat? If so, what is the keyword?

2. The note in SAS help says that there is a fuzz factor used for when
the variable is 0 or close to 0. Does any one know what that fuzz
factor is or what close to 0 is defined as?


I guess technically this is four questions in a couple of areas.

TIA.

Carry W. Croghan
Database Manager
EPA\ORD\NERL\HEASD
RTP, NC
From:Dale McLerran
Subject:Re: geomean questions.
Date:21 Jan 05 22:32:01 GMT
Carry,

GEOMEAN is a function available in the data step under version 9.x.
It is not an option available to the MEANS procedure. Thus,
the GEOMEAN function computes the geometric mean across columns
of a data set. More often, you want to compute the geometric
mean for one or more of your variables over all the observations
in your data set. To do this, you must first compute a variable
LOG_X = log(x) in a data step. From there, the simple way to
get the geometric mean of X would be to run PROC NLMIXED with
code

proc nlmixed data=mydata;
model log_x ~ normal(mu, var);
estimate "Geometric mean of X" exp(mu);
run;

Note that you could also compute the mean of LOG_X employing
your favorite mean calculation procedure (MEANS, UNIVARIATE,
etc.), output the mean value to a data set, and exponentiate
the mean value and a subsequent data step. Something like

proc means data=mydata;
var log_x;
output out=moments mean=mean_log_x;
run;

data geomean;
set moments;
geomean_x = exp(mean_log_x);
run;

If you have multiple variable for which you wish to obtain the
geometric mean, then the PROC MEANS/DATA step approach would
probably be preferable. However, for a single variable, the
NLMIXED procedure handles both the mean computation and
exponentiation operations. Also, the NLMIXED procedure will
produce a standard error for the geometric mean. You can
compute a standard error for the geometric mean yourself using
statistics generated by the MEANS procedure and with some
data step code employing moment approximation formulae. But
since NLMIXED has those formulas implemented, why not make it
easy on yourself?

Dale


--- Carry Croghan wrote:

> I have a couple of questions concerning geomean.
>
> 1. I had thought that geomean was available with proc means and/or
> proc
> summary. Both geomean and gmean as a keyword have produced an error.
> Is geomean available as an additional stat? If so, what is the
> keyword?
>
> 2. The note in SAS help says that there is a fuzz factor used for
> when
> the variable is 0 or close to 0. Does any one know what that fuzz
> factor is or what close to 0 is defined as?
>
>
> I guess technically this is four questions in a couple of areas.
>
> TIA.
>
> Carry W. Croghan
> Database Manager
> EPA\ORD\NERL\HEASD
> RTP, NC
>


=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
From:David L. Cassell
Subject:Re: geomean questions.
Date:22 Jan 05 00:36:39 GMT
Dale McLerran replied [in small part]:
> To do this, you must first compute a variable
> LOG_X = log(x) in a data step. From there, the simple way to
> get the geometric mean of X would be to run PROC NLMIXED with
> code
>
> proc nlmixed data=mydata;
> model log_x ~ normal(mu, var);
> estimate "Geometric mean of X" exp(mu);
> run;

Okay, can we have a show of hands? Does anyone else think this
is the SIMPLE way to get the geometric mean? :-) :-)

> exponentiation operations. Also, the NLMIXED procedure will
> produce a standard error for the geometric mean. You can
> compute a standard error for the geometric mean yourself using
> statistics generated by the MEANS procedure and with some
> data step code employing moment approximation formulae. But
> since NLMIXED has those formulas implemented, why not make it
> easy on yourself?

Yes. The very best part of using PROC NLMIXED.


Now, to keep this post from being completely content-free, let me
show one more way to get the geometric mean: the DoW loop way.
(Has anyone on SAS-L copyrighted "The Tao of the DoW" yet? :-)


options nocenter nodate nonumber ps=60 ls=75 noovp;

data temp1;
do y = 1 to 3;
do x = 3*y to 20*y by 3; output; end;
end;
y = 3; x = .; output;
run;

data temp2(keep = y count logsum geo_mean);
do until(last.y);
set temp1;
by y;
count = sum(count, not missing(x) );
logsum = sum(logsum, log(x) );
end;
geo_mean = exp( logsum / count );
run;

proc print data=temp2; run;


Note that I lobbed in a missing value for X as well. The code
will handle missing values relatively gracefully. Here's the
result:



Obs y count logsum geo_mean

1 1 6 13.1709 8.9814
2 2 12 35.7355 19.6477
3 3 18 61.4175 30.3283


The DoW-loop does our bookkeeping for us, so there's no explicit
pre-loop initialization, and minimal post-loop computations.


I still wonder, though... Why are people still so interested in
geometric means? Do they really believe they have log-normal
data? There are a host of really nice measures of central tendency
which PROC STDIZE will crank out in a heartbeat. The geometric mean
may be better than the arithmetic mean when the data have a long
right-hand tail, but it is still sensitive to outliers. It is not
robust or resistant.

HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
From:Dale McLerran
Subject:Re: geomean questions.
Date:22 Jan 05 01:25:52 GMT
--- "David L. Cassell" wrote:

> Dale McLerran replied [in small part]:
> > To do this, you must first compute a variable
> > LOG_X = log(x) in a data step. From there, the simple way to
> > get the geometric mean of X would be to run PROC NLMIXED with
> > code
> >
> > proc nlmixed data=mydata;
> > model log_x ~ normal(mu, var);
> > estimate "Geometric mean of X" exp(mu);
> > run;
>
> Okay, can we have a show of hands? Does anyone else think this
> is the SIMPLE way to get the geometric mean? :-) :-)
>

OK, I'll admit that the suggestion to use NLMIXED is non-intuitive.
I had not thought of it until I had started my post. Still, I
will have to argue that it is THE SIMPLE WAY to get the statistic,
at least for a single response.

We can construct a data step in which we cumulate sums and
frequencies, as David shows below. So, if we assess simplicity
in terms of how many data steps/procedures one must code in order
to get the statistic, there are "simpler" approaches.


> options nocenter nodate nonumber ps=60 ls=75 noovp;
>
> data temp1;
> do y = 1 to 3;
> do x = 3*y to 20*y by 3; output; end;
> end;
> y = 3; x = .; output;
> run;
>
> data temp2(keep = y count logsum geo_mean);
> do until(last.y);
> set temp1;
> by y;
> count = sum(count, not missing(x) );
> logsum = sum(logsum, log(x) );
> end;
> geo_mean = exp( logsum / count );
> run;
>
> proc print data=temp2; run;
>


OK, let's see those hands raised for the DoW approach versus

data temp2;
set temp1;
log_x = log(x);
run;

proc nlmixed data=temp2;
by y;
model log_x ~ normal(mu, var);
estimate "Geometric mean" exp(mu);
run;


Which one is simpler? Vote now. And in the spirit of recent
vote tabulations, vote often!

Dale


=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------



__________________________________
Do you Yahoo!?
Take Yahoo! Mail with you! Get it on your mobile phone.
http://mobile.yahoo.com/maildemo
From:David L. Cassell
Subject:Re: geomean questions.
Date:21 Jan 05 22:11:58 GMT
Carry Croghan/RTP/USEPA/US@EPA wrote:
> I have a couple of questions concerning geomean.
>
> 1. I had thought that geomean was available with proc means and/or
proc
> summary. Both geomean and gmean as a keyword have produced an error.
> Is geomean available as an additional stat? If so, what is the
keyword?
>
> 2. The note in SAS help says that there is a fuzz factor used for when
> the variable is 0 or close to 0. Does any one know what that fuzz
> factor is or what close to 0 is defined as?
>
>
> I guess technically this is four questions in a couple of areas.

Hey, you work for ORD too! Small world.

Okay:

[1] No and no. Geometric means aren't part of the keyword set. It
isn't
available as an additional stat.

[2] 'Close to zero' is what you get to define if you change the fuzz
factor.

If you want the geometric means of variables X1, X2, ... you can get
them
through a sneaky trick I learned from an old post of Nat Wooding:

proc sql noprint;
create table YourGeoMeans as
select exp(mean(log(x1))) as g_x1,
exp(mean(log(x2))) as g_x2,
.....
from YourData;
quit;


There's an old post of mine in the SAS-L archives which has even more on
building geometric (and other) means:

http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0109C&L=sas-l&P=R18556

If you don't use the SAS-L archives, you should. They're a great
resource.


So... Why do you need to compute geometric means anyway?

David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
From:omugeye
Subject:Re: geomean questions.
Date:21 Jan 2005 13:51:19 -0800
I think Geometric Mean is not available in any of the PROCs. You might
have to compute is yourself using a data step or something. Note that
the "the geometric mean is the antilog of the arithmetic mean of the
logs". See
http://ftp.sas.com/techsup/download/sample/samp_lib/basesampComputes_Geometric_Means.html
for details.
   

Copyright © 2006 knowledge-database   -   All rights reserved