|
|
 | | From: | Carry Croghan | | Subject: | geomean questions. | | Date: | 21 Jan 05 21:10:56 GMT |
|
|
 | I have a couple of questions concerning geomean.
1. I had thought that geomean was available with proc means and/or proc summary. Both geomean and gmean as a keyword have produced an error. Is geomean available as an additional stat? If so, what is the keyword?
2. The note in SAS help says that there is a fuzz factor used for when the variable is 0 or close to 0. Does any one know what that fuzz factor is or what close to 0 is defined as?
I guess technically this is four questions in a couple of areas.
TIA.
Carry W. Croghan Database Manager EPA\ORD\NERL\HEASD RTP, NC
|
|
 | | From: | Dale McLerran | | Subject: | Re: geomean questions. | | Date: | 21 Jan 05 22:32:01 GMT |
|
|
 | Carry,
GEOMEAN is a function available in the data step under version 9.x. It is not an option available to the MEANS procedure. Thus, the GEOMEAN function computes the geometric mean across columns of a data set. More often, you want to compute the geometric mean for one or more of your variables over all the observations in your data set. To do this, you must first compute a variable LOG_X = log(x) in a data step. From there, the simple way to get the geometric mean of X would be to run PROC NLMIXED with code
proc nlmixed data=mydata; model log_x ~ normal(mu, var); estimate "Geometric mean of X" exp(mu); run;
Note that you could also compute the mean of LOG_X employing your favorite mean calculation procedure (MEANS, UNIVARIATE, etc.), output the mean value to a data set, and exponentiate the mean value and a subsequent data step. Something like
proc means data=mydata; var log_x; output out=moments mean=mean_log_x; run;
data geomean; set moments; geomean_x = exp(mean_log_x); run;
If you have multiple variable for which you wish to obtain the geometric mean, then the PROC MEANS/DATA step approach would probably be preferable. However, for a single variable, the NLMIXED procedure handles both the mean computation and exponentiation operations. Also, the NLMIXED procedure will produce a standard error for the geometric mean. You can compute a standard error for the geometric mean yourself using statistics generated by the MEANS procedure and with some data step code employing moment approximation formulae. But since NLMIXED has those formulas implemented, why not make it easy on yourself?
Dale
--- Carry Croghan wrote:
> I have a couple of questions concerning geomean. > > 1. I had thought that geomean was available with proc means and/or > proc > summary. Both geomean and gmean as a keyword have produced an error. > Is geomean available as an additional stat? If so, what is the > keyword? > > 2. The note in SAS help says that there is a fuzz factor used for > when > the variable is 0 or close to 0. Does any one know what that fuzz > factor is or what close to 0 is defined as? > > > I guess technically this is four questions in a couple of areas. > > TIA. > > Carry W. Croghan > Database Manager > EPA\ORD\NERL\HEASD > RTP, NC >
===== --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
|
|
 | | From: | David L. Cassell | | Subject: | Re: geomean questions. | | Date: | 22 Jan 05 00:36:39 GMT |
|
|
 | Dale McLerran replied [in small part]: > To do this, you must first compute a variable > LOG_X = log(x) in a data step. From there, the simple way to > get the geometric mean of X would be to run PROC NLMIXED with > code > > proc nlmixed data=mydata; > model log_x ~ normal(mu, var); > estimate "Geometric mean of X" exp(mu); > run;
Okay, can we have a show of hands? Does anyone else think this is the SIMPLE way to get the geometric mean? :-) :-)
> exponentiation operations. Also, the NLMIXED procedure will > produce a standard error for the geometric mean. You can > compute a standard error for the geometric mean yourself using > statistics generated by the MEANS procedure and with some > data step code employing moment approximation formulae. But > since NLMIXED has those formulas implemented, why not make it > easy on yourself?
Yes. The very best part of using PROC NLMIXED.
Now, to keep this post from being completely content-free, let me show one more way to get the geometric mean: the DoW loop way. (Has anyone on SAS-L copyrighted "The Tao of the DoW" yet? :-)
options nocenter nodate nonumber ps=60 ls=75 noovp;
data temp1; do y = 1 to 3; do x = 3*y to 20*y by 3; output; end; end; y = 3; x = .; output; run;
data temp2(keep = y count logsum geo_mean); do until(last.y); set temp1; by y; count = sum(count, not missing(x) ); logsum = sum(logsum, log(x) ); end; geo_mean = exp( logsum / count ); run;
proc print data=temp2; run;
Note that I lobbed in a missing value for X as well. The code will handle missing values relatively gracefully. Here's the result:
Obs y count logsum geo_mean
1 1 6 13.1709 8.9814 2 2 12 35.7355 19.6477 3 3 18 61.4175 30.3283
The DoW-loop does our bookkeeping for us, so there's no explicit pre-loop initialization, and minimal post-loop computations.
I still wonder, though... Why are people still so interested in geometric means? Do they really believe they have log-normal data? There are a host of really nice measures of central tendency which PROC STDIZE will crank out in a heartbeat. The geometric mean may be better than the arithmetic mean when the data have a long right-hand tail, but it is still sensitive to outliers. It is not robust or resistant.
HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician
|
|
 | | From: | Dale McLerran | | Subject: | Re: geomean questions. | | Date: | 22 Jan 05 01:25:52 GMT |
|
|
 | --- "David L. Cassell" wrote:
> Dale McLerran replied [in small part]: > > To do this, you must first compute a variable > > LOG_X = log(x) in a data step. From there, the simple way to > > get the geometric mean of X would be to run PROC NLMIXED with > > code > > > > proc nlmixed data=mydata; > > model log_x ~ normal(mu, var); > > estimate "Geometric mean of X" exp(mu); > > run; > > Okay, can we have a show of hands? Does anyone else think this > is the SIMPLE way to get the geometric mean? :-) :-) >
OK, I'll admit that the suggestion to use NLMIXED is non-intuitive. I had not thought of it until I had started my post. Still, I will have to argue that it is THE SIMPLE WAY to get the statistic, at least for a single response.
We can construct a data step in which we cumulate sums and frequencies, as David shows below. So, if we assess simplicity in terms of how many data steps/procedures one must code in order to get the statistic, there are "simpler" approaches.
> options nocenter nodate nonumber ps=60 ls=75 noovp; > > data temp1; > do y = 1 to 3; > do x = 3*y to 20*y by 3; output; end; > end; > y = 3; x = .; output; > run; > > data temp2(keep = y count logsum geo_mean); > do until(last.y); > set temp1; > by y; > count = sum(count, not missing(x) ); > logsum = sum(logsum, log(x) ); > end; > geo_mean = exp( logsum / count ); > run; > > proc print data=temp2; run; >
OK, let's see those hands raised for the DoW approach versus
data temp2; set temp1; log_x = log(x); run;
proc nlmixed data=temp2; by y; model log_x ~ normal(mu, var); estimate "Geometric mean" exp(mu); run;
Which one is simpler? Vote now. And in the spirit of recent vote tabulations, vote often!
Dale
===== --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------
__________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo
|
|
 | | From: | David L. Cassell | | Subject: | Re: geomean questions. | | Date: | 21 Jan 05 22:11:58 GMT |
|
|
 | Carry Croghan/RTP/USEPA/US@EPA wrote: > I have a couple of questions concerning geomean. > > 1. I had thought that geomean was available with proc means and/or proc > summary. Both geomean and gmean as a keyword have produced an error. > Is geomean available as an additional stat? If so, what is the keyword? > > 2. The note in SAS help says that there is a fuzz factor used for when > the variable is 0 or close to 0. Does any one know what that fuzz > factor is or what close to 0 is defined as? > > > I guess technically this is four questions in a couple of areas.
Hey, you work for ORD too! Small world.
Okay:
[1] No and no. Geometric means aren't part of the keyword set. It isn't available as an additional stat.
[2] 'Close to zero' is what you get to define if you change the fuzz factor.
If you want the geometric means of variables X1, X2, ... you can get them through a sneaky trick I learned from an old post of Nat Wooding:
proc sql noprint; create table YourGeoMeans as select exp(mean(log(x1))) as g_x1, exp(mean(log(x2))) as g_x2, ..... from YourData; quit;
There's an old post of mine in the SAS-L archives which has even more on building geometric (and other) means:
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0109C&L=sas-l&P=R18556
If you don't use the SAS-L archives, you should. They're a great resource.
So... Why do you need to compute geometric means anyway?
David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician
|
|
 | | From: | omugeye | | Subject: | Re: geomean questions. | | Date: | 21 Jan 2005 13:51:19 -0800 |
|
|
 | I think Geometric Mean is not available in any of the PROCs. You might have to compute is yourself using a data step or something. Note that the "the geometric mean is the antilog of the arithmetic mean of the logs". See http://ftp.sas.com/techsup/download/sample/samp_lib/basesampComputes_Geometric_Means.html for details.
|
|
|