knowledge-database (beta)

Current group: comp.soft-sys.sas

Re: possible in one step?

Re: possible in one step?  
Michael Murff
From:Michael Murff
Subject:Re: possible in one step?
Date:22 Jan 05 01:36:02 GMT
Okay, my first post on this problem was poorly motivated and not
sufficiently explained. Richard's appraoch looks promising, but it seems

to read the large file too many times given that I have to get many
thousands of instances of the SMALL. The point of SMALL is to run 12
dozen
regressions on it, so it has to be a unique subset of LARGE. The overall
program
will rull close to two million regs. Note that the sample code below
reads LARGE
and each instance of SMALL into memory to try and reduce I/O operations.

So let me try again. The big picture is that I have a
table LARGE (15K rows, 240 col); the first two rows of large will become
static variables that will be used in every instance of SMALL.

LARGE table contains a 20 year time series of monthly stock returns. An
observation is defined as a particular stock and its returns over time.
I
need to create a small dataset to be used in an analytic proc later on.
SMALL needs to contain all the data for one security (one row from
LARGE).
The SMALL dataset is then passed a proc, say PROC REG. When the analysis
for
that one "row" is complete then I need to go to the next one until all
15K
are complete.

Before I can pass the "row" from LARGE into the PROC I need to create
some
new variables based on some information from the first two rows of
LARGE. So
the first two rows of LARGE are transposed and concatenated onto every
instance of SMALL, while the ret_i is the transposed row of LARGE that
is of
interest.

%macro mockdata;

%let NMOS = 240;
%let NOBS = 2000;
%let SEED = 20050121;

/* mock up LARGE dataset */
data LARGE;
do head = 1 to 2;
id = head;
%local i;
%do i = 1 %to &NMOS;
ret&i = round (10 * ranuni(&seed), 0.0001);
%end;
output;
end;

do row = 1 to &NOBS;
id = row;
%local j;
%do j = 1 %to &NMOS;
ret&j = round (10 * ranuni(&seed), 0.0001);
%end;
output;
end;
format ret: 7.4;
drop row head;
run;

%mend;

%mockdata

data fixed;
set large(obs=2);
run;

proc transpose data=fixed out=mf prefix=ret; by id;
run;

data mf_rets;
set mf(keep=ret1 rename=(ret1=ret_m) firstobs=1 obs=240);
set mf(keep=ret1 rename=(ret1=ret_f) firstobs=241);
x1 = ret_m - ret_f;
x2 = x1*x1;
run;

sasfile work.large.data open;

%macro smalltoproc;
%do i = 1 %to 1; * (&NOBS - 2);

data small(keep=ret);
set large ( firstobs=&i obs=&i);
array mvwindow{*} ret1-ret240;
do j=1 to dim(mvwindow);
ret = mvwindow(j);
output;
end;
run;

data small2;
set small;
set mf_rets(drop=ret_m);
ret_adj = ret - ret_f;
run;

%end;

/* run rolling regressions on small2 dataset */
/* more code */
/*
sasfile small2.data open;

%do roll=1 %to 71;
proc reg data=small2(firstobs=&roll obs=%eval(&roll + 60)
outest=results;
%end;

%do roll=1 %to 71;
proc reg data=small2(firstobs=%eval(&roll + 61)
obs=%eval(&roll + 120) outest=results;
%end;

sasfile small2.data close;
*/

%mend;

%smalltoproc

sasfile work.large.data close;

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Richard
A. DeVenezia
Sent: Friday, January 21, 2005 11:52 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: possible in one step?

Michael Murff wrote:
> Hi SAS-L,
>
>
>
> I need to create a dataset, call it SMALL, where I am plucking one
> row off a very LARGE table, and transposing this row on the fly with
> a looping array and explicit output statment. I need to merge several
> columns from another dataset FIXED. Then I need to create some new
> variables that are the differences. So my final table contains one
> transposed row of LARGE and several variables from FIXED, and some
> difference variables using variables from both original datasets.
>
> This has to be done literally millions of times so efficiency is
> important. Could any one take a stab at reducing this to one step.

Why? Are you doing it for every row of LARGE ?

>
> Michael Murff's
-----------------------
%macro noname;

%do outerloop=1 %to &LARGENO;

data small(keep=ret_i cusip month);
set returns (keep=cusip ret&beg-ret&end firstobs=1 obs=1);
array mvwindow{*} ret&beg-ret&end;

do j=1 to dim(mvwindow);
month = compress('month'||put(j,3.));
ret_i = mvwindow(j);
output;
end;
run;

data small2;
set small;
set mf_rets(drop=ret_m);
ret_adj = ret_i - ret_f;
run;

%end;

%mend;
-----------------------

Suppose LARGENO = 1e8. Why loop 1e8 times and do nothing with small
beyond
making small2 (which gets overwritten each iteration)?
Do I presume RETURNS is the LARGE table ? Why pluck the first row
LARGENO
times ?

How large is mf_rets ? one row ? dim(mvwindow) rows ?

'month'||put(j,3.)... extraneous, not efficient, j is j, whether it is a
number or a text string containing the number as ascii codes.
values 'month1' - 'month100' are non-useful (inefficient)
transformations of
1-100

If you are transposing every row of LARGE, I would not use macro (or
maybe
barely).
Stuff mf_rets in an array.

maybe something like this.

DATA LARGET;

do _n_ = 1 by 1 until (arrayloaded);
set mf_rets end = arrayloaded;
array mf_rets (size large enough to contain all row of mf_rets)
_temporary_;
mf_rets[_n_] = ret_f;
end;

do _n_ = 1 by 1 until (largedumped);
set LARGE end=largedumped;
array mywindow {*} ret&beg - ret&end;
do enum = 1 to dim(mywindow);
return_adj = mywindow [enum] - mf_rets[enum];
end;
end;

keep largeid enum return_adj;

stop;
run;

--
Richard A. DeVenezia
http://www.devenezia.com/

>>> "Richard A. DeVenezia" 01/21/05 11:51 AM
>>>
Michael Murff wrote:
> Hi SAS-L,
>
>
>
> I need to create a dataset, call it SMALL, where I am plucking one
> row off a very LARGE table, and transposing this row on the fly with
> a looping array and explicit output statment. I need to merge several
> columns from another dataset FIXED. Then I need to create some new
> variables that are the differences. So my final table contains one
> transposed row of LARGE and several variables from FIXED, and some
> difference variables using variables from both original datasets.
>
> This has to be done literally millions of times so efficiency is
> important. Could any one take a stab at reducing this to one step.

Why? Are you doing it for every row of LARGE ?

>
> Michael Murff's
-----------------------
%macro noname;

%do outerloop=1 %to &LARGENO;

data small(keep=ret_i cusip month);
set returns (keep=cusip ret&beg-ret&end firstobs=1 obs=1);
array mvwindow{*} ret&beg-ret&end;

do j=1 to dim(mvwindow);
month = compress('month'||put(j,3.));
ret_i = mvwindow(j);
output;
end;
run;

data small2;
set small;
set mf_rets(drop=ret_m);
ret_adj = ret_i - ret_f;
run;

%end;

%mend;
-----------------------

Suppose LARGENO = 1e8. Why loop 1e8 times and do nothing with small
beyond
making small2 (which gets overwritten each iteration)?
Do I presume RETURNS is the LARGE table ? Why pluck the first row
LARGENO
times ?

How large is mf_rets ? one row ? dim(mvwindow) rows ?

'month'||put(j,3.)... extraneous, not efficient, j is j, whether it is a
number or a text string containing the number as ascii codes.
values 'month1' - 'month100' are non-useful (inefficient)
transformations of
1-100

If you are transposing every row of LARGE, I would not use macro (or
maybe
barely).
Stuff mf_rets in an array.

maybe something like this.

DATA LARGET;

do _n_ = 1 by 1 until (arrayloaded);
set mf_rets end = arrayloaded;
array mf_rets (size large enough to contain all row of mf_rets)
_temporary_;
mf_rets[_n_] = ret_f;
end;

do _n_ = 1 by 1 until (largedumped);
set LARGE end=largedumped;
array mywindow {*} ret&beg - ret&end;
do enum = 1 to dim(mywindow);
return_adj = mywindow [enum] - mf_rets[enum];
end;
end;

keep largeid enum return_adj;

stop;
run;


--
Richard A. DeVenezia
http://www.devenezia.com/
   

Copyright © 2006 knowledge-database   -   All rights reserved