Repeating a line of a data set for each line in another
Suppose you want to access the same information in every line of a data set, and that this information is data-dependent. For example, suppose you want to add the 25th, 50th, and 75th percentile of some variable to a data set, so that you can make a quartile indicator. There's likely some easy way to do this in sql, but I don't know sql. Here are three ways to do this using data steps.
Suppose our data set is t, and we need to add the quartiles of the variable x. First, find the 25th, 50th, and 75th percentiles and add them to a data set. (If you need less common perctiles, you can use proc univariate.
proc means q1 median q3 data = t;
var x;
output out = t2 q1 = q1 median=median q3 = q3;
run;
Now, we want to access those three percentile values in each line of t. The first two solutions involve adding the values to each obervation:
A. Use the point option to the set statement
data t3;
set t;
setme = 1;
set t2 point=setme;
quart = (x ne .) + (x gt q1) + (x gt median) + (x gt q3);
run;
This sets the t2 data set, with the percentile values, for each line of the t data set, with the raw data. The point option tells set to use the line of the t2 data set that's pointed to. It would be easier to say set t2 point = 1, but the point option requires a variable, so we create one with a constant value of 1.
B. Merge on a constant. (Not sure why you'd ever do this, if you know the other way!)
data t;
set t;
mb=1;
run;
data t2;
set t2;
mb=1;
run;
data t3;
merge t t2;
by mb;
run;
In this approach, we're tricking the merge statement by intentionally making the value of the merge by variable be constant in the original data set. This will generate a warning in the SAS log, and this approach requires two unnecessary data steps.
C. Use Macros and the call symput function. (Better than B, but still requires an extra data step.)
data _null_;
set t2;
call symput("quart1",q1);
call symput("quart2",median);
call symput("quart3",q3);
run;
data t3;
set t;
quart = (x ne .) + (x gt &quart1) + (x gt &median) + (x gt &quart3);
run;










