/* Monte Carlo Power estimation by Ken Kleinman */ /* Please contact ken.kleinman@gmail.com with */ /* observations, corrections, or complaints */ /* See http://support.sas.com/resources/papers/proceedings11/430-2011.pdf */ /* for a description of the macro */ /* Generates estimates of power for a range of sample sizes and alternatives, for one predictor, for linear, logistic, or Poisson regression. Predictors from a variety of possible distributions, including Bernoulli, Poisson, Normal, exponentional, and others. Results in a data set with estimated power and CI, plus optionally a graphic with an estimated power curve. */ /* Notes: This version allows linear, logistic, and poisson regression (with canonical links), as well as predictors with general distribution using the rand function, as long as they use 0 , 1 , or 2 parameters. This is an improvement over analytic solutions in the allowance of non-normal, non-dichotomous covariates. Logistic and linear regression with Normal and dichotomous predictors achieve results equivalent to analytic results. A probit regression is used to share information across sample sizes. */ /* The program is made up of two macros: GLMPOWER should not be used directly. It simulates data sets with a given alternative slope and given sample size, then fits the model to them and counts the number of times the CI for the parameter of interest excludes the user-input null value. MANYPOWER is what should be run. It calls GLMPOWER repeatedly, with different alternatives and sample sizes, then (optionally) makes a plot of the results. Uses the "a to b by c" syntax, except that the slope under the alternative can take values (and be incremented by amounts) less than 1 only by using a "divisor" term, due to limitations of SAS macro language. The output is raw data in data set probitpower; with key variables a) altslope (Alternative slope, RR or OR) b) nsubs (Number of subjects simulated) c) power (proportion rejections of the null for each altslope/nsubs combination d) ppower (with estimated values of power sharing information across number of subjects) e) loclpower and upclpower (confidence limits on ppower-- these will be narrower than the binomial CI for ppower if more than one sample size is assessed) options for MANYPOWER: Setting: outcomedist (default "NORMAL") defines the type of regression: "BERNOULLI" for logistic, "POISSON" for Poisson, "NORMAL" for linear scaley (default 1) = residual variance of the outcome, if outcome is normal; ignored otherwise xdist (deault "NORMAL") distribution of x, uses the rand function xloc = first parameter for the x distribution. MUST make this be . if there are no params. xscale = second paramater for the x dist. MUST make this be . if there is no second param Any distribution from the rand function can be used for the covariate, provided it uses 0, 1, or 2 parameters. The first parameter is xloc, the second is xscale. If only one parameter is needed, xscale must have a value of "." (without the quotes); if no parameter is needed, the both xloc and xscale must be "." . descending (default desc) if this has the value desc, then the proc genmod includes "desc" as an option-- no effect unless logistic regression to fit a logistic regression predicting probability = 0 (never needed????) use any other value Hypotheses: null (default 0) comparator for the estimated parameters in from the models intercept (default 0) intercept for the model generating the outcome data: THIS IS KEY FOR POISSON AND LOGISTIC (irrelevant for linear) alt1, alt2, alt3 = Truths under the alternative-- slope, OR, or OR, depending on the outcome. n1,n2, ... n10 = Number of subjects to simulate in each data set nreps = number of data sets to simulate per parameter combination Output: outputpower (default: YES) prints a table with the actual and smoothed estimated power for each alternative and n. Any other value will supress the table. Actual estimated power is called "power." Smoothed power is called "ppower." outputnsubs (default: YES) prints a table with the linearly interpolated n required for power = .1, ... , .9, within the range of the estimated power, for each alternative. CI also provided. estimated power and CI from the smoothed estimates. Any other value will supress the table. plotstyle (default: graphic) if this has a value of sg, the macro produces a sgpanel plot with a panel for each alternative slope if graphic, produces a proc gplot plot with a curve for each alternative if any other value, no plot is made (Note: if SAS 9.1.3 or less is used, proc sgpanel is not available, if SAS/GRAPH is not licensed/installed then proc gplot is not available) plotrows (default 2) number of rows if plotstyle = sg, otherwise ignored NOTES: Results are stored in temporary data set "probitpower" for direct use. Logistic: Input for the alternative the ODDS RATIO (per unit change in the predictor). (Log of this is beta in the model.) Input for the null is in terms of BETA. Intercept is the prob(Y=1 | x=0). Poisson: Input for the alternative the RELATIVE RATE. (Log of this is beta in the model.) Input for the null is in terms of BETA. Intercept is the mean of the Poisson when x=0. */ %macro glmpower(null=,altslope=,scaley=,xdist =, xloc=, xscale = , nsubjects=, nreps=, descending=, intercept=, outcomedist=,test=FALSE); /* First, simulate the data, &nreps times */ data simdata; slope = &altslope; scaley = &scaley; xdistlocation = &xloc; xdistscale = &xscale; inter = &intercept; nsubjects = &nsubjects; nreps = &nreps; do rep = 1 to nreps; do i = 1 to nsubjects; if %upcase(&xdist) = "EXPONENTIAL" then do; if xdistlocation ne . then x = xdistlocation*rand(&xdist); else x = rand(&xdist); end; else do; if xdistscale ne . then x = rand(&xdist,xdistlocation,xdistscale); else if xdistlocation ne . then x = rand(&xdist,xdistlocation); else x = rand(&xdist); end; if %upcase(&outcomedist) = "NORMAL" then do; linpred = inter + x * slope; y = linpred + (normal(0) * sqrt(scaley)); end; else if %upcase(&outcomedist) = "BERNOULLI" then do; expitinter = exp(inter)/(1 + exp(inter)); /* Is this used for anything? Maybe delete? */ logitinter = log(inter/(1-inter)); linpred = logitinter + x * log(slope); y = uniform(0) lt (exp(linpred)/(1 + exp(linpred))); end; else if %upcase(&outcomedist) = "POISSON" then do; loginter = log(inter); linpred = loginter + x * log(slope); y = ranpoi(0, exp(linpred)); end; output; end; end; run; /* Now, fit the model for each rep */ ods exclude all; ods output parameterestimates = paramskk ConvergenceStatus=converge; proc glimmix data = simdata ; by rep; model y %if %upcase(&descending) = DESC %then (desc) = x / solution cl dist= %if %upcase(&outcomedist) = "NORMAL" %then normal; %else %if %upcase(&outcomedist) = "BERNOULLI" %then bin; %else %if %upcase(&outcomedist) = "POISSON" %then poisson; ; run; run; quit; ods exclude none; data _null_; set converge; if status ne 0 then do; call symput('ftnote', 'Warning: At least one model did not converge'); end; run; %if "&ftnote" = 'Warning: At least one model did not converge' %then footnote "&ftnote"; %if &test=TRUE %then %do; proc print data = converge; where status ne 0; run; %put &ftnote; %end; /* count the number of times the lower CI limit is greater than the null, or upper CI is less than the null. This means we rejected the null. Why not just count the P for the slope being less than 0.05? Because the null might not be 0!*/ data sumreject; set paramskk end = eof; retain value 0; where effect = 'x'; if lower gt &null or upper lt &null then value = value + 1; if eof then output; run; /* add results from above proc univariate to the info on the simulation */ data result; set sumreject; reps = &nreps; nsubs = &nsubjects; altslope=&altslope; power = value/reps; run; %mend glmpower; /* testing glmpower directly %glmpower(null=0,altslope=.1,scaley=1,xdist ="NORMAL", xloc=0, xscale = 3, nsubjects=50, nreps=10, descending="desc", intercept=0, outcomedist="Normal"); proc print data = simdata (obs = 10); run; proc univariate data = simdata plot; var x; run; proc print data = paramskk; run; proc univariate data=paramskk; class parameter; var estimate stderr; run; proc print data=prep; where parameter = 'x'; run; proc print data = lockk; run; proc print data = result; run; ** below for testing only!; ods output parameterestimates = paramskk; proc glimmix data = simdata ; where rep = 1; model y = x / solution cl dist= poisson; ; run; run; quit; proc print data=paramskk; run; ods output parameterestimates = paramskk; proc genmod data = simdata ; where rep = 1; model y = x / dist= poisson; ; run; run; quit; proc print data=paramskk; run; */ %macro manypower(alt1=.,alt2=.,alt3=., n1=.,n2=.,n3=.,n4=.,n5=.,n6=.,n7=.,n8=.,n9=.,n10=., null=0,scaley=1, xdist="NORMAL",xloc=.,xscale =., nreps=,plotrows=2, descending = desc, intercept = 0, outcomedist = "NORMAL", plotstyle=graphic, outputpower=YES, outputnsubs=YES,test=FALSE); %let ftnote=; footnote ; /* because the log fills up fast with all of those notes */ %if &nreps > 501 %then options nonotes; ; /* heart of the macro-- calls glmpower multiple times */ data manyresults; set _null_; run; %do ii = 1 %to 10; %if &&n&ii ne . %then %do; %do i = 1 %to 3; /* change end of loop if more than 3 alts are allowed */ %if &&alt&i ne . %then %do; %glmpower(null = &null, scaley=&scaley,xdist = &xdist, xloc=&xloc, xscale=&xscale, nsubjects = &&n&ii, nreps = &nreps, altslope = &&alt&i, descending = &descending, intercept = &intercept, outcomedist = &outcomedist, test=&test); data manyresults; set manyresults result; keep value reps nsubs altslope power; run; %end; %end; %end; %end; /* below code makes nice output with estimated power fit by probit regression */ proc sort data = manyresults; by altslope; run; ods select none; proc logistic data = manyresults desc plots= none; model value/reps = nsubs/ link=probit; output out=probitpower pred=ppower upper=upclpower lower= lowclpower; by altslope; freq reps; run; ods select all; data a; set _null_; run; proc sort data = probitpower; by altslope ppower; run; %do i = 1 %to 3; /* change end of loop if more than 3 alts are allowed */ %if &&alt&i ne . %then %do; data a&i; retain x1 x2 y1 y2 lowx1 lowx2 lowy1 lowy2 upx1 upx2 upy1 upy2 .; set probitpower; where altslope = &&alt&i; do desiredpower = .1 to .9 by .1; if ppower lt desiredpower then do; x1=nsubs; y1=ppower; end; if lowclpower lt desiredpower then do; lowx1=nsubs; lowy1=lowclpower; end; if upclpower lt desiredpower then do; upx1=nsubs; upy1=upclpower; end; if ppower ge desiredpower and y2 lt desiredpower then do; x2=nsubs; y2=ppower; end; if lowclpower ge desiredpower and lowy2 lt desiredpower then do; lowx2=nsubs; lowy2=lowclpower; end; if upclpower ge desiredpower and upy2 lt desiredpower then do; upx2=nsubs; upy2=upclpower; end; if x1 ne . and (((x1 lt x2) and (y1 le desiredpower le y2)) or ((lowx1 lt lowx2) and (lowy1 le desiredpower le lowy2)) or ((upx1 lt upx2) and (upy1 le desiredpower le upy2))) then output; end; run; proc sort data=a&i nodupkey; by desiredpower x1 x2 y1 y2; run; data a; set a a&i; run; %end; %end; data est(keep=altslope desiredpower nsubsReq where=(nsubsReq ne .)) low(keep=altslope desiredpower lowclnsubsReq where=(lowclnsubsReq ne .)) up(keep=altslope desiredpower upclnsubsReq where=(upclnsubsReq ne .)); set a; if x1 lt x2 then do; m = (y2-y1)/(x2-x1); * Slope of line between 2 points closest to desiredpower; b = y1-m*x1; * Intercept of line between 2 points closest to desiredpower; nsubsReq = ceil((desiredpower-b)/m); * Number of subjects required for desiredpower; end; if lowx1 lt lowx2 then do; lowm = (lowy2-lowy1)/(lowx2-lowx1); * Slope of line between 2 points closest to desiredpower; lowb = lowy1-lowm*lowx1; * Intercept of line between 2 points closest to desiredpower; upclnsubsReq = ceil((desiredpower-lowb)/lowm); end; if upx1 lt upx2 then do; upm = (upy2-upy1)/(upx2-upx1); * Slope of line between 2 points closest to desiredpower; upb = upy1-upm*upx1; * Intercept of line between 2 points closest to desiredpower; lowclnsubsReq = ceil((desiredpower-upb)/upm); end; label upclnsubsReq='Upper limit on # of subjects required' lowclnsubsReq='Lower limit on # of subjects required'; run; proc sort data=est noduprec; by altslope desiredpower; run; proc sort data=low noduprec; by altslope desiredpower; run; proc sort data=up noduprec; by altslope desiredpower; run; data nsubs; merge est low up; by altslope desiredpower; if nsubsReq ne .; run; proc sort data = probitpower; by altslope nsubs; run; %if %upcase(&outputpower) = YES %then %do; proc print data=probitpower; run; %end; %if %upcase(&outputnsubs) = YES %then %do; proc print data=nsubs; run; %end; options notes; /* remainder draws the output, if requested */ title1 "nreps = &nreps, xloc = &xloc, xscale = &xscale "; title2 "x dist = " &xdist ", y dist = " &outcomedist; %if %upcase(&outcomedist) = "BERNOULLI" %then %let altlabel = "Alternative odds ratio"; %if %upcase(&outcomedist) = "POISSON" %then %let altlabel = "Alternative rate ratio"; %if %upcase(&outcomedist) = "NORMAL" %then %let altlabel = "Alternative slope"; %if &plotstyle = sg %then %do; ods html; proc sgpanel data = probitpower; panelby altslope / rows = &plotrows; series x = nsubs y = ppower / markerattrs = (color= white) ; rowaxis min=0 max=1 refticks values = (0 .8 .9 1); label altslope = &altlabel; run; ods html close; %end; %else %if &plotstyle = graphic %then %do; axis1 minor = none label=("Number of subjects"); axis2 minor = none order = (0,1) reflabel=("Power = .8" "Power = .9") label=("Power"); %do j = 1 %to 3; /* change end of loop if more than 3 alts are allowed */ %if &&alt&j ne . %then %do; symbol&j i = j v= none c= black l = &j w=2; %end; %end; proc gplot data = probitpower; plot ppower*nsubs=altslope/vref = .8,.9 haxis = axis1 vaxis = axis2; label altslope = &altlabel; run; quit; %end; ; %mend manypower; *options mprint mlogic symbolgen; * options nonotes; /* Test code: settings call for dichotomous predictor with p = .5, 100 data sets per alternative value vs null of log rate ratio = 0, 1000, 1100, 1200, 1400, 1700, and 2000 subjects per simulation, alternative rate ratios of 1.15, 1.2, 1.25, proc gplot output for a Poisson regression with a mean of 1 when x = 0 Results should be a sas/graph plot with three lines; top line near 1 across all sample sizes (RR=1.25), middle line with power from .9 to 1 (RR=1.2) lower line with power from about .7 to .9 (RR=1.15) %manypower(null = 0, scaley=1,xdist = "bernoulli", xloc=.5, xscale=., nreps = 100, n1=1000,n2=1100,n3=1200,n4=1400,n5=1700,n6=2000, alt1 = 1.15, alt2 = 1.20, alt3 = 1.25, plotrows=2,plotstyle=graphic, outcomedist="poisson", intercept = 1); */ /* %manypower(outcomedist="bernoulli", intercept=.5, xdist = "exponential", nreps = 100, n1=20, n5 = 100, alt1=2, alt2=1.5, plotstyle=none, outputpower=yes, outputnsubs=yes); %manypower(outcomedist="poisson", intercept = 1, xdist = "bernoulli", xloc=.5, nreps = 100, n1 = 1000, n2 = 1200, n3 = 1400, n4 = 1600, n5 = 1800, n6 = 2000, alt1 = 1.15, alt2 = 1.2, alt3 = 1.25, plotrows=2,plotstyle=graphic); %manypower(outcomedist="poisson", intercept = 1, xdist = "bernoulli", xloc=.5, nreps = 50, n1 = 500, n2 = 1000, n3 = 1500, n5 = 2500, alt1 = 1.15, alt2 = 1.18, plotrows=2,plotstyle=graphic,test=TRUE); options notes; options nomprint nomlogic nosymbolgen; proc print data = probitpower; run; /* ods graphics on; *ods html; proc sort data = manyresults; by altslope; run; ods select none; proc transreg data =manyresults solve ss2; by altslope; model identity(power) = mspline(nsubs / degree = 3); output out = kkout predicted; run; ods select all; proc print data = monsmooth; run; symbol1 i = j v = none c = black l = 1; symbol2 i = j v = none c = black l = 2; proc gplot data = monsmooth; plot ppower * nsubs = altslope; run; quit; symbol1 i = none v = dot c = black; symbol2 i = j v = none c = black l = 2; proc gplot data = kkout; by altslope; plot (power ppower) * nsubs /overlay; run; quit; */