top of page
Search
kelceyx9vm

Stata 14 Serial Number 166



You should also assess convergence of your imputation model. Thisshould be done for different imputed variables, but specifically for those variableswith a high proportion of missing (e.g. high FMI).Convergence of the imputation model means that DA algorithm has reached anappropriate stationary posterior distribution. Convergence for each imputedvariable can be assessed using trace plots. Trace plots are plots of estimatedparameters against iteration numbers. These plots can berequested using the saveptraceand mcmconly option.




Stata 14 Serial Number 166




This mcmconly option will simplyrun the MCMC algorithm for the same number of iterations it takes to obtain 10imputations without actually producing 10 imputed datasets. Is it typically used incombination with saveptrace or savewlf toexamine the convergence of the MCMC prior to imputation. No imputation isperformed with mcmconly is specified, so the options add or replace are not required with miimpute mvn.


Note that the trace file that is saved is not a true Stata dataset, but itcan be loaded as if they were using the mi ptrace use commandand its contents can be described without actually opening the file using thecommand mi ptrace describe. The trace file contains informationon imputation number, iteration number, regression coefficients, variances andcovariances.


Above is an example of two trace plots. There are two main things you want to note in a trace plot.First, assess whether the algorithm appeared to reach a stableposterior distribution by examining the plot to see if the predicted values remains relativelyconstant and that there appears to be an absence of any sort of trend(indicating a sufficient amount of randomness in the coefficients, covariancesand/or variances between iterations). In our case, this looksto be true. Second, you want to examine the plot to see how long it takes toreach this stationary phase. In the above example it looks to happen almostimmediately, as no observable pattern emerges, indicating good convergence. By default the burn-in period (number ofiterations before the first set of imputed values is drawn) is 100. This can be increasedif it appears that proper convergence is not achieved using the burninoption.


In the graph below, the x-axis shows the lag, that is the distance between agiven iteration and the iteration it is being correlated with, on the y-axis isthe value of the correlations. In the plot you can seethat the correlation is high when the mcmc algorithm starts but quickly goesto near zero after a few iterations indicating almost no correlation betweeniterations and therefore no correlation between values in adjacent imputeddatasets. By default Stata, draws an imputed dataset every 100 iterations, ifcorrelation appears high for more than that, you will need to increase thenumber of iterations between imputed datasets using the burnbetween option. Take a look at the Stata 15 mi impute mvndocumentation for more information about this and other options.


2. Selecting the number of imputations (m) Historically, therecommendation was for three to five MI datasets. Relatively low values of m maystill be appropriate when the fraction of missing information is low and the analysistechniques are relatively simple. Recently, however, larger values of m are often being recommended. To some extent, this change in the recommendednumber of imputations is based on the radical increase in the computing poweravailable to the typical researcher, making it more practical to run, create andanalyze multiply imputeddatasets with a larger number of imputations. Recommendations for the number ofm vary. For example, five to twenty imputations for low fractions of missinginformation, and as many as 50 (or more) imputations when the proportion ofmissing data is relatively high. Remember that estimates of coefficients stabilizeat much lower values of m than estimates of variances and covariances of errorterms (i.e., standard errors). Thus, in order to get appropriate estimates ofthese parameters, you may need to increase the m. A larger number of imputations may also allowhypothesis tests with less restrictive assumptions (i.e., that do not assumeequal fractions of missing information for all coefficients). Multiple runs ofmimputations are recommended to assess the stability of the parameter estimates.


The previous article showed how to perform heteroscedasticity tests of time series data in STATA. It also showed how to apply a correction for heteroscedasticity so as not to violate the Ordinary Least Squares (OLS) assumption of constant variance of errors. This article shows a testing serial correlation of errors or time series autocorrelation in STATA. An autocorrelation problem arises when error terms in a regression model correlate over time or are dependent on each other.


However, STATA does not provide the corresponding p-value. To obtain the Durbin-Watson test statistics from the table conclude whether the serial correlation exists or not. Download the Durbin Watson D table here.


The Breusch-Godfrey LM test has an advantage over the classical Durbin-Watson D test. The Durbin-Watson test relies upon the assumption that the distribution of residuals is normal whereas the Breusch-Godfrey LM test is less sensitive to this assumption. Another advantage of this test is that it allows researchers to test for serial correlation through a number of lags besides one lag which is a correlation between the residuals between time t and t-k (where k is the number of lags). This is unlike the Durbin-Watson test which allows testing for only correlation between t and t-1. Therefore if k is 1, then the results of the Breusch-Godfrey test and Durbin-Watson test will be the same.


Since from the above table, chi2 is less than 0.05 or 5%, the null hypothesis can be rejected. In other words, there is a serial correlation between the residuals in the model. Therefore correct for the violation of the assumption of no serial correlation.


Based on data simulated and analyzed using linear mixed-effects models, we evaluated the distribution of attained powers under different scenarios with varying intraclass correlation coefficient (ICC) of the responses, coefficient of variation (CV) of the cluster sizes, number of cluster-size groups, distributions of group sizes, and number of clusters. We explored the relationship between attained power and two allocation characteristics: the individual-level correlation between treatment status and time period, and the absolute treatment group imbalance. When computational time was excessive due to a scenario having a large number of possible allocations, we developed regression models to predict attained power using the treatment-vs-time period correlation and absolute treatment group imbalance as predictors.


The risk of attained power falling more than 5% below the expected or nominal power decreased as the ICC or number of clusters increased and as the CV decreased. Attained power was strongly affected by the treatment-vs-time period correlation. The absolute treatment group imbalance had much less impact on attained power. The attained power for any allocation was predicted accurately using a logistic regression model with the treatment-vs-time period correlation and the absolute treatment group imbalance as predictors.


In a stepped-wedge trial with unequal cluster sizes, the risk that randomization yields an allocation with inadequate attained power depends on the ICC, the CV of the cluster sizes, and number of clusters. To reduce the computational burden of simulating attained power for allocations, the attained power can be predicted via regression modeling. Trial designers can reduce the risk of low attained power by restricting the randomization algorithm to avoid allocations with large treatment-vs-time period correlations.


In this study, we investigated how different allocation characteristics interacted with design factors to affect attained power and the risk of obtaining low attained power in cross-sectional SW-CRTs. This risk can be assessed by constructing the pre-randomization power distribution (PD) [24], defined as the distribution of attained powers obtained from all possible allocations that a randomization algorithm can generate. A good randomization algorithm will ensure that the risk of obtaining a low attained-power allocation is acceptably small. Identifying such an algorithm requires an understanding of what factors cause low attained power. The first aim of this work was to gain an understanding of the factors that affect the risk of low attained power. We used simulation to evaluate the attained power across different allocations under a wide variety of scenarios. While it is possible to assess the attained power using approximate analytic formulae (e.g., Hussey & Hughes [2]), the accuracy of those formulae has not been well investigated in the context of SW-CRTs, especially when the number of clusters or the cluster sizes are low. However, the computational time needed to simulate the attained powers for all possible allocations often is not feasible. Hence, our second aim was to develop regression models that could accurately predict the attained power for any allocation based on allocation characteristics (specifically TGI and TTC) and the attained powers from a sample of allocations. The results of this work provide guidance on how to assess attained power and avoid having an unacceptably low attained power when designing a SW-CRT with unequal cluster sizes.


However, the evaluation of attained power for all potential allocations can be computationally challenging, given the potentially huge number of possible allocations. For example, a twenty-cluster cross-sectional SW-CRT with unique cluster sizes and four clusters transitioning at each of five steps has more than 300 billion unique allocations. 2ff7e9595c


0 views0 comments

Recent Posts

See All

Comments


bottom of page