Abstract
Continuous variable dichotomization is a popular technique used in the estimation of the effect of risk factors on health outcomes in multivariate regression settings. Researchers follow this practice in order to simplify data analysis, which it unquestionably does. However thresholds used to dichotomize those variables are usually ad-hoc, based on expert opinions, or mean, median or quantile splits and can add bias to the effect of the risk factors on specific outcomes and underestimate such effect. In this paper, we suggest the use of a semi-parametric method and visualization for improvement of the threshold selection in variable dichotomization while accounting for mixture distributions in the outcome of interest and adjusting for covariates. For clinicians, these empirically based thresholds of risk factors, if they exist, could be informative in terms of the highest or lowest point of a risk factor beyond which no additional impact on the outcome should be expected.
Similar content being viewed by others
References
Altman, D.G., Lausen, B., Sauerbrei, W., Schumacher, M.: Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J. Natl Cancer Inst. 86, 829–835 (1994)
Austin, P.C., Brunner, L.J.: Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses. Stat. Med. 23, 1159–1178 (2004)
Bild, D.E., Bluemke, D.A., Burke, G.L., Detrano, R., Diez Roux, A.V., Folsom, A.R., Greenland, P., Jacob, D.R. Jr, Kronmal, R., Liu, K., Nelson, J.C., O’Leary, D., Saad, M.F., Shea, S., Szklo, M., Tracy, R.P.: Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol. 156, 871–881 (2002)
Braun, J.V., Braun, R.K., Muller, H.G.: Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika 87, 301–314 (2000)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)
Cumsille, F., Bangdiwala, S.I., Sen, P.K., Kupper, L.L.: Effect of dichotomizing a continuous variable on the model structure in multiple linear regression models. Commun. Stat. Theory Methods 29, 643–654 (2000)
D’Agostino, R.B., Vasan, R.S., Pencina, M.J., Wolf, P.A., Cobain, M., Kannel, W.B.: General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 117, 743–753 (2008)
Del Priore, G., Zandieh, P., Lee, M.J.: Treatment of continuous data as categoric variables in obstetrics and gynecology. Obstet. Gynecol. 89, 351–354 (1997)
Detrano, R., Guerci, A.D., Carr, J.J., Bild, D.E., Burke, G., Folsom, A.R., Liu, K., Shea, S., Szklo, M., Bluemke, D.A., O’Leary, D.H., Tracy, R., Watson, K., Wong, N.D., Kronmal, R.A.: Coronary calcium as a predictor of coronary events in four racial or ethnic groups. N. Engl. J. Med. 358(13), 1336–1345 (2008)
Duan, N.: Smearing estimate: a nonparametric retransformation method. J. Am. Stat. Assoc. 78, 605–610 (1983)
Duan, N., Manning, W.G., Morris, C.N., Newhouse, J.P.: A comparison of alternative models for the demand for medical care. J. Bus. Econ. Stat. 1(2), 115–126 (1983)
Efron, B.: Better bootstrap confidence intervals (with discussion). J. Am. Stat. Assoc. 82, 171–200 (1987)
George, G., Mallery, P.: SPSS for Windows Step by Step: A Simple Guide and Reference, 11.0 update. Allyn and Bacon, Boston (2003)
Graubard, B.I., Korn, E.L.: Predictive margins with survey data. Biometrics 55, 59–652 (1999)
Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman and Hall, London (1990)
Hayes, J.R., Hatch, J.A.: Issues in measuring reliability. Writ. Commun. 16, 354–367 (1999)
Howard, D.H., McGowan, J.E.: Initial and follow-up costs by treatment outcome for children with respiratory infections. Pediatrics 113, 1352–1356 (2004)
Kannel, W.B., Schatzkin, A.: Sudden death: lessons from subsets in population studies. J. Am. Coll. Cardiol. 5, 141B–149B (1985)
Manning, W.G., Morris, C.N., Newhouse, J.P. et al.: A two-part model of the demand for medical care. In: van der Gagg, J., Perlman, M. (eds.) Health, Economics, and Health Economics, Proceedings of the World Congress on Health Economics, North Holland Publishing Co., pp. 103–124 (1981)
McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman and Hall, London (1989)
Mullahy, J.: Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17, 247–281 (1998)
Nasir, K., Budoff, M.J., Wong, N.D., Scheuner, M., Herrington, D., Arnett, D.K., Szklo, M., Greenland, P., Blumenthal, R.S.: Calcification: multi-ethnic study of atherosclerosis (MESA) family history of premature coronary heart disease and coronary artery. Circulation 116, 619–626 (2007)
National Cholesterol Education Program: Executive Summary of the Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) JAMA 285: 2486–2497 (2001)
Nunnally, J.C., Bernstein, I.H.: Psychometric Theory. 3rd edn. McGraw-Hill, New York (1994)
O’Brien, S.M.: Cutpoint selection for categorizing a continuous predictor. Biometrics 60, 504–509 (2004)
Pawitan, Y.: Change-point problem. In: Armitage, P., Colton, T. (eds) Encyclopedia of Biostatistics, Wiley, New York (1998)
Pearson, K.: Contributions to the mathematical theory of evolution. Philos. Trans. A 185, 71–110 (1893)
Pletcher, M.J., Tice, J.A., Pignone, M., Browner, W.S.: Using the coronary artery calcium score to predict coronary heart disease events. Arch. Intern. Med. 164, 1285–1292 (2004)
Royston, P., Altman, D.G., Sauerbrei, W.: Dichotomizing continuous predictors in multiple regression: a bad idea. Stat. Med. 25, 127–141 (2006)
Scheuner, M.T., Setodji, C.M., Pankow, J.S., Blumenthal, R.S., Keeler, E.: Relation of familial patterns of coronary heart disease, stroke, and diabetes to subclinical atherosclerosis: the multi-ethnic study of atherosclerosis. Genet. Med. 10, 879–887 (2008)
Zhou, S., Shen, X.: Spatially adaptive regression splines and accurate knot selection schemes. J. Am. Stat. Assoc. 96, 247–259 (2001)
Acknowledgments
The authors would like to thank the MESA investigators and staff for their flexibility on the use of their data for this work and the participants of the MESA study for their valuable contributions. This work was supported by the National Heart, Lung, and Blood Institute Grant 1 R21 HL081175-01A1. MESA was supported by contracts N01-HC-95159 through N01-HC-95165 and N01-HC-95169 from the National Heart, Lung, and Blood Institute.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Setodji, C.M., Scheuner, M., Pankow, J.S. et al. A graphical method for assessing risk factor threshold values using the generalized additive model: the multi-ethnic study of atherosclerosis. Health Serv Outcomes Res Method 12, 62–79 (2012). https://doi.org/10.1007/s10742-012-0082-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-012-0082-1