Journal of Postgraduate Medicine
 Open access journal indexed with Index Medicus & EMBASE  
     Home | Subscribe | Feedback  

ORIGINAL ARTICLE
[View FULLTEXT] [Download PDF
 
Year : 2016  |  Volume : 62  |  Issue : 1  |  Page : 26-31  

Analysis of sparse data in logistic regression in medical research: A newer approach

S Devika1, L Jeyaseelan1, G Sebastian2 
1 Department of Biostatistics, Christian Medical College, Vellore, Tamil Nadu, India
2 Department of Statistics, St. Thomas College, Palai, Kerala, India

Correspondence Address:
L Jeyaseelan
Department of Biostatistics, Christian Medical College, Vellore, Tamil Nadu
India

Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs) with very wide 95% confidence interval (CI) (OR: >999.999, 95% CI: <0.001, >999.999). In this paper, we addressed this issue by using penalized logistic regression (PLR) method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13%) of the cases and in four (8.0%) of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0%) were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: <0.001, >999.999) whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48) using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86) times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41) using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell values.


How to cite this article:
Devika S, Jeyaseelan L, Sebastian G. Analysis of sparse data in logistic regression in medical research: A newer approach.J Postgrad Med 2016;62:26-31


How to cite this URL:
Devika S, Jeyaseelan L, Sebastian G. Analysis of sparse data in logistic regression in medical research: A newer approach. J Postgrad Med [serial online] 2016 [cited 2022 Sep 29 ];62:26-31
Available from: https://www.jpgmonline.com/article.asp?issn=0022-3859;year=2016;volume=62;issue=1;spage=26;epage=31;aulast=Devika;type=0


 
Thursday, September 29, 2022
 Site Map | Home | Contact Us | Feedback | Copyright  and disclaimer