Journal of Postgraduate Medicine
 Open access journal indexed with Index Medicus & ISI's SCI  
Users online: 2535  
Home | Subscribe | Feedback | Login 
About Latest Articles Back-Issues Articlesmenu-bullet Search Instructions Online Submission Subscribe Etcetera Contact
 ::   Next article
 ::   Previous article
 ::   Table of Contents

 ::   Similar in PUBMED
 ::  Search Pubmed for
 ::  Search in Google Scholar for
 ::Related articles
 ::   Citation Manager
 ::   Access Statistics
 ::   Reader Comments
 ::   Email Alert *
 ::   Add to My List *
 * Requires registration (Free)

 Article Access Statistics
    PDF Downloaded31    
    Comments [Add]    
    Cited by others 26    

Recommend this journal


Year : 2016  |  Volume : 62  |  Issue : 1  |  Page : 26-31

Analysis of sparse data in logistic regression in medical research: A newer approach

1 Department of Biostatistics, Christian Medical College, Vellore, Tamil Nadu, India
2 Department of Statistics, St. Thomas College, Palai, Kerala, India

Correspondence Address:
L Jeyaseelan
Department of Biostatistics, Christian Medical College, Vellore, Tamil Nadu
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/0022-3859.173193

Rights and Permissions

Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs) with very wide 95% confidence interval (CI) (OR: >999.999, 95% CI: <0.001, >999.999). In this paper, we addressed this issue by using penalized logistic regression (PLR) method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13%) of the cases and in four (8.0%) of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0%) were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: <0.001, >999.999) whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48) using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86) times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41) using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell values.


Print this article     Email this article

Online since 12th February '04
2004 - Journal of Postgraduate Medicine
Official Publication of the Staff Society of the Seth GS Medical College and KEM Hospital, Mumbai, India
Published by Wolters Kluwer - Medknow