Missing Values in Data Analysis: Ignore or Impute?
Ng, Chong Guan, and Yusoff M.S.B., (2011) Missing Values in Data Analysis: Ignore or Impute? Education in Medicine Journal, 3 (1). pp. 6-11. ISSN 2180-1932
Official URL: http://saifulbahri.com/eimj/2011/06/20/volume-3-issue-1-jan-june-2011/
University of Malaya. Faculty of Medicine
Universiti Sains Malaysia. School of Medical Sciences
Objective: Missing values is commonly encountered in data analysis in all types of research. Various methods were introduced to handle this matter. This study aims to compare the result of using complete data analysis, missing indicator method, means substitution and single imputation in dealing with this issue.
Methods: 202 patients who were discharged from the psychiatric ward, University Malaya Medical Centre (UMMC) from 27th August 2007 to 15th April 2008 were recruited. The general psychopathology was measured with Brief Psychiatric Rating Scale (BPRS-24). The information on age, gender, race, marital status and psychiatric diagnosis were collected. On follow up, the patients who had early readmission (<6 months) were identified. A logistic regression model to determine early readmission based on all the variables was made. 10% (n=20) of the highest BPRS scores were deleted to simulate a missing at random (MAR) situation. Four different statistical methods were used to deal with the missing values.
Results: BPRS score was significantly associated with early readmission (p<0.01) in the original complete dataset. The associations based on complete data analysis, missing indicator method and mean substitution were biased and insignificant. Single imputation gave a closest significant estimate of the association (p<0.1).
Conclusion: Ignoring missing values will result in biased estimate in data analysis. Single imputation produced unbiased estimate of association in MAR situation.
Repository Staff Only: item control page