Title: Statistical models for predicting number of involved nodes in breast cancer patients.
Authors: Dwivedi, Alok Kumar; Dwivedi, Sada Nand; Deo, Suryanarayana; Shukla, Rakesh; Kopras, Elizabeth
Published In Health (Irvine Calif), (2010 Jul)
Abstract: Clinicians need to predict the number of involved nodes in breast cancer patients in order to ascertain severity, prognosis, and design subsequent treatment. The distribution of involved nodes often displays over-dispersion-a larger variability than expected. Until now, the negative binomial model has been used to describe this distribution assuming that over-dispersion is only due to unobserved heterogeneity. The distribution of involved nodes contains a large proportion of excess zeros (negative nodes), which can lead to over-dispersion. In this situation, alternative models may better account for over-dispersion due to excess zeros. This study examines data from 1152 patients who underwent axillary dissections in a tertiary hospital in India during January 1993-January 2005. We fit and compare various count models to test model abilities to predict the number of involved nodes. We also argue for using zero inflated models in such populations where all the excess zeros come from those who have at some risk of the outcome of interest. The negative binomial regression model fits the data better than the Poisson, zero hurdle/inflated Poisson regression models. However, zero hurdle/inflated negative binomial regression models predicted the number of involved nodes much more accurately than the negative binomial model. This suggests that the number of involved nodes displays excess variability not only due to unobserved heterogeneity but also due to excess negative nodes in the data set. In this analysis, only skin changes and primary site were associated with negative nodes whereas parity, skin changes, primary site and size of tumor were associated with a greater number of involved nodes. In case of near equal performances, the zero inflated negative binomial model should be preferred over the hurdle model in describing the nodal frequency because it provides an estimate of negative nodes that are at "high-risk" of nodal involvement.
PubMed ID: 21720569
MeSH Terms: No MeSH terms associated with this publication