On the Student-t Mixture Inverse Gaussian Modelwith an Application to Protein Production

Sobre el modelo gaussiano inverso mezclado t-Student y una aplicaci\'{o}n a producci\'{o}n de prote\'{i}nas

ANTONIO SANHUEZA1, VÍCTOR LEIVA2, LILIANA LÓPEZ-KLEINE3

1Universidad de La Frontera, Departamento de Matem\'{a}tica y Estad\'{\i}stica, Temuco, Chile. Professor. Email: asanhueza@ufro.cl
2Universidad de Valpara\'{\i}so, Departamento de Estad\'{\i}stica, CIMFAV, Valpara\'{\i}so, Chile. Professor. Email: victor.leiva@uv.cl
3Universidad Nacional de Colombia, Departamento de Estad\'{i}stica, Bogot\'{a}, Colombia. Assistant professor. Email: llopezk@unal.edu.co


Abstract

In this article, we introduce a mixture inverse Gaussian (MIG) model based on the Student-t distribution and apply it to bacterium-based protein production for food industry. This model is mainly useful to describe data that follow positively skewed distributions and accommodate atypical observations in a better way than its classical version. Specifically, we present a characterization of the MIG-t distribution. In addition, we carry out a hazard analysis of this distribution centered mainly on its hazard rate. Furthermore, we discuss the maximum likelihood method, which produces--in this case--robust parameter estimates. Moreover, to evaluate the potential influence of atypical observations, we produce a diagnostic analysis for the model. Finally, we apply the obtained results to novel bacterium-based protein production data and statistically compare two types of protein producers using the likelihood ratio test based on the MIG-t model as an alternative methodology to the procedures available until now. This fact is very important, since the evaluation of protein production using both constructions allows practitioners to choose the most productive one before the bacterial culture is scaled to an industrial level.

Key words: Distribution mixture, Length-biased, Likelihood methods, distributions, R computer language.


Resumen

En este art\iculo, introducimos un modelo Gaussiano inverso (MIG) mezclado basado en la distribuci\on t-Student y lo aplicamos a la producci\on de prote\inas basada en bacterias para la industria de alimentos. Este modelo es especialmente \util para describir datos que siguen una distribuci\on con sesgo positivo ya que permite acomodar observaciones at\ipicas de mejor forma que su versión cl\asica. Espec{i}ficamente, presentamos una caracterizaci\on de la distribución MIG-t y realizamos un an\alisis de confiabilidad de esta distribuci\on centrado principalmente en la tasa de fallas. También, discutimos el m\etodo de verosimilitud m\axima, el cual proporciona en este caso estimaciones robustas de los par\ametros del modelo. Con el fin de evaluar la influencia potencial de observaciones at\ipicas, proponemos un an\alisis de diagn\ostico para la distribuci\on. Finalmente, aplicamos los resultados obtenidos al análisis de datos nuevos de producci\on de prote\ina basada en bacterias utilizada en la industria de alimentos y comparamos estadísticamente dos tipos de bacterias productoras usando la prueba de raz\on de verosimilitudes basada en el modelo MIG-t como una metodolog\ia alternativa a los procedimientos disponibles a la fecha. Este punto es muy importante, ya que la evaluaci\on de producci\on de prote\inas usando dos construcciones distintas permite a los investigadores escoger el tipo m\as productivo antes de proceder al cultivo industrial a gran escala.

Palabras clave: distribuciones de largo sesgado, lenguaje de computaci\'{o}n R, m\'{e}todos de verosimilitud, mezcla de distribuciones.


Texto completo disponible en PDF


References

1. Arnold, B. C., Balakrishnan, N. & Nagaraja, H. N. (1992), A First Course in Order Statistics, John Wiley and Sons, New York.

2. Balakrishnan, N., Leiva, V., Sanhueza, A. & Cabrera, E. (2009), `Mixture inverse Gaussian distribution and its transformations, moments and applications´, Statistics 43, 91-104.

3. Byrd, R. H., Lu, P., Nocedal, J. & Zhu, C. (1995), `A limited memory algorithm for bound constrained optimization´, SIAM Journal on Scientific Computing 16, 1190-1208.

4. Chhikara, R. S. & Folks, J. L. (1989), The Inverse Gaussian Distribution, Marcel Dekker, New York.

5. Cook, R. D. (1986), `Assessment of local influence (with discussion)´, Journal of The Royal Statistical Society Series B-Statistical Methodology 48, 133-169.

6. Cook, R. D. & Weisberg, S. (1982), Residuals and Influence in Regression, Chapman & Hall, London.

7. Efron, B. & Hinkley, D. (1978), `Assessing the accuracy of the maximum likelihood estimator: Observed vs. expected Fisher information´, Biometrika 65, 57-487.

8. Folks, J. L. (2007), Inverse Gaussian distribution, `The Encyclopedia of Statistical Sciences´, Vol. 6, John Wiley & Sons, New York, p. 3681-3682.

9. Gupta, R. C. & Akman, H. O. (1995), `On the reliability studies of the weighted inverse Gaussian model´, Journal of Statistical Planning and Inference 48, 69-83.

10. Gupta, R. C. & Kirmani, S. (1990), `The role of weighted distributions in stochastic modeling´, Communications in Statistics: Theory and Methods 19, 3147-3162.

11. Johnson, N. L., Kotz, S. & Balakrishnan, N. (1994), Continuous Univariate Distributions, Vol. 1, John Wiley and Sons, New York.

12. Johnson, N. L., Kotz, S. & Balakrishnan, N. (1995), Continuous Univariate Distributions, Vol. 2, John Wiley and Sons, New York.

13. Jorgensen, B. (1982), Statistical Properties of the Generalized Inverse Gaussian Distribution, Springer, Heidelberg.

14. Jorgensen, B., Seshadri, V. & Whitmore, G. (1991), `On the mixture of the inverse Gaussian distribution with its complementary reciprocal´, Scandinavian Journal of Statistics 18, 77-89.

15. Kotz, S., Leiva, V. & Sanhueza, A. (2010), `Two new mixture models related to the inverse Gaussian distribution´, Methodology and Computing in Applied Probability 12, 199-212.

16. Lange, K. L., Little, J. A. & Taylor, M. G. J. (1989), `Robust statistical modeling using the t distribution´, Journal of the American Statistical Association 84, 881-896.

17. Le-Loir, Y., Nouaille, S., Commissaire, J., Bretigny, L., Gruss, A. & Langella, P. (2001), `Signal peptide and propeptide optimization for heterologous protein secretion in lactococcus lactis´, Applied and Environmental Microbiology 67, 4119-2127.

18. Leiva, V., Riquelme, M., Balakrishnan, N. & Sanhueza, A. (2008), `Lifetime analysis based on the generalized Birnbaum-Saunders distribution´, Computational Statistics and Data Analysis 21, 2079-2097.

19. Leiva, V., Sanhueza, A. & Angulo, J. M. (2009), `A length-biased version of the Birnbaum-Saunders distribution with application in water quality´, Stochastic Environmental Research and Risk Assessment 23, 299-307.

20. Leiva, V., Sanhueza, A., Sen, P. K. & Araneda, N. (2010), `M-procedures in the general multivariate nonlinear regression model´, Pakistan Journal of Statistics 26, 1-13.

21. Lucas, A. (1997), `Robustness of the student t based m-estimator´, Communications in Statistics: Theory and Methods 26, 1165-1182.

22. Marshall, A. W. & Olkin, I. (2007), Life Distributions, Springer Verlag, New York.

23. McLachlan, G. J. & Peel, D. (2000), Finite Mixture Models, John Wiley and Sons, New York.

24. Montgomery, D. C., Peck, E. A. & Vining, G. G. (2001), Introduction to Linear Regression Analysis, Third edn, John Wiley and Sons, New York.

25. Patil, G. P. (2002), Weighted distributions, `Encyclopedia of Environmetrics´, Vol. 4, John Wiley & Sons, Chichester, p. 2369-2377.

26. R Development Core Team, (2009), R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. *http://www.R-project.org

27. Sanhueza, A., Sen, P. K. & Leiva, V. (2009), `A robust procedure in nonlinear models for repeated measurements´, Communications in Statistics: Theory and Methods 38, 138-155.

28. Saunders, S. C. (2007), Reliability, Life Testing and Prediction of Services Lives, Springer, New York.

29. Schrodinger, E. (1915), `Zur theorie der fall-und steigversucheand teilchen mit brownscher bewegung´, Physikalische Zeitschrift 16, 289-95.

30. Seshadri, V. (1993), The Inverse Gaussian Distribution: A Case Study in Exponential Families, Clarendon Press, New York.

31. Seshadri, V. (1999), The Inverse Gaussian Distribution: Statistical Theory and Applications, Springer, New York.

32. Simoes-Barbosa, A., Abreu, H., Silva-Neto, A., Gruss, A. & Langella, P. (2004), `A food-grade delivery system for lactococcus lactis and evaluation of inducible gene expression´, Applied Microbiology and Biotechnology 65, 61-67.

33. Tweedie, M. C. K. (1957), `Statistical properties of the inverse Gaussian distribution - I´, Annals of Mathematics Statistical 28, 362-377.

34. Wald, A. (1947), Sequential Analysis, John Wiley and Sons, New York.


[Recibido en octubre de 2010. Aceptado en marzo de 2011]

Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:

@ARTICLE{RCEv34n1a09,
    AUTHOR  = {Sanhueza, Antonio and Leiva, Víctor and López-Kleine, Liliana},
    TITLE   = {{On the Student-t Mixture Inverse Gaussian Modelwith an Application to Protein Production}},
    JOURNAL = {Revista Colombiana de Estadística},
    YEAR    = {2011},
    volume  = {34},
    number  = {1},
    pages   = {177-195}
}