Song Lab

Department of Biostatistics


Assessing Effect of Environmental Exposure to Toxicants on Human Growth Dynamics


Emerging areas of research in environmental health sciences and the ascendancy of new interdisciplinary approaches have led to the development of new hypotheses and technologies to collect massive complex data that present great data analysis challenges. Substantially motivated by analytic needs arising from a major environmental health research center at this University of Michigan (U-M), the primary goal is to develop a set of novel statistical models and efficient algorithms to evaluate, interpret, and predict impacts of prenatal and/or postnatal exposures to environmental risk factors for adverse child health and developmental outcomes.

Funded by an NIH methodology R01 grant, the lab aims to (i) develop semiparametric models and algorithms to evaluate the influence of prenatal and/or postnatal exposures to toxic mixtures on delayed somatic growth and sexual maturation during the adolescent period; and (ii) develop semiparametric stochastic models to evaluate the functional rate of growth changes during the 0-5 year age-period and its potential alterations driven by exposure mixtures. Stochastic differential equations are utilized to model both growth velocity and acceleration as functions of anthropometric characteristics and toxicant mixtures, and the resulting model helps investigators to study various child growth milestones such as timing of velocity peak and adiposity (BMI) rebound.

  1. Baek, J., Zhu, B. and Song, P.X.-K.(2018). Bayesian analysis of infant's growth dynamics with in utero exposure to environmental toxicants. Annals of Applied Statistics (in press)
  2. Zhou, L., Li, H., Lin, H. and Song, P.X.-K. (2018). Evaluating functional covariate-environment interactions in the Cox regression model. Canadian Journal of Statistics (in press)
  3. Perng, W., Baek, J., Zhou, C.W., Cantoral, A., Tellez-Rojo, M.M., Song, PX.K. and Peterson, K. E. (2018). Associations of the infancy body mass index peak with anthropometry and cardiometabolic risk in Mexican adolescents. Annals of Human Biology (in press)
  4. Ma, S. and Song, PXK. (2015). Varying index coefficient models. Journal of American Statistical Association 110, 341-356.
  5. Zhu, B., Taylor, J.M.G. and Song, P.X.-K. (2011). Semiparametric stochastic modeling of the rate function in longitudinal studies. Journal of the American Statistical Association 106, 1485-1495.

Data Integration and Data Harmonization


With the support from an NIH methodology R01 grant, the lab is devoted to the development of novel statistical methods that enables to assess the validity of merging longitudinal cohort data and to perform joint analysis of merged data. Merging longitudinal data sets from multiple cohorts helps yield a desirable age spectrum in growth analyses but complicated by the underlying heterogeneity across different age cohorts. An efficient fused LASSO algorithm is proposed to detect potential inter-cohort heterogeneities that are subsequently accounted for to reach a valid joint analysis of merged data.

  1. Tang, L, Zhou, L and Song, P.X.-K. (2018). Fusion learning algorithm to combine partially heterogeneous Cox models. Computational Statistics (in press).
  2. Wang, F., Wang, L. and Song, P.X.-K. (2016). Fused Lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics 72, 1184-1193.
  3. Tang, L, and Song, P.X.-K. (2016). Fused lasso in regression coefficients clustering – Learning parameter heterogeneity in data integration. Journal of Machine Learning Research 17, 1-23.
  4. Wang, F., Song, P.X.K. and Wang, L. (2015). Merging multiple longitudinal studies with study-specific missing covariates: A joint estimating function approach. Biometrics 71, 929-940.
  5. Wang, F., Wang, L. and Song, P.X.-K. (2012). Quadratic inference function approach to merging longitudinal studies: Validation test and joint estimation. Biometrika 99,748-754.

Kidney Paired Donation


An evolving strategy, known as Kidney Paired Donation (KPD), provides an approach to overcome the barriers faced by many patients with kidney failure who present with willing, but immunologically or blood type incompatible living donors. KPD programs use a computerized algorithm to match one incompatible donor/recipient pair to another pair with a complementary incompatibility, such that the donor of the first pair gives to the recipient of the second, and vice versa. More complex exchanges of organs involving three or more pairs are also considered, as are altruistic or non-directed donors (NDD) who donate a kidney voluntarily and thereby have the potential to create a chain of kidney transplants. Such chains have become increasingly important in KPD programs.

Funded by an NIH grant (renewed), the lab has developed important methods based on sets with fallback options, including extensions to incorporate various features of current KPD pools including partially directed donors, candidates with multiple incompatible donors, compatible donor-candidate pairs, deceased donor initiated chains, and donors from pools with differing genetic makeups. In addition, the lab is currently developing efficient algorithms to enumerate subsets of interest in a KPD pool and to evaluate the expected utility that such subsets would attain if selected.

  1. Wang, W., Bray, M., Song, P.X.K. and Kalbfleisch, J.D. (2018). An efficient algorithm to enumerate sets with fallbacks in a kidney paired donation program. Operations Research in Health Care (in press).
  2. Bray, M., Wang, W., Song, P.X.-K. and Kalbfleisch, J.D. (2018). Valuing sets of potential transplants in a kidney paired donation network. Statistics in Biosciences 10, 255-279.
  3. Ashby, V.B., Leichtman, A.B. Rees, M.A., Song, P.X.-K., Bray, M., Wang, W., Kalbfleisch, J.D. (2017). A kidney graft survival calculator that accounts for mismatches in age, sex, HLA, and body size.Clinical Journal of the American Society of Nephrology 12(7): 1148-1160.
  4. Wang, W., Bray, M., Song, P.X.-K. and Kalbfleisch, J.D. (2016). A look-ahead strategy for non-directed donors in kidney paired donation. Statistics in Biosciences 9(2): 453-469.
  5. Bray, M., Wang, W., Song, P.X.K., Leichtman, A. B., Rees, M. A., Ashby, V. B., Eikstadt, R., Goulding, A. and Kalbfleisch, J. D. (2015). Planning for uncertainty and fallbacks can increase the number of transplants in a kidney paired donation program. American Journal of Transplantation 15, 2636-2645.
  6. Li, Y., Song, PXK, Leichtman, A.B., Rees, M.A. and Kalbfleisch, J.D. (2013). Decision making in kidney paired donation programs with altruistic donors. Statistics and Operations Research Transactions (SORT) 38, 53-72.
  7. Chen, Y., Li, Y., Kalbfleisch, J.D., Zhou, Y., Leichtman, A. and Song, P.X.-K. (2012). Graph-based optimization algorithm and software on kidney exchanges. IEEE Transactions on Biomedical Engineering 59(7), 1985-1991.

Networked Data Analysis


Data collected from networks are pervasive in practice. A network refers to a set of nodes or vertices joined in pairs by edges. An important feature of a network is that between-node distance may not be defined precisely in a numeric metric.

Supported an NSF funding, the lab pursuits the research with the following overarching: to develop quasi-likelihood theory and methods for regression analysis of multi-dimensional response variables on covariates that are collected from networks. Because data from a network are correlated across nodes, in order to achieve desirable efficiency of statistical inference we commit to address relevant analytic challenges pertinent to the need of incorporating appropriate dependence structures in estimation and inference for regression parameters.

  1. Zhong, P., Lan, W., Song, P.X.-K. and Tsai, C. (2016). Tests for covariance structures with high-dimensional repeated measurements. Annals of Statistics 45, 1185-1213.
  2. Zhou, Y. and Song, P.X.-K. (2016). Regression analysis of networked data. Biometrika 103, 287-301.

Statistical Methods with Big Data


The lab focuses on developing simultaneous statistical inference for Big Data. The current literature mostly provides statistical inference on a single variable at one time, which is not ideal in many practical settings where several parameters need to be examined simultaneously.

  1. Tang, L, Zhou, L. and Song, P.X.K. (2018). Method of divide-and-combine in regularized generalized linear models for big data. Working Paper available on arXiv.
  2. Wang, F. Zhou, L., Tang, L. and Song, P.X.K. (2018). Method of contraction-expansion for simultaneous inference in linear model. Working paper.
  3. Hector, E. C. and Song, P.X.K. (2018). A distributed and integrated method of moments for high-dimensional correlated data analysis. Working paper. (This paper received the 2018 ENAR John Van Ryzin Award)
  4. Luo, L. and Song, P.X.K. (2018). Renewable estimation and incremental inference in generalised linear models with streaming datasets. Working paper. (This paper received the 2019 ENAR Distinguished Student Paper Award)

This page was last modified on: 06/10/2018

Questions or comments with the site? Contact the maintainer (Mathieu Bray).