Summary
Fourth-year PhD student in Biostatistics. BS in Applied mathematics and MS in Financial Statistics. Strong background in Statistics, Biostatistics and Mathematics. Advanced skills in bioinformatics, data analysis, machine learning, programming, mathematical modeling and quantitative finance. A fast, creative and energetic learner.
Education
- Medical College of Wisconsin, Milwaukee, WI, USA
Doctor of Philosophy, Biostatistics, 3.90/4.00 9/2020 – E 5/2025
Relevant Coursework: Inference, Clinical Trials, Survival Analysis, Statistical Genetics, Bayes, MCMC, Bioinformatics, Models & Methods, Linear Models, Biostatistical computing(R, SAS, simulation), consulting, etc.
- Rutgers University, New Brunswick, NJ, USA
Master of Science, Financial Statistics, 3.90/4.00 9/2018 – 5/2020
- Northwest University, Xi’an, Shaanxi, China
Bachelor of Science, Applied Mathematics, 3.41/4.00 9/2014 – 7/2018
Skills
- Core Domain Expertise: Data Analysis, Biostatistics, Single-cell Analysis(Seurat), Survival Analysis, Machine Learning, Missing data imputation, Mathematical Modeling, Programming, Financial Analysis, Regression, Meta analysis, Feature selection, High-dimension data analysis, Bayes, Statistical consulting, Quantitative finance, Time series analysis
- Computing and Programming: R, SAS, Python(Pytorch, Tensorflow), MATLAB, EMACS, OpenBUGS, Stan, Nimble, SPSS, C++, C, LaTex, shell script, Microsoft Office Suite, Lingo, SQL, R markdown, Rcpp
- Language skills: English, Chinese, Japanese
Research Experience
Missing data imputation based on sampling methods in single-cell analysis
- Based on downsampling and sampling methods to impute dropout events in single-cell data.
- Compare the clustering performance with existing methods like Scimpute and RESCUE.
Pseudo-value approach for informative cluster size
- Code from scratch for Kaplan-Meier estimates and competing risk estimates, and also pseudo-values.
- Perform simulation based on GEE using pseudo-value approach for informative cluster size to see performance.
Knockoff in case-cohort study based on group LASSO
- Code from scratch for group LASSO and knockoffs. Perform simulation to calculate power and false discovery rate.
- Improve performance based on derandomized knockoffs to control false discovery rate.
Use of external information to improve statistical estimation for infant mortality data
- Preprocess infant mortality data and seek relationship between factors and outcomes based on regression analysis.
- Perform Monte-Carlo simulation study to compare two types of estimation methods using additional information.
Improving outcomes of Chronic Lymphocytic Leukemia: Analysis of the SEER database
- To see if survival rate improves over time, estimate survival rate using Kaplan-Meier estimation.
- Perform regression analysis on survival data based on Cox proportional hazards model.
- For secondary malignancy data, perform Poisson regression model to estimate rate ratio.
Patients’ perspective on post-operative success following bariatric surgery
- Using quality of life data, explore whether time or type of bariatric surgery influences Physical component score (PCS) and mental component score (MCS) in SF-36 questionnaire.
- Find whether there is relationship between PCS, MCS versus Patient Health Questionaire-9(PHQ-9) score and Rosenberg Self-Esteem Scale(RSE).
Self-Reported Coping Strategies in Postlingually Deafened Adults and Speech Recognition Outcomes
- In a retrospective cohort study, characterize the degree to which individual coping strategies may influence speech perception following cochlear implantation.
- Perform Correlation analysis among quality-of-life measures, speech outcome measures and scores of coping strategies.