Projects
Thyroid carcinoma is the most common endocrine malignancy in children, and current guidelines recommend total thyroidectomy for nearly all pediatric cases. While effective, the procedure carries higher complication risks in children, including hypoparathyroidism and nerve injury. Improved preoperative diagnostics could reduce unnecessary surgeries and lifelong hormone dependence. Existing imaging-based approaches are subjective and variable. In this study, we demonstrate that genome-wide DNA methylation profiling robustly captures molecular features of pediatric thyroid carcinoma, including invasiveness and driver mutations. These findings support the potential of DNA methylation as a preoperative prognostic tool to inform treatment decisions and minimize surgical risk.

Clonal hematopoiesis of indeterminate potential (CHIP) arises from somatic mutations in hematopoietic stem cells that confer growth advantage and is influenced by germline variations in DNA damage response (DDR) genes. Whether germline BRCA1/2 (gBRCA1/2) carrier status independently predisposes individuals to CHIP independent of cancer diagnosis or genotoxic therapy exposure remains unclear. Whole-exome sequencing data from the Penn Medicine Biobank (PMBB) were processed through somatic and germline variant calling pipelines to identify CHIP variants and gBRCA1/2 carriers, respectively. gBRCA1/2 carrier status was independently associated with increased CHIP prevalence in a propensity-matched population, suggesting that inherited DNA repair deficiency may promote clonal hematopoiesis in the absence of genotoxic therapy.

Breast cancer is a leading cause of cancer morbidity among women worldwide. Large biobanks linked to electronic health records provide an opportunity to address these gaps by enabling ancestry-aware genetic discovery at scale. In this study, I used imputed genotype sequencing data from the Penn Medicine Biobank to perform an ancestry-stratified genome-wide association study of breast cancer risk. Employing clinical ICD-9/10 codes for case/control selection and genetic ancestry inference, I analyzed breast cancer susceptibility using REGENIE, a program for whole genome regression modeling of large genome-wide association studies. These results contributed to the NCI-led Confluence Project, a large international consortia that has conducted the largest and most ancestrally diverse GWAS of breast cancer to date, nearly tripling the effective sample size of previous GWAS and substantially increasing sample diversity.

FALL 2025
Exome-Wide association study of breast cancer in the Penn Medicine Biobank
Python
R
Shell scripting
REGENIE
PLINK
Exome
LPC
Breast cancer susceptibility is not only influenced by common genetic variation but also by rare, protein-altering variants not well captured by traditional genome-wide association studies. Whole-exome sequencing in large, clinically linked biobanks enables systematic interrogation of these rare variants at scale. In this study, we use whole-exome sequencing data from the Penn Medicine Biobank to perform an exome-wide association study of breast cancer risk using both single-variant and gene-based aggregation approaches. Employing ICD-9/10 codes for case/control selection, functional variant annotation, and multiple burden masks within the REGENIE mixed-model framework, we assess the contribution of rare coding variation to breast cancer susceptibility. This work establishes a scalable and reproducible exome analysis pipeline and contributes to the SIMPLEXO breast cancer project, supporting gene-level discovery in large biobank cohorts.
Conference & Presentations
APRIL 2026
Second author, Poster Presentation at the AACR Annual Meeting; Cancer Research
Summer 2025
Methylation-based Prognosis of Pediatric Thyroid Carcinoma Invasiveness
American Physician Scientists Association Mid-Atlantic Conference
SPRING 2025
DNA methylation-based stratification of pediatric thyroid tumor invasiveness
University of Pennsylvania Spring Research Exposition & Women in Stem Symposium
FALL 2023
Cross-platform DNA methylome-based cancer classification
MidAtlantic Bioinformatics Conference
FALL 2023
Exploration of feature selection strategies in DNA methylome-based cancer classification
University of Pennsylvania Fall Research Exposition
