Jeremy Leipzig, PhD

Jeremy Leipzig, PhD

Bioinformatics engineer & Technical PM. Reproducible research, genomics, pipelines, and metadata. Architect and product leader for early stage therapeutic, diagnostic, and SaaS startups.

About Me

Career Highlights

  • 25+ years of experience spanning bioinformatics research, product management, and software development across academia and industry
  • Led four diagnostic, therapeutic and SaaS startups in defining their bioinformatic product strategy
  • BS in Bio, MS in CS, PhD in Information Science
  • 40+ peer-reviewed publications (4 first authorships). h-index of 30.
  • Expert in developing cloud-based pipelines for genomics, transcriptomics, and clinical applications
  • Founder of PhillyR user group and top-20 contributor to biostars.org bioinformatics Q&A community
  • Proven track record in scaling bioinformatics workflows from research prototypes to production systems
  • Author of O'Reilly book 'Data Mashups in R' and multiple bioinformatics software tools

Experience

08/2022 - Present

TileDB

Product Manager

Manage the population genomics product line, including product development, sales demos, customer support, and marketing.

  • Develop analysis workflows for biopharma and hospital partners
  • Lead product strategy for population genomics solutions
  • Drive customer acquisition and technical sales processes
09/2020 - 07/2022

Truwl

Content Lead

Led onboarding of tools, workflows, and high-impact analyses into the Truwl platform.

  • Developed benchmarking product and customer acquisition strategies
  • Curated and validated bioinformatics workflows for the platform
  • Established quality standards for computational reproducibility
09/2017 - 11/2019

Panorama Medicine

Bioinformatics Engineer

Developed cloud-based pipelines and analysis for drug repositioning efforts. First employee.

  • Built scalable cloud infrastructure for drug discovery analytics
  • Led data mining and competitive intelligence research initiatives
  • Designed automated workflows for pharmacological data analysis
06/2017 - 08/2018

CytoVas LLC

Senior Bioinformatics Scientist

Scaled up flow cytometry workflows for lab developed tests.

  • Developed novel statistical analyses for cell subpopulation measurement
  • Analyzed extracellular vesicles in clinical trials and experimental assays
  • Implemented quality control systems for diagnostic applications
11/2010 - 06/2017

Children's Hospital of Philadelphia (CHOP)

Senior Data Integration Analyst & GRIN Informatics Lead

Led bioinformatics core operations and developed tools for genomic variant analysis.

  • Developed tools for mitochondrial and exome variant analysis
  • Created ChIP-Seq and RNA-Seq reproducible reporting systems
  • Built GRIN epilepsy analysis portal and Jupyter-based variant discovery platform
  • Developed myBiC portal for bioinformatics report deliverables
  • Led CHOP team in pediatric genomics consortium data management
07/2007 - 11/2010

DuPont Crop Genetics

Senior Research Associate

Developed bioinformatics tools for agricultural genomics and high-throughput screening.

  • Built transcriptome assembly analysis and miRNA target scanning tools
  • Developed LIMS systems for high throughput mutagenesis screens
  • Implemented gene annotation pipelines for crop improvement programs
11/2004 - 07/2007

University of Pennsylvania - Bushman Lab

Bioinformatics Programmer

Developed bioinformatics pipeline for HIV integration site analysis.

  • Created annotation and statistical analysis tools for HIV integration sites
  • Analyzed microbial diversity and viral resistance mutations
  • Published groundbreaking research on retroviral DNA integration patterns

Selected Publications

40+ papers Publications (3 first author, 1 sole author, 1 book chapter, 1 book, 1 dissertation)

The role of metadata in reproducible computational research

Leipzig, J., Nüst, D., Hoyt, C.T., Ram, K., and Greenberg, J.

Cell Patterns, 2021

Hierarchy‐guided neural network for species classification

Elhamod, M., Diamond, K.M., Maga, A.M., Bakis, Y. Bart H.L, Mabee PM, Wasila Dahdul, W, Leipzig J., Greenberg, Avants B, Karpatne A.

Methods in Ecology and Evolution, 2021

Tests of Robustness in Peer Review

Leipzig, J.

Drexel University, 2021

Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species†

Leipzig, J., Bakis, Y., Wang, X., Elhamod, M., Diamond, K., Dahdul, W., Karpatne, A., Maga, M., Mabee, P., Bart, H.L., et al.

Research Conference on Metadata and Semantics Research, 2020

Computational Pipelines and Workflows in Bioinformatics

Leipzig, J.

Reference Module in Life Sciences, Elsevier, 2018

Predicting the Pathogenicity of Novel Variants in Mitochondrial tRNA with MitoTIP

Sonney, S.; Leipzig, J.; Lott, M. T.; Zhang, S.; Procaccio, V.; Wallace, D. C.; Sondheimer, N.

PLoS Computational Biology, 2017

Elevated frequency of damaging mt tRNA mutations in children with autism spectrum disorders

Chalkia, D., Singh, L.N., Leipzig, J., Lvova, M., Derbeneva, O., Lakatos, A., Hadley, D., Hakonarson, H., & Wallace, D.C.

PLOS ONE, 2017

A review of bioinformatic pipeline frameworks*

Leipzig, J.

Briefings in Bioinformatics, 2017

Phy-Mer: a novel alignment-free and reference-independent mitochondrial haplogroup classifier

Navarro-Gomez D, Leipzig J, Shen L, Lott M, Stassen AP, Wallace DC, Wiggs JL, Falk MJ, van Oven M, Gai X.

Bioinformatics, 2014

The Mitochondrial Disease Sequence Data Resource (MSeqDR): a global grass-roots effort to promote sharing of mitochondrial DNA sequencing data

Schoenfeld, R.A., Wong, L.J., Singh, L.N., Dimmock, D., Leipzig, J., Sweetser, D.A., ... & McCormick, E.M.

Mitochondrion, 2014

Increased frequency of de novo copy number variants in congenital heart disease by integrative analysis of single nucleotide polymorphism array and exome sequence data

Glessner, J.T., Bick, A.G., Ito, K., Homsy, J., Rodriguez-Murillo, L., Fromer, M., Mazaika, E., Vardarajan, B., Italia, M., Leipzig, J., ... & Goldmuntz, E.

Circulation Research, 2014

Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease

Zhang, B., Gaiteri, C., Bodea, L.G., Wang, Z., McElwee, J., Podtelezhnikov, A.A., Zhang, C., Xie, T., Tran, L., Dobrin, R., Fluder, E., Clurman, B., Melquist, S., Narayanan, M., Suver, C., Shah, H., Mahajan, M., Gillis, T., Mysore, J., MacDonald, M.E., Lamb, J.R., Bennett, D.A., Molony, C., Stone, D.J., Gudnason, V., Myers, A.J., Schadt, E.E., Neumann, H., Zhu, J., & Emilsson, V.

Cell, 2013

De novo mutations in histone-modifying genes in congenital heart disease

Zaidi, S., Choi, M., Wakimoto, H., Ma, L., Jiang, J., Overton, J.D., Romano-Adesman, A., Bjornson, R.D., Breitbart, R.E., Brown, K.K., Carriero, N.J., Cheung, Y.H., Deanfield, J., DePalma, S., Fakhro, K.A., Glessner, J., Hakonarson, H., Italia, M.J., Kaltman, J.R., Kaski, J., Kim, R., Kline, J.K., Lee, T., Leipzig, J., ... & Lifton, R.P.

Nature, 2013

MITOMAP and MITOMASTER: using the MITOMAP database to complement analysis of a novel mitochondrial DNA phenotype

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., & Wallace, D.C.

Current Protocols in Bioinformatics, 2013

Gene expression profiling of the plasmodium of Physarum polycephalum

Barrantes, I., Leipzig, J., Marwan, W., & Starostzik, C.

PLOS ONE, 2012

IRF1 and miR-146a-5p inhibit glioblastoma cell growth through IGF-1R downregulation

Shi, Y., Chen, C., Zhang, X., Liu, Q., Xu, J.L., Zhang, H.R., Yao, X.H., Jiang, T., He, Z.C., Ren, Y., Cui, W., Xu, C., Liu, L., Cui, Y.H., Yu, S.Z., Ping, Y.F., Yao, X.H., Chen, J.N., Wang, B., Leipzig, J., ... & Bian, X.W.

Oncogene, 2011

Data Mashups in R

Leipzig J, Li Xiao-Yi.

O'Reilly Media, 2009

High-resolution human core-exome sequencing reveals a reduced penetrance of CFH and CFHR5 mutations in familial macular degeneration

Wang, K., Li, M., Hakonarson, H., Leipzig, J., & Bucan, M.

Human Genetics, 2008

HTLV-1 integration site selection: involvement of chromosomal fragile sites

Meekings, K.N., Leipzig, J., Bushman, F.D., Brighton, P., & Leib, D.

Virology, 2008

HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications

Wang GP, Ciuffi A, Leipzig J, Berry CC, Bushman FD.

Genome Research, 2007

A genome-wide association study of HIV drug resistance

Hoffmann, C., Welz, T., Sabranski, M., Kolb, G., Wolf, E., Goebel, F.D., Leipzig, J., & Jaeger, H.

AIDS Research and Human Retroviruses, 2007

Selection of target sites for mobile DNA integration in the human genome

Berry C, Hannenhalli S, Leipzig J, Bushman FD.

PLoS Computational Biology, 2006

Genome-wide analysis of chromosomal features repressing human immunodeficiency virus transcription

Lewinski, M.K., Bisgrove, D., Shinn, P., Chen, H., Hoffmann, C., Hannenhalli, S., Verdin, E., Berry, C.C., Ecker, J.R., & Bushman, F.D.

Journal of Virology, 2006

A role for LEDGF/p75 in targeting HIV DNA integration

Ciuffi, A., Llano, M., Poeschla, E., Hoffmann, C., Leipzig, J., Shinn, P., Ecker, J.R., & Bushman, F.D.

Nature Medicine, 2006

DNA repair, mutagenesis, and the control of retroviral integration

Barr, S.D., Leipzig, J., Shinn, P., Ecker, J.R., & Bushman, F.D.

DNA Repair, 2006

A role for LEDGF/p75 in targeting HIV DNA integration

Ciuffi A, Llano M, Poeschla E, Hoffmann C, Leipzig J, Shinn P, Ecker J, Bushman F.

Nature Medicine, 2005

Host cell factors in HIV replication: meta-analysis of genome-wide studies

Bushman, F.D., Malani, N., Fernandes, J., D'Orso, I., Cagney, G., Diamond, T.L., Zhou, H., Hazuda, D.J., Espeseth, A.S., König, R., Bandyopadhyay, S., Ideker, T., Goff, S.P., Krogan, N.J., Frankel, A.D., Young, J.A., & Chanda, S.K.

Nature Reviews Microbiology, 2005

Integration targeting by avian sarcoma-leukosis virus and human immunodeficiency virus in the chicken genome

Barr, S.D., Leipzig, J., Shinn, P., Ecker, J.R., & Bushman, F.D.

Journal of Virology, 2005

The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome

Leipzig J, Pevzner P, Heber S.

Nucleic Acids Research, 2004

The alternative splicing gallery (ASG): bridging the gap between genome and transcriptome

Leipzig, J., Pevzner, P., & Heber, S.

Nucleic Acids Research, 2004

Effects of chronic administration of selected atypical antipsychotics on monoamine levels in rat striatum

Duncan, G.E., Leipzig, J.N., Mailman, R.B., & Lieberman, J.A.

Neuropharmacology, 2000

Differential effects of clozapine and haloperidol on ketamine-induced brain metabolic activation

Duncan, G.E., Leipzig, J.N., Mailman, R.B., & Lieberman, J.A.

Brain Research, 1999

Repeated administration of haloperidol, risperidone, or olanzapine to rats does not produce the pattern of metabolic changes between brain regions found in subjects with schizophrenia

Duncan, G.E., Sheitman, B.B., Leipzig, J.N., Adigun, O.K., & Lieberman, J.A.

Neuropsychopharmacology, 1998

* ISI Highly Cited

† Best Research Paper: 14th International Conference on Metadata and Semantics Research

Education

2021

Drexel University

PhD in Information Science

Specialized in computational biology, reproducible research, and bioinformatics methodology

2003

North Carolina State University

Master of Computer Science

Focus on algorithms, data structures, and software engineering

1997

Wake Forest University

Bachelor of Science in Biology

Pre-medical track with research experience in neuropharmacology

Research Interests

Reproducible Research & Metadata

Developing frameworks and tools to ensure computational reproducibility in biological research, with focus on metadata standards and pipeline documentation

Genomic Variant Analysis

Creating tools for mitochondrial genetics, exome analysis, and clinical genomics applications with emphasis on rare disease diagnostics

Cloud-Scale Bioinformatics

Building scalable workflows and platforms for population genomics, drug discovery, and precision medicine using AWS and GCP infrastructure

Bioinformatics Product Strategy

Translating research methodologies into commercial bioinformatics products, from startup strategy to enterprise solutions

Technical Skills

Programming Languages

R Python Perl Java Groovy Scala JavaScript C++ C PHP

Workflow Systems

Snakemake WDL Nextflow CWL Galaxy Docker Singularity

Cloud Platforms

AWS Google Cloud Platform Azure Kubernetes Terraform

Bioinformatics Tools

BLAST BWA GATK SAMtools Bioconductor IGV SnpEff VEP

Data Analysis

RNA-Seq ChIP-Seq ATAC-Seq Exome Analysis Population Genomics Flow Cytometry

Web Technologies

Django Flask Node.js jQuery D3.js RESTful APIs PostgreSQL MongoDB

Product Management

Linear Shortcut JIRA Asana PRDs User Stories Roadmap Planning Agile/Scrum Kanban OKRs A/B Testing Customer Interviews Market Research Go-to-Market Strategy Feature Prioritization Stakeholder Management Technical Sales Product Analytics

Notable Software & Tools

MITOMASTER

Web Application

Web application that allows clinicians to quickly investigate mitochondrial mutations in sequenced or genotyped samples. Used worldwide for mitochondrial genetics research.

View Project

myBiC

Django/Python

Django application that manages user authentication and presentation of bioinformatics deliverables. Streamlines report delivery for core facilities.

MitoTIP

Machine Learning

Machine learning tool for predicting pathogenicity of novel variants in mitochondrial tRNA genes. Published in PLoS Computational Biology.

InSiPiD

Pipeline

Integration Site Pipeline and Database - comprehensive toolset for managing viral integration site data processing, annotation, and analysis.

awesome-reproducible-research

Python 356

A curated list of reproducible research case studies, projects, tutorials, and media. Community-driven resource with 356+ stars.

View Project

SandwichesWithSnakemake

Tutorial 70

Beginner's tutorial to Snakemake workflow management system. Popular educational resource with 70+ stars.

View Project

snakemake-example

CSS/Snakemake 20

RNA-Seq Snakemake example with Jekyll homepage creation. Complete workflow demonstration with 20+ stars.

View Project

berrylogo

R 12

A better seqLogo implementation for creating sequence logos in R. Enhanced visualization tool with 12+ stars.

View Project

ancestryinformativemarkers

Genomics 7

Public Ancestry Informative Markers (AIMs) dataset and tools for population genetics analysis.

View Project

opensnp

HTML/R 7

Validation of published GWAS studies using OpenSNP volunteered data. Binderized for reproducible analysis.

View Project

blast-wrapper

Node.js

Node.js web-service wrapper for fetching pairwise BLAST alignments against a fixed reference.

View Project

ontopop

Python

Analysis tool to measure popularity and usage of biological ontologies across scientific literature.

View Project

Contact

Whitefish, MT
TileDB