Jeremy Leipzig, PhD

Jeremy Leipzig, PhD

Bioinformatics engineer & Technical PM. Reproducible research, genomics, pipelines, and metadata. Architect and product leader for early stage therapeutic, diagnostic, and SaaS startups. O'Reilly author & PhD.

About Me

My career has dealt primarily with writing software to visualize, explore, and manipulate biological data. I have worked as a bioinformatics software engineer and product manager - in academia, industry, and in diagnostic, therapeutic and platform startups. During my roles in product development I have helped software companies understand and navigate the bioinformatics space. I have over 40 peer-reviewed publications, and an O’Reilly book. My university training was in Biology and Computer Science with an emphasis on bioinformatics and statistical genetics, and a PhD in information science with a dissertation on reproducible research.

With over 25 years of experience spanning academia and industry, I bring a unique perspective that combines deep technical expertise with product management and business strategy. My work encompasses everything from fundamental algorithm development to scaling production systems for enterprise clients.

Career Highlights

  • 25+ years of experience spanning bioinformatics research, product management, and software development across academia and industry
  • Led four diagnostic, therapeutic and SaaS startups in defining their bioinformatic product strategy
  • BS in Bio, MS in CS, PhD in Information Science
  • 40+ peer-reviewed publications (4 first authorships). h-index of 30.
  • Expert in developing cloud-based pipelines for genomics, transcriptomics, and clinical applications
  • Founder of PhillyR user group and top-20 contributor to biostars.org bioinformatics Q&A community
  • Proven track record in scaling bioinformatics workflows from research prototypes to production systems
  • Author of O'Reilly book 'Data Mashups in R' and multiple bioinformatics software tools

Experience

08/2022 - Present

TileDB

Product Manager

Manage the population genomics product line, including product development, sales demos, customer support, and marketing.

  • Develop analysis workflows for biopharma and hospital partners
  • Lead product strategy for population genomics solutions
  • Drive customer acquisition and technical sales processes
  • Presented TileDB solutions at industry conferences and technical talks (watch presentation)
09/2020 - 07/2022

Truwl

Content Lead

Led onboarding of tools, workflows, and high-impact analyses into the Truwl platform.

  • Developed benchmarking product and customer acquisition strategies
  • Curated and validated bioinformatics workflows for the platform
  • Established quality standards for computational reproducibility
  • Presented platform capabilities and workflow demonstrations (watch demo)
09/2017 - 11/2019

Panorama Medicine

Bioinformatics Engineer

Developed cloud-based pipelines and analysis for drug repositioning efforts. First employee.

  • Built scalable cloud infrastructure for drug discovery analytics
  • Led data mining and competitive intelligence research initiatives
  • Designed automated workflows for pharmacological data analysis
06/2017 - 08/2018

CytoVas LLC

Senior Bioinformatics Scientist

Scaled up flow cytometry workflows for lab developed tests.

  • Developed novel statistical analyses for cell subpopulation measurement
  • Analyzed extracellular vesicles in clinical trials and experimental assays
  • Implemented quality control systems for diagnostic applications
11/2010 - 06/2017

Children's Hospital of Philadelphia (CHOP)

Senior Data Integration Analyst & GRIN Informatics Lead

Led bioinformatics core operations and developed tools for genomic variant analysis.

  • Developed tools for mitochondrial and exome variant analysis
  • Created ChIP-Seq and RNA-Seq reproducible reporting systems
  • Built GRIN epilepsy analysis portal and Jupyter-based variant discovery platform
  • Developed myBiC portal for bioinformatics report deliverables
  • Led CHOP team in pediatric genomics consortium data management
07/2007 - 11/2010

DuPont Crop Genetics

Senior Research Associate

Developed bioinformatics tools for agricultural genomics and high-throughput screening.

  • Built transcriptome assembly analysis and miRNA target scanning tools
  • Developed LIMS systems for high throughput mutagenesis screens
  • Implemented gene annotation pipelines for crop improvement programs
11/2004 - 07/2007

University of Pennsylvania - Bushman Lab

Bioinformatics Programmer

Developed bioinformatics pipeline for HIV integration site analysis.

  • Created annotation and statistical analysis tools for HIV integration sites
  • Analyzed microbial diversity and viral resistance mutations
  • Published groundbreaking research on retroviral DNA integration patterns
01/2003 - 11/2004

NC State University - Dept. of Electrical Engineering

Web Developer

Developed various applications to manage student, employee, and equipment records.

  • Built database management systems for academic records
  • Created web interfaces for equipment tracking
  • Implemented student and employee management applications
08/2000 - 12/2001

The Trout Group

Consultant

Developed scientific presentations used in investor road shows for biotechnology clients.

  • Created compelling scientific narratives for biotech investor presentations
  • Collaborated with Enchira and other biotechnology companies
  • Translated complex scientific concepts for investment audiences
08/1997 - 08/1999

UNC School of Medicine - Duncan Lab

Research Technician

Responsible for all techniques involved in 2-deoxyglucose autoradiography studies of ketamine-induced psychotomimetic effects in rodents.

  • Conducted neuropharmacology research on ketamine effects
  • Performed 2-deoxyglucose autoradiography studies
  • Analyzed psychotomimetic effects in rodent models

Selected Publications

40+ papers Publications (3 first author, 1 sole author, 1 book chapter, 1 book, 1 dissertation)

The role of metadata in reproducible computational research

Leipzig, J., Nüst, D., Hoyt, C.T., Ram, K., and Greenberg, J.

Cell Patterns, 2021

Hierarchy‐guided neural network for species classification

Elhamod, M., Diamond, K.M., Maga, A.M., Bakis, Y. Bart H.L, Mabee PM, Wasila Dahdul, W, Leipzig J., Greenberg, Avants B, Karpatne A.

Methods in Ecology and Evolution, 2021

Tests of Robustness in Peer Review

Leipzig, J.

Drexel University, 2021

Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species†

Leipzig, J., Bakis, Y., Wang, X., Elhamod, M., Diamond, K., Dahdul, W., Karpatne, A., Maga, M., Mabee, P., Bart, H.L., et al.

Research Conference on Metadata and Semantics Research, 2020

Computational Pipelines and Workflows in Bioinformatics

Leipzig, J.

Reference Module in Life Sciences, Elsevier, 2018

Predicting the Pathogenicity of Novel Variants in Mitochondrial tRNA with MitoTIP

Sonney, S.; Leipzig, J.; Lott, M. T.; Zhang, S.; Procaccio, V.; Wallace, D. C.; Sondheimer, N.

PLoS Computational Biology, 2017

Elevated frequency of damaging mt tRNA mutations in children with autism spectrum disorders

Chalkia, D., Singh, L.N., Leipzig, J., Lvova, M., Derbeneva, O., Lakatos, A., Hadley, D., Hakonarson, H., & Wallace, D.C.

PLOS ONE, 2017

A review of bioinformatic pipeline frameworks*

Leipzig, J.

Briefings in Bioinformatics, 2017

Phy-Mer: a novel alignment-free and reference-independent mitochondrial haplogroup classifier

Navarro-Gomez D, Leipzig J, Shen L, Lott M, Stassen AP, Wallace DC, Wiggs JL, Falk MJ, van Oven M, Gai X.

Bioinformatics, 2014

The Mitochondrial Disease Sequence Data Resource (MSeqDR): a global grass-roots effort to promote sharing of mitochondrial DNA sequencing data

Schoenfeld, R.A., Wong, L.J., Singh, L.N., Dimmock, D., Leipzig, J., Sweetser, D.A., ... & McCormick, E.M.

Mitochondrion, 2014

Increased frequency of de novo copy number variants in congenital heart disease by integrative analysis of single nucleotide polymorphism array and exome sequence data

Glessner, J.T., Bick, A.G., Ito, K., Homsy, J., Rodriguez-Murillo, L., Fromer, M., Mazaika, E., Vardarajan, B., Italia, M., Leipzig, J., ... & Goldmuntz, E.

Circulation Research, 2014

Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease

Zhang, B., Gaiteri, C., Bodea, L.G., Wang, Z., McElwee, J., Podtelezhnikov, A.A., Zhang, C., Xie, T., Tran, L., Dobrin, R., Fluder, E., Clurman, B., Melquist, S., Narayanan, M., Suver, C., Shah, H., Mahajan, M., Gillis, T., Mysore, J., MacDonald, M.E., Lamb, J.R., Bennett, D.A., Molony, C., Stone, D.J., Gudnason, V., Myers, A.J., Schadt, E.E., Neumann, H., Zhu, J., & Emilsson, V.

Cell, 2013

De novo mutations in histone-modifying genes in congenital heart disease

Zaidi, S., Choi, M., Wakimoto, H., Ma, L., Jiang, J., Overton, J.D., Romano-Adesman, A., Bjornson, R.D., Breitbart, R.E., Brown, K.K., Carriero, N.J., Cheung, Y.H., Deanfield, J., DePalma, S., Fakhro, K.A., Glessner, J., Hakonarson, H., Italia, M.J., Kaltman, J.R., Kaski, J., Kim, R., Kline, J.K., Lee, T., Leipzig, J., ... & Lifton, R.P.

Nature, 2013

MITOMAP and MITOMASTER: using the MITOMAP database to complement analysis of a novel mitochondrial DNA phenotype

Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H.M., Chalkia, D., Sarmady, M., Procaccio, V., & Wallace, D.C.

Current Protocols in Bioinformatics, 2013

Gene expression profiling of the plasmodium of Physarum polycephalum

Barrantes, I., Leipzig, J., Marwan, W., & Starostzik, C.

PLOS ONE, 2012

IRF1 and miR-146a-5p inhibit glioblastoma cell growth through IGF-1R downregulation

Shi, Y., Chen, C., Zhang, X., Liu, Q., Xu, J.L., Zhang, H.R., Yao, X.H., Jiang, T., He, Z.C., Ren, Y., Cui, W., Xu, C., Liu, L., Cui, Y.H., Yu, S.Z., Ping, Y.F., Yao, X.H., Chen, J.N., Wang, B., Leipzig, J., ... & Bian, X.W.

Oncogene, 2011

Data Mashups in R

Leipzig J, Li Xiao-Yi.

O'Reilly Media, 2009

High-resolution human core-exome sequencing reveals a reduced penetrance of CFH and CFHR5 mutations in familial macular degeneration

Wang, K., Li, M., Hakonarson, H., Leipzig, J., & Bucan, M.

Human Genetics, 2008

HTLV-1 integration site selection: involvement of chromosomal fragile sites

Meekings, K.N., Leipzig, J., Bushman, F.D., Brighton, P., & Leib, D.

Virology, 2008

HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications

Wang GP, Ciuffi A, Leipzig J, Berry CC, Bushman FD.

Genome Research, 2007

A genome-wide association study of HIV drug resistance

Hoffmann, C., Welz, T., Sabranski, M., Kolb, G., Wolf, E., Goebel, F.D., Leipzig, J., & Jaeger, H.

AIDS Research and Human Retroviruses, 2007

Selection of target sites for mobile DNA integration in the human genome

Berry C, Hannenhalli S, Leipzig J, Bushman FD.

PLoS Computational Biology, 2006

Genome-wide analysis of chromosomal features repressing human immunodeficiency virus transcription

Lewinski, M.K., Bisgrove, D., Shinn, P., Chen, H., Hoffmann, C., Hannenhalli, S., Verdin, E., Berry, C.C., Ecker, J.R., & Bushman, F.D.

Journal of Virology, 2006

A role for LEDGF/p75 in targeting HIV DNA integration

Ciuffi, A., Llano, M., Poeschla, E., Hoffmann, C., Leipzig, J., Shinn, P., Ecker, J.R., & Bushman, F.D.

Nature Medicine, 2006

DNA repair, mutagenesis, and the control of retroviral integration

Barr, S.D., Leipzig, J., Shinn, P., Ecker, J.R., & Bushman, F.D.

DNA Repair, 2006

A role for LEDGF/p75 in targeting HIV DNA integration

Ciuffi A, Llano M, Poeschla E, Hoffmann C, Leipzig J, Shinn P, Ecker J, Bushman F.

Nature Medicine, 2005

Host cell factors in HIV replication: meta-analysis of genome-wide studies

Bushman, F.D., Malani, N., Fernandes, J., D'Orso, I., Cagney, G., Diamond, T.L., Zhou, H., Hazuda, D.J., Espeseth, A.S., König, R., Bandyopadhyay, S., Ideker, T., Goff, S.P., Krogan, N.J., Frankel, A.D., Young, J.A., & Chanda, S.K.

Nature Reviews Microbiology, 2005

Integration targeting by avian sarcoma-leukosis virus and human immunodeficiency virus in the chicken genome

Barr, S.D., Leipzig, J., Shinn, P., Ecker, J.R., & Bushman, F.D.

Journal of Virology, 2005

The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome

Leipzig J, Pevzner P, Heber S.

Nucleic Acids Research, 2004

The alternative splicing gallery (ASG): bridging the gap between genome and transcriptome

Leipzig, J., Pevzner, P., & Heber, S.

Nucleic Acids Research, 2004

Effects of chronic administration of selected atypical antipsychotics on monoamine levels in rat striatum

Duncan, G.E., Leipzig, J.N., Mailman, R.B., & Lieberman, J.A.

Neuropharmacology, 2000

Differential effects of clozapine and haloperidol on ketamine-induced brain metabolic activation

Duncan, G.E., Leipzig, J.N., Mailman, R.B., & Lieberman, J.A.

Brain Research, 1999

Repeated administration of haloperidol, risperidone, or olanzapine to rats does not produce the pattern of metabolic changes between brain regions found in subjects with schizophrenia

Duncan, G.E., Sheitman, B.B., Leipzig, J.N., Adigun, O.K., & Lieberman, J.A.

Neuropsychopharmacology, 1998

* ISI Highly Cited

† Best Research Paper: 14th International Conference on Metadata and Semantics Research

Education

Drexel University

PhD in Information Science

Dissertation on reproducible research and metadata in bioinformatics

North Carolina State University

Master of Computer Science

Focus on statistical genetics and alternative splicing

Wake Forest University

Bachelor of Science in Biology

Research experience in neurobiology

Research Interests

Reproducible Research & Metadata

Developing frameworks and tools to ensure computational reproducibility in biological research, with focus on metadata standards and pipeline documentation

Genomic Variant Analysis

Creating tools for mitochondrial genetics, exome analysis, and clinical genomics applications with emphasis on rare disease diagnostics

Cloud-Scale Bioinformatics

Building scalable workflows and platforms for population genomics, drug discovery, and precision medicine using AWS and GCP infrastructure

Bioinformatics Product Strategy

Translating research methodologies into commercial bioinformatics products, from startup strategy to enterprise solutions

Technical Skills

Programming Languages

R Python Perl Java Groovy Scala JavaScript C++ C PHP

Workflow Systems

Snakemake WDL Nextflow CWL Galaxy Docker Singularity

Cloud Platforms

AWS Google Cloud Platform Azure Kubernetes Terraform

Bioinformatics Tools

BLAST BWA GATK SAMtools Bioconductor IGV SnpEff VEP

Data Analysis

RNA-Seq ChIP-Seq ATAC-Seq Exome Analysis Population Genomics Flow Cytometry

Web & Database

Django Flask Node.js jQuery D3.js RESTful APIs PostgreSQL MongoDB TileDB Grails

Notable Software & Tools

MITOMASTER

Web Application

Web application that allows clinicians to quickly investigate mitochondrial mutations in sequenced or genotyped samples. Used worldwide for mitochondrial genetics research.

View Project

myBiC

Django/Python

Django application that manages user authentication and presentation of bioinformatics deliverables. Streamlines report delivery for core facilities.

View Project

InSiPiD

Pipeline

Integration Site Pipeline and Database - comprehensive toolset for managing viral integration site data processing, annotation, and analysis.

placenta

R/Bioinformatics

Comprehensive analysis pipeline and resources for placental genomics research and developmental studies.

View Project

opensnp

HTML/R 7

Validation of published GWAS studies using OpenSNP volunteered data. Binderized for reproducible analysis.

View Project

m6a

R/Bioinformatics

Analysis tools and workflows for N6-methyladenosine (m6A) RNA modification detection and quantification.

View Project

metadata-in-rcr

Documentation

Resources and examples for metadata standards in reproducible computational research. Supporting materials for academic publications.

View Project

clk

Python

Command-line toolkit for bioinformatics workflows and data processing automation.

View Project

fcs_flow_cytometry

R/Python

Core flow cytometry data analysis tools and utilities for FCS file processing.

View Project

fcs_parser

Python

Parser and data extraction tools for Flow Cytometry Standard (FCS) files.

View Project

fcs_utils

R

Utility functions and helper tools for flow cytometry data manipulation and quality control.

View Project

mcmanus_ant1

R/Bioinformatics

Analysis pipeline for mitochondrial ANT1 gene studies and associated metabolic pathways.

View Project

asciiruler

Python

Command-line tool for generating ASCII rulers and measurement guides for sequence analysis.

View Project

standard-velvet-assembly-report

Perl/Shell

Standardized reporting pipeline for Velvet genome assembly results and quality metrics.

View Project

awesome-reproducible-research

Python 356

A curated list of reproducible research case studies, projects, tutorials, and media. Community-driven resource with 356+ stars.

View Project

SandwichesWithSnakemake

Tutorial 70

Beginner's tutorial to Snakemake workflow management system. Popular educational resource with 70+ stars.

View Project

snakemake-example

CSS/Snakemake 20

RNA-Seq Snakemake example with Jekyll homepage creation. Complete workflow demonstration with 20+ stars.

View Project

berrylogo

R 12

A better seqLogo implementation for creating sequence logos in R. Enhanced visualization tool with 12+ stars.

View Project

ancestryinformativemarkers

Genomics 7

Public Ancestry Informative Markers (AIMs) dataset and tools for population genetics analysis.

View Project

blast-wrapper

Node.js

Node.js web-service wrapper for fetching pairwise BLAST alignments against a fixed reference.

View Project

ontopop

Python

Analysis tool to measure popularity and usage of biological ontologies across scientific literature.

View Project

Contact

Whitefish, MT
TileDB