
Jeremy Leipzig, PhD
Bioinformatics engineer & Technical PM. Reproducible research, genomics, pipelines, and metadata. Architect and product leader for early stage therapeutic, diagnostic, and SaaS startups.
About Me
Career Highlights
- 25+ years of experience spanning bioinformatics research, product management, and software development across academia and industry
- Led four diagnostic, therapeutic and SaaS startups in defining their bioinformatic product strategy
- BS in Bio, MS in CS, PhD in Information Science
- 40+ peer-reviewed publications (4 first authorships). h-index of 30.
- Expert in developing cloud-based pipelines for genomics, transcriptomics, and clinical applications
- Founder of PhillyR user group and top-20 contributor to biostars.org bioinformatics Q&A community
- Proven track record in scaling bioinformatics workflows from research prototypes to production systems
- Author of O'Reilly book 'Data Mashups in R' and multiple bioinformatics software tools
Experience
TileDB
Product Manager
Manage the population genomics product line, including product development, sales demos, customer support, and marketing.
- Develop analysis workflows for biopharma and hospital partners
- Lead product strategy for population genomics solutions
- Drive customer acquisition and technical sales processes
Truwl
Content Lead
Led onboarding of tools, workflows, and high-impact analyses into the Truwl platform.
- Developed benchmarking product and customer acquisition strategies
- Curated and validated bioinformatics workflows for the platform
- Established quality standards for computational reproducibility
Panorama Medicine
Bioinformatics Engineer
Developed cloud-based pipelines and analysis for drug repositioning efforts. First employee.
- Built scalable cloud infrastructure for drug discovery analytics
- Led data mining and competitive intelligence research initiatives
- Designed automated workflows for pharmacological data analysis
CytoVas LLC
Senior Bioinformatics Scientist
Scaled up flow cytometry workflows for lab developed tests.
- Developed novel statistical analyses for cell subpopulation measurement
- Analyzed extracellular vesicles in clinical trials and experimental assays
- Implemented quality control systems for diagnostic applications
Children's Hospital of Philadelphia (CHOP)
Senior Data Integration Analyst & GRIN Informatics Lead
Led bioinformatics core operations and developed tools for genomic variant analysis.
- Developed tools for mitochondrial and exome variant analysis
- Created ChIP-Seq and RNA-Seq reproducible reporting systems
- Built GRIN epilepsy analysis portal and Jupyter-based variant discovery platform
- Developed myBiC portal for bioinformatics report deliverables
- Led CHOP team in pediatric genomics consortium data management
DuPont Crop Genetics
Senior Research Associate
Developed bioinformatics tools for agricultural genomics and high-throughput screening.
- Built transcriptome assembly analysis and miRNA target scanning tools
- Developed LIMS systems for high throughput mutagenesis screens
- Implemented gene annotation pipelines for crop improvement programs
University of Pennsylvania - Bushman Lab
Bioinformatics Programmer
Developed bioinformatics pipeline for HIV integration site analysis.
- Created annotation and statistical analysis tools for HIV integration sites
- Analyzed microbial diversity and viral resistance mutations
- Published groundbreaking research on retroviral DNA integration patterns
Selected Publications
Hierarchy‐guided neural network for species classification
Methods in Ecology and Evolution, 2021
Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species†
Research Conference on Metadata and Semantics Research, 2020
Computational Pipelines and Workflows in Bioinformatics
Reference Module in Life Sciences, Elsevier, 2018
Predicting the Pathogenicity of Novel Variants in Mitochondrial tRNA with MitoTIP
PLoS Computational Biology, 2017
Elevated frequency of damaging mt tRNA mutations in children with autism spectrum disorders
PLOS ONE, 2017
Phy-Mer: a novel alignment-free and reference-independent mitochondrial haplogroup classifier
Bioinformatics, 2014
The Mitochondrial Disease Sequence Data Resource (MSeqDR): a global grass-roots effort to promote sharing of mitochondrial DNA sequencing data
Mitochondrion, 2014
Increased frequency of de novo copy number variants in congenital heart disease by integrative analysis of single nucleotide polymorphism array and exome sequence data
Circulation Research, 2014
Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease
Cell, 2013
MITOMAP and MITOMASTER: using the MITOMAP database to complement analysis of a novel mitochondrial DNA phenotype
Current Protocols in Bioinformatics, 2013
IRF1 and miR-146a-5p inhibit glioblastoma cell growth through IGF-1R downregulation
Oncogene, 2011
High-resolution human core-exome sequencing reveals a reduced penetrance of CFH and CFHR5 mutations in familial macular degeneration
Human Genetics, 2008
HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications
Genome Research, 2007
A genome-wide association study of HIV drug resistance
AIDS Research and Human Retroviruses, 2007
Selection of target sites for mobile DNA integration in the human genome
PLoS Computational Biology, 2006
Genome-wide analysis of chromosomal features repressing human immunodeficiency virus transcription
Journal of Virology, 2006
Host cell factors in HIV replication: meta-analysis of genome-wide studies
Nature Reviews Microbiology, 2005
Integration targeting by avian sarcoma-leukosis virus and human immunodeficiency virus in the chicken genome
Journal of Virology, 2005
The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome
Nucleic Acids Research, 2004
The alternative splicing gallery (ASG): bridging the gap between genome and transcriptome
Nucleic Acids Research, 2004
Effects of chronic administration of selected atypical antipsychotics on monoamine levels in rat striatum
Neuropharmacology, 2000
Differential effects of clozapine and haloperidol on ketamine-induced brain metabolic activation
Brain Research, 1999
Repeated administration of haloperidol, risperidone, or olanzapine to rats does not produce the pattern of metabolic changes between brain regions found in subjects with schizophrenia
Neuropsychopharmacology, 1998
* ISI Highly Cited
† Best Research Paper: 14th International Conference on Metadata and Semantics Research
Education
Drexel University
PhD in Information Science
Specialized in computational biology, reproducible research, and bioinformatics methodology
North Carolina State University
Master of Computer Science
Focus on algorithms, data structures, and software engineering
Wake Forest University
Bachelor of Science in Biology
Pre-medical track with research experience in neuropharmacology
Research Interests
Reproducible Research & Metadata
Developing frameworks and tools to ensure computational reproducibility in biological research, with focus on metadata standards and pipeline documentation
Genomic Variant Analysis
Creating tools for mitochondrial genetics, exome analysis, and clinical genomics applications with emphasis on rare disease diagnostics
Cloud-Scale Bioinformatics
Building scalable workflows and platforms for population genomics, drug discovery, and precision medicine using AWS and GCP infrastructure
Bioinformatics Product Strategy
Translating research methodologies into commercial bioinformatics products, from startup strategy to enterprise solutions
Technical Skills
Programming Languages
Workflow Systems
Cloud Platforms
Bioinformatics Tools
Data Analysis
Web Technologies
Product Management
Notable Software & Tools
MITOMASTER
Web application that allows clinicians to quickly investigate mitochondrial mutations in sequenced or genotyped samples. Used worldwide for mitochondrial genetics research.
View ProjectmyBiC
Django application that manages user authentication and presentation of bioinformatics deliverables. Streamlines report delivery for core facilities.
MitoTIP
Machine learning tool for predicting pathogenicity of novel variants in mitochondrial tRNA genes. Published in PLoS Computational Biology.
InSiPiD
Integration Site Pipeline and Database - comprehensive toolset for managing viral integration site data processing, annotation, and analysis.
awesome-reproducible-research
A curated list of reproducible research case studies, projects, tutorials, and media. Community-driven resource with 356+ stars.
View ProjectSandwichesWithSnakemake
Beginner's tutorial to Snakemake workflow management system. Popular educational resource with 70+ stars.
View Projectsnakemake-example
RNA-Seq Snakemake example with Jekyll homepage creation. Complete workflow demonstration with 20+ stars.
View Projectberrylogo
A better seqLogo implementation for creating sequence logos in R. Enhanced visualization tool with 12+ stars.
View Projectancestryinformativemarkers
Public Ancestry Informative Markers (AIMs) dataset and tools for population genetics analysis.
View Projectopensnp
Validation of published GWAS studies using OpenSNP volunteered data. Binderized for reproducible analysis.
View Projectblast-wrapper
Node.js web-service wrapper for fetching pairwise BLAST alignments against a fixed reference.
View Projectontopop
Analysis tool to measure popularity and usage of biological ontologies across scientific literature.
View Project