Apply now »

We are experiencing a technical issue with the Careers Website. At the moment, applications are not possible. We are working on resolving the issue as soon as possible. Thank you for your patience.

Genetics Data Engineer

 

Work Your Magic with us!  

 

Ready to explore, break barriers, and discover more? We know you’ve got big plans – so do we! Our colleagues across the globe love innovating with science and technology to enrich people’s lives with our solutions in Healthcare, Life Science, and Electronics. Together, we dream big and are passionate about caring for our rich mix of people, customers, patients, and planet. That's why we are always looking for curious minds that see themselves imagining the unimaginable with us.  

 

United As One for Patients, our purpose in Healthcare is to help create, improve and prolong lives. We develop medicines, intelligent devices and innovative technologies in therapeutic areas such as Oncology, Neurology and Fertility. Our teams work together across 6 continents with passion and relentless curiosity in order to help patients at every stage of life. Joining our Healthcare team is becoming part of a diverse, inclusive and flexible working culture, presenting great opportunities for personal development and career advancement across the globe.

 

Senior Principle Data Engineer - Genetics 

Your Role 

You will advance our human quantitative genetics strategy by providing the data engineering fundamentals that enable downstream analysis. You will collaborate with quantitative geneticists, data scientists, data engineers, platform experts, IT, and others to ensure that data availability and quality are never the bottlenecks for our analyses. In that collaboration, you will provide the vision and implementation for how our FAIR data environment works across internal and external platforms such as biobank trusted research environments (TREs). 

You will work with other scientists to: 

  • Bring software tools, reference datasets, and genetic data into internal environments. 

  • Bring tools, containers, and reference data into TREs (e.g., UK Biobank Research Analysis Platform, All of Us Researcher Workbench) and manage their deployment and versioning. 

  • Ingest and maintain connections to genetic reference databases (OpenTargets, GWAS Catalog, ClinVar, dbSNP, OMIM, HGMD, ChEMBL, DrugBank) and integrate them with the internal knowledge graph (Synaptix) and analytics platforms. 

  • Perform and automate data QC for diverse genomic data types including SNP arrays, whole-exome sequencing (WEX), whole-genome sequencing (WGS), and GWAS summary statistics. 

  • Develop, test, and execute reproducible analysis pipelines using workflow managers (e.g., Nextflow, WDL, Snakemake) and containerized environments (Docker, Singularity) for deployment within TREs. 

  • Return results from TREs to our internal platforms in accordance with each biobank's privacy and data protection policies. 

  • Optimize query performance and pipeline execution to support rapid-turnaround target assessments and in-licensing due diligence (~20-25 targets per year requiring fast genetic evaluation). 

  • Contribute to the design and implementation of agentic AI workflows for automated genetic evidence generation, integrating genetics pipelines with the broader agentic AI platform. 

  • Build and maintain interactive dashboards and data services that expose genetic evidence to project teams, leadership, and due diligence committees. 

  • Link genetic data to our AI tools and platforms, ensuring seamless data flow between genetic analyses and downstream decision-support systems. 

  • Automate routine analyses including standard safety assessments, target-disease association lookups, and genetic evidence reports to minimize geneticist time on repetitive tasks. 

  • Manage cloud compute budgets and optimize resource usage within TREs to maximize analytical throughput within allocated funding. 

 

Who You Are 

You have substantial expertise in data engineering for scientific and genomic data and are comfortable working both on strategic questions as well as hands-on implementation. You have 

  • Bachelor's or Master's degree in computer science, data engineering, bioinformatics, or a related field. 

  • Minimum 5 years relevant experience in data engineering, with significant exposure to genomics or bioinformatics data. 

  • Strong experience building production-grade data pipelines for genetic and genomic datasets, including familiarity with common formats (VCF, PLINK/BED/BIM/FAM, BGEN, GWAS summary statistics). 

  • Hands-on experience with biobank trusted research environments such as UK Biobank (DNAnexus), All of Us Researcher Workbench, or similar platforms. 

  • Strong expertise in Python and R for data ingestion, processing, cleaning, and pipeline orchestration. 

  • Experience with genomics-specific tools and frameworks (e.g., Hail, PLINK, bcftools, samtools, liftOver) and workflow managers (Nextflow, WDL, Snakemake). 

  • Expertise in working with large, complex datasets in cloud or HPC environments, including tools such as Spark, S3, and cloud-native compute platforms (AWS, GCP, Azure). 

  • Experience with containerization (Docker, Singularity) and infrastructure-as-code practices for reproducible deployments in secure environments. 

  • Solid understanding of data modeling, versioning, and reproducibility principles. 

  • Experience with methods and requirements for medical and genetic data privacy, including biobank data governance and controlled-access data handling. 

 

Preferred Qualifications 

  • Experience with additional biobank platforms or multi-ethnic datasets (e.g., Biobank Japan, Galatea, FinnGen). 

  • Familiarity with agentic AI frameworks or experience building LLM-integrated data pipelines and automated reporting tools. 

  • Experience building dashboards or visualization tools (e.g., Shiny, Streamlit, Plotly Dash) for scientific audiences. 

  • Background in pharmaceutical or biotech R&D environments, particularly supporting genetics or genomics teams. 

  • Experience with API development (REST/GraphQL) and MCP for serving analytical results to downstream applications. 

 

What we offer: We are curious minds that come from a broad range of backgrounds, perspectives, and life experiences. We believe that this variety drives excellence and innovation, strengthening our ability to lead in science and technology. We are committed to creating access and opportunities for all to develop and grow at your own pace. Join us in building a culture of inclusion and belonging that impacts millions and empowers everyone to work their magic and champion human progress!

 

Apply now and become a part of a team that is dedicated to Sparking Discovery and Elevating Humanity!

Job Requisition ID:  299291
Location:  Bangalore
Career Level:  D - Professional (4-9 years)
Working time model:  Full-time

North America Disclosure
The Company is committed to accessibility in its workplaces, including during the job application process. Applicants who may require accommodation during the application process should speak with our Candidate Services team at 844-655-6466 from 8:00am to 5:30pm ET Monday through Friday. If you are a resident of a Connecticut or Colorado, you are eligible to receive additional information about the compensation and benefits, which we will provide upon request.  You may contact 855 444 5678 from 8:00am to 5:30pm ET Monday through Friday, for assistance.

Notice on Fraudulent Job Offers
Unfortunately, we are aware of third parties that pretend to represent our company offering unauthorized employment opportunities. If you think a fraudulent source is offering you a job, please have a look at the following information.


Job Segment: R&D, Computer Science, Cloud, Pharmaceutical, Research, Technology, Science

Apply now »