Share this Job

Data Engineer - Enterprise Hadoop/Spark Platform

Job Title:Data Engineer - Enterprise Hadoop/Spark Platform

Job Location: Bangalore

Job Details:

 

The Enterprise Data Engineering Team is responsible for designing, developing, testing, and supporting so-called "data pipelines" on Orgs’s enterprise data management and analytics platform (Hadoop and other components). The platform stack is also referred to as “MCloud”. In this role, you will be part of a growing, global team of data engineers, who collaborate in a DevOps approach, in order to enable Orgs business sectors with state-of-the-art technology to leverage data as an asset and to take informed decisions.

 

The MCloud platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or in Orgs's own data centers. These are: 

  • Hortonworks Hadoop environment (development cluster and GXP-regulated production cluster) 
  • ELK stack (Elasticsearch, Logstash, Kibana)
  • Palantir Foundry platform (proprietary technology stack)

    The technology focus of this role will be Hadoop, Spark and related technologies. However, it might be required to collaborate with team members whose main focus is around one of the other technologies.

     

    It is important to note that this role requires experience working in a strictly regulated IT context and knowledge of the applicable good practices (preferably related to Healthcare regulation). These include but not limited to: good documentation practices, software validation, change management for regulated software and responsible management of deviations/non-conformances. Additional custom training will be provided, but willingness to work in strictly regulated context is required.

     

    Roles & Responsibilities: 

  • Develop data pipelines in a Hadoop-based cluster environment
  • Participate in end to end project lifecycle, from requirements analysis to go-live and operations of an application
  • Review code developed by other data engineers and check against platform-specific standards, cross-cutting concerns, coding and configuration standards and functional specification of the pipeline
  • Create high quality technical documentation; work must be documented in a professional and traceable way
  • Work out the best possible balance between technical feasibility and business requirements (the latter can be quite strict)
  • Consult technical team members and management staff
  • Deploy applications on MCloud platform infrastructure (especially Hortonworks Hadoop) with  clearly defined checks
  • Implementation of changes and bug fixes via Org's change management framework and according to system engineering practices (additional training will be provided)
  • DevOps project setup following Agile principles (e.g. Scrum)
  • Besides working on projects, act as third level support for critical applications (partly GXP regulated); analyze and resolve complex incidents/problems with MCloud support team members

     

    Education 

  • B.Sc. (or higher) degree in Computer Science, Engineering, Physics or related fields 

     

    Professional Experience 

  • 5+ years of experience in system engineering or software development 
  • 3+ years of intensive experience working with an Apache Hadoop distribution

     

    Skills

Hadoop 

Experience with Big Data Platforms / Hadoop platform (ideally Hortonworks Data Platform)

ETL 

Experience with ELT/ETL tools

Data management / data structures

Must be proficient in technical data management tasks, i.e. writing code to read, transform and store data

XML/JSON knowledge

Experience working with REST APIs

Shell Scripting 

Ability to write shell scripts (Linux shell and shell scripting)

Programming 

Deep experience in software development with Scala or Java

Potentially also Python or R (preferred)

SQL 

Must be experienced in writing complex SQL statements

IT project management / process understanding

SDLC experience

Working in DevOps teams, based on Agile principles (e.g. Scrum)

ITIL knowledge (especially incident, problem and change management)

Regulated industry

Experience working in regulated IT context (preferably Healthcare/GXP)

Languages 

Fluent English skills (orally and in writing)

Linux 

Experience working with the Unix CLI

Basic knowledge of Enterprise Linux, ideally SUSE Linux (preferred)

Authorization 

Basic understanding of user authorization (Apache Ranger preferred)

AWS 

General knowledge of AWS Stack: EC2, S3, EBS, etc. (preferred)

 

Specific information related to the position:

  • Physical presence in primary work location (Bangalore)
  • Must be able to work mobile to support issues offline during evenings and weekends
  • Flexible to work CEST and US EST time zones (according to project demand/team rotation plan)
  • Willingness to travel to Germany, US and potentially other locations (as per project demand)

     

Job Requisition ID:  184204
Location:  Bangalore SBS
Career Level:  D - Professional (4-9 years)
Working time model:  full-time


Job Segment: Database, Developer, Computer Science, Linux, SQL, Technology

Apply now »