Big Data Developer and Trainer Job Description





  • Developing complex SQL scripts for data analysis and extraction, developing and maintaining programs as required for the ETL process

  • Design and implement distributed data processing pipelines using Spark, Hive, Sqoop, Python, and other tools and languages prevalent in the Hadoop ecosystem. Ability to design and implement end to end solution.

  • Build utilities, user defined functions, and frameworks to better enable data flow patterns.

  • Research, evaluate and utilize new technologies/tools/frameworks centered around Hadoop and other elements in the Big Data space.

  • Define and build data acquisitions and consumption strategies

  • Build and incorporate automated unit tests, participate in integration testing efforts.

  • Work with teams to resolving operational & performance issues

  • Work with architecture/engineering leads and other teams to ensure quality solutions are implements, and engineering best practices are defined and adhered to.

  • Assists in the development and training of IT department.



  • Requires a four-year degree in Computer Science/Information Technology, Computer/Electrical Engineering or related discipline

  • Hands on Experience with big data tools like Hadoop, Spark, Kafka, Hive, Sqoop etc.

  • MySQL/Oracle and/or NoSQL experience with the ability to develop, tune and debug complex SQL/NoSQL applications

  • Solid experiences with Spark including different Spark API, Spark SQL and Spark Streaming

  • Hands-on experience with Spark Python API, Spark Java API or Spark Scala API, and configure the Spark Jobs

  • Solid experiences with Hive including HUE, Joins, Partitions and Buckets

  • Expert in SQL, such as nested queries, stored procedures, data modeling

  • Familiar with Cloud Technologies such as AWS S3, AWS RedShift, AWS EMR, AWS RDS or similar.

  • Experience with different data store such as HBase, Cassandra, MongoDB, Neo4j, GraphQL

  • Hands-on experience with Data pipeline and ELK 

  • Experience with Shell Scripts

  • Understanding of data pipeline deployment either on the Cloud or on-premise

  • Expert in at least one of the programming language, such as Python, Java, Scala. Python is preferred.

  • Hands-on experience with creating dashboard using Tableau, Spark or PowerBI

  • Good understanding of data streaming tools like Kafka or RabbitMQ

  • Strong written and verbal communication skills

  • Ability to work both independently and as part of a team