Skill: Must have experience with Hadoop, MapReduce, Hive, Pig, Apache Spark, Spark-SQL, Spark Streaming, Kafka, Sqoop, Pyspark. Scripting in Python, Unix/Linux Shell scripting. Teradata Database, Teradata Studio, Teradata Hadoop connector, Oracle 11g, Oracle SQL Developer, UC4 Automation, Jira v6.4.14, Confluence 5.8.18, GitHub, Erwin data modeler, Tableau, ThoughtSpot.
Experience: Atleast 8 years of experience
Education: Must have a Bachelor's degree in Engineering, CIS, Information Technology, Computer Science or related field.
• Work on setting up data pipelines using big data technologies like Kafka and Spark streaming for ingesting huge amounts of data. Leverage traditional SFTP servers, Talend, Golden Gate technologies for appropriate use case.
• Develop, test and deploy ETL for processing the data acquired from various data sources in Hadoop using the technologies like Hive, MapReduce, Spark Sql and create data marts
• Create data models using Erwin tool by working with the senior data architects
• Load processed data into Teradata for downstream application and reporting needs.
• Load data into Thoughtspot for user analytics.
• Working on large data sets using distributed computing methodologies
• Setup highly scalable data pipelines for continuous data ingestion from various sources like databases, SFTP servers, Kafka clusters etc
• Participate in developing and documenting User Stories, including development estimates and QA
• Coding, technical design, data modeling, root cause analysis, investigation, debugging, testing the ETL for data mart creation processes
• Load data into multiple destinations like Teradata, Hive, Thoughtspot for various use cases
• Collaboration with the business partners, product managers other engineering teams
• Good to have experience with creating Tableau dashboards daily using data in Teradata, Hive and Spark
• Work in Agile environment and follow the strict release guidelines