Overview

About the Project:
The customer has been working on an analytics platform since 2018. The platform is on Hadoop and the Hortonworks Data Platform, and the customer is planning on moving it to Amazon EMR in 2021. The customer has a variety of products, the data for all of which comes into one data lake on this analytics platform, which also allows the customer to do next generation analytics on the amassed data.
Architecture:
Hortonworks is the current vendor. It will be replaced by Amazon EMR. Tableau is going to be the BI vendor. Microstrategy currently exists and will be phased out by early 2023.
All data is sent to the data lake, and the customer can do industry reporting. These data are used by a data science team to build new products and an AI model.
We will be moving to real-time streaming using Kafka and S3. We are doing POC to use Dremio and Presto for the query engine.
We’re migrating to version 2.0 using Amazon EMR and S3, and Query engine is bucketed under 2.0 project.
Requirements:
● Must have strong technical expertise in database architecture, design & data modelling
● Expertise in gathering and analyzing system requirements and designing conceptual and logical data models
● Ability to build conceptual, logical, and physical data models in Erwin and/or ER Studio
● Strong SQL query writing skills
● Experience working with Big Data
● Ability to speak to the differences in data modeling technology and philosophies of Codd, Kimball, and Inman
● Knowledge of data model types and terminology including OLTP, OLAP, cubes, dimensional star/snowflake modelling, and graph/NoSQL
● Prior experience in databases like Oracle, SQL Server, Aurora MySQL, MongoDB
● Experience with HIVE is an advantage
● Familiarity with data visualization tools like Tableau/MicroStrategy
● Understanding of and ability to apply data naming standards and to explain your definition of “best practices in data modeling”
Nice to have:
● Experience working in the AWS ecosystem
English level:
● Intermediate
Responsibilities:
● Design and develop integration data models between multiple data sources and consolidate them into a single common data model or enterprise data model
● Maintain version control of the data model(s), as well as ensure consistency between the data model and deployed instances of the model
We offer:
● Vacation is 20 working days / till 20 working days per year for sick leaves;
● Full payment of taxes;
● English courses;
● Flexible work schedule;
● Friendly environment;
● Medical insurance;
● Opportunity for career growth

