Job Description
Key Responsibilities:
- Design and implement data pipelines using Azure Data Factory, Azure Databricks, and other Azure services.
- Develop and maintain ETL processes for data extraction, transformation, and loading.
- Create and manage data models for Azure Data Lake and data warehouses.
- Write and optimize SQL queries, stored procedures, and functions.
- Integrate data from various sources, ensuring data quality and consistency.
- Implement CI/CD pipelines for data deployments using Azure DevOps.
- Apply Agile principles to manage and deliver data engineering projects effectively.
- Migrate data from legacy systems to Azure environments.
- Collaborate with cross-functional teams to ensure alignment with project goals and deliverables.
Required Skills and Experience:
- Azure Experience: 3-5 years of hands-on experience with Azure Data Lake, Azure Databricks, and Azure Data Factory.
- Programming: Proficiency in Python, PySpark, and Spark SQL for data processing and analytics.
- Data Engineering: Expertise in building data pipelines and managing ETL processes.
- Data Warehousing: Experience with data warehouse development, including Azure SQL Data Warehouse and Snowflake.
- DevOps and CI/CD: Strong understanding of DevOps practices and experience implementing CI/CD pipelines using Azure DevOps.
- Agile Methodologies: Familiarity with Agile principles, including Scrum cadences and roles, and experience using Agile tools such as JIRA or Azure DevOps.
- Big Data Frameworks: Experience with Spark, Hadoop, and Hive for processing large datasets.
- Data Integration: Knowledge of data integration tools and processes, including batch and real-time data integration.
- Database Management: Experience with databases such as MS SQL Server, Oracle, and working knowledge of SQL code, stored procedures, and functions.
- Documentation: Ability to write detailed requirement, functional, and technical documentation.
Basic Understanding:
- Scheduling and Workflow Management: Experience with workflow management tools like Azure Data Factory or similar.
- Data Modeling: Knowledge of enterprise and semantic modeling using tools like ERwin, ER/Studio, or PowerDesigner.
- Cloud Architecture: Understanding of architecture and data modeling for data lakes on cloud platforms.
- Build and Release Management: Familiarity with build and release management practices and tools.
Strong In:
- Coding: Proficiency in Python and PySpark for writing data processing scripts.
- Big Data Frameworks: Hands-on experience with Spark or Hadoop.
- ETL Processes: Expertise in ETL toolsets and data processing.
- Code Management: Experience with version control systems like GitHub or Azure DevOps.
- Data Formats: Experience with data formats such as JSON and XML.