Responsibilities
===============
- Design, develop, and maintain data architectures, pipelines, and workflows for the collection, processing, storage, and retrieval of large volumes of structured and unstructured data from multiple sources.
- Collaborate with cross-functional teams to identify and prioritize data engineering requirements and to develop and deploy data-driven solutions to address business challenges.
- Build and maintain scalable data storage and retrieval systems (e.g., data lakes, data warehouses, databases), fault-tolerant, and high-performance data platforms on cloud infrastructure such as AWS, Azure, or Google Cloud Platform.
- Develop and maintain ETL workflows, data pipelines, and data transformation processes to prepare data for machine learning and AI applications.
- Implement and optimize distributed computing frameworks such as Hadoop, Spark, or Flink to support high-performance and scalable processing of large data sets.
- Build and maintain monitoring, alerting, and logging systems to ensure the availability, reliability, and performance of data pipelines and data platforms.
Requirements:
===============
- Bachelor’s or master’s degree in computer science, engineering, or related field.
- At least 5-6 years of experience in data engineering, with a strong background in machine learning, cloud computing and big data technologies.
- Experience with at least one major cloud platform (AWS, Azure, GCP).
- Proficiency in programming languages like Python, Java, and SQL.
- Experience with distributed computing technologies such as Hadoop, Spark, and Kafka.
- Familiarity with database technologies such as SQL, NoSQL, NewSQL.
- Experience with data warehousing and ETL tools such as Redshift, Snowflake, or Airflow.
- Strong problem-solving and analytical skills.
Preferred qualification:
===============
- Experience with DevOps practices and tools such as Docker, Kubernetes, or Ansible, Terraform.
- Experience with data visualization tools such as Tableau, Superset, Power BI, or Plotly, D3.js.
- Experience with stream processing frameworks such as Kafka, Pulsar or Kinesis.
- Experience with data governance, data security, and compliance.
- Experience with software engineering best practices and methodologies such as Agile or Scrum.