Job Description
As a Software Engineer with AI/ML Platform Experience you will:
- Architect and design scalable Kubernetes platforms supporting both traditional ML and Large Language Models (LLMs).
- Provide client support for hosting AI/ML workload on Kubernetes platform powered by GPU
- Lead the development of end-to-end ML pipelines, including data ingestion, model training, evaluation, and deployment.
- Drive AIOps initiative across Middleware platform by collaborating with multi-functional teams, including SRE, Software Engineers to operationalize and optimize ML models effectively.
- Define and implement MLOps standard methodologies such as monitoring, logging, and automated maintenance of models in production.
- Develop infrastructure automation tools and frameworks to improve efficiency across teams.
- Ensure platform reliability, scalability, and performance through meticulous engineering practices.
- Conduct code reviews, establish standard processes, and mentor junior engineers.
- Stay updated on the latest trends in AI/ML to influence platform enhancements.
Minimum Qualifications / Requirement -
- Experience: 5–7 years of software engineering experience, including at least 2+ years in machine learning-related roles.
- Expertise in Golang or Python, with hands-on experience with Kubernetes platform
- Along with ML frameworks (TensorFlow, PyTorch).
- Consistent track record in designing and deploying scalable machine learning systems in production.
- Deep understanding of ML algorithms, data pipelines, and optimization techniques.
- Experience building CI/CD pipelines for ML workflows, including model monitoring and retraining.
- Proficiency in cloud platforms and orchestration tools for distributed systems.
- Strong problem-solving and debugging skills for complex, large-scale systems.
- Experience in mentoring engineers and driving technical decision-making.
Preferred Qualifications / Requirements -
- Kubernetes and Container Orchestration:
- Sophisticated understanding in Kubernetes for managing production-grade systems and ensuring scalability.
- Sophisticated experience with Docker and orchestration of complex services.
- Software development:
- Expertise in Golang or Python
- Develop & enforce secure software development lifecycle
- MLOps Tools and Frameworks:
- Experience with architecting and optimizing workflows using Kubeflow pipelines, KServe, Airflow, and MLflow.
- Ability to design and implement efficient CI/CD pipelines for ML systems.
- Large Language Models (LLMs):
- Understanding of LangChain and experience designing RAG systems.
- Knowledge of integrating and scaling vector databases (e.g., Pinecone, FAISS) for real-world applications.
- Distributed Systems and Microservices:
- Consistent record of designing and leading the development of distributed systems.
- Experience with implementing robust inter-service communication patterns and solving scalability issues.