AI Engineer / AI Platform Engineer (Hybrid AI/HPC Infrastructure, Architecture, MLOps & LLMOps)

Hybrid
- Belgrade, Beograd, Serbia
Consulting

Job description

We are looking for an AI Engineer / AI Platform Engineer to design, build, and operate a hybrid AI and HPC infrastructure across on-premises environments, AWS, and GCP.

You will support R&D teams by providing scalable compute, MLOps, and LLMOps platforms that accelerate scientific discovery and enable efficient development, training, evaluation, and deployment of machine learning and large language models.

Key Responsibilities

Design and evolve hybrid AI/HPC infrastructure across on-premises, AWS, and GCP
Manage and scale cloud and on-premises HPC clusters, including workloads with Slurm
Build and operate a Kubeflow-based MLOps platform
Integrate MLOps workflows with Databricks as the core data layer
Use MLflow for experiment tracking, model lifecycle management, and model registry
Architect LLMOps infrastructure for LLM fine-tuning, evaluation, and inference at scale
Establish financial transparency for ML, LLM, GPU, cloud, and HPC usage
Drive automation through Terraform, Helm, CI/CD, and GitOps
Collaborate with R&D, data science, infrastructure, security, and platform teams

Job requirements

Required Skills & Experience

Proven experience as an AI Engineer, MLOps Engineer, Platform Engineer, DevOps Engineer, Cloud Engineer, or HPC Engineer
Strong hands-on experience with Kubernetes
Experience with Kubeflow or similar MLOps platforms
Knowledge of Slurm or comparable HPC workload managers
Practical experience with AWS cloud services
Experience with GCP is a strong advantage
Hands-on experience with Databricks and MLflow
Strong expertise in Terraform and infrastructure as code
Experience with Helm, CI/CD pipelines, and GitOps practices
Good understanding of Linux, networking, security, and cloud-native architectures
Experience with GPU, ML, LLM, or HPC workloads is highly beneficial

Certifications

Relevant certifications are considered a strong advantage.

Must have:
At least one relevant certification in cloud, Kubernetes, Terraform, data engineering, machine learning, or AI platform engineering — or equivalent proven hands-on experience.

Preferred certifications:

AWS Certified Machine Learning Engineer – Associate
Google Cloud Professional Machine Learning Engineer
Certified Kubernetes Administrator
HashiCorp Certified Terraform Associate
Databricks Certified Machine Learning Professional
Databricks Certified Generative AI Engineer Associate

Nice to have:

AWS Certified Solutions Architect
AWS Certified DevOps Engineer – Professional
Google Cloud Professional Cloud Architect
Google Cloud Professional Data Engineer
Additional security, cloud-native, or HPC-related certifications

Soft Skills

Structured, reliable, and proactive working style
Strong communication skills with technical and non-technical stakeholders
Ownership mindset for AI, ML, and HPC platforms
Passion for automation, reproducibility, and scalable infrastructure
Ability to work closely with R&D, data science, and engineering teams

What We Offer

Work on a modern hybrid AI/HPC platform
Support advanced R&D and scientific discovery
Exposure to ML, LLM, GPU, cloud, and HPC workloads
Modern tooling with Kubernetes, Kubeflow, Databricks, Terraform, and GitOps
Flexible working models for permanent employees and freelancers
Long-term growth in AI infrastructure, MLOps, and LLMOps