Skip to content

AI Engineer / AI Platform Engineer (Hybrid AI/HPC Infrastructure, Architecture, MLOps & LLMOps)

  • Hybrid
    • Belgrade, Beograd, Serbia
  • Consulting

Job description

We are looking for an AI Engineer / AI Platform Engineer to design, build, and operate a hybrid AI and HPC infrastructure across on-premises environments, AWS, and GCP.

You will support R&D teams by providing scalable compute, MLOps, and LLMOps platforms that accelerate scientific discovery and enable efficient development, training, evaluation, and deployment of machine learning and large language models.

Key Responsibilities

  • Design and evolve hybrid AI/HPC infrastructure across on-premises, AWS, and GCP

  • Manage and scale cloud and on-premises HPC clusters, including workloads with Slurm

  • Build and operate a Kubeflow-based MLOps platform

  • Integrate MLOps workflows with Databricks as the core data layer

  • Use MLflow for experiment tracking, model lifecycle management, and model registry

  • Architect LLMOps infrastructure for LLM fine-tuning, evaluation, and inference at scale

  • Establish financial transparency for ML, LLM, GPU, cloud, and HPC usage

  • Drive automation through Terraform, Helm, CI/CD, and GitOps

  • Collaborate with R&D, data science, infrastructure, security, and platform teams

Job requirements

Required Skills & Experience

  • Proven experience as an AI Engineer, MLOps Engineer, Platform Engineer, DevOps Engineer, Cloud Engineer, or HPC Engineer

  • Strong hands-on experience with Kubernetes

  • Experience with Kubeflow or similar MLOps platforms

  • Knowledge of Slurm or comparable HPC workload managers

  • Practical experience with AWS cloud services

  • Experience with GCP is a strong advantage

  • Hands-on experience with Databricks and MLflow

  • Strong expertise in Terraform and infrastructure as code

  • Experience with Helm, CI/CD pipelines, and GitOps practices

  • Good understanding of Linux, networking, security, and cloud-native architectures

  • Experience with GPU, ML, LLM, or HPC workloads is highly beneficial

Certifications

Relevant certifications are considered a strong advantage.

Must have:
At least one relevant certification in cloud, Kubernetes, Terraform, data engineering, machine learning, or AI platform engineering — or equivalent proven hands-on experience.

Preferred certifications:

AWS Certified Machine Learning Engineer – Associate
Google Cloud Professional Machine Learning Engineer
Certified Kubernetes Administrator
HashiCorp Certified Terraform Associate
Databricks Certified Machine Learning Professional
Databricks Certified Generative AI Engineer Associate

Nice to have:

AWS Certified Solutions Architect
AWS Certified DevOps Engineer – Professional
Google Cloud Professional Cloud Architect
Google Cloud Professional Data Engineer
Additional security, cloud-native, or HPC-related certifications

Soft Skills

  • Structured, reliable, and proactive working style

  • Strong communication skills with technical and non-technical stakeholders

  • Ownership mindset for AI, ML, and HPC platforms

  • Passion for automation, reproducibility, and scalable infrastructure

  • Ability to work closely with R&D, data science, and engineering teams

What We Offer

  • Work on a modern hybrid AI/HPC platform

  • Support advanced R&D and scientific discovery

  • Exposure to ML, LLM, GPU, cloud, and HPC workloads

  • Modern tooling with Kubernetes, Kubeflow, Databricks, Terraform, and GitOps

  • Flexible working models for permanent employees and freelancers

  • Long-term growth in AI infrastructure, MLOps, and LLMOps

or