Get jobs like this by email
First name, email, subscribe.
Job Details
- Status
- Active
- Category
- Posted
- May 18, 2026
- Expires
- Aug 16, 2026
- Work style
- Hybrid
About the Role
We are seeking an experienced (5+ years), motivated, and hands-on Cloud Platform DevOps Engineer to join our North American AI and DevOps Platform Engineering team. In this critical role, you will be responsible for enhancing the stability, reliability, and performance of our AI and DevOps platforms, which support a diverse ecosystem of AI applications, developer tools, and CI/CD pipeline technologies across the organization. You will actively contribute to infrastructure design, implementation, and maintenance, and facilitate agile development within the team. The ideal candidate is a strong technical leader who champions agile practices, drives continuous improvement, and excels in both coding and coaching, possessing a deep understanding of infrastructure and operational considerations for Artificial Intelligence and Machine Learning initiatives, with proven hands-on experience in DevOps tools and technologies such as Kubernetes, Docker, HELM, Ansible, DevOps tools, or similar CI/CD platforms, and proficiency in scripting and automation (e.g., Python, Bash). We are looking for someone with a track record of implementing scalable, resilient, and high-performance solutions, coupled with strong communication and collaboration skills, and an ability to mentor and guide junior team members, as you join a dynamic team committed to fostering innovation and collaboration.
Responsibilities:
Hands-on DevOps & Infrastructure Engineering
Design & Implementation: Lead the design, implementation, and ongoing management of secure, scalable, and resilient infrastructure components.
Secret & Certificate Management: Administer and maintain secret and certificate management solutions using HashiCorp Vault, including policy definition and integration.
Database Management: Perform hands-on administration and optimization of database systems (PostgreSQL, Oracle, MongoDB), including performance tuning, backup, and recovery strategies.
Workflow Orchestration: Deploy, monitor, and troubleshoot data orchestration workflows using Apache Airflow, and develop/optimize DAGs.
Messaging Systems: Implement and manage messaging queues such as Kafka and IBM MQ, including cluster setup and configuration.
API Integrations: Develop, maintain, and troubleshoot RESTful API and SOAP integrations critical for system connectivity.
Build Automation: Implement and optimize build and deployment processes using Gradle.
Container Orchestration: Design, implement, and manage container orchestration platforms with Kubernetes and Helm, including integration with CyberArk and HashiCorp for secrets management. Create, debug, and troubleshoot Kubernetes PODs, Jobs, and Deployments using YAML.
Storage Management: Configure and manage persistent storage solutions including PVC, SONiC NAS, and S3, with an awareness of storage requirements for AI/ML workloads.
Networking & Load Balancing: Set up and maintain load balancing solutions (e.g., Nginx, HAProxy, AWS ELB/ALB, Kubernetes Ingress controllers) for high availability and performance.
Monitoring & Logging: Implement, configure, and utilize comprehensive monitoring and logging solutions (Prometheus, Grafana, ELK Stack) to ensure system health and proactively identify issues, including those relevant to AI/ML applications.
Automation & Scripting: Develop robust automation scripts and tools using Python, Bash, Go, or similar languages to streamline operations and enhance efficiency.
Incident Response: Participate actively in on-call rotations, responding to and resolving critical incidents with hands-on troubleshooting.
Documentation: Create and maintain technical documentation, architecture diagrams, and runbooks for infrastructure components and processes.
Impediment Resolution: Proactively identify and resolve technical impediments and process bottlenecks within the team and across organizational boundaries, paying special attention to unique challenges posed by AI/ML infrastructure.
Backlog Refinement: Collaborate closely with stakeholders (e.g., product owners, technical leads) to ensure a well-defined and prioritized backlog for infrastructure work, technical debt, operational improvements, and AI/ML platform needs.
Process Improvement: Drive continuous improvement in the team's agile and DevOps practices, helping them adapt and optimize their workflow for maximum efficiency and quality.
Required Qualifications:
Hands-on DevOps & Infrastructure Engineering Expertise
Secret & Certificate Management: Proven hands-on experience with HashiCorp Vault (installation, configuration, policy management, integrations).
Database Administration: Strong hands-on experience with at least two of PostgreSQL, Oracle, or MongoDB (installation, tuning, replication, backup/restore).
Workflow Orchestration: Hands-on experience deploying, managing, and developing DAGs for Apache Airflow.
Messaging Systems: Solid hands-on experience with Kafka and/or IBM MQ (cluster setup, topic management, producer/consumer configuration).
Container Orchestration: In-depth hands-on experience with Kubernetes and Helm, including YAML configuration, troubleshooting PODs/Jobs/Deployments, and integrations with secrets management (CyberArk, HashiCorp).
Storage Management: Practical experience with Kubernetes PVCs, Persistent Volumes, S3, and/or enterprise NAS solutions (e.g., SONiC NAS).
Monitoring & Logging: Strong hands-on experience with Prometheus, Grafana, and the ELK Stack (setup, dashboard creation, query optimization, alert configuration).
Scripting & Automation: High proficiency in Python, Bash, or Go for automation, tooling development, and system administration.
Cloud Platforms: Extensive hands-on experience with at least one major cloud provider (AWS, Azure, GCP).
Infrastructure as Code (IaC): Proficiency with IaC tools such as Terraform or Ansible.
CI/CD: Experience designing, implementing, and maintaining CI/CD pipelines (e.g., Jenkins, GitLab CI, GitHub Actions).
API Integration: Experience with RESTful API and SOAP web services.
Build Tools: Proficiency with Gradle for build automation.
AI/ML Awareness & Support
AI/ML Infrastructure Concepts: Understanding of the specific infrastructure requirements for deploying, managing, and scaling Artificial Intelligence and Machine Learning workloads (e.g., GPU resources, specialized storage, MLOps pipelines).
Data for AI/ML: Awareness of data management strategies and data governance principles relevant to AI/ML models and training datasets.
Monitoring AI/ML Systems: Familiarity with metrics and monitoring approaches for the performance and health of AI/ML applications and their underlying infrastructure.
Agile & Leadership Skills
Working Scrum Master Experience: Proven experience acting as a Scrum Master within a technical team where you also performed significant hands-on engineering.
Agile & Scrum Mastery: In-depth knowledge and practical application of Agile principles and the Scrum framework.
Facilitation & Coaching: Excellent facilitation, coaching, and mentoring skills within a technical context.
Communication: Strong verbal and written communication skills, able to bridge technical and process discussions.
Technical Leadership: Ability to guide technical discussions, influence architectural decisions, and drive best practices.
Preferred Qualifications:
Certified ScrumMaster (CSM) or Professional Scrum Master (PSM) certification.
Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, GCP Professional Cloud DevOps Engineer).
Experience with site reliability engineering (SRE) principles and practices.
Familiarity with other Agile scaling frameworks (e.g., SAFe, LeSS).
Exposure to MLOps platforms or tools (e.g., Kubeflow, MLflow).
Education:
Bachelor's or Master's degree in computer science, Engineering, or a related technical field or equivalent experience
------------------------------------------------------
Job Family Group:
Technology
------------------------------------------------------
Job Family:
Applications Development
------------------------------------------------------
Time Type:
Full time
------------------------------------------------------
Primary Location Full Time Salary Range:
$94,300.00 - $141,500.00
------------------------------------------------------
Most Relevant Skills
Please see the requirements listed above.
------------------------------------------------------
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.
------------------------------------------------------
Automated Processing and AI
We use automated processing, including artificial intelligence, for our legitimate business interests (or our reasonable and appropriate business purposes) to identify and align the candidate's skills and abilities with a specific job opening. Additionally, if you so choose, or consent, we can match your skills and abilities to other suitable roles at Citi.
Importantly, all our hiring processes and decisions, including determining your suitability for a role, are conducted, checked, and decided by individuals. Our automated processing and AI do not involve relying on automatic or autonomous decision-making. Please refer to any Jurisdictional Considerations, with specific provisions for your country (where relevant) for further details.
------------------------------------------------------
This job opening is for an existing job vacancy.
------------------------------------------------------
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.
CV Match Tool
Check if your CV matches this job before applying
This job accepts direct applications - no recruiter in between. Posted May 18, 2026.
Apply on Company SiteMore Jobs in Mississauga Ontario Canada
- Applications Development Sr Programmer Analyst- AVP -Mississauga
Citi • Mississauga Ontario Canada • Hybrid
Software Development • Posted 3w ago
- GenAI Senior Platform Engineer - Python, VP
Citi • Mississauga Ontario Canada • Hybrid
Automation & AI Engineering • Posted 3w ago
- Java Full Stack Dev Tech Lead
Citi • Mississauga Ontario Canada • Hybrid
Software Development • Posted 3w ago
- Senior Lead Software Engineer – Regulatory Reporting & Big Data
Citi • Mississauga Ontario Canada • Hybrid
Software Development • Posted 3w ago
Remote Jobs in Mississauga Ontario Canada
No same-location remote jobs were found, so here are remote DevOps & Cloud jobs from other countries.
- Manager Data Platform
Cohere Health • Hyderabad • Remote
DevOps & Cloud • Posted 47m ago
- Senior Cloud Platform Engineer (AWS/GCP)
Ensono • United Kingdom • Remote
DevOps & Cloud • Posted 51m ago
- Senior Observability Engineer
Ensono • Hyderabad • Remote
DevOps & Cloud • Posted 22h ago
- [SO] DevOps Engineer (Azure)
Bosch Group • Ho Chi Minh City • Remote
DevOps & Cloud • Posted 1d ago
- Director - Application Migration/integration
Ensono • Hyderabad • Remote
DevOps & Cloud • Posted 1d ago
Articles You May Like
- Best Cybersecurity Certifications in 2026 You Should Have to Land a Job
AI and Automation • Jun 9, 2026
Cybersecurity certifications are more popular than ever, but many professionals are chasing the wrong credentials for their career goals. In 2026, the smartest move isn't collecting certificates; it's choosing the one that aligns with the job you actually want. From Security+ and CISSP to CCSP, CISM, OSCP, and GIAC, here's what matters most before you invest your time and money.
- How to Become an AI Engineer in 2026
Career Advice • Jun 7, 2026
AI engineering in 2026 is no longer just about learning Python or training machine learning models. Companies want people who can build real AI systems, integrate them into products, evaluate their performance, and ensure reliability. Here’s why most beginners are preparing the wrong way, and what to focus on instead.
- ChatGPT Skills for Jobs in 2026
AI and Automation • Jun 6, 2026
As ChatGPT becomes a must-have workplace tool in 2026, many job seekers are focusing on the wrong skills. In this article, I explain why employers care less about memorized prompts and more about AI workflow thinking, the ability to use ChatGPT to research, analyze, verify, organize, and produce real business outcomes.
- Why AI Skills Are Becoming the New Career Filter
AI and Automation • Jun 4, 2026
AI is no longer just a bonus skill. In 2026, employers are looking for workers who can use AI to improve real work, not just generate quick answers. This article explains why prompt writing is only the beginning — and why skills like workflow design, AI evaluation, data judgment, risk awareness, and domain expertise are becoming essential for career growth.
- Countries Best for Remote Workers in 2026
Career Advice • May 7, 2026
With 56 countries now competing for remote workers, the decision isn't about finding the "best" destination, it's about understanding where your income level, tax situation, and work style actually align.
Related Jobs
More jobs in DevOps & Cloud that are worth reviewing next.
Manager Data Platform
Cohere Health
VerifiedSenior Cloud Platform Engineer (AWS/GCP)
Ensono
VerifiedHead Of Infrastructure
Family Bank Ltd
VerifiedLead Site Reliability Engineer
Mthree
VerifiedRecently Posted Jobs
Fresh openings users can continue browsing from here.
Aspiring Financial Adviser (Upington / Springbok)
Old Mutual
VerifiedAdvancing Financial Adviser (Upington / Springbok)
Old Mutual
VerifiedAdvancing Financial Adviser (Vryburg / Postmasburg)
Old Mutual
VerifiedAdvancing Financial Adviser (Hartswater / De Aar)
Old Mutual
Verified