Chennai, Tamil Nadu
Platform Engineering Senior Engineer #1034700Job Description:
Job Title: OpenShift Virtualization Engineer / Cloud Platform Engineer (OpenShift Virtualization)
Job Summary:
- We are seeking a highly skilled and experienced OpenShift Virtualization Engineer to join our dynamic Globally positioned, on premises cloud platform team.
- In this role, you will be responsible for the design, implementation, administration, and ongoing management of our OpenShift Virtualization (KubeVirt) service.
- You will ensure the stability, performance, and security of our virtualized workloads, leveraging your expertise in Kubernetes, OpenShift, and virtualization technologies.
- This position requires a strong understanding of day-2 operations, proactive monitoring, and a commitment to operational excellence.
Key Responsibilities:
- Observability, Monitoring, logging and Troubleshooting
- Implement and maintain comprehensive end to end observability solutions (monitoring, logging, tracing) for the OSV environment, including integration with tools like Dynatrace and Prometheus/Grafana
- Explore and implement Event Driven Architecture (EDA) for enhanced real time monitoring and response.
- Develop capabilities to flag and report abnormalities and identify "blind spots" in
- Perform deep dive Root Cause Analysis (RCA), potentially utilizing available tooling, to quickly identify and resolve issues across the global compute environment
- Find the needle in a haystack/unhealthy bits in the compute universe (Globally) for faster time to resolution
- Monitor VM health, resource usage, and performance metrics proactively
- Monitor for unusual activity that might indicate a compromise or misconfiguration Solution Design & Consulting
- Provide technical consulting and expertise to application teams requiring OSV solutions
- Design, implement, and validate custom or dedicated OSV clusters and VM solutions for critical applications with unique or complex requirements (e.g., specialized appliances) Knowledge Management
- Create, maintain, and update comprehensive internal documentation and customer facing content on the docs site to facilitate self service and clearly articulate platform capabilities
Skills Required:
- Kubernetes, Openshift , deployment, services, Storage Capacity Management, Linux, Network Protocols & Standards, Python, Scripting, Dynatrace, VMware, Problem Solving, Technical Troubleshoot, Communications,
- Ability to communicate and work with cross-functional teams and all levels of management , redhat, TERRAFORM, Tekton, Ansible, GitHub, GCP, AWS, Azure, Cloud Infrastructure, CI/CD, DevOps
Experience Required:
Required Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
- 5+ years of experience in IT infrastructure, with at least 2+ years focused on Kubernetes and/or OpenShift.
- Proven experience with OpenShift Virtualization (KubeVirt) in a production environment.
- Strong understanding of Kubernetes concepts (Pods, Deployments, Services, Storage Classes, Operators, Custom Resources).
- Experience with Linux administration and networking fundamentals.
- Proficiency in scripting languages (e.g., Bash, Python) for automation.
- Experience with monitoring tools (e.g., Prometheus, Grafana, Dynatrace) and logging solutions.
- Solid understanding of virtualization concepts and technologies (e.g., KVM, VMware)
- Excellent problem-solving skills and the ability to troubleshoot complex issues across multiple layers of the stack.
- Strong communication and collaboration skills.
Experience Preferred:
Preferred Qualifications:
- Red Hat Certified Specialist in OpenShift Virtualization or other relevant certifications.
- Experience with Infrastructure as Code (IaC) tools like Ansible, Terraform, or OpenShift GitOps.
- Familiarity with software-defined networking (SDN) and software-defined storage (SDS) solutions.
- Experience with public cloud providers (AWS, Azure, GCP) and hybrid cloud architectures.
- Knowledge of CI/CD pipelines and DevOps methodologies.
Education Required:
- Bachelor's Degree
Education Preferred:
- Master's Degree
Additional Information :
Key Responsibilities:
- Capacity Management
- Conduct capacity planning and forecasting for the OpenShift Virtualization platform, including compute, memory, storage, and network resources, to ensure scalability and prevent resource exhaustion.
- Analyze resource utilization trends and make recommendations for infrastructure scaling, consolidation, or optimization.
- Collaborate with application teams and stakeholders to understand future demand and project capacity needs.
- Develop and maintain capacity models and reports to support strategic planning. OSV Automation & Efficiency
- Develop automation solutions (scripts, playbooks) for repetitive OSV tasks, including configuration changes, VM management (like snapshot removal), auditing, remediation and integration with ticketing systems
- Leverage automation to enable delivering operator updates and changes efficiently at scale
- Implement Site Reliability Engineering (SRE) principles and practices to improve overall platform stability, performance, and operational efficiency
- Role Based Access Control deployment and auditing
- Namespace and Resource Quota management (CPU, Disk and Storage)