As a Principal, you'll lead our SRE and DevOps teams, driving the design, automation, and reliability of our infrastructure. Your expertise will be crucial in ensuring our systems are secured, robust, scalable, and efficient.
- Technical Leadership: Provide technical leadership to the SRE and DevOps teams, guiding them in designing, implementing, and maintaining highly available and scalable infrastructure.
- Exceptional Problem Solving: Candidates must be able to demonstrate the ability to start from an unknown, design, and implement novel solutions. A primary responsibility will be contributing to org wide Ramp;D efforts by prototyping next generation infrastructure solutions in coordination with engineering teams. A mastery of Python and Bash is a must, knowledge of other programming/scripting languages a bonus.
- System Architecture: Collaborate with software engineers and architects to design, develop, and maintain infrastructure solutions that support our applications and services.
- Automation: Champion automation across the development and operations lifecycle to increase efficiency, reduce manual work, and minimize downtime.
- Reliability and Performance: Develop strategies and implement practices to maintain high system reliability, availability, and performance, while continuously monitoring and improving system health.
- Security: Prioritize DevSecOps methodology and align with InfoSec on joint initiatives. Ensuring that infrastructure and applications meet security and compliance standards.
- Incident Management: Lead incident response and post-mortem activities, with InfoSec leadership, to identify root causes, implement preventive measures, and improve system resilience.
- SSDLC: Design, implement, and monitor Secure Software Development Life Cycle process in alignment with InfoSec.
- Infrastructure as Code (IaC): Manage and expand the use of Infrastructure as Code tools and practices to ensure reproducibility and consistency in our environments.
- Scalability: Work on scaling infrastructure horizontally and vertically to support growing demands while optimizing costs.
- Monitoring and Alerting: Implement and maintain robust monitoring, alerting, and observability solutions to proactively identify and address issues.
- Cost Optimization: Continuously optimize cloud resource usage to maximize efficiency and reduce operational costs.
- Documentation: Create and maintain comprehensive documentation for infrastructure, processes, and best practices.
- Mentorship: Provide mentorship and technical guidance to junior engineers, fostering their professional growth.
Requirements
- Bachelor's or Master's degree in Computer Science or a related field, or equivalent work experience.
- 7+ years of proven experience as a Site Reliability Engineer, DevOps Engineer, or a similar role in a senior or leadership capacity.
- 7+ years of strong expertise in cloud platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Kubernetes).
- 5+ years of experience in implementing and maintaining robust monitoring, alerting, and observability solutions to proactively identify and address issues.
- 5+ years of experience SIEM
- Proficiency in scripting and programming languages (e.g., Python, Bash, Go, Ruby).
- Deep knowledge of Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation.
- Deep knowledge of development languages such as Node.js, Java, PHP, and Go
- 7+ years of experience with CI/CD pipelines and automation tools (e.g., Jenkins, CircleCI, Github, GitLab CI/CD).
- Strong understanding of DevSecOps and InfoSec methodologies.
- Strong understanding of networking and data storage technologies.
- Strong database technology knowledge and type of data structure.
- Excellent problem-solving skills and the ability to troubleshoot complex issues.
- Strong communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams.
- Certifications such as AWS Certified DevOps Engineer, AWS Certified Solutions Architect, or Kubernetes certifications are a plus.
- Experience with Game Engine platforms such as Unity and Unreal and their infrastructure needs is a plus.