Engineering Leader

Job title: Engineering Leader

Company: Delta Air Lines

Job description: About Delta Air Lines

Delta Air Lines is the U.S. global airline leader in safety, innovation, reliability and customer experience. Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our reputation for award-winning customer service. With our mission of connecting the people and cultures of the globe, Delta strives to foster understanding across a diverse world and serve as a force for social good. We here at Delta are changing the way we do business from top to bottom, and IT is leading the way. Technology drives innovation that enrich our customer experience when it comes to seamless travel through Delta flights. The stage is ripe for disruption. We are calling out to the thought-leaders as we strive to create meaningful and innovative solutions that help us realize our vision. Join Delta Technology Hub (DTH) in Bengaluru, India, to ideate with some of the best minds and unleash a wave of innovation.

Job Description

SRE Engineering Leader

Delta IT is on a journey to becoming the best IT organization in the airline industry, a journey of transformation. We are changing the way we do business from top to bottom as we strive to create meaningful and innovative solutions and are looking for team members to help us realize our vision.

Delta employees are thinkers, doers, and innovators. We are proactive. We are collaborative. We deliver impact to our customers. Let’s continue on our transformation journey in becoming a world-class IT organization while continually working at the world’s best airline!

JOB OVERVIEW:

We’re seeking a strong leader to manage the new Site Reliability Engineering (SRE) Execution team in the Software Engineering organization.Delta IT has embarked on modernizing its technology solutions in the cloud. The SRE Engineering Leader will lead a team of SREs who will work with Delta IT development squads to implement best practices for reliability and performance with the applications and services they support as they migrate into the cloud. Our ideal candidate is well-versed in modern cloud-based and on-prem architecture and experienced in leading teams responsible for implementing monitoring, alerting, and ops automation to reliably operate and maintain mission-critical and mission-vital production software applications.

TheSRE Engineering Leader works along with the SRE team to help developers to improve the Reliability and Resiliency of Delta software solutions to meet the business requirements by implementing SRE tools, processes, and best practices.SRE is what happens when you ask a software engineer to design an operations function.The SRE team works with development teams to advise on how to design, develop, and test for reliability, and to automate operational tasks for applications. The SRE team also helps troubleshoot incidents to address failure patterns, automate remediation through runbooks, and document application optimization.

RESPONSIBILITIES:

  • Support, influence, and communicate the mission, goals, objectives, and practices of the SRE team, the Software Engineering portfolio, and Delta IT.
  • Demonstrate leadership across multiple projects, peer groups, and departments in thought, initiative, responsibility, teamwork, actions, and culture.
  • Effectively manage relationships with IT leadership, Delta business leadership, and external customers/vendors working with Delta IT applications.
  • Lead, motivate, and develop staff. Lead the people; manage the work.
  • Help application development squads and product owners set service-level objectives (SLOs), service-level agreements (SLAs), and service-level indicators (SLIs) for the applications and services they support to ensure that Delta technology remains available and functional to support the world class Delta Air Lines operation.
  • Ensure the observability platform is optimized and operating to support (and oversight) of defined SLI, SLO, and SLAs on the portfolio of application services.
  • Work with the application development squads and the Enterprise Monitoring team to ensure applications are well-monitored and that any potential service disruption is detected at the earliest point, even before the failure when possible.
  • Drive implementation and evolution of DevOps reliability practices within the application development squads, including the building of CI/CD pipelines with thorough functional, performance, and security testing.
  • Work closely with development teams to evaluate the health, stability, and reliability of applications.
  • Utilize monitoring, alerts, dashboards, and management tools to ensure the availability, reliability and performance of applications and services.
  • Proactively work to implement and improve automation of applications tasks.
  • Collaborate with development teams as well as monitoring and SRE subject matter experts to select enterprise tools and set standards for monitoring and ops automation.
  • Drive continuous improvement in on-call incident response and incident prevention processes for Delta application development squads.
  • Keep abreast of industry reliability trends through benchmarking, and participation in professional associations, conferences, etc. in order to lead the strategic direction of the SRE practice at Delta.
  • Participate as needed in high-severity incident calls for Delta applications, providing thought leadership, coordinating incident triage, and escalating with related support teams as necessary.
  • You will guide and mentor teams on observability, scalability, and resiliency of web platforms.
  • Participate in the analysis, design, develop, testing, and implementation activities within the areas of responsibility.
  • Assist team in setting up and maintaining SLIs, SLOs, and error budgets for systems and applications.
  • Perform advanced troubleshooting of incidents in mission-critical systems and participate in preventative problem management activities. Reduce or eliminate manual/repetitive tasks.
  • Improving the resilience and reliability of the systems to reduce or eliminate manual intervention on service recovery

REQUIREMENTS:

  • Proven understanding of web technologies and distributed application architecture is required
  • Proven understanding of full life cycle software development methodologies is required
  • Proven ability to troubleshoot complex application logic flow is required
  • Excellent analytical, decision-making, and problem-solving skills
  • 12 or more years of experience in roles related to software engineering
  • 4 or more years of experience working in application support or Site Reliability Engineering
  • 4 or more years of experience in a leadership role on a development or engineering team
  • Demonstrated experience leading the implementation of SREpractices in development teams with diverse tech stacks.
  • You will ensure the appropriate SLA/SLO dashboards are developed.
  • Demonstrated experience leading the SRE guild or SRE chapter
  • Strong understanding of industry best practices for Site Reliability Engineering and ops automation.
  • Experience with ops automation using a scripting language such as Python or Ansible.
  • Site Reliability Engineering: Knowledge of the theories and methodologies of reliability engineering; ability to design, develop and support various tools, services and applications to maintain a reliable site environment.
  • Performance Measurement and Tuning: Knowledge of system performance, testing, and programming; ability to monitor, measure, and optimize system performance and network communication.
  • CI/CD Pipeline: Knowledge of concepts, values, and tools applied in building Continuous Integration(CI), Continuous Delivery, and Continuous Deployment(CD) pipelines; ability to design, build, implement and maintain CI/CD pipelines to achieve the automation of software delivery process.
  • Software Release Management: Knowledge of strategies, practices, and tools for managing versions and distribution of software products and enhancements; ability to evaluate and improve release management practices and tools
  • Application Maintenance: Knowledge of production applications; ability to monitor application functions and resolve issues to maintain optimal conditions for system applications.
  • Software Engineering: Knowledge of software engineering; ability to deliver new or enhanced software products.
  • Agile Development: Knowledge of agile methodologies and the agile development lifecycle; ability to utilize formal agile methodologies, disciplines, practices, and techniques for the delivery of new and enhanced applications.
  • Container: Knowledge of concept, functions, and capabilities of container tools and techniques; ability to effectively apply containers in various IT business environments
  • Cloud Platform: Knowledge of the products and services regarding cloud platforms; ability to utilize related tools and technologies to develop cloud solutions and deploy applications on cloud platforms.
  • Proficient in production systems design including High Availability, Disaster Recovery, Performance, Efficiency, and Security user, application performance, system, log, time-series, and dashboarding
  • Familiarity with Open Source concepts and tools like Prometheus, Grafana, ELK etc.
  • Knowledge of APM fundamentals or experience in tools like New Relic or AppDynamics is good to have

PREFERRED QUALIFICATIONS:

  • Bachelors degree in Computer Science, Information Technology, or a related field is preferred.
  • Experience with an APM tool such as Dynatrace, New Relic, AppDynamics, or Datadog is preferred.
  • AWS Certified SysOps Administrator or AWS Certified DevOps Engineer certification is a plus.
  • Experience with airline applications and infrastructure technology is a plus.
  • Experience leading the SRE Guild or chapter is a plus.
  • Experience developing ops automation in Tekton pipelines is a plus.
  • Experience developing applications and/or automation running in Red Hat OpenShift is a plus.

Expected salary:

Location: Bangalore, Karnataka

Job date: Sun, 06 Nov 2022 06:03:54 GMT

Apply for the job now!