Site Reliability Engineer – DevOps Requisition ID 6010 – CLASTUDENT

Job title: Site Reliability Engineer – DevOps Requisition ID 6010

Company: Qualys

Job description: We are seeking a highly motivated and talented Lead Site Reliability Engineer to work on Qualys’ DevOps Toolchains. Working with a team of engineers and architects, you will combine software development and systems engineering skills to build and run scalable, distributed and fault-tolerant systems.

The ideal candidate will write software to optimize day to day work through better automation, monitoring, alerting, testing and deployment.

Responsibilities

Communicate effectively with the DevOps managers on release milestones, sprints and roadmap activities
Co-develop and participate in the full lifecycle development of DevOps tool chains from inception and design, deployment, operation and improvement by applying scientific principles.
Increase the effectiveness, reliability and performance of DevOps tool chains by identifying and measuring key indicators, making changes to the production systems in an automated way and evaluating the results.
Support DevOps team before the technologies are pushed for production release through activities such as system design, capacity planning, automation of key deployments, engaging in building a strategy for production monitoring and alerting and participate in testing/verification process.
Ensure that the DevOps tool chains are maintained properly by measuring and monitoring availability, latency, performance and system health.
Advice the DevOps team to improve the reliability of the systems in production and scale them based on need.
Participate in the development process by supporting new features, services, releases and hold an ownership mindset for the DevOps tool chains
Develop tools and automate the process for achieving large scale provisioning and deployment of cloud platform technologies
Participate in on-call rotation for DevOps tool chains. At times of incidents, lead incident response and be part of writing detailed postmortem analysis reports which are brutally honest with no-blame.
Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting and root cause analysis

Requirements

1+ years of relevant experience in running distributed systems at scale in production.
Expertise in one of the programming languages: Java, Python or Go.
Proficient in writing bash scripts
Good understanding of SQL and NoSQL systems
Good understanding of systems programming (network stack, file system, OS services)
Understanding of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs etc
Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and determining the root cause of incidents.
Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, etc.
Knowledge of best practices related to security, performance, high-availability, and disaster recovery.
Demonstrate a proven record of handling production issues, planning escalation procedures, conducting post-mortems, impact analysis, risk assessments and other related procedures.
Able to drive results and set priorities independently
BS/MS degree in Computer Science, Applied Math or related field

EEO Employer/Vet/Disabled

Expected salary:

Location: Pune, Maharashtra

Job date: Sat, 25 Jun 2022 02:11:44 GMT

Apply for the job now!

Related News

Data Entry Clerk/Work From Home-Jobs/Online Data Entry Jobs/Part-Time Jobs/Typing Jobs/Home-Based

Online Data Entry Jobs/ Work At Home Jobs/ Part-Time Jobs/ Data Entry/ Typing Jobs/ Home-Based Jobs

Data Entry Jobs available/ Online Data Entry Jobs/ Work From Home-Jobs/ Data Entry Jobs/ Typing Jobs

Junior Project Financial Analyst (India)