Engineering Farm Architect

  Clinical Research

Job title: Engineering Farm Architect

Company: Nvidia

Job description: For two decades, we have pioneered visual computing, the art and science of computer graphics. With our invention of the GPU – the engine of modern visual computing – the field has expanded to encompass video games, movie production, product design, medical diagnosis and scientific research. Today, we stand at the beginning of the next era, the AI computing era, ignited by a new computing model, GPU deep learning. This new model – where deep neural networks are trained to recognize patterns from massive amounts of data – has shown to be deeply effective at solving some of the most complex problems in everyday life.

Engineering Farm Architect is responsible for architecting solution around our large compute cluster to make it work efficiently and improve the user experience for customer as well as engineers supporting the cluster. Much of our software development focuses on eliminating manual work through automation, performance tuning, and growing the efficiency of production systems. Practices such as limiting time spent on reactive operational work, blameless postmortems, and proactive identification of potential outages factor into iterative improvement that is key to product quality and interesting and dynamic day-to-day work. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow.

What you will be doing:

Independently Architect, Design solutions with OO approach using right Data Structures and efficient algorithms, implement with regular SDLC process that includes requirements gathering, OO design, test, deploy, & release.

Support large scale infrastructure with monitoring, logging, and alerting with promised uptime.

Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement.

Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity management, and launch reviews.

Maintain infra and services once they are live by measuring and monitoring availability, latency, and overall system health.

Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

Practice sustainable incident response and blameless postmortems.

Understand complex and vast infrastructure and support it during on call weeks

Work with different SME and help provide quality resolution to the production issues to the customer

What we need to see:

BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.

Experience with OO Design, Algorithms, data structures, and software design.

Experience in one or more of the following: Python, Perl, Java, C, C++, Go, or Ruby using Object oriented approach.

Experience in mentoring junior engineers or leading a team.

Basic understanding of SQL & NoSQL Data platforms, database queries, and data analysis.

Interest in crafting, analyzing, and fixing large-scale distributed systems.

Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.

Ability to debug and optimize code and automate routine tasks.

Ability to learn quickly and adapt to different platforms as per the need of the project.

Ways to stand out of the crowd:

Demonstrated experience with architecting and building scalable and maintainable tool following SW best practices

Demonstrated experience with leading a project from inception to completion along with significant independent contribution

Good hands on experience on schedulers like LSF and SLURM

Good understanding on Linux Administration or done automation around it

Experience is debugging infrastructure or UNIX related issues

Expected salary:

Location: Bangalore, Karnataka

Job date: Tue, 05 Jul 2022 03:16:39 GMT

Apply for the job now!