SITE RELIABLITITY ENGINEER
Full Time
India
Posted 3 years ago
Dear Candidate,
We are urgently looking for SITE RELIABILITY ENGINEER for our prestigious client. Kindly find a detailed job description
Designation : SITE RELIABILITY ENGINEER
Experience : 3-7Yrs
CTC : Not a constraint for right candidate
Job Location : WFH (REMOTE WORK)
About the Job
- Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to users; needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on the capacity and performance of our system. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation.
- On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to us, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
- SRE’s culture of diversity, intellectual curiosity, problem-solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
Responsibilities
- Engage in and improve the whole lifecycle of services—from inception and design, deployment, operation, and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
- Lead sustainable incident response, blameless postmortems, and production
- improvements that result in direct business opportunities for Organization.
- Manage individual project priorities, deadlines, and deliverables.
- Provide guidance to other team members on managing end-to-end availability and performance of mission-critical services, on building automation to prevent problem recurrence, and on building automated responses for non-exceptional service conditions.
- Able to work in shifts
Minimum qualifications:
- Bachelor’s degree in Computer Science, a related technical field involving
- software/systems engineering, or equivalent practical experience.
- Experience programming in at least one of the following languages: C, C++, Java,Python, or Go.
- Experience with algorithms and data structures.
- 3-5 years of experience in computing, distributed systems, storage, or networking.
Preferred qualifications:
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- Ability to debug, optimize code, and automate routine tasks.
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
- Experience with algorithms and data structures and/or Unix/Linux systems internals(e.g., filesystems, system calls) and administration.
If interested kindly revert with your updated resume asap
Job Features
Job Category | IT |