Avature's Coverage team is dedicated to maintaining and improving the quality of our monitoring tools and practices as applied during on-call shifts or other related incident-spotting endeavors. The scope of the team ranges from the management and continuous improvement of our servers and service monitoring and alerting to a holistic system reliability view.
As a Cloud Reliability Engineer, you'll strive to implement tools and processes that improve observability, monitoring, and incident management, minimize emergency response time, and provide a pain-free experience for the teams involved in incident management.
Your challenges and objectives
Understand Avature's infrastructure and processes. Contribute to defining standards with a DevOps/SRE mindset and advocate for them. Identify and address weaknesses in our infrastructure to ensure service availability. Develop strategies to mitigate and prevent interruptions in critical services. Occasionally perform troubleshooting on ongoing incidents. Your day-to-day activities
Participate in the definition and implementation of SRE policies and practices. Collaborate with other infrastructure and development teams in the continuous improvement of their services' monitoring and observability. Work with development and engineering teams to implement SRE practices from the early stages of the software development life cycle. Engage in incident management, conducting post-mortem analyses and proposing preventive measures to avoid future disruptions. Occasionally perform troubleshooting on ongoing incidents. About you
Knowledge in observability: logs (ELK stack), metrics (e.g. Prometheus, Grafana), and tracing (e.g. Jaeger, OpenTelemetry). Experience creating and maintaining fault-tolerant and distributed systems. Solid experience in Linux system administration. Analytical and troubleshooting skills. Infrastructure-as-code mindset. Software development (Python, Golang) and configuration management (Puppet, Ansible) skills. Knowledge of incident management and related tools, such as Splunk On-Call. About us
Avature is a market leading enterprise SaaS Solution provider for global talent acquisition and talent management. We have a strong commitment to high quality engineering and customer service and are recognized innovators in the very large company market. We currently work with over 650 companies worldwide, including 110 of the Fortune 500, all of the Big Four consulting firms, the largest banks and manufacturers in the world, and five governments.
We design, build, implement, and support our product ourselves. With 26 releases a year and a strong commitment to innovation and quality engineering, our private cloud platform has become the product choice for the very large global organization.
At Avature, we value opportunities to learn and grow within a dynamic, creative, and collaborative environment. We encourage autonomy and empower our people to approach challenges innovatively while bringing their unique perspective to the table. We offer a career development program that supports continuous learning and thoughtful leadership, and that meaningfully impacts each individual's professional trajectory.
What we offer
A fast-paced, energetic, and engaging environment.
Flexible hours.
Work remotely or come by the office as much as you want.
Four salary reviews per year.
Option to earn part of your salary in US dollars.
Three weeks vacations from the first year.
Four weeks paternity leave.
OSDE 310 health coverage (family plan).
Four days a year to attend events related to professional development.
End of year week off (December 26 to 31).
Internet service expenses.
Birthdays off.
An organizational culture that empowers everyone to be themselves is key to thrive in business, but more importantly, it is a pathway for creating a more equitable society. Avature fosters a diverse and inclusive environment and celebrates that each unique person brings something different to our team. We are committed to considering all qualified applicants equally and to promoting equal opportunities within our organization.
#J-18808-Ljbffr