As an SRE, you will be responsible for responding to incidents and escalations. This includes on-call support and escalation support that may be required after office hours and planned during the weekend. A support duty roster shall be implemented.
Main responsibilities and activities: Implement solution monitoring and observability monitoring, automate detections and responses.Implement SLI and SLO measurements and monitoring in our Solution Monitoring.Conduct service improvement actions and review with the team using data from SLI and SLO.Troubleshoot incidents, perform post-incident analysis, and conduct root cause analysis.Implement workarounds to avoid recurrence of incidents and improvements to monitoring detection.Implement observability monitoring and perform distributed tracing analysis of applications.Deploy new application releases to the preproduction and production environments.Participate and contribute to automation in deployment, automated testing, and monitoring detection.Collaborate with the SQC team on testing automation deployment and DevOps on continuous delivery.Participate in planning and review sessions with Development, DevOps, and Platform teams.Expand and grow the technical knowledge, skillsets, and expertise expected of an SRE.Create and document any artifacts related to SRE practices, such as good practices, patterns, customized dashboards, workarounds, troubleshooting methods, and solution monitoring improvements.This is a hybrid position: 3 days at the office and 2 days remote.
Minimum Requirements: At least 3 years involved in software development.2 years related to IT operations, IT support, or basic system administration.Experience in application maintenance, especially in application troubleshooting, bug detection, fixing, testing, and application management, is a must.
#J-18808-Ljbffr