Site Reliability Engineer (Senior) ID27316AgileEngine is one of the Inc. 5000 fastest-growing companies in the U and a top-3 ranked dev shop according to Clutch.
We create award-winning custom software solutions that help companies across 15+ industries change the lives of millions.
If you like a challenging environment where you're working with the best and are encouraged to learn and experiment every day, there's no better place - guaranteed!
What you will doDay-to-day management of alerts, checking systems, and escalating issues as necessary;Be part of a team that provides 24×7 on-call support for critical SaaS events;Available in case of emergencies when team members are not available or need help;Documentation of issues and remediation steps;Proactively create appropriate monitors in the EKS/K8S ecosystem;Deploy to EKS/K8s cluster using Terraform and Helm;Learn and maintain existing infrastructure running under Docker Swarm;Improve existing infrastructure health by implementing checks and scripts to correct known issues;Maintenance and development of deployment code;Automating tasks that are currently executed manually;Implement/integrate new technologies in our Cloud Infrastructure;Collaborate with other teams and departments to provide the highest level of support and assistance;Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes;Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers;Perform RCA and take necessary corrective actions to prevent the recurrence of issues;Create and assign alert-related actions to the appropriate team after the investigation;Handle support requests for environment-specific actions;Identify and provide automation requirements to improve RCA.Must havesHands-on AWS Cloud Engineer;Working knowledge of EKS/Terraform/Helm;Working Experience with Docker and Docker Swarm;Good understanding of AWS IAM roles and policies;Logging and Monitoring AWS Resources using CloudWatch logs;Experience working with Linux environment;Proficient in Bash and/or Python scripting;A strong understanding of web technologies such as REST APIs;Working Experience with monitoring solutions, such as Grafana, and Prometheus;Excellent oral and written communication skills;Customer-facing communication skills to effectively explain issues and RCAs to them;Experience in Product/Application Support for SaaS-based products;Understanding of APIs, Databases, Systems Architecture, and Design;Designing, implementing, and operating in a DevSecOps;Upper-intermediate English Level.The benefits of joining usProfessional growth: Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.Competitive compensation: We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.A selection of exciting projects: Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.Flextime: Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.Nivel de antigüedadIntermedio
#J-18808-Ljbffr