The Technical Operations team is responsible for ensuring the stability, reliability, and performance of our production systems. In this pivotal role as a Technical Operations Engineer, you will be instrumental in maintaining, troubleshooting, and enhancing the performance and reliability of our production systems, with a special emphasis on complex Web3 technologies. This position demands a unique blend of Web3 expertise, technical skills, analytical prowess, and a collaborative approach to ensure operational excellence.
What You'll DoChain Operations and Research: Lead the research, deployment, and operational management of new blockchain networks, ensuring efficient launch processes, thorough testing, and ongoing optimization of chain performance and reliability.Advanced Web3 Support: Tackle complex Web3 issues, including conducting thorough post-mortems, providing technical direction in customer calls, and leading remediation efforts during incidents.System Monitoring and Analytical Skills: Utilize advanced analytical and dashboard skills, including proficiency in tools like Grafana or DataDog, to monitor system performance and health.Process and SLA/SLO Management: Define, enforce, and ensure the organization meets stringent SLO/SLA objectives, contributing to the overall health and reliability of our platform services.Hands-On Problem Solving: Innovate to solve complex challenges quickly and efficiently, while being disciplined in task follow-through to minimize technical debt and implement preventive measures.Collaboration and Support: Work closely with the Support L1 team and other cross-functional teams, fostering a collaborative environment to maintain system efficiency.Proven Experience: At least 5 years in Technical Operations, SRE, or a similar role, with a deep understanding of Linux/Unix systems.Deep Blockchain / Web3 Expertise: Ability to handle complex Web3-related issues with proficiency, including troubleshooting JSON-RPC responses, analyzing validator logs, and working with chain foundations directly on improving network performance.System Optimization: Skilled in system optimization, including benchmarking using in-house tools, cost analysis and optimization, and system-level tuning by comparing various cloud providers, hardware configurations, and kernel parameters.Analytical and Dashboard Proficiency: Demonstrated expertise in using tools like Grafana or DataDog for detailed system analysis and monitoring, essential for proactive system management and data-driven decision-making.SLA/SLO and Incident Management Expertise: Proven ability in defining and adhering to SLA/SLO objectives, coupled with efficient incident management using tools like PagerDuty, ensuring operational reliability and customer satisfaction.Proactive and Hands-On Approach: A proactive mindset with a hands-on approach to problem-solving, capable of innovating under pressure and committed to reducing risks and technical debt.Communication and Collaboration: Excellent communication skills, with the ability to collaborate effectively across teams and with various stakeholders.Personal Attributes: High energy, resilient, with a can-do mentality and a strong work ethic. Integrity, honesty, and maturity are key, along with a commitment to continuous improvement and a self-starter attitude.Knowledge of database systems: Knowledge of database systems such as ScyllaDB, Redis, and Postgres.Experience with WAF optimization: Experience with WAF optimization and alerting, particularly with CloudFlare.Familiarity with modern web hosting technologies: Familiarity with modern web hosting technologies, including lambda functions and caching strategies.
#J-18808-Ljbffr