The Technical Operations team is responsible for ensuring the stability, reliability, and performance of our production systems. In this pivotal role as a Technical Operations Engineer, you will be instrumental in maintaining, troubleshooting, and enhancing the performance and reliability of our production systems, with a special emphasis on complex Web3 technologies. This position demands a unique blend of Web3 expertise, technical skills, analytical prowess, and a collaborative approach to ensure operational excellence.
Timezone: US EST (Eastern Timezone)
What You'll Do
Chain Operations and Research: Lead the research, deployment, and operational management of new blockchain networks, ensuring efficient launch processes, thorough testing, and ongoing optimization of chain performance and reliability.Advanced Web3 Support: Tackle complex Web3 issues, including conducting thorough post-mortems, providing technical direction in customer calls, and leading remediation efforts during incidents.System Monitoring and Analytical Skills: Utilize advanced analytical and dashboard skills, including proficiency in tools like Grafana or DataDog, to monitor system performance and health.Process and SLA/SLO Management: Define, enforce, and ensure the organization meets stringent SLO/SLA objectives, contributing to the overall health and reliability of our platform services.Hands-On Problem Solving: Innovate to solve complex challenges quickly and efficiently, while being disciplined in task follow-through to minimize technical debt and implement preventive measures.Collaboration and Support: Work closely with the Support L1 team and other cross-functional teams, fostering a collaborative environment to maintain system efficiency.
Proven Experience: At least 5 years in Technical Operations, SRE, or a similar role, with a deep understanding of Linux/Unix systems.Deep Blockchain / Web3 Expertise: Ability to handle complex Web3-related issues with proficiency, including troubleshooting JSON-RPC responses, analyzing validator logs, and working with chain foundations directly on improving network performance.DevOps Experience: Proficiency in automation and configuration management tools (e.g., Ansible, Terraform, Consul), and in programming languages such as Python, Go, or JavaScript. Familiarity with containerization technologies like Docker and Kubernetes.System Optimization: Skilled in system optimization, including benchmarking using in-house tools, cost analysis and optimization, and system-level tuning by comparing various cloud providers, hardware configurations, and kernel parameters.Analytical and Dashboard Proficiency: Demonstrated expertise in using tools like Grafana or DataDog for detailed system analysis and monitoring, essential for proactive system management and data-driven decision-making.SLA/SLO and Incident Management Expertise: Proven ability in defining and adhering to SLA/SLO objectives, coupled with efficient incident management using tools like PagerDuty, ensuring operational reliability and customer satisfaction.Proactive and Hands-On Approach: A proactive mindset with a hands-on approach to problem-solving, capable of innovating under pressure and committed to reducing risks and technical debt.Communication and Collaboration: Excellent communication skills, with the ability to collaborate effectively across teams and with various stakeholders.Personal Attributes: High energy, resilient, with a can-do mentality and a strong work ethic. Integrity, honesty, and maturity are key, along with a commitment to continuous improvement and a self-starter attitude.
Knowledge of database systems such as ScyllaDB, Redis, and PostgresExperience with WAF optimization and alerting, particularly with CloudFlareFamiliarity with modern web hosting technologies, including lambda functions and caching strategies