EugeneRecruiter Since 2001
the smart solution for Eugene jobs

Senior Site Reliability Engineer (SRE)

Company: mParticle
Location: Remote
Posted on: May 3, 2021

Job Description:

Job Description

At mParticle, we are passionate about building software that empowers our customers to make the most of their data. We count on our operations team and site reliability engineers (SREs) to keep our platform at peak performance and high availability, processing over 1 trillion events a month in near real-time, with no interruptions.

We are growing and expanding our customer deployments, and we are currently seeking an experienced senior SRE to join our operations team - someone who can bring experience, as well as fresh ideas, demonstrate a unique and informed viewpoint, who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.

As a Site Reliability Engineer, you will be part developer, part operations, all continuous integration and delivery expert; you will be integral to the design, set up, automation, and maintenance of our entire integration and delivery pipeline. The ideal candidate should have a deep software development background married with effective intercommunication skills to promote collaboration with developers, support engineers, customers, and senior management. They will work closely with development squads, our client-facing teams, and customers, as well as other engineers and developers gathering requirements, architecting, and constantly delivering quality improvements to our platform.

As an mParticle Senior SRE, you will...

  • Be part of PagerDuty rotation responding to platform incidents and provide support for other engineers who are responding to customer issues
  • Use your daily interactions with the platform and your experience and skills to constantly improve our environment and ensure that issues do not reoccur
  • Maintain and augment our monitoring systems so that they alert on symptoms, instead of issues
  • Be proactive and take ownership in identifying, raising, and resolving issues or deficiencies you see anywhere in our environment
  • Produce and improve internal documentation and SOPs where they are missing or lacking quality or details
  • Write new tools and improve existing ones to help automate and remove toil from the team
  • Live-debug applications and issues, and identify, resolve or own resolution for functionality and performance deficiencies
  • Identify, and suggest or resolve performance issues with production applications and their configuration
  • Contribute to our scale goals by identifying areas for improvement that can lead to higher efficiency
  • Automate yourself out of a job

You will be perfect for this role, if you...

  • Have a bachelor's degree in computer science or other highly technical, scientific discipline
  • Are able to program (structured and OO) with one or more high level languages, preferably Python and either C#, Java, or Go
  • Comfortably "own" the Linux shell
  • Have a proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • Have coding experience beyond simple scripts
  • Are experienced in debugging and performance tuning applications
  • Have an eye for edge cases, behaviors, creative solutions
  • Are experienced with configuration management
  • Have an unstoppable urge to fix what is broken
  • Efficiently balance speed/iteration and quality
  • Are experienced with Terraform and Ansible

As a Senior SRE, we expect you to...

  • Fluently follow existing best practices for maintaining supported application and platform health and writing and testing code
  • Make impactful decisions about your technical contributions
  • Understand how our production systems work
  • Handle vague scope or identify improvements in small areas
  • Manage your work with little-to-no supervision
  • Actively collaborate with others through technical documentation
  • Able to troubleshoot and contribute to resolution of moderate to complex production problems, write post-mortems on them
  • Write SOPs for issues encountered and common tasks
  • Able to automate repetitive tasks using purpose-written code or commercially available tool
  • Detect inefficient common operational patterns and processes
  • Design and implement monitoring solutions for common or critical problems

As a technical resource and expert, you should be able to...

  • Handle medium complexity issues' troubleshooting and resolution; be a core resource in troubleshooting and resolving those issues
  • Have sufficient understanding of the mParticle pipeline to be able to assist in troubleshooting medium to complex platform issues
  • Write quality, clean, and maintainable code, following company best practices with minimal guidance
  • Develop sufficient domain understanding to sanity check and ensure the quality of their output, as well as review that of other team members
  • Write custom code of medium to high complexity in at least 2 languages
  • Be the responsible/SME engineer for 2 or more internally-maintained supporting infrastructure components and have general knowledge of all platform components
  • Proactively research and keep up to date on the patterns, advancements, and evolutions of tools and technologies used in the mParticle pipeline
  • Identify problematic patterns in the mParticle applications, processes and tools and suggest and implement resolution options
  • Make small design decisions independently, making appropriate tradeoffs between simplicity and performance
  • Follow existing patterns to create new instances of projects, features, or architecture
  • Create novel architectures of small components within your area of expertise This includes diagramming the architecture and assessing trade-offs made and patterns applied, assessing the effort for the change and approximate timeline
  • Understand the flow control of nearly any system including those outside of your area of expertise, though unable to necessarily suggest improvements to systems outside of your area
  • Properly sense when to engage Security for a review of a potential change
  • Understand techniques used to troubleshoot and fix production bugs and issues
  • Develop solutions/code that reduces future operational burden (e.g. by adding appropriate self-healing, high levels of alerting/monitoring/logging, reducing alert noise, etc.)
  • Ensure that infrastructure resources are not wasted by consistently following provided best practices and rightsizing instances, proactively identify areas that can benefit from changes that lead to cost savings
  • Contribute to the build and release tooling and infrastructure
  • Contribute to defining SLAs and SLIs

You should also be able to...

  • Be successful when working on a large feature or improvement of vague scope
  • Identify and push forward new features or enhancements that improve the functioning of a system or feature
  • Identify problems and contribute well-scoped solutions to the team's roadmap.
  • Focus your work on what is most valuable for the team
  • Make and communicate accurate time estimates for own work, potentially spanning multiple sprints
  • Manage projects that span multiple groups of stakeholders
  • Act as an effective facilitator for team meetings
  • Consistently communicate technical decisions through high-quality design docs, tech talks, and wiki contributions
  • Create documentation, train and mentor others
  • Be the role model for less experienced team members

Lastly, as part of mParticle and our Engineering organization, you should...

  • Participate, own, and improve mParticle technical recruiting, onboarding and branding
  • Act as a brand ambassador for mParticle Engineering
  • Drive the cultural direction of mParticle operations
  • Encourage people to be the best they can

Keywords: mParticle, Eugene , Senior Site Reliability Engineer (SRE), Other , Remote, Oregon

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Oregon jobs by following @recnetOR on Twitter!

Eugene RSS job feeds