Site Reliability Engineer

Location: Atlanta, Georgia
Date Posted: 08-10-2018
Site Reliability Engineer
Atlanta, Georgia
Full Time Opportunity
Our highly respected client in the Atlanta area is currently seeking an experienced Site Reliability Engineer to be responsible for scaling some of the largest software products by automating the application infrastructure, deployment and monitoring of products in production.
Are you passionate about automation and large scale, mission critical software systems? If so, this position is for you!
You will join a group of highly experienced technology professionals that empowers you to make decisions quickly to deliver reliability improvements without the red tape that typically surrounds enterprise environments.
The ideal Site Reliability Engineer will have a passion for automating as much as possible and constantly be on the lookout for areas where operational and code efficiencies can be improved. You will work directly with product engineering teams leveraging XP principles and when you aren’t automating all the things, you will be proactively executing destructive tests, participating in “game day” exercises and related activities to improve the operational readiness of your products.
  • Object-oriented programming language (preferably Java)
  • 3-8 years of experience in production monitoring concepts and implementation including synthetic, real user, application performance, system, log, time-series and dashboarding; including tools like appdynamics, dynatrace, newrelic, splunk, Grafana, ELK, etc.
    • AWS, Oracle DB (PostgresSQL), Cassandra, Redis, Apache Kafka, Sterling, Elastic Search, Jenkins, JavaScript, Confluence
  • Proficient in production systems design including high availability, disaster recovery, performance, efficiency and security
  • Modern scripting language (preferably Python)
  • Modern infrastructure automation toolkit such as Puppet or Chef
  • Linux or Unix based environment experience
  • Modern microservice based architecture and operations
  • Destructive testing methodologies and tools such as chaos monkey
  • CI/CD automation
  • Version control systems such as Git or SVN
  • Cloud computing platform and associated automation patterns experience
  • Experience in defensive coding practices and patterns for high-availability
How will you make an impact?
  • Collaborate and pair with other product team members to create secure, reliable, scalable software solutions
  • Write custom code or scripts to automate infrastructure, monitoring services and test cases
  • Write custom code or scripts to “destructive testing” to ensure adequate resiliency in production
  • Create meaningful dashboards, logging, alerting and responses to ensure that issues are captured and addressed proactively
  • Identify unsecured code areas and implement fixes as they are discovered with or without tooling
Additional Information
  • Stable company & opportunity to move up the ladder
  • Experience reporting to a CIO
  • Moving towards Google Cloud
  • Leveraging leading-edge technologies
  • Awesome work culture and environment
this job portal is powered by CATS