We’ve talked about why stuff breaks in the production environment when it didn’t in development (see “Works for me”), how Continuous Improvement / Continuous Deployment helps in failure detection, and how we can set up a “like-production” staging environment in which to test your features.
At this point in the journey, we act as consultants – or Site Reliability Engineers – in the production environment. Remember the various types of pain along the journey, that we help take away? The pain at this stage is the lack of knowing the big picture when it comes to how all the different features are interacting and contributing to the grand vision of what’s trying to be accomplished in your environment. Acting as your Site Reliability Engineer means that we do regular code level investigation for issues through the entire workflow. In addition, we provide proven best-practice consultation on Cloud native software designs.
See this Wikipedia article for a simple high-level summary of the differences between SRE and DevOps.