Operations¶
All of our solutions are very tightly engineered and completely under our control. We ship all of our software via RPM; and every 30 minutes systems run their Chef/orchestration.
Monitoring¶
data:image/s3,"s3://crabby-images/82cd8/82cd89f6754939dc522fe6c8fb13f94ed25cb207" alt="../_images/screenshot-grafana.png"
Monitoring Dashboard¶
We utilise the Grafana ecosystem to monitor all of our customer infrastructure, with most of our Products having dedicated Dashboards.
data:image/s3,"s3://crabby-images/dde35/dde35b750cf0824e44b98530a4883891fcfd0467" alt="../_images/screenshot-loki.png"
Logging Dashboard¶
All system logs are remotely shipped into our Loki; where we can run all sorts of scripted and adhoc queries to get to the root cause of an issue.
We ship a comprehensive set of alerts for which our AlertManager does real-time management.
Incidents¶
All of our systems are Infrastructure as Code; so when there is an incident, a software development process is undertaken to get to resolution; this normally involves an Inspec control being written to prove the issue (and the resolution); then a new software release via our RPM/Entitlements (or a new version of a Chef artifact).