When a software developer dreams of working at a company, it may be to build something new or to further develop products they’re familiar with. Most developers probably don’t think about building foundational infrastructure primitives such as the Configuration Management Database (CMDB). After reading this post, that might change.
A CMDB is vital in the hyper-efficient operation of GoDaddy products and internal services. But what about a CMDB makes it so important? What about a CMDB requires the discipline of software development? How does software development in CMDB make GoDaddy products better? How did GoDaddy evolve its CMDB and make it a trustworthy source of truth for GoDaddy? This article answers those questions and describes how the CMDB has evolved over time at GoDaddy.
A CMDB is a database that stores information about hardware and software assets. It’s used to track and manage infrastructure in an organization’s technology environment. A CMDB usually has details about hardware, software, network devices, applications, services, and their dependencies. Simply put, a CMDB acts as a data warehouse and stores information about IT assets and the relationships between them.
GoDaddy has a centralized CMDB of infrastructure required for efficiently operating GoDaddy products and internal services.
Why a CMDB matters
A CMDB is important to GoDaddy engineers because it provides a complete picture of all physical and virtual hardware that runs GoDaddy products and its internal services. GoDaddy engineers rely on trustworthy data in the CMDB for their security, operational, and business use cases. Trustworthy data is essential to efficient IT Operations Management (ITOM) and data-driven business decision-making. The centralized CMDB enables the hyper-efficient operation of all GoDaddy products and internal services, including those obtained through acquisitions.
Believe it or not, a CMDB is important to GoDaddy customers too. An organized GoDaddy with efficient operations means lower prices and higher customer value. The GoDaddy CMDB is a competitive advantage that enables GoDaddy to pass cost savings on to customers. For example, the CMDB enables a Cost of Goods Sold analysis, which leads to making smarter infrastructure buy decisions in a just-in-time manner. It also means quicker troubleshooting, reduced incidents and downtime for customer websites, and better reliability of GoDaddy products.
Evolution of CMDB at GoDaddy
Prior to 2015, GoDaddy used a homegrown version of CMDB called Server Management Database. GoDaddy acquired several companies over the years, each tracking IT infrastructure in their own unique ways. For example, Host Europe Group (HEG) is a collection of many brands, each with its own CMDB. This caused challenges integrating acquisitions. In 2015 GoDaddy started using ServiceNow, including their ITOM and CMDB products. This change made it easier to integrate acquisitions and bring them into compliance with its centralized CMDB standard. All brands now migrate or integrate with the centralized ServiceNow-based CMDB to provide a single source of truth aggregated from disparate systems of record.
From inception to comprehensive and trustworthy
A CMDB doesn’t just happen. All assets, physical, and virtual, need key attribute values identified and populated. Throughout their lifecycle, assets change. Hardware refreshes take place. The CMDB isn’t a static database. When changes occur, the CMDB updates to reflect those changes. Without an exact accounting of all infrastructure and current state, engineers can’t trust the CMDB. So where do you start?
The best place to start when creating a trustworthy CMDB is to put initial values into the CMDB before and during provisioning. However, with existing deployed infrastructure, and to pick up changes over time, there needs to be a discovery mechanism.
Discovery (in the context of CMDB) means learning what’s out there with its key attributes in a continuous loop. Discovery finds infrastructure that skipped the initial CMDB onboarding process. Discovery also reaffirms key attributes of existing infrastructure over and over again. GoDaddy has used many discovery software packages, some vendor-supplied and some homegrown. These continuous discovery processes are paramount to building trust in the CMDB data.
Building trust around the data in the CMDB became crucial to GoDaddy. Hence, a program called CMDB Trustworthiness emerged and evolved. The CMDB Trustworthiness Standard is an internal document that describes CMDB attribute trustworthiness as the level of trust CMDB attributes have earned. It defines the scope and rules for calculating the CMDB Attribute Trustworthiness score. The following is an excerpt from the CMDB Trustworthiness Standard describing trustworthiness calculation, with an example.
How’s CMDB Attribute Trustworthiness measured?
CMDB Attribute Trustworthiness is the % of infrastructure attributes verified as correct.
Trustworthiness (T%) = Count of compliant attributes / Count of attributes
Count of compliant attributes: Total count of in-scope infrastructure attributes specified in this standard that comply with the respective trustworthiness rules defined in this document.
AttributeCount: Total count of in-scope infrastructure attributes specified in this document.
Trustworthiness is calculated and recorded daily for overall and each attribute. The following Trustworthiness Calculation Example helps provide a deeper understanding of the metric:
Pretend there are only six servers and only three Attributes in the CMDB. Each server will have all its attributes tested with trustworthiness rules. For each server, each attribute either passes its rules or doesn’t. Each server attribute gets a score of 0 or 1. 0 means one or more rules failed. 1 means all rules passed.
The hypothetical results in this example are as follows:
Attribute Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 T% Name 0 1 1 1 1 1 5/6 = 83.3% OS 1 0 0 0 1 1 3/6 = 50% Ram (MB) 1 1 1 1 1 1 6/6 = 100%
- Name has a Trustworthiness score of 83.3%
- OS has a Trustworthiness score of 50%
- Ram (MB) has a Trustworthiness score of 100%
- Overall CMDB Attribute Trustworthiness = 14 / 18 = 77.8%
There are over 50 measurable attributes. Each attribute has a description, scope, and list of trustworthiness rules. Attribute rules are often intertwined, for example, Model ID and Is Virtual, as explained in the following example.
Description: Stores information about the Model of the server.
The attribute is considered Trustworthy when all of these are true:
- [Model ID] IS NOT EMPTY
- IF [Model ID].[Display name] CONTAINS ANY OF THE FOLLOWING (caSe inSensitive): [cloud, openstack, vmware] THEN [Is Virtual] IS true
- IF [Model ID].[Display name] DOES NOT CONTAIN ANY OF THE FOLLOWING (caSe inSensitive): [cloud, openstack, vmware] THEN [Is Virtual] IS false
As you can see from the excerpt of the Model ID attribute, specific models indicate the Is Virtual attribute should be true. The rules about how the Model ID and Is Virtual attributes correlate also applies to the Is Virtual attribute. The validation tool considers both Model ID and Is Virtual untrustworthy if these rules are in violation. This is just one example. There are many more cross-dependencies between attributes.
Tracking Trustworthiness Over Time
How do we track trustworthiness over time? With a validation tool. The validation tool compares the trustworthiness rules to every attribute, enabling Trustworthiness reports. The Trustworthiness reports help GoDaddy identify servers with inaccurate data. This is important because the physical and virtual network topology and CMDB constantly change. The CMDB attempts to be an exact mirror of the network topology’s current state. The topology changes over time, and so must the CMDB. When trustworthiness rules are in violation, GoDaddy launches investigations to discover the root cause(s) and make corrections. Establishing better and better CMDB trustworthiness enables GoDaddy to use the CMDB to make data-driven decisions confidently.
CMDB use cases at GoDaddy
Besides simply tracking assets and their current states, a robust and well-mananged CMDB provides many opportunities to provide insights, automation, and compliance.
There are many use cases for a well managed CMDB. Some include:
- Financial services for budget tracking
- Public cloud asset management
- Server patch compliance reporting
- Integration with Out-Of-Band Management System
The following sections cover additional use cases at GoDaddy in more detail:
- Keeping up with hyper-growth
- Network automation
- Security compliance
- Incident management
Keeping up with hyper-growth
Imagine a product at GoDaddy is growing at a rate of 100K+ new customers monthly. If there are 1,000 customers per server, that’s 100 new servers needed every month. Now imagine how you would execute ordering 100 new servers every month and ship them to the correct locations with all the information needed to install them correctly. The CMDB makes this process much easier. An order placed with the vendor outputs a “ship sheet” that identifies the hardware with specifics, including serial numbers. An uploaded ship sheet matches the order. The software automates taking each piece of hardware on the ship sheet and creating a server in the CMDB. Other software pairs the hardware with its business purpose, then automates the provisioning. For servers, that’s writing the operating system and base configuration for its role.
Hardware provisioning uses network automation to keep track of switches and switch ports. For example, every server needs one or more IP Addresses. Servers reserve IP addresses from an address pool, then they’re assigned to the server. Network adapters on the server and ports on the switches get configured. Software manages this whole process, completely automated. The software that runs on the servers before provisioning detects the switch and connected switch ports. Switch port information relays to the CMDB, which automates the appropriate attribute changes so the CMDB mirrors their current states.
With thousands of assets, all requiring updates, backups, and periodic scans and checks, security compliance can be challenging. GoDaddy uses several vendor-supplied security software installed on GoDaddy-managed servers. Attributes like Is VMDR Target, Most Recent VMDR Discovery, Is EDR Target, and Most Recent EDR Discovery track server security software and dates last synced. Different patterns update these fields which all drive trustworthiness in the CMDB.
Something is wrong if a server’s Is VMDR Target attribute is true and Most Recent VMDR Discovery attribute is empty or outdated. Either the VMDR software that’s supposed to be running on the server isn’t running, or something else prevented the full process from completing successfully. The system generates reports and problem tickets for the owning product teams using this type of information so they can investigate and mitigate findings.
GoDaddy tracks the installation and gap (lack of installation) of security software using ratios, such as the following, over time.
VMDR installed compliance ratio: count of [Is VMDR Target] / count of recent [Most Recent VMDR Discovery].
VMDR installed compliance ratio charted over time is broken down by Business Service, Product, Product Line, and Business Unit.
VMDR gap: The count of [Is VMDR Target] minus the count of recent [Most Recent VMDR Discovery].
Sometimes vendor-supplied software facilitates syncing to CMDB (like with VMDR). However, in all cases, there is custom software to integrate and make it work for GoDaddy. VMDR-CMDB sync not only syncs Most Recent VMDR Discovery, it also syncs key attributes, such as Operating System, IP Address, RAM, and more. This makes the VMDR-CMDB sync process even more important to CMDB’s trustworthiness. It enables daily re-affirmation of key attributes, which adds to the trustworthiness of these attributes. For example, if the Operating System was AlmaLinux during provisioning, how much trust do you place in this months later? If the VMDR-CMDB sync process re-affirmed this value the same day you checked it, you can trust the attribute value is still accurate.
Although VMDR is used in this example, the pattern applies to each security software.
When an incident occurs, a combination of GoDaddy and third-party software creates an incident ticket in ServiceNow. The incident ticket includes key attributes like the affected server Name, Support Group, Business Service, and Service Tier.
The Support Group attribute of a server lists team members responsible for support and maintenance of a particular server, including the on-call schedule for members in that group. When an incident occurs, ServiceNow automation sends an SMS to the primary on-call member’s mobile number and optionally messages the Support Group’s Slack channel. If the primary on-call member doesn’t acknowledge the SMS, an escalation occurs with a robocall. If the call isn’t answered, then escalate the incident to the secondary on-call. If the secondary on-call doesn’t respond, the Support Group’s manager becomes the de-facto third on-call. If the Support Group’s manager doesn’t respond, the Global Operations Center (GOC) may further escalate, depending on the severity. This may include reaching out to other on-call team members, the management chain, or even the department VP.
Sometimes it can be useful to research incidents. For example, which servers had the most incidents over the last week, month, or year? Useful attributes to track include Assignment Group, Support Group, Business Service, Product, Product Line, and Business Unit. The total number of incidents isn’t the only useful metric to track. Other metrics like Time to Repair and Service Availability offer value. Other incident related questions to consider could be:
- When an incident occurs, what impact happened to which business services?
- What’s the service tier of the impacted business services?
Incidents that cause business service availability issues for top priority service tiers are of the highest importance. The service tier standard defines rules that consider revenue generation, reputation, compliance, dependence, and continuity. The point is, having all the servers in the CMDB with the correct Support Group and Business Service is important for incident management.
The CMDB enables GoDaddy to efficiently operate and evolve its infrastructure, products, and services with agility. GoDaddy operators and engineers use CMDB to search for trustworthy information about infrastructure in seconds, instead of hours or days. Establishing better and better CMDB trustworthiness enables GoDaddy to use the CMDB to make data-driven decisions confidently.
If you are early in your CMDB maturity cycle, we hope you find this write-up helpful as you develop your CMDB. If you already have a mature CMDB, we hope our use cases and approach resonate with you and you can still leverage a few helpful nuggets.
If you want to work with us on foundational infrastructure services for automating networks or data centers, virtualization, or distributed storage, check out our career page. We’d like to hear from you!