Key Factors in NCCM and CMDB Integration – Part 1 Discovery

Part I Discovery

Key Factors in NCCM and CMDB Integration - Part 1 Discovery“I am a rock, I am an Island…” These lyrics by Simon and Garfunkel pretty appropriately summarize what most IT companies would like you to believe about their products. They are islands that stand alone and don’t need any other products to be useful. Well, despite what they want, the truth is closer to the lyrics by the Rolling Stones – “We all need someone we can lean on”. Music history aside, the fact is that interoperability and integration is one of the most important keys to a successful IT Operations Management system. Why? Because no product truly does it all; and, when done correctly, the whole can be greater than the sum of the individual parts. Let’s take a look at the most common IT asset management structure and investigate the key factors in NCCM and CMDB Integration.

Step 1. Discovery. The heart of any IT operations management system is a database of the assets that are being managed. This database is commonly referred to as the Configurations Management Database or CMDB. The CMDB contains all of the important details about the components of an IT system and the relationships between these items. This includes information regarding the components of an asset like physical parts and operating systems, as well as upstream and downstream dependencies. A typical item in a CMDB may have hundreds of individual pieces of information about it stored in the database. A fully populated and up to date CMDB is an extremely useful data warehouse. But, that begs the question, how does a CMDB get to be fully populated in the first place?

That’s where Discovery software comes in. Inventory discovery systems can be used to automatically gather these critical pieces of asset information directly from the devices themselves. Most hardware and software vendors have built in ways of “pulling” that data from the device. Network systems mainly use SNMP. Windows servers can also use SNMP as well as the Microsoft proprietary WMI protocol. Other vendors like VMware also have an API that can be accessed to gather this data. Once the data has been gathered, the discovery system should be able to transfer that data to the CMDB. It may be a “push” from the discovery system to the CMDB, or it could use a “pull” to go the other way – but there should always be a means of transfer. Especially when the primary “alternative” way of populating the CMDB is either by manually entering the data (sounds like fun) or by uploading spreadsheet csv files (but how do they get populated?).

Step 2. Updating. Once the CMDB is populated and running then you are done with the discovery software right? Um, wrong. Unless your network never changes (please email me if that is the case, because I’d love to talk to you), then you need to constantly update the CMDB. In fact, in many organizations, the CMDB has a place in it for pre-deployment. Meaning that new systems which are to come online soon are entered into the CMDB. The could news is that our discovery system should be able to get that information out of the CMDB and then use it as the basis for a future discovery run, which in turn adds details about the device back to the CMDB and so on. When implemented properly and working well, this cyclical operation really can save enormous amounts of time and effort.

In the next post in this series, I’ll explore how having an up to date asset system makes other aspects of NCCM like Backup, Configuration, and Policy Checking much easier.

Top 20 Best Practices for NCCM

Thanks to NMSaaS for the article.

Advertisements

The First 3 Steps To Take When Your Network Goes Down

The First 3 Steps To Take When Your Network Goes DownWhether it is the middle of the day, or the middle of the night nobody who is in charge of a network wants to get “that call”. There is a major problem and the network is down. It usually starts with one or two complaints “hey, I can’t open my email” or “something is wrong with my web browser” but those few complaints suddenly turn into many and you suddenly you know there is a real problem. What you may not know, is what to do next.

In this blog post, I will examine some basic troubleshooting steps that every network manager should take when investigating an issue. Whether you have a staff of 2 or 200, these common sense steps still apply. Of course, depending on what you discover as you perform your investigation, you may need to take some additional steps to fully determine the root cause of the problem and how to fix it.

Step 1. Determine the extent of the problem.

You will need to try and pinpoint as quickly as possible the scope of the issue. Is it related to a single physical location like just one office, or is it network wide including WAN’s and remote users. This can provide valuable insight into where to go next. If the problem is contained within a single location, then you can be pretty sure that the cause of the issue is also within that location (or at the very least that location plus any uplink connections to other locations).

It may not seem intuitive but if the issue is network wide with multiple affected locations, then sometimes this can really narrow down the problem. It probably resides in the “core” of your network because this is usually the only place that can have an issue which affects such a large portion of your network. That may not make it easier to fix, but it generally does help with identification.

If you’re lucky you might even be able to narrow this issue down even further into a clear segment like “only wireless users” or “everything on VLAN 100” etc. In this case, you need to jump straight into deep dive troubleshooting on just those areas.

Step 2. Try to determine if it is server/application related or network related.

This starts with the common “ping test”. The big question you need to answer is, do my users have connectivity to the servers they are trying to access, but (for some reason) cannot access the applications (this means the problem is in the servers / apps) or do they not have any connectivity at all (which means a network issue).

This simple step can go a long way towards troubleshooting the issue. If there is no network connectivity, then the issue will reside in the infrastructure. Most commonly in L2/L3 devices and firewalls. I’ve seen many cases where the application of a single firewall rule is the cause if an entire network outage.

If there is connectivity, then you need to investigate the servers and applications themselves. Common network management platforms should be able to inform you of server availability including tests for service port availability, the status of services and processes etc. A widespread issue that happens all at once is usually indicative of a problem stemming from a patch or other update / install that was performed on multiple systems simultaneously.

Step 3. Use your network management system to pinpoint, rollback, and/or restart.

Good management systems today should be able to identify when the problem first occurred and potentially even the root cause of the issue (especially for network issues). You also should have backup / restore capabilities for all systems. That way, in a complete failure scenario, you can always fall back to a known good configuration or state. Lastly, you should be able to then restart your services or devices and restore service.

In some cases there may have been a hardware failure that needs to be addressed first before a device can come back online. Having spare parts or emergency maintenance contracts will certainly help in that case. If the issue is more complex like overloading of a circuit or system, then steps may need to be put in place to restrict usage until additional capacity can be added. With most datacenters running on virtualized platforms today, in many cases additional capacity for compute, and storage can be added in less than 60 minutes.

Network issues happen to every organization. Those that know how to effectively respond and take a step by step approach to troubleshooting will be able to restore service quickly.

I hope these three steps to take when your Network goes down was usefull, dont forget to subscribe for our weekly blogs.

The First 3 Steps To Take When Your Network Goes Down

Thanks to NMSaaS for the article.

Why Just Backing Up Your Router Config is the Wrong Thing To Do

One of the most fundamental best practices of any IT organization is to have a backup strategy and system in place for critical systems and devices. This is clearly needed for any disaster recovery situation and most IT departments have definitive plans and even practiced methodologies set in place for such an occurrence.

Why Just Backing Up Your Router Config is the Wrong Thing To DoHowever what many IT pros don’t always consider is how useful it is to have backups for reasons other than DR and the fact that for most network devices (and especially routers), it is not just the running configuration that should be saved. In fact, there are potentially hundreds of smaller pieces of information that when properly backed up can be used for help with ongoing operational issues.

First, let’s take a look at the traditional device backup landscape, and then let’s explore how this structure should be enhanced to provide additional services and benefits.

Unlike server hard drives, network devices like routers do not usually fall within the umbrella backup systems used for mass data storage. In most cases a specialized system must be put in place for these devices. Each network vendor has special commands that must be used in order to access the device and request / download the configurations.

When looking at these systems it is important to find out where the resulting configurations will be stored. If the system is simply storing the data into an on-site appliance, then it also critical to determine if that appliance itself is being backup into an offsite / recoverable system otherwise the backup are not useful in a DR situation where the backup appliance may also be offline.

It is also important to understand how many backups your system can hold i.e. can you only store the last 10 backups, or maybe only everything in the last 30 days etc. are these configurable options that you can adjust based on your retention requirements? This can be a critical component for audit reporting, as well as when rollback is needed to a previous state (that may not just have been the last state).

Lastly, does the system offer a change report showing what differences exist between selected configurations? Can you see who made the changes and when?

In addition to the “must haves” explored above, I also think there are some advanced features that really can dramatically improve the operational value of a device / router backup system. Let’s look at these below:

  • Routers and other devices are more than just their config files. Very often they can provide output which describes additional aspects of their operation. To use the common (cisco centric) terminology, you can also get and store the output of a “show” command. This may contain critical information about the devices hardware, software, services, neighbors and more that could not be seen from just the configuration. It can be hugely beneficial to store this output as well as it can be used to help understand how the device is being used, what other devices are connected to it and more.
  • Any device in a network, especially a core component such as a router should conform to company specific policies for things like access, security etc. Both the main configuration file, as well as the output from the special “show” commands can be used to check the device against any compliance policy your organization has in place.
  • All backups need to run both on a schedule (we generally see 1x per day as the most common schedule) as well as on an ad-hoc basis when a change is made. This second option is vital to maintaining an up to date backup system. Most changes to devices happen at some point during the normal work day. It is critical that your backup system can be notified (usually via log message) that a change was made and then immediately launch a backup of the device – and potentially a policy compliance check as well.

Long gone are the days where simply logging into a router, getting the running configuration, and storing that in a text file is considered a “backup plan”. Critical network devices need to have the same attention paid them as servers and other IT systems. Now is a good time to revisit your router backup systems and strategies and determine if you are implementing a modern backup approach, as you can see its not just about backing up your router config.

Why Just Backing Up Your Router Config is the Wrong Thing To DoThanks to NMSaaS for the article.