Key Factors in NCCM and CMDB Integration – Part 2 – Change Configuration and Backup

In Part 1 of this series I discussed how an NCCM solution and a CMDB can work together to create a more effective IT inventory system. In this post, I will be taking that a step further and show how your change configuration process will benefit from integration with that same CMDB.

Key Factors in NCCM and CMDB Integration - Part 2 – Change Configuration and BackupIn general, the process of implementing IT infrastructure change happens at 3 separate stages of an assets lifecycle.

  1. Initial deployment / provisioning
  2. In production / changes
  3. Decommissioning / removal

In each of these stages, there is a clear benefit to having the system(s) that are responsible for orchestrating the change be integrated with an asset inventory / CMDB tool. Let’s take a look at each one to see why.

1. Initial Deployment / Provisioning

When a new device is ready to be put onto the network, it must go through at least one (and probably many) pre-deployment steps in order to be configured for its eventual job in the IT system. From “out of the box” to “in production” requires at least the following:

  1. Installation / turn on/ pretest of HW
  2. Load / upgrade of SW images
  3. Configuration of “base” information like IP address / FQDN / Management configuration
  4. Creation / deployment of full configuration

This may also include policy security testing and potentially manual acceptance by an authorized manager. It is best practice to control this process through an ITIL compliant system using a software application which has knowledge of what is required at each step and controls the workflow and approval process. However, the CMDB / Service desk rarely if ever can also process the actual changes to the devices. This is typically a manual process or (in the best case) is automated with an NCCM system. So, in order to coordinate that flow of activity, it is absolutely essential to have the CMDB be the “keeper” of the process and then “activate” the NCCM solution when it is time to make the changes to the hardware. The NCCM system should then be able to inform the CMDB that the activity was performed and also report back any potential issues or errors that may have occurred.

2. In Production / Changes

Once a device has been placed into production, at some point there will come a time where the device needs to have changes made to its hardware, software or configuration. Once again, the change control process should be managed through the CMDB / service desk. It is critical that as this process begins, the CMDB has been kept up today as to the current asset information. That way there are no “surprises” when it comes time to implement the changes. This goes back to having a standard re-discovery process which is performed on a known schedule by the NCCM system. We have found that most networks require a full rediscovery about 1x per week to be kept up to date –but we have also worked with clients that adjust this frequency up or down as necessary.

Just as in the initial deployment stage, it is the job of the NCCM system to inform the CMDB as to the state of the configuration job including any problems that might have been encountered. In some cases it is prudent to have the NCCM system automatically retry any failed job at least once prior to reporting the failure.

3. Decommissioning / Removal

When the time has come for a device to be removed from production and/or decommissioned the same type of process should be followed from when it was initially provisioned (but in reverse). If the device is being replaced by a newer system then the part of (or potentially the whole) configuration may just be moved to the new hardware. This is where the NCCM systems backup process will come into play. As per all NCCM best practices, there should be a regular schedule of backups that happen in order to make sure the configuration is known and accessible in case of emergency etc.

Once the device has been physically removed from the network, it must also either be fully removed from the CMDB or at the very least should be tagged as decommissioned. This has many benefits including stopping the accidental purchase of support and maintenance on a device which is no longer in service as well as preventing the NCCM system from attempting to perform discovery or configuration jobs on the device in the future (which would create a failure etc).

NCCM systems and CMDB’s really work hand in hand to help manage the complete lifecycle of an IT asset. While it could be possible to accurately maintain two non-connected systems, the time and effort involved, not to mention that much greater potential for error, makes the integration of your CMDB and NCCM tools a virtual necessity for large modern IT networks.

Top 20 Best Practices for NCCM
Thanks to NMSaaS for the article.

Webinar- Best Practices for NCCM

Webinar- Best Practices for NCCM

Most networks today have a “traditional” IT monitoring solution in place which provides alarming for devices, servers and applications. But as the network evolves, so does the complexity and security risks and it now makes sense to formalize the process, procedures, and policies that govern access and changes to these devices. Vulnerability and lifecycle management also plays an important role in maintaining the security and integrity of network infrastructure.

Network Configuration and Change Management – NCCM is the “third leg” of IT management with traditional Performance and Fault Management (PM and FM) being one and two. The focus of NCCM is to ensure that as the network grows, there are policies and procedures in place to ensure proper governance and eliminate preventable outages.

Eliminating misapplied configurations can reduce network performance and security issues from 90% to 10%.

Learn about the best practices for Network Configuration and Change Management to both protect and report on your critical network device configurations

  1. Enabling of Real-Time Configuration Change Detection
  2. Service Design Rules Policy
  3. Auto-Discovery Configuration Backup
  4. Regulatory Compliance Policy
  5. Vendor Default and Security Access Policies
  6. Vulnerability Optimization and Lifecycle Announcements

Date: On October 28Th.
Time: 2:00pm Eastern

Webinar- Best Practices for NCCM

Register for webinar NOW: http://hubs.ly/H01gB720

5 Reasons Why You Should Include LAN Switches in Your NCCM Scope

We’ve been doing a lot of blogging around here lately about NCCM and the importance of having an automated configuration and change management system. We’ve even published a Best practices guide for NCCM. One of the main points in any NCCM system is having consistent and accurate configuration backups of all of your “key” devices.

When I ask Network Managers to name their key devices, they generally start with WAN / Internet routers and Firewalls. This makes sense of course because, in a modern large-scale network, connectivity (WAN / Internet routers) & security (Firewalls) tend to get most of the attention. However, we think that it’s important not to overlook core and access switching layers. After all, without that “front line” connectivity – the internal user cannot get out to the WAN/Internet in the first place.

With that in mind, today’s blog offers up 5 Reasons Why You Should Include LAN Switches in Your NCCM Scope


5 Reasons Why You Should Include LAN Switches in Your NCCM Scope1. Switch Failure

LAN switches tend to be some of the most utilized devices in a network. They also don’t generally come with the top quality hardware and redundant power supplies that core devices have. In many cases, they may also be located on less than pristine locations. Dirty manufacturing floors, dormitory closets, remote office kitchens – I have seen access switches in all of these places. When you combine a heavy workload with tough conditions and less expensive part, you have a recipe for devices that will fail at a higher rate.

So, when that time comes to replace / upgrade a switch, having its configuration backed up and a system which can automate the provisioning of the new system can be a real time and workload saver. Just put the IP address and some basic management information on the new device and the NCCM tool should be able to take off the rest in mere minutes.

2. User Tracking

As the front line connectivity device for the majority of LAN users, the switch is the best place to track down user connections. You may want to know where a particular user is located, or maybe you are trying to troubleshoot an application performance issue; no matter what the cause, it’s important to have that connectivity data available to the IT department. NCCM systems may use layer 2 management data from CDP/LLDP as well as other techniques to gather this information. A good system will allow you to search for a particular IP/MAC/DNS and return connectivity information like which device/port it is connected to as well as when it was first and last seen on that port. This data can also be used to draw live topology maps which offer a great visualization of the network.

3. Policy Checking

Another area where the focus tends to be on “gateway” devices such as WAN routers and firewalls is policy checking. While those devices certainly should have lots of attention paid to them, especially in the area of security policies, we believe that it’s equally as important not to neglect the access layer when it comes to compliance. In general terms, there are two aspects of policy checking which need to be addressed on these devices: QoS policies and regulatory compliance policies.

The vast majority of VoIP and Video systems will connect to the network via a traditional LAN switch. These switches, therefore, must have the correct VLAN and QoS configurations in order to accurately forward the traffic in the appropriate manner so that Quality of Service is maintained.

If your organization is subject to regulatory compliance standards such as PCI, HIPAA etc then these regulations are applicable to all devices and systems that are connected to or pass sensitive data.

In both of these cases, it is incredibly important to ensure policy compliance on all of your devices, even the ones on the “edge” of your network.

4. Asset Lifecycle Management

Especially in larger and more spread out organizations, just understanding what you have can be a challenge. At some point (and always when you are least prepared for it) you will get the “What do we have?” question from a manager. An NCCM system is exactly the right tool to use to answer this question. Even though NCCM is generally considered to be the tool for change – it is equally the tool for information. Only devices that are well documented can be managed and that documentation is best supplied through the use of an automated inventory discovery system. Likewise, when it is time for a technology refresh, or even the build out of a new location or network, understanding the current state of the existing network is the first step towards building an effective plan for the future.

5. New Service Initiatives

Whether you are a large IT shop or a service provider – new applications and services are always coming. In many cases, that will require widespread changes to the infrastructure. The change may be small or larger, but if it needs to be implemented on a number of systems at the same time, it will require coordination and automation to get it done efficiently and successfully. In some instances, this will only require changes to the core, but in many cases it will also require changes to the switch infrastructure as well. This is what NCCM tools were designed to do and there is no reason that you should be handcuffed in your efforts to implement change just because you haven’t added all of your devices into the NCCM platform.

Networks are complicated systems of many individual components spread throughout various locations with interdependencies that can be hard to comprehend without the help of network management tools. While the temptation may be to focus on the core systems, we think that it’s critical to view all parts, even the underappreciated LAN switch, as equal pieces to the puzzle and, therefore, should not be overlooked when implementing an NCCM system.

Top 20 Best Practices for NCCM

Thanks to NMSaaS for the article.

Key Factors in NCCM and CMDB Integration – Part 1 Discovery

Part I Discovery

Key Factors in NCCM and CMDB Integration - Part 1 Discovery“I am a rock, I am an Island…” These lyrics by Simon and Garfunkel pretty appropriately summarize what most IT companies would like you to believe about their products. They are islands that stand alone and don’t need any other products to be useful. Well, despite what they want, the truth is closer to the lyrics by the Rolling Stones – “We all need someone we can lean on”. Music history aside, the fact is that interoperability and integration is one of the most important keys to a successful IT Operations Management system. Why? Because no product truly does it all; and, when done correctly, the whole can be greater than the sum of the individual parts. Let’s take a look at the most common IT asset management structure and investigate the key factors in NCCM and CMDB Integration.

Step 1. Discovery. The heart of any IT operations management system is a database of the assets that are being managed. This database is commonly referred to as the Configurations Management Database or CMDB. The CMDB contains all of the important details about the components of an IT system and the relationships between these items. This includes information regarding the components of an asset like physical parts and operating systems, as well as upstream and downstream dependencies. A typical item in a CMDB may have hundreds of individual pieces of information about it stored in the database. A fully populated and up to date CMDB is an extremely useful data warehouse. But, that begs the question, how does a CMDB get to be fully populated in the first place?

That’s where Discovery software comes in. Inventory discovery systems can be used to automatically gather these critical pieces of asset information directly from the devices themselves. Most hardware and software vendors have built in ways of “pulling” that data from the device. Network systems mainly use SNMP. Windows servers can also use SNMP as well as the Microsoft proprietary WMI protocol. Other vendors like VMware also have an API that can be accessed to gather this data. Once the data has been gathered, the discovery system should be able to transfer that data to the CMDB. It may be a “push” from the discovery system to the CMDB, or it could use a “pull” to go the other way – but there should always be a means of transfer. Especially when the primary “alternative” way of populating the CMDB is either by manually entering the data (sounds like fun) or by uploading spreadsheet csv files (but how do they get populated?).

Step 2. Updating. Once the CMDB is populated and running then you are done with the discovery software right? Um, wrong. Unless your network never changes (please email me if that is the case, because I’d love to talk to you), then you need to constantly update the CMDB. In fact, in many organizations, the CMDB has a place in it for pre-deployment. Meaning that new systems which are to come online soon are entered into the CMDB. The could news is that our discovery system should be able to get that information out of the CMDB and then use it as the basis for a future discovery run, which in turn adds details about the device back to the CMDB and so on. When implemented properly and working well, this cyclical operation really can save enormous amounts of time and effort.

In the next post in this series, I’ll explore how having an up to date asset system makes other aspects of NCCM like Backup, Configuration, and Policy Checking much easier.

Top 20 Best Practices for NCCM

Thanks to NMSaaS for the article.

5 Reasons Why You Must Back Up Your Routers and Switches

I’ve been working in the Network Management business for over 20 years, and in that time I have certainly seen my share of networks. Big and small, centralized and distributed, brand name vendor devices in shiny datacenters, and no-name brands in basements and bunkers. The one consistent surprise I continue to encounter is how many of these organization (even the shiny datacenter ones) lack a backup system for their network device configurations.

I find that a little amazing, since I also can’t tell you the last time I talked to a company that didn’t have a backup system for their servers and storage systems. I mean, who doesn’t backup their critical data? It seems so obvious that hard drive need to be backed up in case of problems –and yet many of these same organizations, many of whom spend huge amounts of money on server backup, do not even think of backing up the critical configurations of the devices that actually move the traffic around.

So, with that in mind, I present 5 reasons why you must back up your Routers and Switches (and Firewalls and WLAN controllers, and Load Balancers etc).

1. Upgrades and new rollouts.

Network Devices get swapped out all of the time. In many cases, these rollouts are planned and scheduled. At some point (if you’re lucky) an engineer will think about backing up the configuration of the old device before the replacements occurs. However, I have seen more than one time when this didn’t happen. In those cases, the old device is gone, and the new devices need to be reconfigured from scratch – hopefully with all of the correct configs. A scheduled backup solution makes these situations a lot less painful.

5 Reasons Why You Must Back Up Your Routers and Switches2. Disaster Recovery.

This is the opposite of the simple upgrade scenario. The truth is that many times a device is not replaced until it fails. Especially those “forgotten” devices that are on the edge of networks in ceilings and basements and far flung places. These systems rarely get much “love” until there is a problem. Then, suddenly, there is an outage – and in the scramble to get back up and running, and central repository of the device configuration can be a time (and life) saver.

3. Compliance

We certainly see this more in larger organizations, but it also becomes a real driving force in smaller companies that operate in highly regulated industries like banking and healthcare. If your company falls into one of those categories, then chances are you actually have a duty to backup your devices in order to stay within regulatory compliance. The downside of being non-compliant can be harsh. We have worked with companies that were being financially penalized for every day they were out of compliance with a number of policies including failure to have a simple router / switch / firewall backup system in place.

4. Quick Restores.

Ask most network engineers and they will tell you – we’ve all had that “oops” moment when we were making an configuration change on the fly and realized just a second after hitting “enter” that we just broke something. Hopefully, we just took down a single device. Sometimes it’s worse than that and we go into real panic mode. I can tell you, it is that exact moment when we realize how important configuration backups can be. The restoration process can be simple and (relatively) painless, or it can be really, really painful; and it all comes down to whether or not you have good backups.

5. Policy Checking.

One of the often overlooked benefits of backing up your device configurations, is that it allows an NCCM systems to then automatically scan those backups and compare them to known good configurations in order to ensure compliance to company standards. Normally, this is a very tedious (and therefore ignored) process – especially in large organizations with many devices and many changes taking place. Sophisticated systems can quickly identify when a device configuration has changed, immediately backup the new config, and then scan that config to make sure it’s not violating any company rules. Regular scans can be rolled up into scheduled reports which provide management with a simple but important audit of all devices that are out of compliance.

Bottom Line:

Routers, Switches and Firewalls really are the heart of a network. Unless they are running smoothly, everything suffers. One of the simplest yet effective practices for helping ensure the operation of a network is to implement an automatic device configuration backup system.

Top 20 Best Practices for NCCM

Thanks to NMSaaS for the article. 

3 Reasons for Real Time Configuration Change Detection

So far, we have explored what NCCM is, and taken a deep dive into device policy checking – in this post we are going to be exploring Real Time Configuration Change Detection (or just Change Detection as I will call it in this blog). Change Detection is the process by which your NCCM system is notified – either directly by the device or from a 3rd party system that a configuration change has been made on that device. Why is this important? Let’s identify 3 main reasons that Change Detection is a critical component of a well deployed NCCM solution.


3 Reasons for Real Time Configuration Change Detection1.
Unauthorized change recording. As anyone that works in an enterprise IT department knows, changes need to be made in order to keep systems updated for new services, users and so on. Most of the time, changes are (and should be) scheduled in advance, so that everyone knows what is happening, why the change is being made, when it is scheduled and what the impact will be on running services.

However, the fact remains that anyone with the correct passwords and privilege level can usually log into a device and make a change at any time. Engineers that know the network and feel comfortable working on the devices will often just login and make “on-the-fly” adjustments that they think won’t hurt anything. Unfortunately as we all know, those “best intentions” can lead to disaster.

That is where Change Detection can really help. Once a change has been made, it will be recorded by the device and a log can be transmitted either directly to the NCCM system or to a 3rd party logging server which then forwards the message to the NCCM system. At the most basic level this means that if something does go wrong, there is an audit trail which can be investigated to determine what happened and when. It can also potentially be used to roll back the changes to a known good state

2. Automated actions.

Once a change has been made (scheduled or unauthorized) many IT departments will wish to perform some automated actions immediately at the time of change without waiting for a daily or weekly schedule to kick in. Some of the common automated activities are:

  • Immediate configuration backup. So that all new changes are recorded in the backup system.
  • Launch of a new discovery. If the change involved any hardware or OS type changes like a version upgrade, then the NCCM system should also re-discover the device so that the asset system has up-to-date information about the device

These automation actions can ensure that the NCCM and other network management applications are kept up to date as changes are being made without having to wait for the next scheduled job to start. This ensures that any other systems are not acting “blindly” when they try to perform an action with/on the changed device.

3. Policy Checking. New configuration changes should also prompt an immediate policy check of the system to ensure that the change did not inadvertently breach a compliance or security rule. If a policy has been broken, then a manager can be notified immediately. Optionally, if the NCCM system is capable of remediation, then a rollback or similar operation can happen to bring the system back into compliance immediately.

Almost all network devices are capably of logging hardware / software / configuration changes. Most of the time these can easily be exported in the form of an SNMP trap or Syslog. A good NCCM system can receive these messages, parse them to understand what has happened and if the log signifies a change has taken place – is then able to take some action(s) as described above. This real time configuration change detection mechanism is a staple part of an enterprise NCCM solution and should be implemented in all organizations where network changes are commonplace.

3 Reasons for Real Time Configuration Change Detection

Thanks to NMSaaS for the article.

A Deeper Look Into Network Device Policy Checking

A Deeper Look Into Network Device Policy CheckingIn our last blog post “Why you need NCCM as part of your Network Management Platform” I introduced the many reasons that growing networks should investigate and implement an NCCM solution. One of the reasons is that an NCCM system can help with automation in a key area which is related to network security as well as compliance and availability – Policy Checking.

So, in this post, I will be taking a deeper dive into Network Device Policy Checking which will (hopefully) shed some light onto what I believe is an underutilized component of NCCM.

The main goal of Policy Checking in general is to make sure that all network devices are adhering to pre-determined standards with regard to their configuration. These standards are typically put in place to address different but interrelated concerns. Broadly speaking these concerns are broken down into the following:

  1. Device Authentication, Authorization and Accounting (AAA, ACL)
  2. Specialized Regulatory Compliance Rules (PCI, FCAPS, SOX, HIPAA …)
  3. Device Traffic Rules (QoS policies etc.)

Device Authentication, Authorization and Accounting (AAA)

AAA policies focus on access to devices – primarily by engineering staff- for the purposes of configuration, updating and so forth as well as how this access is authenticated, and tracked. Access to infrastructure devices are policed and controlled with the use of AAA TACACS+, RADIUS servers, and ACLs (Access Control Lists) so as to increase security access into device operating systems.

It is highly recommended to create security policies so that the configurations of security access can be policed for consistency and reported on if changed or vital elements of the configuration are missing.

Many organizations, including the very security conscious NSA, even publish guidelines for AAA policies they believe should be implemented.

They offer these guidelines for specific vendors such as Cisco and others which can be downloaded from their website http://www.nsa.gov these guidelines are useful to anyone that is interested in securing their network infrastructure, but become hard requirements if you need to interact in anyway with US government or military networks.

Some basic rules include:

  1. Establishing a dedicated management network
  2. Encrypt all traffic between the manager and the device
  3. Establishing multiple levels or roles for administrators
  4. Logging the devices activities

These rules, as well as many others, offer a first step toward maintain a secure infrastructure.

Specialized Regulatory Compliance Rules:

Many of these rules are similar to and overlap with the AAA rules mentioned above. However, these policies often have very specialized additional components designed for special restrictions due to regulatory laws, or certification requirements.

Some of the most common policies are designed to meet the requirements of devices that carry traffic with sensitive data like credit card numbers, or personal data like Social Security numbers or hospital patient records.

For example, according to PCI, public WAN link connections are considered untrusted public networks. A VPN is required to securely tunnel traffic between a store and the enterprise network. The Health Insurance Portability and Accountability Act (HIPAA) also provides guidelines around network segmentation (commonly implemented with VLAN’s) where traffic carrying sensitive patient data should be separated from “normal” traffic like Web and email.

If your company or organization has to adhere to these regulatory requirements, then it is imperative that such configuration policies are put in place and checked on a consistent basis to ensure compliance.

Device Traffic Rules:

These rule policies are generally concerned with the design of traffic flow and QoS policies. In large organizations and service providers (Telco’s, MSP’s, ISP’s) it is common to differentiate traffic based on pre-defined service types related to prioritization or other distinction.

Ensuring service design rules are being applied and policed is usually a manual process and therefore is susceptible to inaccuracies. Creating design policy rules provides greater control around the service offerings, i.e. QOS settings for Enhanced service offerings, or a complete End-2-End service type, and ensures compliancy with the service delivery SLAs (Service Level Agreements).

Summary:

Each of these rules and potentially others should be defined and policed on a continuous basis. Trying to accomplish this manually is very time consuming, inefficient, and fraught with potential errors (which can become really big problems).

The best way to keep up with these policy requirements is with an automated, electronic policy checking engine. These systems should be able to run on a schedule and detect whether the devices under its control are in or out of compliance. When a system is found to be out of compliance, then it should certainly have the ability to report this to a manager, and potentially even have the ability to auto-remediate the situation. Remediation may involve removing any known bad configurations or rolling back the configuration to a previously known “good” state.

A Deeper Look Into Network Device Policy Checking

Thanks to NMSaaS for the article.

 

Why You Need NCCM As Part Of Your Network Management Platform

In the landscape of Enterprise Network Management most products (and IT Professionals) tend to focus on “traditional” IT monitoring. By that I mean the monitoring of devices, servers, and applications for performance issues and faults. That makes sense because most networks evolve in a similar fashion. They are first built out to accommodate the needs of the business. This primarily involves supporting access for people to applications they need to do their jobs. Once the initial buildout is done (or at least slows down) then next phase is typically implementing a monitoring solution to notify the service desk when there are problems. This pattern of growth, implementation, and monitoring continues essentially forever until the business itself changes through an acquisition or (unfortunately) a shutdown.

However, when a business reaches a certain size, there are a number of new considerations that come into play in order to effectively manage the network. The key word here is “manage” as opposed to “monitor”. These are different concepts, and the distinction is important. While monitoring is primarily concerned with the ongoing surveillance of the network for problems (think alarms that result in a service desk incident) – Network Management is processes, procedures, and policies that govern access to devices and change of the devices.

What is NCCM?

Commonly known by the acronym NCCM which stands for Network Configuration and Change Management – NCCM is the “third leg” of IT management with includes the traditional Performance and Fault Management (PM and FM). The focus of NCCM is to ensure that as network systems move through their common lifecycle (see figure 1 below) there are policies and procedures in place that ensure proper governance of what happens to them.

Figure 1. Network Device Lifecycle

Why You Need NCCM As Part Of Your Network Management Platform

Source: huawei.com

NCCM therefore is focused on the devices itself as an asset of the organization, and then how that asset is provisioned, deployed, configured, changed, upgraded, moved, and ultimately retired. Along each step of the way there should be controls put in place as to Who can access the device (including other devices), How they can access it, What they can do to it (with and without approval) and so on. All NCCM systems should also incorporate logging and auditing so that managers can review what happened in case of a problem later.

These controls are becoming more and more important in today’s modern networks. Depending on which research you read, between 60% and 90% of all unplanned network downtime can be attributed to a mistake made by an engineer when reconfiguring a device. Despite many organization having strict written policies about when a change can be made to a device, the fact remains that many network engineers can and will log into a production device during working hours and make on-the-fly changes. Of course, no engineer willfully brings down a core device. They believe the change they are making is both necessary and non-invasive. But as the saying goes “The road to (you know where) is paved with good intentions”.

A correctly implemented NCCM system can therefore mitigate the majority of these unintended problems. By strictly controlling access to devices and forcing all changes to devices to be both scheduled and approved, an NCCM platform can be a lifesaver. Additionally, most NCCM applications use some form of automation to accomplish repetitive tasks which are another common source of device misconfigurations. For example, instead of a human being making the same ACL change to 300 firewalls (and probably making at least 2-3 mistakes) the NCCM software can perform that task the same way, over and over, without error (and in much less time).

As NCCM is more of a general class of products and not an exact standard, there are many additional potential features and benefits of NCCM tools. Many of them can also perform the initial Discovery and Inventory of the network device estate. This provides a useful baseline of “what we have” which can be a critical component of both NCCM and Performance and Fault Management.

Most NCCM tools should also be able to perform a scheduled backup of device configurations. These backups are the foundation for many aspects of NCCM including historical change reporting, device recovery through rollback options, and policy checking against known good configurations or corporate security and access policies.

Lastly, understanding of the vendor lifecycle for your devices such as End-of-Life and End-of-Support is another critical component of advanced NCCM products. Future blog posts will explore each of these functions in more detail.

The benefits of leveraging configuration management solutions reach into every aspect of IT.

Configuration management solutions also enable organizations to:

  • Maximize the return on network investments by 20%
  • Reduce the Total Cost of Ownership by 25%
  • Reduce the Mean Time to Repair by 20%
  • Reduce Overexpansion of Bandwidth by 20%

Because of these operational benefits, NCCM systems have become a critical component of enterprise network management platforms.

Best Practices Guide - 20 Best Practices for NCCM

Thanks to NMSaaS for the article.

 

The Case for an All-In-One Network Monitoring Platform

The Case for an All-In-One Network Monitoring PlatformThere are many famous debates in history: dogs vs cats, vanilla vs chocolate & Coke vs Pepsi just to name a few. In the IT world, one of the more common debates is “single platform vs point solution”. That is, when it comes to the best way to monitor and manage a network, is it better to have a single management platform that can do multiple things, or would it be better to have an array of tools that are each specialized for a job?

The choice can be thought of as being between Multitaskers & Unitaskers. Swiss Army knives, vs dedicated instruments. As for most things in life, the answer can be complex, and probably will never be agreed upon by everyone – but that doesn’t mean we can’t explore the subject and form some opinions of our own.

For this debate, we need to look the major considerations which go into this choice. That is, what key areas need to be addressed by any type of network monitoring and management solution and then how do our two options fair in those spaces? For this post, I will focus on 3 main areas to try to draw some conclusions:

  • Initial Cost
  • Operations
  • Maintenance

1) Initial Cost

This may be one of the more difficult areas to really get a handle on, as costs can vary wildly from one vender to another. Many of the “All-In-One” tools come with a steep entry price, but then do not grow significantly after that. Other AIO tools offer flexible licensing options which allow you to only purchase the particular modules or features that you need, and then easily add-on other features when you want them.

In contrast, the “Point-Solutions” may not come with a large price tag, but you need to purchase multiple tools in order to cover your needs. You can therefore take a piecemeal approach to purchasing which can certainly spread your costs out as long as you don’t leave critical gaps in your monitoring in the meantime. And, over time, the combined costs for many tools can become larger than a single system.

Newer options like pay-as-you-go SaaS models can greatly reduce or even eliminate the upfront costs for both AOI and Point Solutions. It is important to investigate if the vendors you are looking at offer that type of service.

Bottom Line:

Budgets always matter. If your organization is large enough to absorb the initial cost of a larger umbrella NMS, then this typically leads to a lower total cost in the long run, as long as you don’t also need to supplement the AIO solution with too many secondary solutions. SaaS models can be a great way to get going with either option as they reduce the initial Cap-Ex spend necessary.

2) Operations

In some ways, the real heart of the question AIO vs PS comes should come down to this – “which choice will help me solve issues more quickly”? Most monitoring solutions are used to respond when there is an issue with service delivery, and so the first goal of any NMS should be to help the IT team rapidly diagnose and repair problems.

When thought of in the context of the AIO vs PS debate, then you need to think about the workflow involved when an alarm or ticket is raised. With an AIO solution, an IT pro would immediately use that system to try both see the alarm and then to dive into the affected systems or devices to try and understand the root cause of the problem.

If the issue is systemic (meaning that multiple locations/users/services are affected) then an AIO solution has the clear advantage of being able to see a more holistic view of the network as a whole instead of just a small portion as would be the case for many Point Solutions. If the AIO application contains a root cause engine then this can be a huge time saver as it may be able to immediately point the staff in the right direction.

On the other hand, if that AIO solution cannot see deeply enough into the individual systems to pinpoint the issues, then a point solution has an advantage due to its (typically) deeper understanding of the systems it monitors. It may be that only a solution provided directly by the systems manufacturer would have insight into the cause of the problem.

Bottom line

All In One solutions typically work best when problems occur which affect more than one area of the network. Whereas Point Solutions may be required if there are proprietary components that don’t have good support for standards based monitoring like SNMP.

3) Maintenance

The last major consideration is one that I don’t think gets enough attention in this debate- the ongoing maintenance of the solutions themselves i.e. “managing the management solutions”. All solutions require “maintenance” to keep them working optimally. There are upgrades, patches, server moves etc. There are also the training requirements of any staff that need to use these systems. This can add up to significant time and energy “costs”.

This is where AIO solutions can really shine. Instead of having to maintain and upgrade many solutions, your staff can focus on maintaining a single system. The same thing goes for training – think about how hard it can be to really become an expert in anything, then multiply that by the training required to become proficient at X number of tools that your organization has purchased.

I have seen many places where the expertise in certain tools becomes specialized – and therefore becomes a single point of failure for the organization. If only “Bob” knows how to use that tool, then what happens when there is a problem and “Bob” in on vacation, or leaves the group?

Bottom Line:

Unless your organization can spend the time and money necessary to keep the entire staff fully trained on all of the critical network tools, then AIO solutions offer a real advantage over point solutions when it comes to maintainability of your IT management systems.

In the end, I suspect that this debate will never completely be decided. There are many valid reasons for organizations to choose one path over another when it comes how to organize their IT monitoring platforms.

In our view, we see some real advantages to the All-In-One solution approach, as long as the platform of choice does not have too many gaps in it which then need to be filled with additional point solutions.

Thanks to NMSaaS for the article.

 

External Availability Monitoring – Why it Matters

External Availability Monitoring - Why it MattersRemember the “good old days” when everyone that worked got in their car and drove to a big office building every day? And any application that a user needed was housed completely within the walls of the corporate datacenter? And partners / customers had to dial a phone to get a price or place an order? Well, if you are as old as I am, you may remember those days – but for the vast majority you reading this, you may think of what I just described as being about as common as a black and white TV.

The simple fact is that as the availability and ubiquity of the Internet has transformed the lives of people, it has equally (if not more dramatically) transformed IT departments.In some way this has been an incredible boon, for example, I can now download and install new software in a fraction of the time it used to take to purchase and receive that same software on CD’s (look it up kids).

Users can now login to almost any critical business application from anywhere there is a Wi-Fi connection. They can probably perform their job function to nearly 100% from their phone….in a Starbucks…. or on an airplane…..But of course, with all of the good, comes (some) of the bad – or at least difficult challenges for the IT staff whose job it is to keep all of those applications available to everyone , everywhere, all of the time. The (relatively) simple “rules” for IT monitoring need to be re-thought and extended for the modern work place. This is where External Availability Monitoring comes in.

We define External Availability Monitoring (EAM) as the process through which your critical network services and the applications that run over them are continuously tested from multiple test points which simulate real world geo-diversity and connectivity options. Simply put, you need to constantly monitor the availability and performance of any public facing services. This could be your corporate website, VPN termination servers, public cloud based applications and more.

This type of testing matters, because the most likely cause of service issues today is not call from Bob on the 3rd floor, but rather Jane who is currently in a hotel in South America and is having trouble downloading the latest presentation from the corporate intranet which she needs to deliver tomorrow morning.

Without a proactive approach to continuous service monitoring, you are flying blind as to issues that impact the global availability – and therefore operations- of your business.

So, how is this type of monitoring delivered? We think the best approach is to setup multiple types of tests such as:

  • ICMP Availability
  • TCP Connects
  • DNS Tests
  • URL Downloads
  • Multimedia (VoIP and Video) tests (from external agent to internal agent)
  • Customized application tests

These tests should be performed from multiple global locations (especially from anywhere your users commonly travel). This could even include work from home locations. At a base level, even a few test points can alert you quickly to availability issues.

More test points can increase the accuracy with which you can pinpoint some problems. It may be that the incident seems to be isolated to users in the Midwest or is only being seen on apps that reside on a particular cloud provider. The more diverse data you collect, the swifter and more intelligent your response can be.

Alerts should be enabled so that you can be notified immediately if there is an issue with application degradation, or “service down” situation. The last piece to the puzzle is to quickly be able to correlate these issues with underlying internal network or external service provider problems.

We see this trend of an “any application, anywhere, anytime” service model becoming the standard for IT departments large and small. With this shift comes an even greater need for continuous & proactive External Availability Monitoring.

External Availability Monitoring - Why it Matters

Thanks to NMSaaS for the article.