Five Design Principles for the Network Architect - Supportability

(#5 of 7)

As this series has developed, we have addressed a number of areas that affect the detail of the technical solution to be deployed. Supportability is more concerned with the legacy that the deployment project leaves behind. As we have already seen, the purpose of the network is to provide a level of connectivity availability for an application or service to a consumer of that service. The concept of supportability complements that, ensuring that appropriate monitoring of the environment is in place; that the correct tooling is available to maintain and change the infrastructure if required; and the processes are available to ensure that service levels are maintained. This is the area that dovetails with the end customer's operational model and so is driven by their 'day 2' requirements.

Let's address these areas one by one.


When we discussed availability, I made mention of the fact that we can't have a 'one size fits all' approach to monitoring. As a network designer, you must consider who is going to consume the monitoring and how it is going to be used. In most cases, there will be multiple groups of people interested in the state of the network, for example

  • end users, who just want to know that they have access to their specific applications and services;
  • Service Management, who might need to see a summary of the availability of all services;
  • technical support teams, who would be interested in the above but also in the individual devices, servers and circuits that deliver those services.

There are two variables here then. The scope of visibility and the level of detail (or abstraction of it) that is required. As a designer, you might consider that you need to build two views of the network, one containing all of the topology and each node's status in detail, and another showing an abstracted red / amber / green view of locations and service status. You might want to offer the ability to drill down to add more detailed information such as security information, packet flows and so on.

Each of these views could then be constrained based on the viewer's scope of responsibility - a Head of Department might only need to see his or her location, whereas an IT guy with global responsibility needs to see it all.

Technically then, the components of visibility that need consideration might include (but not necessarily limited to)

  • Monitoring platform and protocols (SNMP, Netflow)
  • Event management (logging, ITSM, processes)
  • Reporting (real-time, periodic)
  • Trending and capacity management

Management tooling

Somebody somewhere is going to be responsible for managing the network and its constituent devices after project completion. The designer should have a full appreciation of how that management regime needs to be assembled for optimum operational efficiency.

There are different facets to network device management. The first might be considered to be that of lifecycle - ensuring the devices are up to date in terms of support and software, and that there are no vulnerabilities known on those devices. If remediation is required then this is either reported or automatically remediated if possible.

The next area of interest might be configuration management. Tooling might be available to backup and restore individual device configs, or to check them for compliance. However, while individual network devices will typically support a command-line interface for management, it can be seen to be more efficient to be able to treat the collection of individual devices as a single system and apply a single configuration across them all. This can be done in a number of ways:

  • device programmability - the ability to use a scripting or programming language to change configuration of an individual network device. Programs can be written to iterate through an inventory of individual devices making changes to config as it goes. This requires knowledge of the individual devices and their configuration options, but in skilled hands, probably provides optimal flexibility. Examples of this might include using a Python script to log in to a set of network switches and change usernames when a new IT staff member joins the team.
  • network automation takes programmability to the next level and instead of scripting individual configuration loops, make it possible to define templates and apply them to your network inventory. Such a platform in dudes support for custom-written modules for different hardware platforms and abstracts details away behind calls to subroutines in those modules. There are many automation frameworks available, ones often used for network devices include Ansible for handling and making changes to a network inventory, and Jinja for managing configuration templates.
  • orchestration - is a way of abstracting device configuration altogether by providing workflows to the network engineer to automatically apply 'Best Practice'configurations. These workflows can be Created using an orchestration platform, or may be provided by an SDN controller which essentially configures and manages devices and abstracts their detailed operation behind a series of menus, dialogues and dashboards. Cisco's NSO is a good example of the former, able to orchestrate workflows across multiple vendors' equipment; a product like ACI is much more vendor-specific. The controller (APIC) manages a network of Cisco Nexus 9K switches with a single abstract configuration, then configures the individual devices to achieve the desired intent.
It is also necessary to consider centralised policy management. Perhaps the best way of automating the network is to not have to make config changes at all on network devices themselves but instead configure the network to adhere to centrally-defined policies and change those when required. In this way, policy can be linked to ideas and constructs outside the network and closer to business operations. For example, by authenticating users when they connect to the network, it is possible to identify if they are in a particular group of users.

We can then apply security rules to that group of users. Similarly, the network devices can identify the usage of a given application on the network and apply security, QoS or routing decisions based on the classification. In these cases, we can see that centralised policy - if defined correctly - can effectively change the behaviour of the network without the need to change configuration


At least as important as the technical elements, if not more so, are the processes by which end users obtain a measurable service from the engineering team that manages the network. Typically there would be three phases in the lifecycle of the network support:

1)      Implementation
2)      Operational adds,. moves + changes (BAU)
3)      Decommissioning.

An initial network deployment is typically delivered as a project, complete with whatever rigour and process 'normal' project delivery has in an organisation: the customer is consulted up front for requirements, designs are agreed, implementation and testing of the new environment carried out, and migration of live service completed. Typically, in order for the project to complete, visibility and management tooling is put in place, documentation completed and a handover carried out to the team responsible for supporting the network in normal operations.

The BAU phase is where the the success or otherwise of the design is proven. The designer must consider how end users are going to consume the network service and the information that the support engineers need to carry out their activities. The best approach is often to consider that the end user needs minimal understating of the technical details of the environment. Don't ask them to provide details of VLANs and switch ports, ask them to submit requests in terms of how the network is to be used. Create standard service requests to request provisioning for wireless coverage in a certain location: don't expect the User to explicitly request the appropriate DHCP scope options and PoE configuration.

Aim to provide as frictionless an experience as possible, and ensure that the supporting team is able to assemble configuration activity from templates wherever possible. This also allows them an option to automate and orchestrate whenever appropriate, using whichever tools are available. Put in place a simple feedback mechanism to show status of requests to customer and then it becomes a matter of choice as to how the supporting team wish to fulfil these requests.

Decommissioning should be similarly seamless - a simple request leading to all the activities kicked off in a workflow to reverse installation, including renewal of details from inventories, monitoring and management platforms.

To summarise, the designer's job is to assemble a network environment that fulfils the customer technical requirements and is simple for them to consume. For the supporting engineering team> it should offer flexibility in management and visibility terms to offer the optimal balance of flexibility and capability.
As usual, really interested to hear your thoughts on this!

Previous> Security
Next> Simplicity


Popular posts from this blog

Five Design Principles for the Network Architect - Intro

Five Design Principles for the Network Architect - Security

Five Design Principles for the Network Architect - Availability