Quality of Service (QoS) Introduction

QoS provides predictable management of network resources during times of congestion. When a router is overloaded the memory buffers on it hit maximum capacity. The router has no other choice than to drop traffic. Congestion happens when the memory buffer is filled up on a particular interface on a router. This usually happens when traffic is being pushed passed the line rate for said cable. A router has certain memory reserved for each interface and when that memory gets full, it will drop packets. QoS gives control onto what packets can be dropped. It can also limit traffic by either policing or shaping it. Policing is the act of watching for the bandwidth of a particular stream of packets, and dropping any packets that are excess of that. Shaping is the act of watching for bandwidth of a particular stream of packets, and when the excess limit is reached, it holds the packet in memory until that interface is less congested.

During times of congestion on the network you can expect to see things like Delay, Jitter, and Drops. Delay is simply the latency for one packet to get to it’s destination. Jitter is the results of packets being received but in various time lapses. Drop is simply that the traffic had to be dropped because of the congestion.

To understand QoS, It is best to understand different switching and hardware architectures and how all these different platforms handle packets: particularly how packet is stored in memory and how those memory relate to the forwarding process.

Network Equipment is very much like a computer

Us network engineers know how to configure network equipment, analyze packets and influence the forwarding decision of those packets. However, sometimes we don’t know how the switches/routers actually do it! ‘It’ as in how switches/routers take packets and put them onto other interfaces. What is going on behind the scenes?

Switches and routers are just like a computer. They have their storage. There memory. They have a CPU. The big difference is that most network equipment have a thing called ASICs. ASICs stand for Application-specific integrated circuit – and they are really good at doing one thing and one thing only (or sometimes a subset of very specific tasks). That one thing could be looking up a MAC Address in a MAC Address Table. Another example would be looking up the routing destination for a IP Packet. Since these ASICs were made for a specific task, they perform these lookups very very quickly. In contrast, CPUs on routers/switches are much slower in there lookup. If you were to compare the two – a human could not differentiate, as the lookup on both would be similar to human perception. However, it makes a huge difference when you are handling thousands upon thousands of packets to use ASICs to make forwarding decision rather than CPUs. While a standard PC uses RAM/Memory to store the operating system, and various applications – Network Equipment use them the same way, but with a twist: they use memory to store packets ingressing and egressing the device. A network device has processes just like a computer. It runs an OS of some type, and it has processes that need to be stored into memory. Packets ingressing or leaving a network device have to be stored somewhere. That is where memory is used. There are lots of different network devices as well as alot of different hardware architecture for them. But the key take away is that memory in network devices are used for two things:

  1. For it’s own OS/processes (routing protocol, SNMP, OS, etc.) – These use CPU Resources
  2. For packets traversing the device (Packet Lookup) – These use ASIC Resources

How routers deal with a packet

Below is a high-level chronological overview of how routers deal with packets:

  • 1. Packet Arrives on ingress interface and its  placed in memory called the RX-Ring.
  • 2. Packet is then queued in the memory buffer. This is where CPU (or ASIC) takes control of that portion of memory and re classifies the memory.
  • 3. Forwarding Decision is made (routing via IP/Switching based on MAC etc.)
  • 4. Packet placed on TX-Ring. The same memory is then reclassified as TX-Ring. The outbound interface of the packet then takes control of that portion of memory. 
  • 5. Packet transmitted out egress media.

Think of RX Ring and TX Ring as the dedicated memory for that specific interface. Every port has both a RX Ring and a TX Ring. These ‘Rings’ Are completely separate from queues and buffers* More on that later. QoS has no control over the RX Ring and TX Ring. QoS has control over handling of packets and congestion from the Queues and Buffers.

Packets could be physically moved from one memory chip to another. Depending on the memory architecture of the device, the packet could be physically moved from one memory chip to another -or- simply re-classified, but not moved.

Memory Architecture

There are two types of memory architectures for switches. Shared memory and distributed memory. Shared memory essentially is one big block of memory that is used for all interfaces. The packets coming in and out are renamed and looked up by ASIC linked to that memory. A device with distributed memory has dedicated ASIC/memory for each port/a group of ports. A common shared ring that connects all the ASICs memory together tie them to other ports. Devices that use distributed memory are usually large switched chassis that have multiple line cards. Each linecard has ASICs, but they use high speed ring/bus to interconnect them all together. Below is high-level order of how packets are handled with shared/distributed memory.

How devices deal with packets (shared memory)

  • 1. Packet arrives on ingress interface
  • 2. Interface/Module ASIC forwards packet into a common shared memory pool.
  • 3. Forwarding decision is made by forwarding ASICs
  • 4. Memory ownership of packet buffer transferred to egress interface
  • 5. Packet transmitted onto the egress media

How devices deal with packets (distributed memory)

  • 1. Packets arrive on ingress interface
  • 2. Interface/Module ASIC places packet into memory (specific for port/group of ports
  • 3. Forwarding decision is made by forwarding ASIC
  • 4. Packet transmitted onto shared ring/bus to all egress interfaces
  • 5. Appropriate egress interface queues and then schedule the packet

Buffers and Queues

A Buffer is physical memory used to store packets before and after a forwarding decision is made. On a router this memory can be allocated to interfaces as ingress/egress. In a shared memory architecture, certain parts of memory are dedicated as buffers. However, that same sahred memory is used for other CPU Proccesses.

A queue is different depending on the platform. On Routers, it is a logical part of the shared memory buffer. On switches, individual interfaces/linecards have their own memory which is used as interface queues. Think of queues as the logical section of the physical memory (buffer).

Configuration of buffers is not part of QoS. Buffer configuration would involve modifying the quantity of buffers allowed for particular sized packet. QoS configuration applies to queues. With QoS you’re not modifying the quanitity of buffers allocated or a particular sized packet. Instead, you are taking existing buffers that have already been defined as interface queues and modifying how packets are treated when inside those queues. 
During times of no congestion, QoS is not needed because packets are transmitted First In First Out (FIFO) up to the line-rate of said interface. During times of congestion what happens is the queue is filled up and trying to pass traffic higher than the line-rate of the interface. 

Integrated and Differentiated Services

Integrated Services is a QoS Model in which the entire packet from end to end is ensured certain minimum QoS. Initial RFCs published by IETF in mid 1990s: 1633, 2211, 2212. RSVP is used as the primary protocol to setup the path. Requires every node along path to heed its reservation and to keep per-flow state. This type of Service for QoS did not gain much traction because it was unfeasible to implement across multiple vendors and organizations.

Differentiated Services is designed to address challenges of Integrated Serivces. These are the following RFCs: RFC 2474, 2597, 2598, 3246, 4594. The DiffServ Model Describes various behaviors to be adopted by each compliant node (called Per-Hop Behaviors(PHB)). Each device has the capability to apply QoS the way they want with whatever method they choose fit. With Integrated Services it was guaranteed that each packet had end to end guarantee of QoS. With Differentiated Services, there is no guarantee and each device can or may not be configured with QoS.


Traffic first must be divided into “classes”. A Class of traffic will receive the same type of QoS treatment. It analyzes the packets to differentiate flows. Packets are marked so that analysis happens only a limited number of times, usually at the ingress edge of a network. Usually this starts as a business decision and the business needs for the network. The whole idea behind classification is to identify traffic in your network that is critical to operation and quality of your buisness. After identifying what traffic is important, you can create rules to match that traffic – and mark them for QoS. Most ISPs will police ingress traffic. Traffic that is non-conforming (higher then the CIR) will be either dropped or marked down. Customers obviously don’t want any type of traffic drops, so shaping done on the egress interface leading to your ISP is recommended. 

Queuing When egress traffic cannot immediately be transmitted (aka on the TX Ring), it is placed in an egress queue. A single egress interface may have multiple associated egress queues differentiated by priority. QoS features designed for queuing provide control over which classified traffic is placed into each of these queues. Queueing can also preemptively drop traffic from within queues to make room for higher priority traffic. 


Scheduling is defining what packets are put on the wire depending on their priority. On routers, QoS queuing features such as WFQ affect queuing and scheduling behaviors. On switches, queuing and scheduling can be separate features. Traffic shaping is a function of scheduling. 

Congestion Management

Congestion management features allow you to control congestion by determining the order in which packets are sent out an interface based on priorities assigned to those packets. Below is high-level overview of congestion management process:

  • Creation of queues
  • Assignment of packets to those queues based on the classification of the packet
  • Selectively dropping packets from within queues when those queues reach pre-defined thresholds
  • Scheduling of the packets in queue for transmission

Features for Congestion Management: WFQ, CBWFQ, PQ, LLQ, WRR, and SRR

Traffic Shaping Features of Congestion Avoidance: RED, WRED, WTD, and Policing

Modular QoS Command-Line (MQC)

MQC allows QoS features that apply classification, policing etc to be configured independently and then linked together as needed. Similar to Modular Policy Framework (MPF) in ASA. MQC utilizes class maps, policy maps, and service policies. 

  • Class-maps are used to identify and classify traffic that you want to identify for QoS. Class-maps can reference ACLs to classify traffic, for example.
  • Policy-maps define what you want to do to the traffic. Each policy map can reference multiple class-maps. When you enter more than one class-map, it is done in chronological kind of like an ACL. Policy-maps apply things like policing, shaping based on your class-maps that you created.
  • Service policy is used then to apply the policy-map to a particular interface in a particular direction. 

Border Gateway Protocol (BGP) Fundamentals


BGP is the premier routing protocol that runs on the internet. It is used by many (if not all) Internet Providers across the globe. BGP is designed as a Exterior Gateway Protocol (EGP). BGP is actually the only EGP that is standardized across the internet. In earlier years, EGP (not to be confused with the category EGP as previously stated) was the first routing protocol developed to communicate network reachability between two Autonomous Systems. BGP was developed as an extension of EGP, improving upon it. BGP is defined in RFC 1771/4271. The other category of routing protocols is an Interior Gateway Protocol (IGP). This includes protocols such as OSPF, EIGRP, and RIP. IGPs are meant to be run within a single Autonomous System. However for BGP, it is meant to be run between two Autonomous Systems.

BGP has a best-path algorithm to determine the best route for a particular destination. A total of up to 14 checks for each route could be learned from BGP to determine what is the best path for a given prefix. In contrast, IGPs really only use AD and Cost to determine the best path for a given destination. In this way BGP is very flexible in influencing its route selection. BGP by default does not do any type of load balancing. BGP advertises prefixes/length – otherwise known as Network Layer Reachability Information (NLRI). The Term NLRI is used within the protocol to describe certain prefixes.

IGPs comparison with BGP

BGP needs to form a neighbor relationship just like IGPs. However, BGP neighbors must be configured statically. There is no way to dynamically learn of neighbor in BGP. BGP advertises prefixes just like other IGPs, and also advertises the next hop for those prefixes. Another interesting thing about BGP is that neighbors do not have to be directly connected with each other. Two routers running BGP can form a neighbor relationship across multiple subnets. All BGP communications with its neighbor use unicast TCP packets on port 179. This is a big difference with most IGPs because IGPs use multicast packets to dynamically learn of and advertise subnets. BGP advertise things called Path-Attributes for each prefix/length to its neighbors so that the routers can make a best-path selection. In comparison, IGPs have to advertise their metric/cost. BGP uses Path Vector Logic, that is similar to IGPs running Distance Vector. BGP emphasizes scalability in its design. It is not nearly as fast compared to IGPs. But it was not designed for that. BGP was designed for mass scale routing across the internet.

iBGP and eBGP

There are two types of neighbors in BGP: Internal BGP (iBGP) or External (eBGP) neighbors. When two neighbors are in the same Autonomous System they are considered iBGP neighbors, while if two neighbors are in different Autonomous Systems they are considered eBGP neighbors. BGP behaves differently in several ways depending if it is a iBGP neighbor or eBGP neighbor. In addition, the neighborship requirements are different for routers wanting to be iBGP/eBGP neighbors. When BGP sends prefix updates to its neighbor it updates the AS Path Attribute depending on what type neighbor it is sending the update to. When a router is sending a prefix to a iBGP neighbor, it does not update the AS Path Attribute because the Autonomous System number is the same between the two neighbors. However for eBGP it updates the AS Path Attribute because it is moving from one Autonomous System to another Autonomous System.

The AS_Path attribute in BGP essentially tells the router receiving a BGP update what Autonomous System the updates went to before getting received by said router. The reason eBGP updates the AS path attribute is because eBGP neighbors are not in the same AS, so they update it to reflect what AS it’s going to. When a BGP router is modifying the AS Path to send to another eBGP neighbor, it adds that AS path (aka the latest) in the front of the list (aka on the left). So if you see a route that says : x.x.x.x/24 23 4000 56 702, the last time that route got an update was through AS 23. The next AS ‘hop’ for the update is 4000 and so on.

Autonomous System

We have mentioned Autonomous Systems but haven’t given them much attention to them. So what is an Autonomous System? An Autonomous System is a single organizational unit that administers and controls the networks related to said entity. An example would be the IT organization for a e-commerce website. Every company has it’s own network that it administers, and thats what a Autonomous System is. In regards to configuration, an Autonomous System is simply a number in BGP. For the rest of the article Autonomous System/Autonomous System number will be abbreviated to AS/ASN. AS numbers were first identified as 16-bit intergers. However it was then extended to a 32-bit interger in RFC 4893. There are a few ways to write the number (hexadecimal, asplain, or asdot).

There are two kinds of AS Numbers: Public and Private

  • Public AS number can be advertised over the internet.
  • Private AS number are not advertised over the internet. Can only be internally used.

The ranges of Public and Private AS Numbers:

  • Public: 1-64495, 131072-4199999999
  • Private: 65512-65534, 4200000000-4294967294

All other numbers in the 0 to 4294967295 range are reserved.

BGP Neighborship

! Start BGP with configuring the ASN

#router bgp [ASN]

! Configure a statically defined neighbor, and specify the remote ASN that the neighbor has

#neighbor [ip address] remote-as [asn]

To complete a neighbor relationship this has to be configured on both sides of the link.

Requirements to form a BGP neighborship:

  • The local routers ASN must match the neighboring routers reference to the ASN with the neighbor remote-asn command
  • The peers IP Address must be reachable via Connected, static or IGP route.
  • The BGP Router IDs must not be the same between the two neighbors. BGP elects a router ID in similar fashion to other IGPs: 1. Use Setting from router-id command 2.Choose highest numeric IP on loopback interface 3. Choose the highest numeric IP Address on any non loopback interface.
  • If configured, MD5 authentication must pass. This can be configured via the neighbor [ip address] password [key] command.
  • Each router must be able to complete a TCP 3-way handshake with the BGP Peer.
  • The source IP address used to reach that peer must match the peers BGP neighbor command.

When using the neighbor remote-as command, the source address is going to be the interface of wherever that route is pointing to. For redundancy purposes you can change the source interface of the BGP packet to something like a loopback. Changing it to a loopback interface makes it more redundant because it does not rely on an interface to be up to form a neighbor relationship. You can also have two neighbor statements going to the same router, one going to one link and the other link going to another link (different IPs, so there will be two neighbor statements). This will consume double the memory and CPU utilization on each router because even though the router has neighborship with the same box, it will receive the routes on both links. 

When a rotuer is trying to form an eBGP neighbor relationship, by default all eBGP messages have a TTL of 1. You can disable this using the neighbor [ip address] ebgp-multihop command. This command changes the TTL from 1 to 255. To change the source interface of BGP packets use the neighbor [ip address] update-source [interface].

! Configure an eBGP neighbor for multihop (increases TTL)

#neighbor [ip address] ebgp-multihop 

! Force a router to use its source address for BGP packets to use the specified interface

#neighbor [ip address] update-source [interface]

! Verify 

#show ip bgp summary

iBGP vs eBGP Neighborship Differences

The only difference between iBGP and eBGP neighbors is that iBGP neighbors have the same ASN between the two routers connecting each other. eBGP neighbors have different ASN numbers connecting each other The other difference is that the TTL value for iBGP neighbors is 255 by default. With eBGP, the TTL by default was 1 and needed to be changed to higher number so that it can communicate with routers multiple hops away. The configuration between an iBGP and eBGP relationship is the same.

BGP Neighbor States

There are various states that BGP goes through when forming a neighbor relationship with another BGP router. These states are the following:

  • Idle – The BGP process is either administratively down or awaiting the next retry attempt. 
  • Connect – The BGP process is waiting for the TCP connection to be completed. During this state the BGP router is actively trying to start a TCP session with the other neighbor. The connect-retry timer is started during this stage. If the connect-retry timer hits 0, and the TCP session was never able to finish, then the neighbor state will move to Active.
  • Active – The TCP connection failed during the Connect state, the connect-retry timer is started again, only this time it is passively listening for incoming TCP connection. The connection-re-try timer is a timer that specifies how long the BGP neighbor will try to establish a TCP session, and once the timer is reached during the connect state, the BGP routers stop trying to actively make a TCP session. During the active state, the router passively listens for incoming TCP messages. However, this implementation is based on the router/manufacturer. Ultimately the Active State means that the TCP 3-way handshake failed.
  • Opensent – The TCP connection exists, and a BGP Open Messages has been sent to the peer but the matching Open Messages has not yet been received from the other router. 
  • Openconfirm – An Open message has been both sent to and received from the other router.
  • Established – All neighbors parameters match. The neighbor relationship works, and the peers can now exchange Update messages.

BGP Message Types

Every header of a BGP packet is the same. BGP messages are carried inside a TCP/IP header.  It contains marker, length and type field. Marker field contains authentication if configured. If not it is all 1s. Type field contains a number to identify if it is a open, update, keepalive or notification message. 

BGP uses four (4) types of emssages:

  • Open
  • Update
  • Keepalive
  • Notification

BGP Open Message

  • Used in neighborship establishment
  • BGP values and capabilities exchanged

BGP Update Message

  • Informs neighbors about withdrawn routes, changed routes, and new routes
  • Used to exchange PAs (Path Attributes) and the associated prefix-length (NLRI) that use those attributes

TLV stands for Type Length Value. The TLV value is a number that tells you what type of path attribute is following. NLRI stands for Network Layer Reachability.  Since Path Attributes, and Withdraw routes field can vary in size they are accompanied each by a length field to specify how big they are.

BGP Notification Message

  • Used to signal a BGP error
  • Typically results in reset of neighbor relationship

BGP Keepalive Message

  • Sent on a periodic basis to maintain the neighbor relationship. The lack of receipt of a keepalive message within the negotiated hold time causes BGP to bring down the neighbor connection.
  • Only contains the BGP Header

BGP Table & Path Attributes

BGP has a table that it stores and keeps all of its routes. It is called the BGP table. You can view the table by issuing show ip bgp. The output will list all the BGP learned routes (locally injected plus learned routes). This command will only show a high level view of the table and not the details of each entry. 

The output of show ip bgp displays a high level overview of all the routes learned via BGP. To the left of the Network Column there are various codes to help identify the route:

  • * – Means it is a valid route and can be installed in the routing table
  • > – The best route BGP has discovered for that specific prefix
  • r – Failure to put this prefix in the IP routing table (Better route already in routing table, Routing table is maxed (memory is full), VRF routing table limit succeeded)
  • i – Learned about this prefix from a iBGP neighbor

A next hop of means that the local router advertises this either via network or redistribution command. The Path Column shows the AS path that the particular prefix was learned from. A ? means that the prefix was locally learned within the routers AS. 

! Verify BGP Learned Routes

#show ip bgp [prefix/subnet]

#show ip bgp neighbors [ip address] received-routes

#show ip bgp neighbors [ip address] routes

#show ip bgp neighbors [ip address] advertised-routes

#show ip bgp summary

BGP uses multiple path attributes to determine best path for a certain prefix. By default, if no BGP PAs have been explicitly set, BGP routers use the BGP AS_PATH (autonomous system path) PA when choosing the best route among many competing routes. The AS_PATH attribute is also used to prevent routing loops. If a router receives a BGP Update and the AS_PATH (or AS_SET) has an autonomous number that is the same as its own, it will drop it. AS_SEQ is a component of the AS_PATH attribute also. The AS_SEQ is simply the list of ASs a BGP prefix goes through in order. When route summarization is performed on routes coming from multiple ASs, then something called an AS_SET is used. AS_SET is simply all the ASes that are in that summarization. However, since it cannot decipher the order it just lists them out in brackets like so {6 8 2 5}.

Injecting Routes into BGP

There are three (3) ways to inject routes into BGP:

  • By using the BGP network command
  • By using redistribution
  • By using route summarization

The network command for BGP is different than IGPs. It does not “turn on” BGP on an interface, nor does it allow for dynamic neighborship of BGP on interface (BGP has to have static neighbors anyways). It also doesn’t allow hellos on the interface (BGP uses keepalives). The network command in BGP looks for the exact prefix/length matches in the IP routing table, and originates that prefix/length into the BGP table. It does not matter if it is a directly connected, static, or IGP route. Aslong as the route lives in the routing table and it is not a BGP route, the network command will take that route and convert it into BGP. 

! To inject a route into BGP, use the following command in BGP config mode. The mask is optional. If the mask is omitted then the router assumes a classful boundary. 

#network [subnet] mask [mask]

There is also the auto-summary command in BGP. The auto-summary command does not affect any network commands with the mask command included. The specific mask specified for the prefix will look into the routing table and advertise only that specific prefix/length. If the mask command is ommited, then the auto-summary command will advertise the classful route.

The classful route is added if:

  • The exact classful route is in the routing table


  • Any subset routes of that classful network are in the routing table

The second way to inject routes into BGP is by using redistribution command in BGP router config mode. This essentially does the same thing as the network command however it has the option of injecting alot more at once.

! Configure redistribution in BGP router config mode

#redistribute [static|ospf|eigrp|rip|connected]

This command has many other options like implementing route-maps and metrics. However, that is out of the scope for this article.

The third way to add routes into BGP is by using summarization. This aggregates several smaller subnets into a larger subnet and advertised out as one prefix rather than multiple individual ones.

! Configure the prefix to be sent out as a BGP Update with accompanying length

#aggregate-address [prefix] [prefix-length] [summary-only]

If you do not specify the summary-only command then BGP will advertise the summarized routes and the specific routes. Specifying summary-only only advertises the summary routes to its neighbor. This command has to be accompanied by a matching network or redistribute command to successfully send the summary. Applying this command alone will not create the route even if it is in your routing table. 

BGP Advertising

BGP has two rules for advertising routes to its peers:

  • Only advertise the best route in any BGP Update (BGP will never send an update with two possible next hops)
  • Do not advertise iBGP learned routes to iBGP Peers

By default a router running BGP will only send networks it originates to its neighboring iBGP router. Once the neighboring router receives those networks, it will not send it on to other iBGP neighbors. The reason is to prevent routing loops. When routes are advertised to iBGP neighbors, the AS_PATH attribute remains the same (thus BGP identifies it as a loop). So by default iBGP neighbors don’t send non-locally generated routes to other iBGP neighbors. This behavior can be changed with configuration, however.

When BGP advertises a prefix to an eBGP neighbor, the next hop IP address is changed by the advertising router. However, when iBGP advertises a prefix to an iBGP neighbor, the next hop IP address is not changed (this behavior is configurable/can be changed). Routes learned from eBGP neighbors can pass through multiple iBGP neighbors. However, since they pass through iBGP neighbors the next hop does not change. This can cause issues because since the next-hop IP address is not changed, routers receiving it may or may not have IP reachability to the next hop IP address advertised. Everytime a BGP update is received on a BGP Router (iBGP or eBGP) BGP will look into its IP routing table and see if the next-hop IP address is reachable. If it is not it will not install that BGP route into the routing table. 

If a router running BGP receives an update from an iBGP neighbor, and the next hop IP address is not reachable then:

  • iBGP-learned routes will not be installed in IP Routing Table
  • iBGP-learned routes will not be advertised to any other BGP Peers
  • Viewable via the show ip bgp prefix/length command as inaccessible

There are a few ways to resolve this issue:

  • Advertise those IP addresses into the internal network (static route, IGP)
  • Use the neighbor next-hop-self command

The neighbor next-hop-self command changes the next-hop IP address to the source address of the neighbor statement you have with your iBGP neighbor. By default, as stated previously, when iBGP neighbors send updates the next-hop IP address is unchanged. This command forces it to change to the source address of the neighbor interface. 

! Configure a iBGP neighbor to send the next-hop IP address of it’s source interface of neighbor relationship in the update message 

#neighbor [IP] next-hop-self

Encryption & Hashing Fundamentals

Let’s start with hashing

Hashing is a one-way mathematical function used to provide data integrity and authenticity. Data of any length is used for the input to the algorithm. The output of the algorithm is a fixed length hash (known as a digest). The length of the hash is based on the bit size of the algorithm. Since the algorithm is one-way, it is impossible to reverse engineer. 
Sometimes the length of the input of the hashing algorithm is more than the bit size of the hash itself – which creates scenarios where different inputs to the same algorithm can cause the hash to be the same. This is where a Collision Attack can come into play for hashing algorithms. A Collision attack is simply using another key to result in the same output by the hashing algorithm. By increasing the bit size of the hashing algorithm, you make it much harder for a Collision Attack to crack.

Hashing algorithms:

  • MD5 (128 bit)
  • SHA1 (160 bit)
  • SHA2 (224, 256, 384, 512 bit)

Hashing alone is very easy to bypass with a man in the middle attack, as the packet information can be replaced. The receiver of the packet will not know the difference because the input of the hash because it’s simply just packet bits. It is unknown to the receiver if it was changed, as the hash will be the same.

Hashed Message Authentication Code (HMAC) uses hashing as a key that is only known between the sender and receiver – which will prevent a man in the middle attack described above. Instead of using packet bits as the hash input (alone), it uses a pre-configured key that is only known to sender and receiver. The input of the hash is the packet bits, AND the secret key.  Devices using HMAC take the plain input of the has, and add on a key for each transaction.

And now encryption algorithms

Encryption is a two-way mathematical function used to provide data confidentiality. The input of the encryption algorithm is the clear-text packet AND the secret key (either asymmetric or symmetric). The output of the algorithm is known as the cipher text. Once a device receives the cipher text on the other side of the encrypted tunnel, it uses the cipher text + secret key as the input of the algorithm. The output is the plaintext packet.
Asymmetric encryption uses a public and private key pair (for each direction of traffic) to encrypt and decrypt the data. 

Symmetric encryption – Uses the same key for encryption and decryption

  • Also known as shared key encryption
  • More efficient, cheaper to perform on hardware
  • Typical length 56-512 bits

Symmetric Algorithms:

DES – Data Encryption Standard (64-bit key, only 56 bit used for encryption)

3DES – Triple Data Encryption Standard (168-bit key, uses 3 keys of 56-bit)

AES – Advanced Encryption Standard (3 Versions: 128, 192, and 256 bit keys)

SEAL, IDEA, Blowfish, Serpent

Asymmetric encryption – Different key is used for encryption and decryption

  • Also known as public key encryption
  • computationally expensive to perform in hardware
  • key length series varies between 512 bit to 32768 bit

Asymmetric Algorithms:

RSA – Rivest Shamir Adleman

DSA – Digital Signature Algorithm

DH – Diffie Hellman Algorithm

ECC – Eliptic Curve Cryptography (ECDH, ECDSA)

Open Shortest Path First (OSPF) Routing Protocol

What is OSPF? OSPF is a link-state routing protocol. OSPF uses a link-state database to build a tree of all the links that live in a area. OSPF uses the concept of ‘areas’ to limit the scope of these trees. Since OSPF is a link state routing protocol, it inherently knows alot more about a particular topology and its links compared to a distance vector protocol. What this means is that a router running OSPF receives all this information from other OSPF routers and keeps all this data in the link-state database (not just network and prefix information – but things such as: bandwidth, L2 encapsulation, delay etc.). After it has gained all of this information, it runs the Shortest Path First (SPF) Algorithm. This algorithm calculates the cost to get to all the networks it has learned within the OSPF routing domain.

OSPF is an open standard, as in it can be implemented by multiple vendors. There are two versions of OSPF:

  • Version 2: RFC 1583, and RFC 2328
  • Version 3: RFC 5340

Version 2 and Version 3 are very similar. Version 3 still has the same basis of foundation as compared to Version 2. However, Version 3 has added support for IPv6. With that, Version 3 has some changes in its configuration so that it can support IPv6.

OSPF Messages

All OSPF packets are identified with IP Protocol code 89. The code can be found in the L3 header of the packet. For dynamic learning of neighbors, the multicast address of is used for hello packets. Unicast is usually used for requests, updates, and acknowledgements.

Before any routes or links are learned about, each adjacent OSPF router must first form a neighbor relationship. Below is a summary of the different types of packets that OSPF uses for neighbor-forming, and updating topology changes:

  • Hello: Fundamental Packet for discovery and holding neighborships in OSPF. It also keeps key information for the formation of OSPF Neighbors. A hello packet is just a packet that is sent periodically out on an interface with OSPF enabled. It is mainly used to form neighbors and to make sure that it’s neighbor is still connected.
  • Database Description (DBD): Only used in the very beginning of the neighbor relationship process. Once hellos are exchanged, DBD are exchanged soon after. DBD is kinda of like a table of contents of what each router has, but it does not provide details of each specific link.
  • Link-State Request: After DBD is exchanged, routers may or may not know the details that are pointed out in the DBD exchange. So a Link State Request is sent out for that information from each router.
  • Link-State Update: Link State Update is then sent in response to the request message, with the details of what is missing.
  • Link-State Acknowledgement: Every Link State Update is also acknowledged, with this type of message.

OSPF Neighborship

For a OSPF neighbor relationship to form, the following parameters (that are found in the hello packet), must match:

  • Hello Interval
  • Dead Interval
  • Area ID
  • Subnet Mask
  • Stub Area Flag
  • Authentication

The following DOES NOT have to match for OSPF to form a neighbor relationship:

  • OSPF Router ID
  • List of neighbors reachable on the interface
  • Router Priority
  • Designated Router (DR) IP Address
  • Backup Designated Router (BDR) IP Address

The Hello Interval dictates how often packets are sent out, and the Dead Interval dictates how long before declaring the neighbor as down. On a LAN interface, the default value of hello interval is 10s. The dead interval is 40s. When you change the Hello Interval on an interface, the Dead Interval is automatically changed to 4x the hello interval. However, when changing the hello interval, it will cause the neighbor relationship of the other side to go down if, that is if the other side doesn’t match.

This is how you configure those intervals on a per-interface basis (Cisco):

! Configure hello interval

#ip ospf hello-interval [value]

! Configure dead interval

#ip ospf dead-interval [value]

! Configure sub second hello interval and dead interval

#ip ospf dead-interval minimal hello-multiplier [multiplier]

! Verify

#show ip ospf interface [name]

In OSPF there is a chronological list of states that OSPF must go through for neighbors to be considered fully adjacent:

OSPF Neighbor States:

Attempt: You will only see this neighbor state when you are configuring static neighbors (for example on frame relay interface). Basically means that OSPF tried to send packets to statically defined neighbor but never hears anything back. 

Init: Init state is when hello packets have been received and exchanged. However, if say the first hello has been exchanged, then the router receiving it sees in the hello packet that it does not name itself in the hello packet (basically mean’s the neighbor has not recognized you yet). Say there are 4 routers on the same broadcast domain, and you receive a hello packet from one of the routers for first time, and in the hello packet you see two other routers listed, but not yourself. This means you are in the init state. After that point the hellos will populate that field. 

2-Way: When Hello Packets have been exchanged but they contain the name of your own router in their. This basically verifies that the other adjacent router has at the very least received your hello. During the 2 way state DR/BDR election is formed. Routers that are DROTHers in a DR/BDR election, stay in the 2-way state with other DROTHERs. They do not exchange DBD or anything else with the other DROTHERs, just the DR/BDR. More on DR/BDR later*

Exstart: The Exstart is started right after the 2-way state. The router goes into Exstart state as soon as the first DBD message is received. In this state, an election is held for who is master and slave for a particular router within an adjacency. This is done by both routers sending empty DBDs to each other, and their RID. The Router with the higher RID is elected the master, while the other becomes the slave. The only reason this master/slave election is used is because DBD descriptors have a sequence number in each DBD descriptor packet sent to the neighbors. DBD are sent using unicast. Whoever becomes the master starts with the first sequence number. Once the election for master and slave is done, the exstart state ends. 

Exchange: The exchange state starts right after the first DBD is sent to it’s neighbor with all it’s headers filled. Again, the DBD is simply a table of contents of what each router knows. The router itself will most likely need to send multiple DBD packets to fully send all of it’s data to the neighbors. The neighboring router know that it received all of the DBD from it’s neighbor because the last DBD packet is flagged with a value indicating that.

Loading: When routers have the same view of the LSIDs, they move to the loading state (after all DBD have been exhcnaged). For any missing LSA the router missing the LSA will send a Link State Request (LSR). The router listening to LSR sends a Link State Update (LSU) back. Every LSR is accompanied by an acknowledgement as well. 

Full: When all LSAs have been sent, received, and acknowledged – the neighbor relationship goes to the FULL state (aka fully adjacent). Database is fully populated. It is then at this point each router runs SPF to calculate the best paths for each subnet. 

OSPF Router-ID:

OSPF creates router ID just like all other routing protocol. Think of a Router ID as a name for the router. Router IDs are critical to the operation of OSPF. If two routers directly connected have the same router ID they do not form a neighbor relationship, and a syslog message is generated. If they are separated by a router (and are in the same area), the neighbor relationship is still formed, but a syslog is also generated saying that their is a router ID match within a topology. If there is Router ID mismatch for routers in different areas, the routers will flush each others LSAs and declare an “OSPF Floor War”. Since every LSA is signed with their RID, having matching RIDs in a topology messes up the LSDB LSAs and sequence numbers. RID will only change if you have either no neighbors, or if the OSPF process is cleared. 

Router ID Election:

  • Configured in OSPF Process configuration
  • Highest Loopback IP
  • Highest IP address on active interface

! Configure router ID in OSPF configuration mode

#router-id [#]

! Verify

#show ip ospf neighbor

#show ip protocols

#show ip ospf database

OSPF MTU Mismatch:

Routers typically have a default IP MTU of 1500 bytes. MTU stands for Maximum Transmission Unit. It is used to indicate how big a packet can be to be forwarded out on a link. If a router needs to forward a packet larger than the outgoing interfaces MTU, it either fragments the packet or discards it. It will depend on the settings of the Don’t Fragment (DF) bit in the IP Header. If it is set (1) the packet is dropped. Otherwise, it is fragmented.  
The value of MTU on OSPF neighbor links should be the same. If there is an MTU mismatch between two OSPF routers, they will not be able to exchange topology information. The neighbors will get to Exstart state, and then go down. A log message will be generated reporting “too many re transmissions”. The reason for this is because during the neighbor process where they are exchanging Database Descriptors, the MTU value is specified on each end of the link. Since that value does not match for that specific link, it will never get past that stage in the neighbor process. 

OSPF Authentication:

OSPF Supports either plain-text or MD5 authentication. OSPF does not support key-chain mode like EIGRP. OSPF Authentication key must be configured statically on the interface. Interface level mode configuration takes precedence over global (aka area) mode configuration

! Enable authentication on an interface

#ip ospf authentication [message-digest]
! Enable on all interfaces in an area by changing the area wide authentication (in global routing mode)

#area [#] authentication [message-digest] 

The Authentication key can only be configured on a per interface basis, and not area wide. Three types of authentication:

  • Type 0: no authentication
  • Type 1: clear text authentication
  • Type 2: MD5 Authentication

OSPF supports multiple keys on the same interface, but not with key-chain. If you are using multiple keys on the same interface, then MD5 authentication must be used. 

! Configure key on interface for plain text

#ip ospf authentication-key [key-value] 

! Configure key on interface for MD5

#ip ospf message-digest-key [key number] md5 [key value]

! Verify

#show ip ospf interface [interface]

#debug ip ospf hello

#debug ip ospf adj

OSPF Network Types

OSPF does classification for every link in a topology. The classification is for determining operational characteristics of each interface:

  • Whether the router will use multicast to discover neighbors
  • If two or more OSPF routers can exist in the subnet attached to the interface
  • Whether the router should attempt to elect an OSPF DR (More on that later*) on that interface

These items are identified by the layer-2 encapsulation of OSPF Links.

See the table below for a list of all the different network types:


  • This network type discovers neighbors automatically
  • This network type supports the use of DR/BDRs
  • Hello & Dead Intervals: 10/40
  • Ethernet, FDDI, Token Ring
  • You can ‘force’ network type by using the #ip ospf network broadcast command on the interface level


  • This network does not discover neighbors dynamically
  • Intervals: 30&120
  • Neighbors must be statically configured:#neighbor ip-address [priority priority]

The neighbor command can work with just one side of a link configured.

Consider the following frame-relay configuration:

interface Serial0

encapsulation frame-relay

no shut

ip address x.x.x.x

ip ospf 1 area x

In the above interface configuration, OSPF would ‘guess’ that this interface is a Non-Broadcast Multi-Access link. Since frame-relay can have multiple, DLCIs on it, it makes the assumption based on that. OSPF itself does not have knowledge of the DLCI config on a frame-relay interface. Since it is a non-broadcast multiaccess link, multicast is not supported on that type of interface. So neighbors HAVE to be statically configured to form a neighbor relationships. You can however configure that interface as a broadcast link with the ip ospf network broadcast command. You still have to make DLCI mapping on the frame relay end so that router knows where to put those multicast packets. When doing this type of setup there has to be a full mesh where every router has a full mesh to every other router in a frame relay cloud.

Now consider the following frame-relay configuration:

interface Serial0

encapsulation frame-relay

no shut

interface Serial 0.101 [point-to-point|multipoint]

ip address x.x.x.x

ip ospf 1 area x

In the above configuration, OSPF still thinks and views this as a Broadcast Multi-access link, even though the OSPF process has been enabled on sub interface. The reason is because it sees the multipoint keyword in sub interface and makes a decision based on that. The previous example ‘guesses’ that its a NBMA link, despite it possibly being point to point.


  • This network type does not elect a DR/BDR
  • This network type discovers neighbors dynamically
  • Interval: 10&40
  • To configure an interface as p2p: #ip ospf network point-to-point


  • This network type does not elect a DR/BDR
  • This network type discovers neighbor dynamically
  • Intervals: 30&120
  • Must be manually set with: #ip ospf network point-to-multipoint

How does point to multipoint network type help with a partial mesh topology:

  • Regardless of actual mask, each router advertises /32 LSAs for its connectivity to frame relay cloud
  • LSAs received on a P-2-MP sub interface are allowed to be flooded right back out the same interface to other neighbors (effectively split horizon is disabled because it goes to different DLCIs)

Changing a broadcast network into a point to multipoint network can have certain advantages. Static neighbor configuration can allow per-neighbor cost configuration. This is done using the neighbor x.x.x.x cost [x] command. Usually the cost is derived from the interface it connects to (fast ethernet, serial etc). However, with point to multi point non broadcast you can specify cost PER NEIGHBOR, and not per interface. 

How to memorize OSPF network types:

  • Any network types with keyword nonbroadcast basically means that they cannot discover neighbors dynamically and must use static configuration of neighbors.
  • If network starts with point it doesn’t use a DR/BDR
  • Only broadcast and point to point use faster timers of hello 10 dead 40. 

What is DR/BDR?

DR stands for Designated Router and BDR stands for Backup Designated Router. On a link with OSPF enabled, if it classified as broadcast or non-broadcast link then a DR/BDR election is initiated. The reason is because on these types of links there is the possibility of more than two (2) other OSPF routers living on it. By electing a DR/BDR, these two routers act as the “hub” for the neighbor relationships on this link. The benefit is that it reduces LSA flooding and neighborship overhead.

DR/BDR are elected based on information in OSPF hello packet. Hello packet lists each routers RID and a priority value. Who ever has a high priority, gets elected the DR, with BDR being second highest priority. If priority is same then the highest RID is used to elect a DR. A DR stays the DR aslong as it is connected to the LAN and neighbor relationship doesn’t go down, even if a new router is added to the link with a higher priority/RID. Once the DR goes down, the BDR becomes the DR, and a new BDR is elected (if there is one).

You can configure the priority on a per interface basis:

#ip ospf priority [value]

DR/BDR for a particular segment use the multicast address instead of DR and BDRs ONLY listen to multicast addresses. The multicast address is only used for multicasts going TO the DR/BDR. The DR itself will send DBD to the multicast address of The DR is the only router that forms a FULL neighbor relationship with all other routers on the segment. The others routers (called DROTHERS) stay in 2-way state with each other.

OSPF Areas

OSPF implement the concept of areas in the protocol itself. When you enable OSPF on a routers interface, you have to explicitly state which area it is a part of. The area identification is a numeric number from 0-255. There is no specific criteria to use certain numbers, however, there is one cardinal rule about OSPF areas: All non-backbone areas must hook up to the backbone area. The backbone area is Area 0. Non-backbone areas are any areas that are not Area 0. Each area in a OSPF domain has its own Link State Database, and its own SPF calculation for how to get to routers within its area. An Area Border Router (ABR) is a router that has 2 links in more than one area. Any link changes in a particular area do not force a SPF re-calculation in other areas. When designing out OSPF for a network, knowing how areas work is extremely effective. You want to use area separation so that not every router in your topology is forcing a SPF recalculations. Separating your links into areas creates opportunities for route manipulation and prefix summarization.

Each OSPF interface is placed in an area. A router within an area send LSAs for everyone in that area. Each router has a link state database where it keeps tracks of all the LSAs it learns. Each router in an area builds a ‘tree’ of what the topology looks like with all the LSAs it receives and places it in the link state database. Everytime a LSA is received on a router, the links that are based on that are torn down and rebuilt, factoring in the new LSA.

OSPF Router Roles

Area Border Router (ABR) – Any router that connects to more than one area 

Autonomous System Border Router (ASBR) – any router that connects multiple AS’s together (via redistribute command)

Designated Router (DR) – In every broadcast domain a router is elected as a DR. A DR is responsible for receiving Type 1 and Type 2 LSAs from multicast address and sending those LSAs back out into the area to A router is elected a DR if it is the first router on that segment. If they are powered on at same time, then higher OSPF priority is elected the DR. If the priority is the same, the router with highest router id is elected the DR.

Backup Designated Router (BDR) – In every broadcast domain a router is elected as a BDR. A BDR is simply a backup to a DR. The router with second lowest DR becomes a BDR.

DROTHERS – Routers that are not elected a DR or a BDR. These routers stay in 2-way neighbor state between each other.

OSPF Link-State Advertisements (LSAs)

In OSPF each router stores data which is composed of individual link-state advertisments (LSAs) in it’s Link State Database (LSDB)

Each router within an OSPF area must have the same link state database information. In addition there are two LSDBs if a router belongs to more than one area. 

Each router individually runs the Shortest Path First (SPF) Algorithm. This Algorithm runs each time for each area a router is a part of. Each router considers itself to be the root of the tree and ‘draws’ its branches towards each of the destination via the shortest path. LSAs in OSPF LSDB are like pieces of a puzzle. The SPF process must examine the individual LSAs and see how they fit together based on their characteristics.

Types of LSAs:

Type 1 Router LSA: Router LSA is a fundamental LSA for creating the so-called tree that OSPF uses to calculate via SPF Algorithm. This LSA is generated to describe the interfaces connected on a particular router running OSPF. One LSA is generated per area per router. In the body of one LSA, is the combination of one or more sub headers for all the interfaces in that particular area. The LSA is then flooded to all other routers in a particular area. The Router LSA is fundamentally just a description of the interface for that particular router, and associated via the router ID. When another router in an area receives Type 1 LSA, it associates the links with the Router ID of the other router. Type 1 LSA does not go beyond its own area. 

To view Type 1 LSAs use the following command: #show ip ospf database router [RID]

LS Sequence Number is a number to identify quote on quote the version of the LSA. If say a IP address changes on an interface participating in OSPF, OSPF first poisons the route by sending an update to that LSA Sequence number with an age of 60m. This effectively kills that LSA, and the new LSA with the new sequence number is used instead. 

OSPF identifies a Type 1 LSA using a 32 bit link state identifier (LSID). Each router then uses its own OSPF router id as the LSID.

Each LSA (associated with a LSID) will then have link data for each interface depending on the type of interface:

  • Interface with no neighbors: 1. Its subnet number/mask is advertised 2. Described as a ‘stub network’
  • Interface with DR: 1. The IP address of the DR 2. Link connected to a ‘transit network’
  • Interface with no DR: 1. it lists the neighbors RID 2. Link connected to ‘another router (point to point)’ 3. Point to point interface also create a second Link Data describing the network as a stub network, with the subnet and mask included.

Type 2 Network LSA: Type 2 LSA is generated for multi access networks. It is required for OSPF to properly map all connected routers to a single multi access network, like a LAN. The generation of a type 2 LSA depends on the existence of a DR. Only the DR in a particular multi access network creates this LSA. All other routers (BDRs and DROTHERs) do not. The LSA is flooded by the DR to all other routers in the area. The content of the LSA is the subnet, mask, and all the participating routers in that broadcast domain (RIDs). 

A type 2 LSA is not generated for a link that is connected to a stub network (or a network with only one router on it). However, once you form a  neighbor relationship on a multi access link then a type 2 LSA is flooded within the area.

! To view a Type 2 LSA

#sh ip ospf database network

Type 3 Network Summary LSA: Type 3 LSA are not generated in single area deployments of OSPF. You will only see a type 3 LSA if you include more than one area.

Area Border Routers (ABRs) are used to connect different OSPF areas  together. ABRs do not forward the type 1 and type 2 LSAs on to other areas. ABRs generate a Type 3 LSA for each subnet in a particular area  and they are advertised out to another area. The Type 3 LSA only contains subnet and route information, no details of links. A type 3 LSA consists of each subnet and a cost to reach that subnet from that ABR. A Type 3 LSA does not initiate a SPF recalculation. 

The ABR assigns an LSID of the subnet number being advertised. The ABR also adds its own RID in the LSA, because multiple ABRs can advertise the same subnet with the same LSID.

! To view a Type 3 LSA

#sh ip ospf database summary

Type 4 ASBR Summary LSAB: This LSA is generated by Area Border Routers (ABRs). This LSA is created when a Type 5 LSA is also advertised throughout the whole OSPF domain. Since the type 5 LSA is advertise throughout the whole OSPF domain (past all areas), routers in other areas cannot calculate how to get other areas wherever the ASBR might live. So what Area Border Routers (ABRs) do is create a Type 4 LSA, saying “Hey, I know how to get to the ASBR in this non-transit area, go through me”. This LSA simply states: “to reach the ASBR Router-ID, come through my Router-ID”

Type 5: AS External LSA: The Type 5 LSA is an LSA generated by ASBRs when redistributing routes from outside the OSPF Domain into OSPF. Whoever is doing the redistributing then becomes an ASBR. They are flooded within the entire OSPF domain, unchanged.

A Type 5 LSA has two sub types:

Sub-Type-1 tells all routers receiving this Type 5 LSA to allow cost calculation on this LSA.

Sub-Type-2 tells all routers receiving this Type 5 LSA to NOT allow cost calculations on this LSA. This tells the router to install this LSA with original cost that the ASBR was advertising and do not add cost calculation.

Type 6 Group Membership LSA:

Type 7 NSSA External LSA: The Type 7 LSA is a LSA generated when a router (ASBR) within a NSSA, is redistributing routes into OSPF. This LSA is Flooded within the area, and learned by every router. Once the Type 7 is reached to an ABR, that Type 7 is converted to a Type 5 for the other areas to learn.

Type 8 External Attributes LSA:

Type 9-11 Opaque LSA:

Each LSA has a age timer of 30m. When no changes to an LSA occur for 30m, the owning router increments the sequence number, resets the timer to 0, and re-floods the LSA. 

LSAs are poisoned by flooding the LSA to it’s neighbor and setting the timer to the max age setting (3600s).

Enabling OSPF (V2 and V3) on Cisco

You can enable OSPF on interface in two ways:

  • With the network command (under routing config)
  • With the ip ospf [process] area [num] command (under interface config)

! Enable OSPFv2 routing with certain process number. You can run multiple instances of OSPF if desired. The number here is only locally significant.

#router ospf [number]

! Enable an interface with the network command. This command tells the router to look for any interfaces that start with the network address and enable OSPF on that interface. This automatically makes the interface start sending hellos out on the interface, and also advertising that network into the OSPF domain (that is, if it has any adjacent neighbors). The network command also specifies the area number of where the link resides.

#network [network address] [wildcard mask] [area #]

! Enable OSPFv3 routing with certain process number. Again, the number is only locally significant.

#router ospfv3 [number]

! Enable an interface with the ip/ipv6 ospf command in interface configuration

#ip ospf [process-id] area [number]

#ipv6 ospf [process-id] area [number]

! Verification

#debug ip ospf adj

#sh ip ospf neigh

#sh ip ospf interface

#sh ip protocols

#sh ipv6 ospf interface brief

#sh ipv6 ospf neigh

#sh ipv6 ospf database

#sh ipv6 protocols

#debug ipv6 ospf adj

OSPF Path Selection

OSPF analyzes each route it receives and determines the best path for each route by doing a metric calculation. OSPF calculates the metric by doing the following:

  • Analyze the LSDB to find all possible routes to reach a particular subnet
  • For each possible route, add the OSPF interface cost for all outgoing interfaces in that route.
  • Pick the route with the lowest total cost.

The OSPF Cost is a metric derived from the egress interface bandwidth. 

! View OSPF Cost for an interface running OSPF

#show ip ospf interface

Intra-Area is when routes/traffic are inside the area itself. Inter-Area is when routes/traffic are outside the area. The terms are used to describe traffic that is flowing internal to an area or flowing through other areas.


To calculate the best route to each subnet, a router analyzes the LSDB and does the following:

  • Finds all subnets inside the area, based on the stub interfaces listed in the Type 1 LSA and based on any Type 2 Network LSAs.
  • Run SPF to find all possible paths through the area topology.
  • Calculates the OSPF interface costs for all outgoing interfaces in each route, picking the lowest total cost route for each subnet as the best route. 


An ABR advertises a Type 3 Summary LSA to adjacent areas. The way the neighbors in accompanying areas calculate the cost is by adding the cost that is advertised in that LSA, to the cost it takes to get to that ABR. It uses the same method as inter-area, but adds the cost to get to the ABR as well as adding the cost of the Type 3 Summary LSA.

Priority of route selection:

  • Intra-Area (Received a Type 1 LSA/Type 2 LSA)
  • Inter-Area (Received a Type 3 LSA)
  • External (Type 5 LSA or Type 7 LSA)

There are 3 ways to change the OSPF Cost/Metric:

  • Changing the reference bandwidth
  • Setting bandwidth
  • Configuring cost directly

OSPF Calculates the OSPF cost for an interface based on the following formula: reference-bandwidth / interface bandwidth = OSPF Cost

The default reference bandwidth on all interfaces is 100Mbps.

The reference bandwidth can be changed using the following command: #auto-cost reference bandwidth [#]

This command is only locally significant to the router, because it calculates the cost after the fact.

The bandwidth can be changed using the following command: #bandwidth [#]

Other routing processes such as QOS and other routing protocols use this command as well to influence their operations. 

The cost can be changed directly using the following command: #ip ospf cost [value]

OSPF Stub Areas

In OSPF there are four (4) variation of stub areas: 

  • Stubby Area
  • Totally Stubby Area
  • Not-So-Stubby-Area (NSSA)
  • Totally Not-So-Stubby Area (NSSA)

All types of stubby areas filter the Type 5 External LSA. Any area that starts with ‘Totally’ means that the ABR also filters out Type 3 LSAs. Any area that does not start with ‘Totally’ means that Type 3 LSAs are allowed to be learned and advertised in the area by the ABR.

Any other area besides Area 0 can be defined as a stub area. A stub area allows the routers in an area to use default routes for forwarding packets to the ABR rather than specific routes. The ABR injects a default rout into the stub area. All Type 5 LSAs will not be advertised into the stub area. ABRs create a default route using a Type 3 LSA and flood that into the Stub area. The default route has a metric of 1 unless otherwise configured using the command area [area-number] default-cost [cost]
When configuring stubby areas, all routers must be configured as a stub area. If not then neighbor relationships will not be formed. This is based on the Stub flag in the hello packet. Any router in a stub area cannot become an ASBR. The reason for this is because all the routers in a stub area have agreed that no Type 5 LSAs can be created or advertised into the area. Creating an ASBR in the area goes against that rule.

! Configure an area as a Stub Area in router config. This command should be configured on all routers in the area. 

#area [area-number] stub

! Specify metric for default route that ABR injects.

#area [area-number] default-metric [metric]

! Configure an area as a Totally Stub Area. This command only needs to be done on the ABR because it is the only router in an area that creates a Type 3 LSA.

#area [area-number] stub no-summary

Not-So-Stubby-Area (NSSA) are areas that allow any router in that area to become an ASBR with the help of a Type 7 LSA. With the stub/totally stubby areas, it was not allowed for those routers to become ASBRs because all Type 5s are filtered into all types of stub areas. So the way that this is solved is by using a Type 7 LSA, which has the exact same contents as a Type 5 LSA. The Type 7 LSA is only generated on ASBRs in NSSAs. The Type 7 is flooded by the ASBR within the entire NSSA. However, once it reaches an ABR, that Type 7 is then converted into a Type 5 LSA and flooded out. NSSA area must be configured on all routers in the area. When configuring NSSA the ‘NSSA is Supported’ bit is set. OSPF routers will not form a neighbor relationship if one side is a stub and the other is a NSSA.
When configuring NSSA (as compared to Stub/Totally stub areas), a Type 3 LSA of all 0s (aka default route) is not injected automatically into the NSSA. An Extra command is needed to configure this: area [area-number] nssa default-information originate

! Configure a NSSA in router config

#area [area-number] nssa

! Configure a Totally NSSA in router config. This only needs to be done on the ABR

#area [area-number] nssa no-summary

! Configure default route into a NSSA

#area [area-number] nssa default-information originate

! Verify

#show ip ospf

#show ip ospf database

#show ip ospf database database-summary

OSPF Route Summarization

OSPF allows summarization at ABRs or ASBRs. The reason for this is because all OSPF routers in an area must have the same exact LSDB. Summarization is done on LSAs not on routes. Summarization can be done to reduce the size of the LSDB (save memory, CPU etc). Summarization can also be used for path manipulation. 

! Configure summarization on an ABR, use the following command in routing config

#area [area id] range [ip address] [mask] [cost [#]]

The configured area number refers to the area where the subnet you want to summarize exists. The summary will be advertised into all other areas connected to the area. There should be at least one subordinate subnet inside the range of the summary route for the summary route to actually be advertised. The ABR does not advertise the subordinate subnet Type 3 LSAs, only the summarized version of the Type 3 LSA. The ABR assigns a metric for the summary routes Type 3 LSA by default to match the best metric among all subordinate routes (AKA the lowest metric). The area range command can also explicitly set the cost of the summary (instead of whatever is route has the lowest metric).

! Configure summarization at an ASBR under router config

#summary-address [prefix] [mask]

OSPF Default Routing

There are two ways to introduce a default route into OSPF, either within a specific area or throughout the whole OSPF domain. 

! Configure default route across all areas, use the default-information originate command in routing config

#default-information orignate [always] [metric [metric-value]] [metric-type [metric-type]] [route-map [map-name]]

If you just type default-information originate, then the router will look into its routing table and if it has any route that starts with all 0s (aka a default route) then the router will give permission to OSPF to create a Type 5 LSA and flood it through the whole entire OSPF Domain. The always keyword does not do the initial check if there is an all 0s route in the routing table, it will just create the LSA and flood it regardless if you have a default route learned or configured. 

When all the default parameters of this command are used, this command injects a default route into OSPF as a Type 2 route, using a Type 5 LSA with a metric of 1. A Type 5 LSA has two different types. A sub-Type 1 and a sub-Type 2. A Type 1 tells all the routers it gets to that it can modify the metric as it goes through each router. A Type 2 tells all the routers to not modify the metric and leaved it unchanged throughout the whole OSPF domain. OSPF by default makes all external routes a cost of 1 (for static and IGPs). You can also use route map in the default-information originate command to ‘track’ routes in your routing table. If the route disappears, then the Type 5 LSA will be poisoned.

OSPF Route Filtering

Routers in the same area MUST have the same LSAs in their LSDB. Therefore it’s impossible to filter routes that are learned within an area. OSPF can filter the origination of an LSA BETWEEN areas. This is accomplished by telling the border router to not generate type 3/5 LSAs all together. Type 3 LSAs are filtered by ABRs, type 5 LSAs are filter by ASBRs. Type 3 LSAs are filtered PRIOR to origination. 

! Configure type 3 LSA filtering on ABRs using prefix lists:

#area [#] filter-list prefix [name] [in|out]

When ‘in’ is configured, IOS filters prefixes being created and flooded into the configured area

When ‘out’ is configured, IOS filters prefixes coming out of the configured area

! Configure area range for filtering 

#area [#] range [x.x.x.x] [mask] not-advertise

Not advertise keyword turns the area range command (usually used for summarization) into a filtering mechanism. Does not require prefix list or ACL. The big difference between this command compared to the area filter list command is that the area range command looks at the type1 and type2 LSAs for the range that you specify. If the area does not have a type 1 or type 2 LSA for that range, it will not filter it. So this command only works if you are filtering traffic that are directly filtering OSPF routes into the routing table (instead of filtering the LSAs)

Filtering this way essentially filters routes being learned into the routing table. It does not stop the generation or learning of LSAs in anyway. To accomplish this filtering a distribute list is used.

Internet Protocol Version 6 (IPv6)

What is IPv6? It is the latest (not really new at this point lol) version of the IP Protocol. The main reason that this version was developed is to alleviate the IPv4 Address Exhaustion happening across the internet. It boasts a longer address of 128 bit vs the 32 bit of IPv4. It also has fundamentally changed the way some communication happens as compared to IPv4. These details will be discussed below. Without further ado, let’s get into it!

IPv6 Structure

IPv6 is a 128-bit address represented in Hexadecimal. Each character in an IPv6 address represents 4 bits of data. That 4 bits is presented by a hexadecimal character (0-F). When we think of IPv4 and the whole idea of Subnet Masks and CIDR notation – that still holds true in IPv6 the exact same way. The whole point of a Subnet Mask is to define what is the Network portion and what is the Host portion in a particular address. A IPv6 address with a /64 mask tells you that the first 64-bits is the network portion and the latter 64-bits is the host portion. In IPv4, the mask could be represented using CIDR (/24) or subnet mask ( In IPv6, the only way that it is represented is by using CIDR notation (which makes sense considering how long the address actually is).

Above, you will see an example of the structure of an IPv6 address. This has been defined in RFC 4291 as a global unicast address. More on the different types of IPv6 addresses later*. The take away from this chart is to get familiar with the the full-length format of IPv6. You have eight (8) sixteen (16) bit sections, and they are all separated by a colon. Again, each character represents a hexadecimal character (0-F) of 4 bits.

How to Write IPv6

Since IPv6 is very long, it can be a pain to write sometimes. Luckily, IPv6 addresses can be shortened/abbreviated. Take for Example the address in the previous diagram:


This address can be shortened to the following:


Let’s start by specifying what are the rules for shortening IPv6 Addresses. (BTW, the shortened version of IPv6 can also be used in configuration)

  1. Leading Zeroes in a 16 bit block (aka within a semi-colon) can be ommited. So in our example, I shortened the IPv6 address: 0db8 turned into db8. 0015 was shortened to just 15.
  2. A double semi-colon can be used in place of an all zeroes 16-bit block. The double semi-colon can be used not just for one 16-bit block that has all 0s, but also two blocks, that are adjacent to each other. A double semi-colon cannot be used two times in one address. So in our example, I could essentially “skip” block 15 all the way to block 1a2f just by putting a double semi-colon.

IPv6 L3 Header vs IPv4 L3 Header

There are a few key differences in the headers for IPv6 compared to the IPv4 :

  1. Fragmentation is dealt with at the host level for IPv6. If a router receives a packet that is too big to be put on another link (aka MTU is smaller for whatever reason), then the Router running IPv6 will send back a ‘too big’ ICMPv6 packet back to the host. The too big ICMPv6 packet essentially tells the host: “hey your packet is too big, chop it up into something smaller than x”. If you compare this to IPv4, routers running IPv4 actually perform the fragmentation of packets instead of the host. Since IPv6 routers pushes the fragmentation to the host, the following headers are not in IPv6: Identification, Flags, and Fragment offset
  2. The flow label is used to uniquely identify a flow of packets. For example, if a certain host sends 100 packets to google.com, a unique flow number will be generated to identify the unique flow. This is not in IPv4 header.
  3. TTL Field is renamed to Hop Limit.
  4. Checksum is removed completely. The reason for this is that all upper level protocols already have an implementation for error-checking, so having it in the L3 header is redundant.

IPv6 Address Types

There are lots of different IPv6 Address types. They all have a unique purpose and function for the operation of IPv6. Similarly, the same could be said for IPv4.

Global unicast:

A global unicast address is a globally unique address (aka routable through the internet). Currently IANA has assigned only 2000::/3 addresses to the global pool (as of this writing).

Unique Local:

A unique local address (ULA) is an IPv6 address in the block fc00::/7, defined in RFC 4193. It is the approximate counterpart of the IPv4 private address space.

Link Local:

The link-local address can be used only on the local network link (aka unique to a VLAN). Link-local addresses are not valid nor recognized outside the subnet. FE80:/10 is a Hexadecimal representation of the 10-bit binary prefix 1111111010. This prefix identifies the type of IPv6 address as link local. Link local addresses use the EUI-64 method to identify its interface id.


IPv6 multicast operates the same as in IPv4. A packet sent to a multicast address is delivered to all interfaces identified by the multicast address (in a given scope). in IPv6, FF00:/8 is the pool for the multicast addresses.  In FFxy multicast addressing, the x will denote permanent (0) or temporary (1) addressing.  The y will denote the scope of the address:

  • y=1 means interface local (kinda like an interface-based loopback)
  • y=2 means link-local so they can’t be routed (within subnet)
  • y=4 means admin-local which is really a bit varying in scope
  • y=5 means site-local which should be your site’s physical infrastructure.  Routable yes, but not outside your site.
  • y=8 means organization-local which implies autonomous system number like in BGP (think Site prefix in 1.12 picture)
  • y=E fully routable/usable on the Internet.

Solicited Node Multicast:

When a IPv6 interface is enabled on any device, a solicited node multicast address is created too. For every IPv6 assigned on an interface, a matching solicited node multicast is created (for link local AND global unicast) . The solicited node multicast starts with ff02::1:ff/104. The last 24-bits of the interface id from the IPv6 address is used in the address.

The solicited multicast address purpose is to be able to eliminate ARP/broadcast that were originally used to find the MAC address of a particular host in IPv4. The solicited multicast address is also used for duplicate address detection on a subnet, more on that later*. However, in IPv6, the MAC address is found by initiating a Neighbor discovery process. This is where the solicited multicast address comes into play. The device sends a neighbor solicitation packet to the device with the address that start with ff02::1:ff/104. The reason it knows how to send it to a specific solicited node multicast address to that individual host is because it extracts the last 24 bits out of the interface-id of the original IPv6 address you’re trying to talk to and slaps the last 24-bits of interface-id on the end of it: ff02::1:ff[24bits]. The destination device is listening on that multicast address, so when it receives it, it responds back with its full MAC Address. This is similar to the operation of ARP for IPv4. However, instead now it uses multicast packets instead of broadcast. The multicast is also only going to that specific host because that host is the only one listening with that specific solicited node multicast address.

ICMPv6 Neighbor Discovery

The collection of ICMPv6 features makes up what is called Neighbor Discovery Protocol (NDP). NDP is used for MAC-finding, duplicate address detection and Stateless Auto Address Configuration (SLAAC)

ICMPv6 Message Types:

  • Neighbor Solicitation: During Duplicate address detection a device sends a ICMPv6 packet destined for the solicited node multicast address of itself and source is the link local address or :: (all 0s). If it receives a reply from that neighbor solicitation, then obviously that address is already used on the local area network. Duplicate address detection is used for link local and global ipv6 addresses. 
  • Neighbor Advertisement: Is the reply message to a Neighbor Solicitation message (either for duplicate address detection or to find a MAC address on a subnet)
  • Router Solicitation: A device that is configured for IPv6 Stateless Auto Configuration sends out a router solicitation with the destination as FF02::2. Only routers running IPv6 listen to this multicast address.
  • Router Advertisement: Routers then send back a router advertisement back to the accompanying router solicitation message (destination could be the link local address of the specific node or FF02::1). Located in this RA is the prefix for the routers interface. Router advertisements are sent periodically, they do not have to respond to a solicitation message. 

IPv6 Autoconfiguration

All interfaces on IPv6 nodes must have a link-local address, which is usually automatically configured from the identifier for an interface(EUI-64 + MAC) and the link-local prefix FE80::/10. A link-local address enables a node to communicate with other nodes on the link (aka subnet) and can be used to further configure the node.

Nodes can connect to a network and automatically generate global IPv6 addresses without the need for manual configuration or help of a server, such as a Dynamic Host Configuration Protocol (DHCP) server. With IPv6, a device (typically the default gateway / router) on the link advertises global prefix in Router Advertisement (RA) messages, as well as its willingness to function as a default device for the link. RA messages are sent periodically and in response to device solicitation messages.

A node on the link can automatically configure global IPv6 addresses by appending its interface identifier (64 bits) to the prefixes (64 bits) included in the RA messages. The resulting 128-bit IPv6 addresses configured by the node are then subjected to duplicate address detection to ensure their uniqueness on the link. If the prefixes advertised in the RA messages are globally unique, then the IPv6 addresses configured by the node are also guaranteed to be globally unique. Device solicitation messages, which have a value of 133 in the Type field of the ICMP packet header, are sent by hosts at system startup so that the host can immediately autoconfigure without needing to wait for the next scheduled RA message. This is also referred to as Stateless Address Autoconfiguration (SLAAC)

EUI 64

With IPv6, you can manually set them just like IPv4. However, since IPv6 addresses are bigger than MAC Addresses (which are globally unique), why not convert those MAC addresses into the host portion of the address? That is exactly what EUI-64 does. It converts a 48-bit MAC Address into a 64-bit counterpart, and placing that into the Interface-ID of the address. This is how it is accomplished:

A 64-bit interface ID is created by inserting the hex number FFFE in the middle of the MAC address. Also, the 7th Bit in the first byte is flipped to a binary 1 (if the 7th bit is set to 0 it means that the MAC address is a burned-in MAC address.). When this is done, the interface ID is commonly called the modified extended unique identifier 64 (EUI-64).

For example, if the MAC address of a network card is 00:BB:CC:DD:11:22, then the interface ID would be 02BB:CCFF:FEDD:1122.

Why is that so, you might ask?

Well, first we need to flip the seventh bit from 0 to 1.

MAC addresses are in hex format. The binary format of the MAC address looks like this:

hex 00BBCCDD1122 
binary 0000 0000 1011 1011 1100 1100 1101 1101 0001 0001 0010 0010

We need to flip the seventh bit:

binary 0000 0010 1011 1011 1100 1100 1101 1101 0001 0001 0010 0010

Now we have this address

hex 02BBCCDD1122

Next we need to insert FFFE in the middle of the addres:

hex 02BBCCFFFEDD1122

The resulting Interface ID is 02BB:CCFF:FEDD:1122.


Identifies a group of interfaces, usually on different physical nodes. Packets that are sent to the anycast address go to an anycast group member node that is physically closest to the sender.

IPv6 Configuration (Cisco)

! Enables the forwarding of ipv6 unicast datagrams globally on the router. This also permits the router to send ICMPv6 RAs. 

#ipv6 unicast-routing

! Automatically configures an ipv6 link local address and enables ipv6 processing

#ipv6 enable

! Configure a global ipv6 address

#ipv6 address 2001::[xxxx]

! Configures a global ipv6 address with eui-64 format set on the lower order bits

#ipv6 address 2001::db8:0:1::/64 eui-64

! configure specific link local address

#ipv6 address fe80::[xxxx]/64 link-local

! Configure default route in ipv6

#ipv6 route ::/0 [next hop ipv6 address]

! verify/tshoot

#sh ipv6 int brie

#sh ipv6 routers

#sh ipv6 route

#debug ipv6 nd

! See which ipv6 devices have been mapped to which MAC address

#sh ipv6 neighbors

Under the state section of this command the following output can appear:

  • Incomplete – INCP -Process of being established
  • Reachable – REACH – Confirmed and established
  • Stale – STALE – Mapping was successfully established but has since been expired and has not been successfully re established
  • Delay – DELAY – Mapping was successfully established but has since been expired and has not been successfully re established
  • Probe – PROBE – Neighbor confirmation messages are being re sent to remote interface on an ongoing basis in attemt to reestablish mapping

! Configures the interface to use stateless auto configuration. When this command is initiated a link local address is created automatically, and the interface listens to RAs for global-multicast assignment.

#ipv6 address autoconfig

! Verify that address was configured on the interface

#show ipv6 interface FastEthernet 0/10

IPv6 Subnetting – How the hell do you do it?

It’s actually really easy. It’s the same concept as IPv4 subnetting, except your dealing with hexadecimal not dotted decimal.

Let’s look at a few interesting examples:

IANA will be using 2000::/3 for all global unicast routable addresses. But a /3 doesn’t line up so well. What does a /3 even mean? Well again, it’s bits to bit comparison. It means that the first 3 bits in this address, if they match then that means that every other bit to the ‘left’ of that will be considered a globally routable address.

2000 in binary is 0010 0000 0000 0000

IANA will be using the first 3 bits of this address for all globally routed addresses. So essentially, anything starting with a 2 or a 3.

Let’s look at another example: FC00::/7 (private address space for IPv6)

FC00 in binary is 1111 1100 0000 0000

So any address where the first 7 bits are those binary bits, mean that it is a private address. Say for example, I assigned a IPv6 private address to the following: FD00::1/7 – this is still technically in the range of FC00::/7 even though it actually doesn’t start with FC. The reason is because aslong as the first 7 bits match, then it is still in the range. I flipped the 8th bit. By flipping the 8th bit, I changed the hexadecimal character to a D. But that is still a perfectly valid private address space assignment.

In most scenarios, You will be given a /48 block (aka site-prefix in the above diagrams). From there, you can use the next 16 bit block to create different subnets. The last 64-bits can either be implemented manually for servers, or some variation of DHCPv6 / SLAAC. But that does not mean you cannot make smaller subnets. You can still assign a /100, or even a /127 for point to point links. The same concepts of subnetting apply from IPv4 to IPv6, it’s just getting used to Hex.

Spanning-Tree Protocol (STP)

Spanning-Tree Protocol (STP) is a protocol to prevent loops in a switched 802.11 network by blocking ports based on a couple key parameters. 1: Priority of a bridge and 2. Port Roles for each Switch connection. STP uses Bridge Protocol Data Unit (BPDU) to converge the network. There is an election process using BPDUs to elect the root bridge of the network. Each BPDU has BID of who they think the root bridge is (themselves at first), and their own priority to identify the root bridge. Once a bridge sees a lower priority they stop generating BPDUs and only forward BPDUs from the root (this is only the case for vanilla 802.1D STP). Eventually a bridge is elected the root in a topology – and looped connections are blocked ( e.g discarding/blocking) based on alternate paths to the root bridge.

Sequencing of STP

1. Elect a root bridge in a topology. The root bridge is elected by determining the switch/bridge with the lowest Bridge Priority within a topology. By default all cisco switches have a default priority of 32768.[internal mac-address]. The Bridge with the lowest priority in a topology becomes the root bridge. Every BPDU has a field for: sending bridge and root bridge priority. By default when a switch turns on it automatically thinks its the root bridge, but if it sees a lower priority, then it goes silent (only in vanilla 802.1d STP), and only advertises the root bridges BPDUs at that point. When the root bridge has been identified in a topology (usually after several seconds), it puts all of its ports in Forwarding state, with a port role of Designated Port. The job of a designated port is to forward BPDUs. Designated port and Root ports are in the forwarding state. Meaning they can forward ‘user’ traffic. 

2. Find the root port for every switch that’s not the root bridge. Every switch in a topology that isn’t the root bridge has to find their root port, which is the port that has least cost to get back to the root bridge. The way this is calculated is by calculating the STP Cost. When a BPDU is sent to a non-root bridge, in the BPDU is a field for Cost to Root. This is calculated by the speed of the interface in relation to the root bridge. If a switch that has gigabit ethernet interface and directly connected to root bridge, the root bridge BPDUs will send a cost of 0 (because the root bridge costs nothing to get to himself). However, when that BPDU is sent to any other switches “down the line” it adds those costs together every time it gets to a bridge. Lower the cost is to the root, the easier it is to get to the root, and that port is put into the forwarding state, as a root port. Once a non root bridge receives Cost information from two possible ports to get to the root bridge, it will pick the port with the lowest total STP cost to assign the root port. If root STP cost is the same for both BPDUs, and the sending bridge ID is the also the same; it uses the sending port ID field in a BPDU to determine which one to choose. The Sending Port ID field tells you port ID (ex: gigabit0/1) and the port priority (default 128)). The lower port number/priority is chosen.

3. Find Designated Ports for every non root bridge. These are the rest of the connections that wouldn’t cause a loop in the network (aka to end nodes).

4. The Ports that are left are non designated ports. These ports are put in the blocking state because they form a loop.

Per-VLAN Spanning-Tree (PVST)

1. Specified in 802.1d

2. On by default for all Cisco Switches

3. Load Balance sharing between vlans (uses 1 vlan on one port, and another vlan on the other alternate port)

Rapid Rapid Per-VLAN Spanning-Tree(PVST)

1. specified in 802.1w

2. Proposal and agreement bit are added as flags in new BPDU header. If a bridge sends an agreement bit the ports are automatically put in the states that STP calculates. Can only work on a full duplex point to point link.

3. All Switches generate BPDUs every 2 seconds. Where as in 802.1D (STP) only the root bridge sends BPDUs

4. Uses BPDU protocol version 2 (compared to 0 in 802.1D)

5. When any RSTP ports receive legacy 802.1d BPDUs it falls back to legacy STP and the inherent fast convergence benefits of 802.1w are lost.

Port Roles for RSTP

1. Root port – same as 802.1d

2. Designated port – same as 802.1d

3. Alternate port – Means you received a BPDU from the root from another switch that can lead to the root bridge.

4. Backup port – Means you received a BPDU from yourself.

BPDU Frame Format

STP Cost Table

  • 10Mpbs – 100
  • 100Mpbs – 19
  • 1000Mpbs – 4
  • 10gig – 2

Timers (defaults)

Hello Timer = 2s – Every time a local switch sends a BPDUon the link

Max Age = 20s – Every time a bpdu is received on a port, the timer is set back to 0. If Max Age ever reaches 20s, and still havent seen a BPDU from the root bridge, it will start the process of electing a new root bridge. Set 10 times the hello timer.

Forward Delay = 15s – this governs the port states (listening, learning, etc.). The forward delay applies to each port state when a port detects electrical signal.

Message Age = Similiar to TTL. The message age is incremented by 1 every time it goes through a bridge. Cisco devices do not enforce the age on their devices as a boundary though.

Port State > Port Role

Forwarding > Root Port – Port on the local switch that provides least-cost path back to the root | Designated Port – port on the cable that provides least cost path back to root 

Blocking > Non-Designated Port










STP Port States

Disabled – Port that is in the down state or no cable plugged in. Does not participate in the STP topology. 

Blocking – Port is only allowed to receive BPDUs, cannot receive or learn mac addresses.

When a port first detects electrical signal it goes into the following phases:

2. Listening State (15s) – only allows BPDU to send to the CPU. Actively participate in STP. Cannot send or receive data. 

3. Learning State (15s) – only allows BPDUs, does not send data, however can learn MAC Addresses.

4. Root Port, Or Designated Port, or Non-Designated port – The port at that point has decided what state the port should be in but it has to go through the forwarding delay plus the listening and learning state to set it to that. RSTP uses the Proposal and Agreement bit in the BPDU to automatically bring the port up in a quick one-two.

Bridge priority can only be set in increments of 4096. The reason for this is that the first 12 bits of Bridge ID in a BPDU is used for VLAN number. The last 4 (4096 and base 2 of that, can only be increments of 4096)
Modifying timers on a non root bridge does not work, cause the only BPDUs that adhere to are the root bridge ones.

Policy Based Routing (PBR)

When a packets enters a router on a particular interface, it route packets based on any source address – even if that packet is sourcing an IP from a different network. The packet is routed either using a configured VRF or the default VRF. With PBR you can modify the direction an IP packet takes based on a criteria (route-map). PBR is performed right before regular routing. If a packet coming into an interface does not match the PBR criteria, then the packet will be routed normally (either in the default routing table or configured VRF). PBR that is applied to an interface does not apply to local generated traffic (aka ping ssh etc). PBR can also be applied globally (all interfaces where routing occurs).

You can accomplish PBR by completing the following high-level tasks:

1. Define match criteria using ACL/Prefix-List (aka what traffic you want to modify routing for)

2. Assign the criteria to a Route-Map sequence, and specify parameters

! Configure ACL for matching Source/Dest IPs

! Configure Route-Map to match on IP/length and set parameters for that packet

#match ip address [ACL]


#match length [length]

! The default keyword simply uses the default routing table FIRST, and if it does not find a route match use the IP/interface specified in the command. The precedence and tos values can be changed for traffic that is routed via PBR.

#set ip next-hop [ip address(es)]

#set ip default next-hop [ip address(es)]

#set interface [interface type/num]

#set default interface [interface type/num]

#set ip precedence [value]

#set ip tos [value]

! Apply PBR to incoming interface

#ip policy route-map [name]

! Apply PBR for local generated traffic

#ip local policy route-map [name]

! Verify PBR

#sh ip policy

#sh route-map

#debug ip policy

Cisco VLAN Trunking Protocol (VTP)

VLAN Trunking Protocol (VTP) is a Cisco proprietary protocol that makes administration of VLANs across a L2 network easier. Simply put, VTP propagates VLANs across trunk links to other switches, so that only one configuration line needs to be changed in one switch, and the rest of the switches configure that VLAN # in their VLAN database.

In VTP, five things must happens for VTP to proporgate VLANs:

  1. The devices must be in the same VTP domain (case sensitive)
  2. The devices must have the same VTP password (case sensitive)
  3. The device receiving a VTP message must have a lower configuration revision number than itself (output in sh vtp status)
  4. VTP messages are only sent on Trunk links, not access ports. A Switch connected to another switch via an access port will not allow VTP to propogate VLANs.
  5. The device must be in server or client mode (not transparent). Transparent participates in the vtp domain, but does not update its configuration register number. It forwards vtp messages out it’s trunk ports.

There are 3 flavors of VTP:

  • v1
  • v2:
    • Supports Token Ring VLANs
    • Supports consistency checks.
    • In transparent mode it will forward the message without checking version information, a transparent switching using vtp will check
  • v3:
    • Supports for the full range of VLANs (Normal AND extended)
    • Support for Propagation of PVLANS
    • Options for cleartext or Hidden VTP Passwords
    • Support for Propagation of 802.1s MST configuration info.
    • Can be turned off globally, or per-port

In all flavors of VTP, the vtp password is never displayed in the running-config

VTP v1 devices (that are v2 capable) will upgrade itself to v2 if:

  1. Detects if it is connected to a v2 neighbor
  2. Detects if it is connected to a v3 neighbor

VTP v2 device will remain as v2 if a v3 neighbor is detected (even if it is v3 capable). VTP v3 must be manually configured, it does not automatically upgrade to v3 from other switches.
VTP v1 and v2 automatically update the VTP domain name on incoming VTP messages if the domain name is not manually set/is NULL. However, VTP v3 does not have this functionality. in VTP v3 you must always manually configure the domain name for it to be joined.
VTP v3 is backwards compatible with v2 (on a per port basis where it is detected).

The other major difference with VTP v3 is that all switches by default are still VTP Servers, but they are considered “secondary servers”. It is very similar to VTP Client mode, because it does not allow manual addition or deletion of VLANs, or not allowed to update other VLAN databases. You then make one of your switches a “primary server”. This is manually configured. There can only be one Primary server per VTP domain. This Server is allowed to make the changes to their VLAN database, and propagate it. 

! Configure a vtp domain (Can be done from privileged EXEC or Configuration Terminal)

#vtp domain [name]

! Configure vtp password (Can be done from privileged EXEC or Configuration Terminal)

! When configured this way, it will display the password in the sh vtp password command. It will also store the password in cleartext in the vlan.dat file.

#vtp password [password]

! When configured this way, the sh vtp password command will instead show a 32-bit hash of the password (effectively hiding it). service password encryption ALSO encrypts the contents of the password. The hidden keyword in addition to scrambling the output of sh vtp password, it also scrambles the cleartext password in the vlan.dat file.

#vtp password [password] hidden

! Once you use the vtp password hidden command, you use the secret keyword to specify the 32 hex character on the OTHER switches

#vtp password [32-hex character] secret 

! Configure vtp version (Can be done from privileged EXEC or Configuration Terminal)

#vtp version [1 | 2 | 3] 

! Setting v3 device to a primary server (Can be done from privileged EXEC or Configuration Terminal)

#vtp primary 

! VTP pruning is disabled by default on cisco switches 

! VTP pruning is how switches in a VTP topology ‘prune’ trunk connections to prevent unnecessary broadcasts. The switches in a topology that do not have an access port in a said VLAN, sends a ‘vtp prune’ message to upstream trunks to prune that vlan off the trunk

!Enable VTP pruning

#vtp pruning

! Verification

#sh vtp status

By default all ports on a cisco catalyst switch start out as access ports (switchport mode dynamic auto), and send DTP messages to negotiate a trunk. The switchport nonegotiate command tells the switch to not send DTP. DTP messages contain a field for the VTP domain. DTP cannot negotiate a trunk if there is a mismatch in the VTP domain between switches

! Disable/enable DTP (on by default on a port)

#sw non

#no sw non

! Configure DTP to passively listen for DTP messages, and will negotiate a trunk if it receives a DTP message. Starts out as access port until it receives other DTP messages.

#sw mode dynamic auto

! Configure DTP to actively send DTP messages, and if it receives a reply it negotiates a trunk

#sw mode dynamic desirable 

Types of VLANs:

  1. Standard VLAN = 1-1005
  2. Extended VLAN = 1005 and above

When a standard VLAN is configured, it is copied into the running configuration and the vlan.dat file located in flash. When a extended VLAN is created it is only copied into the running configuration. You can only create extended VLANs when the switch is in vtp transparent mode. If a switch is operating in vtp server mode and VLAN configuration exists in both vlan.dat and startup config, it will ignore the startup config vlans and use the vlans in the vlan.dat  file for standard vlans.

Bidirectional Forwarding Detection (BFD)

BFD is a lightweight keepalive protocol that runs independently from your routing protocol(s).
When a link goes down on a router (not using any in-between devices like a switch) it re-converges instantaneously. The reason for this is because since there is no L2 connectivity, L3 connectivity obviously cannot happen and therefore the routing protocol can act on it fast. However, if a link does down somewhere upstream (that is, not directly attached to the router) then the routing protocol will have to wait for its dead/hold timer to expire before re converging. The benefit of BFD is that it can detect L2 disconnects somewhere upstream, and then report that to your upper-layer routing protocol to re-converge faster. 

  • RFC 5880 – Bidirectional Forwarding Detection (BFD)
  • RFC 7419 – Common Interval Support in BFD

There are two versions of BFD

  • Version 0 – Echo mode with asymmetry
  • Version 1 – Echo mode with symmetry

BFD Configuration

! Turn on BFD on the interface level. The interval [ms] time is the time frequency that the interface sends out BFD echo packets to it’s neighboring router. The min_rx [ms] time is how often it expects to receive BFD echo packets from it’s neighboring router. Multiplier [interval] is the same as dead time for a routing protocol. If a router does not hear a BFD echo packet from it’s neighbor in [multiplier x min_rx ms] then it will report to the upper layer routing protocol that the link is dead (that is if you have the routing protocol end configured)

#bfd interval [ms] min_rx [ms] multiplier [interval]

! Associate BFD to your routing protocol, and interface

#router eigrp 1

#bfd [all-interfaces | interface {name}]

! Verification commands

#show bfd neighbors

#debug bfd [packet|event]

Virtual Port-Channel (vPC) on Cisco Nexus

Virtual Port-channel (vPC) – What is it?

vPC is a feature on Cisco Nexus switches that allows you to do port channel configuration across two separate switches. The benefit of this is that you can have a server (or a switch – practically any device that does port-channeling) create a port-channel configuration, and one uplink goes to one nexus switch, and the other uplink goes to another nexus switch. Despite the port-channel being physically connected to two different switches, the two vPC peers synchronize their control plain information – which makes their associated member ports part of one of logical switch. In a regular switching design, this would cause MAC Flapping between server port 1 and server port 2. But since vPC is used to synchronize the control plane data and their associated member ports – this type of topology becomes possible. 
vPC consists of 2 physical switches called ‘vPC Peers’. These switches must be identical for a vPC peer to be formed. If it is running a nexus-type chassis, then the two line cards have to be identical.

Cisco Fabric Services Over Ethernet (CFSoE) is a protocol developed by Cisco to facilitate the communication between two vPC peers. the CFSoE is the protocol used to synchronize the MAC, ARP, and IGMP tables. It is also used to compare hardware revisions when in the negotiation phase of the vPC Peers. 

vPC Peers form a vPC Domain. The two vPC peers must match the Domain values to be considered in the same vPC domain. The LAG ID (LACP System ID) is inherited from the Domain ID, that is then shared across the peers. This makes it to where when LACP is found to be on a vPC member, that the two switches advertise the same LAG ID. Additionally, the vPC domain number (that is configured) is used in the LACP System Identifier. The well known MAC of 00:23:04:ee:be:xx is being used by Cisco for the LACP vPC identification. The last 8 bits are encoded with the vPC domain number that is configured. When creating a back to back vPC with nexus switches, the vPC peer domains must not be identical because of the LACP system identifier number that is inherited.

vPC peers also elect a primary and secondary vPC peer. This is elected based on the vPC Role Priority[1-65535]. The default role priority on a switch is 32667. If the Role Priority is the same on both switches the MAC Address between the two is used. Whichever switch with the lower priority/MAC becomes the primary while the other becomes the secondary. Whenever a type 1 consistency mis-match is found on the vPC peers, the secondary box disables their vPC member ports while the primary keeps them up.

There is a feature called “vPC peer switch” that can be enabled on vPC peers. What this does is make it to where both the vPC “Primary” and “Secondary” devices appear and operationally function as the root switch within a spanning tree topology. Essentially when you enable the feature the ‘vPC system MAC’ is inherited into the spanning-tree priority for all VLANs. The priority, however, is not changed. When you set the spanning-tree priority value to be the same as the root bridge(on the other switch), it does not cause a topology change by re-electing the root bridge. Instead both the primary and secondary peers advertise the vPC System MAC and the priority. 

vPC Peers track reach-ability between each other using Peer Keepalives. Peer Keepalive is a UDP Ping to a L3 interface on the other peer. Any type of L3 interface can be created to facilitate this UDP ping communication: Routed Interface, L3 Port Channel, SVI, etc.

Peers sync their control plane over the Peer Link. A Peer Link is just a layer 2 port-channel between the peers. A miniumum of x2 10GB Links is needed for the vPC Peer Link. This is the link that synchronizes the MAC, ARP, and IGMP tables to each other so that both switches know that a particular MAC lives off both ports – not just one (just like a regular port-channel, but on two different switches)

A member port is the ports that are synchronized together to create the port-channel across the two switches IE. downlink and/or uplink ports.

A orphan Ports are ports that are singly attached connection to a vPC Peer (either by design, or by vPC Peer Failure)

Orphan Ports use modified loop prevention:

  • Traffic from remote Orphan is allowed to enter peer link and exit via local Member
  • Traffic from remote Member is allowed to enter via peer link and exit via local Orphan
  • Traffic from remote Member is not allowed to enter via peer link and exit via local Member

By default two vPC peers running HSRP do active/active forwarding instead of active-standby (by default, no configuration needed). They achieve this by both vPC peers listening for the vMAC of HSRP and from there route the traffic. If the downstream server or switch is hashed to the ‘standby’ hsrp router, the ‘standby’ switch will still listen for the vmac and route the traffic itself (instead of switching the traffic over the peerlink).
The only problem with the above is that HSRP/L3 functions are not communicated down to the vpc (or vice versa). So if a L3 function goes does, the vPC knows nothing about it. If the Peer link goes down, the secondary disables its SVIs and member ports. However, if the routed ports also go down on the primary, then the server is then isolated. In the case of HSRP, if the routed ports go down on one switch, the SVI for HSRP is still actively listening and trying to route traffic (to of which it cannot route). What you would want to do is implement enhanced object tracking to also track the routed ports, so if they do go down, HSRP is switched from A/A to A/S. See below for config.

High-Level Configuration 

1. Turn on the vPC feature

2. Define the vPC Domain

3. Establish a vPC Peer Keepalive connection

4. Establish a vPC Peer Link connection

5. Establish the vPC Member ports

The above steps detail out the order of operations when configuring/bringing up the vPC. A Member Port cannot be established unless, the peer link is established. A Peerlink cannot be established unless the keepalive connection is established. 

! Turn on vPC feature

#feature vpc

! Define the vPC Domain

#vpc domain [1-1000]

! Define the peer keepalive destination (under vPC configuration mode). If you do not specify a VRF, the default management VRF will be used for the destination. If any other VRF is used you need to specify that VRF.

#peer-keepalive destination [IP Address] {vrf [name]}

! Create a port channel between the two vPC peers. Under the port channel you specify the peerlink as a the ‘vpc peer-link’. When configuring a vPC, Spanning Tree Bridge Assurance must be enabled on the peer-link. By default it will configure the port-channel with ‘spanning-tree port type network’ (which essentially enables bridge assurance). The reason for this is because if one VLAN is created on one vPC peer but not the other, bridge assurance will prune that VLAN off the trunk. 

#interface port-channel1

#switchport mode trunk

#vpc peer-link

#int eth1/x

#switchport mode trunk

#channel-group 1

#int eth1/x

#switchport mode trunk

#channel-group 1

! Create Member ports (trunk or access). When you type just ‘vpc’ under the port-channel configuration, it inherits the number of the vpc with the port-channel number itself. So if you configure port-channel 11, and type just vpc, it will create vpc 11. The vpc numbers must match between both peers to form the member ports. The port-channel number between the two vpc peers, however, do not have to match. For simplicities sake it makes sense to match them on both sides. 

#int eth1/x

#channel-group [#] mode active

#int port-channel[#]


#vpc {#}

! Configure peer-switch feature on vPC Peer Switches. This must be configured on both the Primary and Secondary Switch for the feature to work.

#vpc domain [#]

#vpc peer-switch

! Configure delay restore for the vPC Peer Link. Delay restore is a ‘wait timer’ for how long the vPC Member ports have to wait until they come up, when the vPC Peer Link is restored/initialized. The reason for this is mainly to let your L3 Routing Protocol to converge first before bring up the vPC Members.

#delay restore [#]

! Configure the vPC Auto Recovery timer

#auto-recovery reload-delay [#]

! Configure the vPC Secondary Device to not disable its SVIs if the Peer Link goes down

#dual-active exclude interface-vlan

!Configure the vPC orphan ports to also be disabled if the peerlink fails and secondary receives a Keepalive ping

#vpc orphan-port suspend

! vPC verification commands

#show vpc

#show vpc keepalive

#show vpc role

#show vpc consistency-parameters

#show port-channel compatibility-parameters 

#show run int po[#] membership

! Displays orphan ports that are attached to a vPC peer. This will only show ports that are not configured for a vpc. It will not show orphaned ports from a failure point of view.

#show vpc orphan-ports

! Configure advanced object tracking for Nexus vPC. What this does is allow other ports (such as uplink L3 ports) to be tracked so that if those interfaces fail, then the secondary vPC will become primary vPC. The same config can be applied to HSRP, where HSRP does A/A. Advanced object tracking can force both of the SVIs to be A/S. See below for failure scenario
! Define the track object and the interface to ‘track’

track [#] interface [interface name/num] line-protocol

! Combine each track object created and type them to a boolean operatortrack [#] list boolean [OR|AND]object [#]object [#]

! Tie the Tracking group to the vpc

#vpc domain [#]

#track [#]

! Tie the tracking to HSRP

#int vlan [#]

#hsrp [#]

#track [#]

! Enable self-isolation in vPC under domain config (must be enabled on both peers)


vPC Initialization Order of Operations:

1. vPC Process Starts

2. IP/UDP 3200 Peer Keepalive connectivity established

3. Peer-Link adjacency forms

4. vPC Primary/Secondary role election

5. vPC Performs consistency checks

6. Layer 3 SVIs move to up/up

7. vPC Member ports move to up/up state

vPC Consistency Checks

The vPC Peers perform ‘consistency checks’ to bring the vPCs link and member ports up. 

1. vPC Peers sync control plane over Peer Link with Cisco Fabric Services (CFS)

2. Verifying hardware and configuration match (e.g speed, duplex, STP config, LACP mode etc.)
Three types of consistency checks:

Type 1 Global

  • Mismatch results in vPC failing to form (e.g hardware not matching, STP config not matching)

Type 1 Interface

  • Mismatch results in VLANs being suspended on vPC member (e.g STP port type network vs. normal)

Type 2

  • Mismatch results in syslog message but not vPC failure
  • Can result in failures in the data plane (e.g MTU mismatch)

vPC Failure Scenarios

vPC Member Port Failure Detection

  • vPC Peers exchange vPC member status over the peerlink
  • Failed Member Ports result in “orphan ports”
    • Orphan Ports are single attached ports that use a vPC VLAN
    • vPC VLANs are any VLANs allowed on the Peer Link
    • show vpc orphan-ports

vPC Peer Link Failure

  • vPC Secondary Pings Primary over Peer Keepalive
    • If vPC Primary responds
      • Disables vPC member ports on secondary
      • Disables SVIs on Secondary
      • Goal is to force end host to forward via primary
    • if vPC Primary is dead
      • Promote vPC Secondary to Operational Primary
      • Continue to forward traffic on new Primary
  • Peer Keepalive and PeerLink must not share fate in order to prevent Split Brain
    • e.g seperate MGMT Switch, seperate port channels on seperate line cards

vPC Auto Recovery

  • Certain Failures can result in neither vPC Peers forwarding
  • Power Outage with node failure problem case
    • Power outage on both Peers
    • Only one Peer is restored
    • vPC Peer Keepalive never comes up
    • Means vPC Peer Link can never come up
    • Means vPC Member Ports can never come up
    • vPC Members are isolated
  • vPC Auto Recovery allows a single Peer to promote itself to Primary
    • If Peer Link does not initialize before auto recovery timeout, promote myself to primary and bring up Member ports
  • Gradual Failure problem case
    • vPC Peer Link goes down
    • vPC Secondary pings vPC Primary and gets response
    • vPC Secondary Disables vPC Member Ports
    • vPC Primary completely fails
    • vPC Secondary does not re active Member Ports
    • Member ports are isolated
  • vPC Auto Recovery Allows Secondary to detect this over Peer Keepalive
    • vPC Primary is continually tracked over vPC Peer Keepalive
    • Peer Keepalive failure at a later time results in Secondary promoting itself to Primary
    • Secondary re-activates its Member Ports

The default timeout for vPC Auto Recovery is 240s. After 240s, if the other peer is still not detected or reachable, it brings up the vPC member ports/the vPC (in both cases stated above).

vPC Orphan Port Failures Problem case:

  • vPC Primary And Secondary are Default Gateways for vPC VLAN hosts
  • Orphan Port exists on Secondary
  • vPC Peer Link fails, but Primary remains up
  • Secondary Pings Primary, gets a response
    • Disables vPC Member Ports
    • Disables SVIs
  • Orphan is now isolated from its Default Gateway

The above is only a problem if the orphan is connected to the secondary when the failure of the Peer Link occurs. However, it is hard to predict when vPC Peers will be secondary or primary in a topology based on previous failures, updates etc.
Fixes to the above problem:

  • Dual home the orphaned port
  • Single attached hosts connect to a single access switch, and then dual home the access switch to the vPC peer.
  • Single attached ports could use non-vPC VLANs
    • Port only counts as orphan if it is using a vPC VLAN
    • Non-vPC VLANs require additional east/west trunk between vPC Peers
  • Don’t disable SVI when Peer Link fails on Secondary
    • enable under vpc domain config:”dual-active exclude interface-vlan’

Problem Case:

  • Active/Standby Failover Device connects via orphan ports on to vPC Peers
  • Active Device Connects to vPC (operational) Secondary 
  • vPC Peer Link Fails, but primary remains up
  • Secondary Pings Primary, gets response
    • Disables vPC Member Ports
    • Disables SVIs
    • Active device sees port as still up/up and does not failover
    • Active device is now isolated from it’s default-gateway, and potentially L2

Fixes to the above problem:

  • Run Active/Active
  • Dual home each device to vPC Peers
  • Force the active device to failover to the vPC Primary
    • interface level “vpc orphan-port suspend”

Problem Case:

  • Peer Link and northbound routing links share same linecard
  • Peer Keepalive does not fate share with peerlink
  • vPC Primary linecard fails
    • Peer Link is Lost
    • Northbound routing lost
  • Secondary pings primary, gets reponse
    • Disables vPC Member Ports
    • Disables SVIs
  • Layer 2 traffic is collected via primary, but cannot route to WAN
  • Servers Isolated

Fixes to the above problem:

  • Keepalive, peerlink, and routing links do not share fate
  • Enhanced object tracking on primary
    • If Peer link && WAN == Down, failover to secondary
  • vPC Self-Isolation
    • If peerlink && WAN == Down, tell secondary over keepalive
    • Available in nx-os 7.2 and later