Saku Ytti

Author Archives: Saku Ytti

Junos and DHCP relay

There are two different ways to configure DHCP in Junos, bootp helper and dhcp relay. These work in very different manner, bootp helper is being phased out and is not supported for example in QFX10k. Behaviour of bootp helper is obvious, it works like it works in every other sensible platform. Behaviour of dhcp-relay is very confusing and it's not documented at all anywhere.

If it's possible in your platform to configure bootp helper, do it. If not, complain to Junos about dhcp-relay implementation and ask them to fix it. The main problem with dhcp-relay implementation is that once you've configured it, you're punting all dhcp traffic in all interfaces. Normal transit traffic crossing your router is subject to this punt, so transit customers will experience larger jitter and delay of packets being punted and almost certainly reordering, because the non-dhcp packet that came after but was not subject to punt will be forwarded first. Technically reordering does not matter, as long as it does not happen inside a flow, but it's not desirable.

How the sequence of operation works in Junos for dhcp-relay:

  1. Transit packet touches ingress NPU
  2. After L2 lookup, before L3 lookup ingress NPU punts the transit Continue reading

Quick look at Trio ddos-protection with flow-detection

Some things are easy to protect with iACL and lo0 ACL but others are really hard, like BGP, you need to allow BGP from customers and from core, and it's not convenient or practical to handle them separately in lo0 ACL + policer. Luckily JunOS has feature called flow-detection, you turn it on with set system ddos-protection global flow-detection

I'm sending DoS from single source to lo0, my iBGP goes immediately down. After I turn on flow-detection iBGP connectivity is restored. Looking at PFE, we can see what is happening:

MX104-ABB-0(test13nqa1-re0.dk vty)# show ddos scfd asic-flows pfe idx rindex prot aggr IIF/IFD pkts bytes source-info --- ---- ------ ---- ---- ------- ------- -------- ---------- 0 0 721 1400 sub 338 21 79161 c158ef22 c158ef1f 53571 179 0 1 2679 1400 sub 356 11159404 2187242988 64640102 c158ef1f 179 179 0 2 2015 1400 sub 338 29 112468 c158ef23 c158ef1f 179 65020

Pretty nice and clear, 64.64.01.02 => c1.58.ef.1f is our attack traffic and it's getting its own policer, iBGP is stable, attack traffic is policed separately. Let's check those policers more closely:

MX104-ABB-0(test13nqa1-re0.dk vty)# show ddos scfd asic-flow-rindex 0 2679 PFE: 0 Flow Continue reading

Tourist trip to MX fabric

Tourist, because it's mostly original research so quality may be dubious.

You can infer lot about the fabric by looking at 'show hsl2 ...' commands. Let's start.

NPC0(test13nqe1-re1.dk vty)# show hsl2 asic mqchip(0) serdes MQCHIP(0) serdes table : MQCHIP(0)-Avago 65NM-0 [0xf300000]: 24 links 0 - 23 MQCHIP(0)-Avago 65NM-1 [0xf304000]: 24 links 24 - 47 MQCHIP(0)-Avago 65NM-2 [0xf308000]: 8 links 48 - 55 MQCHIP(0)-Avago 65NM-3 [0xf309000]: 8 links 56 - 63 MQCHIP(0)-Avago 65NM-4 [0xf30a000]: 8 links 64 - 71 MQCHIP(0)-Avago 65NM-5 [0xf30b000]: 8 links 72 - 79 MQCHIP(0)-Avago 65NM-6 [0xf30c000]: 8 links 80 - 87 MQCHIP(0)-Avago 65NM-7 [0xf30d000]: 8 links 88 - 95 MQCHIP(0)-Avago 65NM-8 [0xf30e000]: 8 links 96 - 103 MQCHIP(0)-Avago 65NM-9 [0xf30f000]: 8 links 104 - 111 MQCHIP(0)-Avago 65NM-10 [0xf310000]: 8 links 112 - 119 MQCHIP(0)-Avago 65NM-11 [0xf311000]: 8 links 120 - 127 MQCHIP(0)-Avago 65NM-12 [0xf312000]: 8 links 128 - 135 MQCHIP(0)-Avago 65NM-13 [0xf313000]: 8 links 136 - 143 MQCHIP(0)-Avago 65NM-14 [0xf318000]: 2 links 144 - 145 MQCHIP(0)-Avago 65NM-15 [0xf31a000]: 2 links 146 - 147

Avago is well known manufacturer of SerDes (SERialization / DESerialization), 65NM probably means Avago's 65nm lithography line of products. SerDes presentation here is unidirectional. But that is still quite large number of SerDes Continue reading

Capture your fancy, part two, Trio

Like with 7600/PFC3, it is possible to capture transit traffic on Juniper Trio (MPC, MX80, MX104, FPC5 etc). First decide what you know about the packet and convert that data to hex, it can be pretty much anywhere in the packet in the first 320B or so.

[[email protected] ~]% pry [1] pry(main)> '194.100.7.227'.split('.').map{|e|"%02x" % [e.to_i]}.join => "c26407e3" [2] pry(main)> '91.198.120.24'.split('.').map{|e|"%02x" % [e.to_i]}.join => "5bc67818"

I'm using boringly IPv4 addresses but I could have used anything. Unlike in PFC3 you do not need tell the location in the packet where the pattern must occur, you just tell pattern and any packet having that pattern anywhere is triggered, let's try it:

[email protected]> start shell pfe network tfeb0 TFEB platform (1000Mhz MPC 8544 processor, 1024MB memory, 512KB flash) TAZ-TBB-0(mec-pe1-re0.hel.fi vty)# test jnh 0 packet-via-dmem enable TAZ-TBB-0(mec-pe1-re0.hel.fi vty)# test jnh 0 packet-via-dmem capture 0x3 5bc67818c26407e3 TAZ-TBB-0(mec-pe1-re0.hel.fi vty)# test jnh 0 packet-via-dmem dump Received 116 byte parcel: Dispatch cookie: 0x0074000000000000 0x00 0x08 0x80 0xf0 0x80 0x08 0x5c 0x5e 0xab 0x0b 0x6e 0x60 0xb0 0xa8 0x6e 0x7c 0x60 0x52 0x88 0x47 Continue reading

Capture your fancy, part one, PFC3

It's often incredibly useful to be able to capture transit traffic, it's quick way to prove that you're actually receiving some frames and with any luck have good idea how and where you are sending them. It's unfortunately common, especially in 7600/6500 PFC3 to have bug where packets are not going where software FIB suggests they are. Luckily there is quite good tooling to inspect what really is happening. So we're taking a peek at 'ELAM'.

We have traffic coming in unlabeled to 7600 and going out labeled. Let's see how to capture it

psl2-pe2.hel.fi#show platform capture elam asic superman slot 5 psl2-pe2.hel.fi#show platform capture elam trigger dbus ipv4 help SEQ_NUM [5] QOS [3] QOS_TYPE [1] TYPE [4] STATUS_BPDU [1] IPO [1] NO_ESTBLS [1] RBH [3] CR [1] TRUSTED [1] NOTIFY_IL [1] NOTIFY_NL [1] DISABLE_NL [1] DISABLE_IL [1] DONT_FWD [1] INDEX_DIRECT [1] DONT_LEARN [1] COND_LEARN [1] BUNDLE_BYPASS [1] QOS_TIC [1] INBAND [1] IGNORE_QOSO [1] IGNORE_QOSI [1] IGNORE_ACLO [1] IGNORE_ACLI [1] PORT_QOS [1] CACHE_CNTRL [2] VLAN [12] SRC_FLOOD [1] SRC_INDEX [19] LEN [16] FORMAT [2] MPLS_EXP [3] REC [1] NO_STATS [1] VPN_INDEX [10] PACKET_TYPE [3] L3_PROTOCOL [4] L3_PT [8] MPLS_TTL [8] SRC_XTAG [4] DEST_XTAG [4] FF [1] Continue reading

JunOS and ARP Glean

I'm using Cisco vocabulary 'glean' here as I don't know better word for it. Glean is any IPv4 packet which is going to connected host which is not resolved. It is NOT an ARP packet, so ARP policers won't help you. They are punted, since you need to generate ARP packet and try to resolve them.

In 7600 we can use 'mls rate-limit unicast cef glean 200 50' to limit how many packets per second are punted to control-plane for glean purposes. How can we limit this in JunOS? As far as I can see, there is no way. But I remember testing this attack and was unable to break MX80, so why didn't it break?

First let's check what does connected network look like

[email protected]> show route forwarding-table destination 62.236.255.179/32 table default Routing table: default.inet Internet: Destination Type RtRef Next hop Type Index NhRef Netif 62.236.255.0/24 intf 0 rslv 828 1 xe-0/0/0.42

Ok, fair enough. Type 'rslv', which we can guess means packet is punted to control-plane for resolving ARP. Let's try to ping some address rapidly which does not resolve and check what it looks like

[email protected]> show Continue reading

JunOS ‘L3 incompletes’, what and why?

There is quite often chatter about L3 incompletes, and it seems there are lot of opinions what they are. Maybe some of these opinions are based on some particular counter bug in some release. Juniper has introduced also toggle to allow stopping the counter from working. It seems very silly to use this toggle, as it is really one of the few ways you can gather information about broken packets via SNMP.

What they (at least) are not

  • Unknown unicast
  • CDP
  • BPDU
  • Packet from connected host which does not ARP
  • Packet from unconfigured VLAN

What they (at least) are

  • IP header checksum error
  • IP header error (impossibly small IHL, IP version 3, etc)
  • IP header size does not match packet size

Troubleshooting

So if you are seeing them, what can you do? As it is aggregate counter for many different issues, how do you actually know which one is it and is there way to figure out who is sending them? Luckily for Trio based platforms answers and highly encouraging, we have very good tools to troubleshoot the issue.

To figure out what they exactly are, first you need to figure out your internal IFD index (not snmp ifindex)

im@ruuter> Continue reading

Why you should want metered INET?

When people think about metered, they may think about mobile roaming or old outrageous per minute PSTN billing. Those are not fair prices, they are not what I'm talking about.

Also INET should be always on, billing should take this into consideration, maybe once you exceed your paid capacity, your connection is policed to 256kbps unless you pay for more. You could get notice when this limit is nearing by SMS and Email.

Flat-rate billing is based on assumption that on average INET is not used much at all, in such scenario it works. Consumers get flat-rate stove-gas in Helsinki, because its use is almost non-existing. But services like Youtube and Netflix which are relatively new can alone be 2/3 of all your traffic, meaning what ever average use you planned for, it's not true, average use is increasing as more services users care for appear.


1. Quality

When you pay flat rate there is financial incentive for your operator not to provide you bits, every bit not provided improves your margins. Operators today regularly keep some ports congested, because it would be expensive to upgrade, instead they try get someone else to pay for it, if they have the Continue reading

Sorry state of JunOS control plane protection

I've been looking into how to protect MX80 11.4R5 from various accidental and intentional attempts to congest control plane and I'm drawing pretty much blank.

Main discoveries so far.

  1. ISIS always leaked to control plane, even when no 'family iso' or 'protocol isis' on interface
  2. PVST always leaked to control plane. Even when just 'family inet' configured to interface
  3. LLDP protocol not matched by ddos-protection feature
  4. Essentially impossible to protect against attack from eBGP
  5. ddos-protection feature mis-dimensioned

ISIS

This is pretty bad for anyone running ISIS, as you cannot use ddos-protection to limit ISIS, as it won't distinguish between bad and good ISIS. If you don't use ISIS, just set ddos-protection limit low and you're good to go.

ISIS is punted with different code than IP packets, but resolving the punt path it goes to the same path. This path is still seeing full wire rate, i.e. there isn't magic 10kpps limit before it

HCFPC2(le_ruuter vty)# show jnh 0 exceptions control pkt punt via nh PUNT(34) 9134818 1065269880 HCFPC2(le_ruuter vty)# show jnh 0 exceptions nh 34 punt Nexthop Chain: CallNH:desc_ptr:0xc02bbc, mode=0, rst_stk=0x0, count=0x3 0xc02bb8 0 : 0x127fffffe00003f0 0xc02bb9 1 : 0x2ffffffe07924a00 0xc02bba 2 : 0xda00601499000a04 0xc02bbb 3 : Continue reading

We don’t understand hashes

At least I don't, nor do I understand math. It only this week dawned to me, that we consistently choose wrong hash for password hashing.

When I started using Linux DES was the standard way to hash your passwd, then it was MD5, now at least Ubuntu is using SHA. And I can bet that in 2 years time SHA-3 (will be selected this year) will be used widely for protecting passwords.

But what were design goals for MD5 and SHA? Design goals are obviously avoidance of collisions and more importantly algorithmic cheapness in terms of computational requirements and ability to implement it cheaply and easily in hardware requiring no branching. So MD5 and SHA are _by design_ simple to brute-force in hardware, even the new SHA-3. You don't want your 'git commit' or 'sha3sum /dev/cdrom' to take days, you want it to be very very fast and very very unlikely to represent any other data.

It should be quite obvious that those requirements are orthogonal to the requirements of password hash. Avoidance of collisions is not critical to password hash and absolutely opposite requirement of computational expensiveness exists for password hash, it needs to be very expensive and it Continue reading

Silver bullet for home QoS

Rationale

I mentioned in one of the posts about how prioritizing small packets upstream is almost the proverbial silver bullet when it comes to QoS at home. I'm sure any ADSL user who uses interactive applications, such as SSH have noticed how laggy the SSH gets when you upload something from home, say your holiday pictures with scp to your web server. Also download is quite slow during upload. VoIP and online gaming will suffer too. Canonical solution is to use DSCP markings at sender end or DSCP mark based on IP address or port.

But I feel that is unnecessarily complex for typical home use scenario, since all of the important/interactive stuff are using small packets and the bandwidth hogging applications are all essentially doing MTU size packets. I've chosen <200B as small packet, which is arbitrary decision I did about decade ago when setting this up first time, I'm sure it could just as well be like 1300B. So without further rambling, I'll give IOS (ISR) and JunOS (SRX) examples how to roll this on your CPE.

IOS example

class-map match-any SMALL-PACKETS match packet length max 200 ! policy-map WAN-OUT class SMALL-PACKETS priority percent 75 class class-default fair-queue random-detect ! interface ATM0.100 point-to-point pvc 0/100 vbr-nrt 2000 2000 tx-ring-limit 3 service-policy output WAN-OUT ! !

JunOS example

[email protected]> show configuration interfaces vlan unit 0 family inet filter input FROM_LAN; [email protected]> show configuration firewall family inet filter FROM_LAN term small_packets { from { packet-length 0-200; } then { Continue reading

LLDP / 802.1AB-2009 blows

If you're designing L2 discovery protocol, I suppose one of your mandatory requirements is, that you can 'machine walk' the network, after you find one box. I.e. you are able to know your neighbor devices and their ports. LLDP makes no such guarantees

You have 4 mandatory TLVs, [0123], End of LLDPDU, Chassis ID, Port ID and TTL. Chassis ID has 7 subtypes which implementation is free to choose, EntPhysicalAlias (two distinct cases), IfAlias, MAC address, networkAddress, ifName or locally assigned. Port ID also has 7 subtypes which implementation is free to choose, ifAlias, entPhysicalAlias, MAC address, networkAddress, ifName, agent circuit ID, locally assigned.

Now you can send what ever trash via locally assigned and be fully compliant implementation. It seems that it would be wise to mandate sending management address (networkAddress) in ChassisID and SNMP ifindex in PortID (and any _additional_ ones you may want to send, i.e. more than 1, which is not allowed). This way you'd immediately know what OID to query and from which node. Obviously this makes assumption that we have IP address always and SNMP implementation always. If we absolutely must support some corner cases where this is not true, we should Continue reading

Future residential INET users, I’m so sorry

I never believed IPv6 will be NAT free, but as idealist I hoped there is good chance there will be mostly only 1:1 NAT and each and every connection will get own routable network, /56 or so, residential DSL, mobile data, everything

Unfortunately that ship has sailed, it's almost certain majority of residential/non-business products will only contain single directly connected network, since we (as a community, I don't want to put all the blame to IPv6 kooks) failed produce feasible technical way to do it and spent too much time arguing on irrelevant matters. I'm reviewing two ways to provide INET access on DSL, no PPPoX, as it's not done in my corner of the world, and show why it's not practical to provide the end customer routable network

Statically configure per customer interface

At DSLAM (or other access device) customer would be placed in unique virtual-circuit (Q, QinQ...) all would terminated on unique L3 logical interface in PE router. Interface would have static /64 ipv6 address and ipv6/56 network routed to say ::c/64. IPv4 could continue to be shared subnet via 'unnumbered' interface.

This is by far my favorite way of doing residential IPv6 it, it supports customer Continue reading

junos vrf-import funnies

Consider this configuration:

> show configuration routing-instances VRF1 instance-type vrf; route-distinguisher 42:1; vrf-import [ VRF1-IMPORT VRF-DEFAULT-IMPORT ]; vrf-export [ VRF1-EXPORT VRF-DEFAULT-EXPORT ]; vrf-table-label; > show configuration policy-options policy-statement VRF1-IMPORT from community [ VRF1 VRF2 ]; > show configuration policy-options policy-statement VRF-DEFAULT-IMPORT term cust_routes { from protocol bgp; then default-action accept; } > show configuration policy-options community VRF1 members target:42:1; > show configuration policy-options community VRF2 members target:42:2;

If you configure this on any router on your network, it'll work, VRF will import correct and only correct routes. This will give you assumption, that VRF import in JunOS works like this:

  1. start with empty array of routes to evaluate policy against
  2. when you hit 'match community' push matching routes from bgp.l3vpn.0 to the list
  3. evaluate rules normally against the list

If you create multiple of these to single router, and you only have single 'from community [ X ]' in each, it also works perfectly. However, if you have more than one community in 'from community' AND you have more than one VRF using the 'VRF-DEFAULT-IMPORT' things go wrong. If we have three routes:

  1. 10.10.1.0/24 RT:42:1
  2. 10.10.2.0/24 RT:42:1 RT:42:2 RT:42:3
  3. 10.10.3. Continue reading

no usage scenario for ssh-agent forwarding

Many people, especially those in consulting business have need to access multiple different organization 'jump boxes' from which they can ssh towards the organization servers. And due to security it makes sense to have different ssh key being allowed for different organization servers. For convenience people often allow ssh-agent towards the 'jump boxes'.

Problem with ssh-agent is, that it has no idea who is requesting the key signing, it could very well be organization1 evil admin asking for organization2 key, when sshing into organization2 jump-box, and your agent would simply allow this.

One solution to the problem could be that when ever signing is requested, user gets prompt 'localhost < organization2-jump < organization2 requests sign of organization1 identity, allow yes/no, [ ] always'. Now you'd have idea if sign request is legit or not. However this would require protocol changes to ssh, as ssh-agent has no idea who is requesting signing much less of the full path, which would be absolutely needed to make this feature work.

So I asked openssh dev mailing list, how this problem should be solved. Turns out there is recently added feature in openssh, which could potentially remove need for agent forwarding completely, to access organization1-server through organization1-jump you'd do ssh -oProxyCommand='ssh -W %h:%p organization1-jump' organization1-server, now obviously this is inconvenient, especially if there are more than 1 box through which you need to jump. .ssh/config can help somewhat:

# cat >> ~/.ssh/config Host org1-ultimate ProxyCommand ssh -W %h:%p org1-secondjump Host org1-secondjump ProxyCommand ssh -W %h:%p org1-firstjump ^d

Now you'd ssh 'ssh Continue reading

Playing hide and seek with JunOS

JunOS has some commands which either are unsupported, do not work in platform you're using, undocumented or unnecessary for vast majority of operators, these commands are hidden in the UI so they are only accessible if you know what (and more importantly why) you want (them).

Today I was searching for a way to quiet my SRX210HE-POE as it makes annoyingly lot noise, I failed to find configuration way to force it to normal spinning speed, but I did notice that CLI exposes hidden commands. I've actually found same in IOS several years back and wrote little perl script to search for them (exec only), it proved bad idea as several of them purposely crash your system. If you want to dig deeper, in IOS difference is incomplete and invalid command, however actually some commands are truly hidden in IOS, particular example is the toggle for unsupported transceivers.

Neither the JunOS nor IOS issue are something you can blame vendor at, vendor isn't trying to stop you from using them, they just want to be very clear that if you use them TAC ain't go your back.

The code is quick 2h hack (running it takes longer, but I'm certain Continue reading

When should you advertise default route?

Never

There are two typical scenarios when people carry default route in dynamic routing protocol, I'll address these separately and explain why you shouldn't do it, and what you should do instead.

CE (eBGP) PE

This is probably the most common scenario, maybe you're giving your customer default route, maybe it's your own firewall or really any situation where neighbor won't carry full routing table and neighbor isn't strictly same administrative domain.

Problem with default route here is, that if your PE gets disconnected from core, you're still originating the default route and CE is unaware of this and you're blackholing customer traffic until BGP is manually shutdown. You could conditionally advertise default, but that is just useless overhead, instead of default you should advertise to CE any aggregate route which is originated from multiple core boxes, such as your PA aggregate, or really any stable route originated from multiple places, but not local PE.

Customer would just add this to their router:

# ios ip route 0.0.0.0 0.0.0.0 192.0.2.0 name floating_default # junos route 0.0.0.0/0 { qualified-next-hop 192.0.2.0 { interface xe-0/0/0.0; } resolve; Continue reading

IPv6 ACL bypass

IPv6 designers recognized that IPv4 header has several faults, these were addressed to a different degree. Particularly annoying was IPv4 options which caused TCP/UDP/ICMP data to shift, as it made IPv4 header length variable. IPv6 header is fixed length, there is 'next-header' option, which will instruct how to parse data after IP header. Typically 'next-header' would be TCP, UDP or ICMP, and rest of packet would be exactly like in IPv4 (apart from mandatory checksum in UDP).

Where the complexity (some might say design fault) is that 'next-header' could be any large number of more exotic extension header, each of which have 'next-header' field themselves. Standard does not specify any limitation how many headers you could have, so you need to be able to parse packet up-to MTU length. The final extension header typically would contain TCP/UDP/ICMP and normal IPv4 style packet would follow.

Unfortunately no practical router has MTU wide view to the packet, you have 64B, 128B or 256B view, after which you are completely unaware of the packet content, it's just bits in memory which you cannot process in any meaningful way. Your PC won't have same problem, it does not have specialized hardware to quickly forward Continue reading

eBGP triggered blackhole for customers

Very many large scale transit providers, if not most of them support eBGP remote triggered blackhole via separate multihop eBGP session. I suspect this is, because they've used for very long time single shared route-map for transit customers, and it is not immediately obvious how you can support blackholing without customer specific route-map. Requiring customer specific route-map would probably be less than minor change in their provisioning systems. However, it is perfectly doable and same idea works just the same in JunOS and IOS, here is pseudoIOShy example how to do it:

router bgp N neighbor eBGP peer-group neighbor eBGP route-map eBGP-IN in neihgbor eBGP disable-connected-check neighbor CUSTIP peer-group eBGP neighbor CUSTIP prefix-list C-CUSTID-IN in ! route-map eBGP-IN permit 100 match community BLACKHOLE set ip next-hop BLACKHOLE set community BLACKHOLE additive route-map eBGP-IN permit 200 match ip address prefix-list eBGP-TRANSIT-FULL set community full-transit additive route-map eBGP-IN permit 300 match ip address prefix-list eBGP-TRANSIT-PARTIAL set comunity partial-transit additive route-map eBGP-IN permit 400 set ip address prefix-list eBGP-PUNCHOLE set community no-export additive ! ip prefix-list C-CUSTID-IN permit 192.0.2.0/24 le 32 ip prefix-list C-CUSTID-IN permit 10.10.42.0/28 le 32 ip prefix-list eBGP-TRANSIT-FULL permit 192.0.2.0/24 ip prefix-list eBGP-PUNCHOLE Continue reading

IPv6 and the enterprise of tomorrow(ish)

One of the great promises of IPv6 has been to get rid of NAT, no more will IT do RFC1918 and NAPT to single public IP. But how is IPv6 going to accomplish this, what is the magical toggle for it? Let's get disappointed.

Some devices, like Cisco IOS allow you to configure IPv6 prefix as 'macro', so you could tell that macro 'ME' is 2001:db8::/32 and everywhere where you write IPv6 address, you use macro 'ME'instead. So in theory, when your prefix changes, you simply change the macro. So the great renumbering benefit is ability to always get same size network. But of course this was true for IPv4 too, you got the network size you needed. Why isn't this utilized? Because enterprises don't have one Cisco IOS devices, they have plethora of devices from different vendors, firewalls, slb, ips, ids, servers, OSS systems and so forth, you'd still need to go in all of these to change the 'macro', not all devices even have the concept and quite frankly no enterprise of non-trivial size will even know without months of work _where_ and _what_ will need to be changed for renumbering to be successful. I know industry professionals Continue reading