Filippo Valsorda

Author Archives: Filippo Valsorda

NCC Group’s Cryptography Services audits our Go TLS 1.3 stack

The Cloudflare TLS 1.3 beta is run by a Go implementation of the protocol based on the Go standard library, crypto/tls. Starting from that excellent Go codebase allowed us to quickly start experimenting, to be the first wide server deployment of the protocol, and to effectively track the changes to the specification draft.

Of course, the security of a TLS implementation is critical, so we engaged NCC Group's Cryptography Services to perform an audit at the end of 2016.

You can find the codebase on the Cloudflare GitHub. It's a drop-in replacement for crypto/tls and comes with a go wrapper to patch the standard library as needed.

The code is developed in the open but is currently targeted only at internal use: the repository is frequently rebased and the API is not guaranteed to be stable or fully documented. You can take a sneak peek at the API here.

The final goal is to upstream the patches to the Go project so that all users of the Go standard library benefit from it. You can follow the process here.

Below we republish the article about the audit first appeared on the NCC Group's blog.

NCC Group's Cryptography Services Complete Continue reading

TLS 1.3 explained by the Cloudflare Crypto Team at 33c3

Nick Sullivan and I gave a talk about TLS 1.3 at 33c3, the latest Chaos Communication Congress. The congress, attended by more that 13,000 hackers in Hamburg, has been one of the hallmark events of the security community for more than 30 years.

You can watch the recording below, or download it in multiple formats and languages on the CCC website.

The talk introduces TLS 1.3 and explains how it works in technical detail, why it is faster and more secure, and touches on its history and current status.

The slide deck is also online.

This was an expanded and updated version of the internal talk previously transcribed on this blog.

TLS 1.3 hits Chrome and Firefox Stable

In related news, TLS 1.3 is reaching a percentage of Chrome and Firefox users this week, so websites with the Cloudflare TLS 1.3 beta enabled will load faster and more securely for all those new users.

The last few days

You can enable the TLS 1.3 beta from the Crypto section of your control panel.

TLS 1.3 toggle

So you want to expose Go on the Internet

This piece was originally written for the Gopher Academy advent series. We are grateful to them for allowing us to republish it here.

Back when crypto/tls was slow and net/http young, the general wisdom was to always put Go servers behind a reverse proxy like NGINX. That's not necessary anymore!

At Cloudflare we recently experimented with exposing pure Go services to the hostile wide area network. With the Go 1.8 release, net/http and crypto/tls proved to be stable, performant and flexible.

However, the defaults are tuned for local services. In this articles we'll see how to tune and harden a Go server for Internet exposure.

`crypto/tls`

You're not running an insecure HTTP server on the Internet in 2016. So you need crypto/tls. The good news is that it's now really fast (as you've seen in a previous advent article), and its security track record so far is excellent.

The default settings resemble the Intermediate recommended configuration of the Mozilla guidelines. However, you should still set PreferServerCipherSuites to ensure safer and faster cipher suites are preferred, and CurvePreferences to avoid unoptimized curves: a client using CurveP384 would cause up to a second of CPU to be consumed on our Continue reading

Dyn issues affecting joint customers

Today there is an ongoing, large scale Denial-of-Service attack directed against Dyn DNS. While Cloudflare services are operating normally, if you are using both Cloudflare and Dyn services, your website may be affected.

Specifically, if you are using CNAME records which point to a zone hosted on Dyn, our DNS queries directed to Dyn might fail making your website unavailable, and presenting a “1001” error message.

Some popular services that might rely on Dyn for part of their operations include GitHub Pages, Heroku, Shopify and AWS.

1001 error

As a possible workaround, you might be able to update your Cloudflare DNS records from CNAMEs (referring to Dyn hosted records) to A/AAAA records specifying the origin IP of your website. This will allow Cloudflare to reach your origin without the need for an external DNS lookup.

Note that if you use different origin IP addresses, for example based on the geographical location, you may lose some of that functionality by using plain A/AAAA records. We recommend that you provide addresses for many of your different locations, so that load will be shared amongst them.

Customers with a CNAME setup (which means Cloudflare is not configured in your domain NS records) where the main Continue reading

TLS nonce-nse

One of the base principles of cryptography is that you can't just encrypt multiple messages with the same key. At the very least, what will happen is that two messages that have identical plaintext will also have identical ciphertext, which is a dangerous leak. (This is similar to why you can't encrypt blocks with ECB.)

One Does Not Simply

If you think about it, a pure encryption function is just like any other pure computer function: deterministic. Given the same set of inputs (key and message) it will always return the same output (the encrypted message). And we don't want an attacker to be able to tell that two encrypted messages came from the same plaintext.

Same inputs, same output

The solution is the use of IVs (Initialization Vectors) or nonces (numbers used once). These are byte strings that are different for each encrypted message. They are the source of non-determinism that is needed to make duplicates indistinguishable. They are usually not secret, and distributed prepended to the ciphertext since they are necessary for decryption.

The distinction between IVs and nonces is controversial and not binary. Different encryption schemes require different properties to be secure: some just need them to never repeat, in which case we commonly Continue reading

An overview of TLS 1.3 and Q&A

The CloudFlare London office hosts weekly internal Tech Talks (with free lunch picked by the speaker). My recent one was an explanation of the latest version of TLS, 1.3, how it works and why it's faster and safer.

You can watch the complete talk below or just read my summarized transcript.

The Q&A session is open! Send us your questions about TLS 1.3 at [email protected] or leave them in the Disqus comments below and I'll answer them in an upcoming blog post.

Summarized transcript

TLS 1.2 ECDHE

To understand why TLS 1.3 is awesome, we need to take a step back and look at how TLS 1.2 works. In particular we will look at modern TLS 1.2, the kind that a recent browser would use when connecting to the CloudFlare edge.

TLS 1.2 ECDHE exchange

The client starts by sending a message called the ClientHello that essentially says "hey, I want to speak TLS 1.2, with one of these cipher suites".

The server receives that and answers with a ServerHello that says "sure, let's speak TLS 1.2, and I pick this cipher suite".

Along with that the server sends its key share. The Continue reading

The complete guide to Go net/http timeouts

When writing an HTTP server or client in Go, timeouts are amongst the easiest and most subtle things to get wrong: there’s many to choose from, and a mistake can have no consequences for a long time, until the network glitches and the process hangs.

HTTP is a complex multi-stage protocol, so there's no one-size fits all solution to timeouts. Think about a streaming endpoint versus a JSON API versus a Comet endpoint. Indeed, the defaults are often not what you want.

In this post I’ll take apart the various stages you might need to apply a timeout to, and look at the different ways to do it, on both the Server and the Client side.

SetDeadline

First, you need to know about the network primitive that Go exposes to implement timeouts: Deadlines.

Exposed by net.Conn with the Set[Read|Write]Deadline(time.Time) methods, Deadlines are an absolute time which when reached makes all I/O operations fail with a timeout error.

Deadlines are not timeouts. Once set they stay in force forever (or until the next call to SetDeadline), no matter if and how the connection is used in the meantime. So to build a timeout with SetDeadline you'll have to Continue reading

Yet Another Padding Oracle in OpenSSL CBC Ciphersuites

Yesterday a new vulnerability has been announced in OpenSSL/LibreSSL. A padding oracle in CBC mode decryption, to be precise. Just like Lucky13. Actually, it’s in the code that fixes Lucky13.

It was found by Juraj Somorovsky using a tool he developed called TLS-Attacker. Like in the “old days”, it has no name except CVE-2016-2107. (I call it LuckyNegative20¹)

It’s a wonderful example of a padding oracle in constant time code, so we’ll dive deep into it. But first, two quick background paragraphs. If you already know all about Lucky13 and how it's mitigated in OpenSSL jump to "Off by 20" for the hot and new.

If, before reading, you want to check that your server is safe, you can do it with this one-click online test.

TLS, CBC, and Mac-then-Encrypt

Very long story short, the CBC cipher suites in TLS have a design flaw: they first compute the HMAC of the plaintext, then encrypt plaintext || HMAC || padding || padding length using CBC mode. The receiving end is then left with the uncomfortable task of decrypting the message and checking HMAC and padding without revealing the padding length in any way. If they do, we call Continue reading

Building the simplest Go static analysis tool

Go native vendoring (a.k.a. GO15VENDOREXPERIMENT) allows you to freeze dependencies by putting them in a vendor folder in your project. The compiler will then look there before searching the GOPATH.

The only annoyance compared to using a per-project GOPATH, which is what we used to do, is that you might forget to vendor a package that you have in your GOPATH. The program will build for you, but it won't for anyone else. Back to the WFM times!

I decided I wanted something, a tool, to check that all my (non-stdlib) dependencies were vendored.

At first I thought of using go list, which Dave Cheney appropriately called a swiss army knife, but while it can show the entire recursive dependency tree (format .Deps), there's no way to know from the templating engine if a dependency is in the standard library.

We could just pass each output back into go list to check for .Standard, but I thought this would be a good occasion to build a very simple static analysis tool. Go's simplicity and libraries make it a very easy task, as you will see.

First, loading the program

We use golang.org/x/tools/go/loader to Continue reading

Go coverage with external tests

The Go test coverage implementation is quite ingenious: when asked to, the Go compiler will preprocess the source so that when each code portion is executed a bit is set in a coverage bitmap. This is integrated in the go test tool: go test -cover enables it and -coverprofile= allows you to write a profile to then inspect with go tool cover.

This makes it very easy to get unit test coverage, but there's no simple way to get coverage data for tests that you run against the main version of your program, like end-to-end tests.

The proper fix would involve adding -cover preprocessing support to go build, and exposing the coverage profile maybe as a runtime/pprof.Profile, but as of Go 1.6 there’s no such support. Here instead is a hack we've been using for a while in the test suite of RRDNS, our custom Go DNS server.

We create a dummy test that executes main(), we put it behind a build tag, compile a binary with go test -c -cover and then run only that test instead of running the regular binary.

Here's what the rrdns_test.go file looks like:

// +build  Continue reading

Creative foot-shooting with Go RWMutex

Hi, I'm Filippo and today I managed to surprise myself! (And not in a good way.)

I'm developing a new module ("filter" as we call them) for RRDNS, CloudFlare's Go DNS server. It's a rewrite of the authoritative module, the one that adds the IP addresses to DNS answers.

It has a table of CloudFlare IPs that looks like this:

type IPMap struct {  
    sync.RWMutex
    M map[string][]net.IP
}

It's a global filter attribute:

type V2Filter struct {  
    name       string
    IPTable    *IPMap
    // [...]
}

Mexican Standoff CC-BY-NC-ND image by Martin SoulStealer

The table changes often, so a background goroutine periodically reloads it from our distributed key-value store, acquires the lock (f.IPTable.Lock()), updates it and releases the lock (f.IPTable.Unlock()). This happens every 5 minutes.

Everything worked in tests, including multiple and concurrent requests.

Today we deployed to an off-production test machine and everything worked. For a few minutes. Then RRDNS stopped answering queries for the beta domains served by the new code.

What. That worked on my laptop™.

Here's the IPTable consumer function. You can probably spot the bug.

func (f *V2Filter) getCFAddr(...) (result []dns.RR) {  
    f. Continue reading

DNS parser, meet Go fuzzer

Here at CloudFlare we are heavy users of the github.com/miekg/dns Go DNS library and we make sure to contribute to its development as much as possible. Therefore when Dmitry Vyukov published go-fuzz and started to uncover tens of bugs in the Go standard library, our task was clear.

Hot Fuzz

Fuzzing is the technique of testing software by continuously feeding it inputs that are automatically mutated. For C/C++, the wildly successful afl-fuzz tool by Michał Zalewski uses instrumented source coverage to judge which mutations pushed the program into new paths, eventually hitting many rarely-tested branches.

go-fuzz applies the same technique to Go programs, instrumenting the source by rewriting it (like godebug does). An interesting difference between afl-fuzz and go-fuzz is that the former normally operates on file inputs to unmodified programs, while the latter asks you to write a Go function and passes inputs to that. The former usually forks a new process for each input, the latter keeps calling the function without restarting often.

There is no strong technical reason for this difference (and indeed afl recently gained the ability to behave like go-fuzz), but it's likely due to the different ecosystems in which they Continue reading

A deep look at CVE-2015-5477 and how CloudFlare Virtual DNS customers are protected

Last week ISC published a patch for a critical remotely exploitable vulnerability in the BIND9 DNS server capable of causing a crash with a single packet.

CC BY 2.0 image by Ralph Aversen

The public summary tells us that a mistake in handling of queries for the TKEY type causes an assertion to fail, which in turn crashes the server. Since the assertion happens during the query parsing, there is no way to avoid it: it's the first thing that happens on receiving a packet, before any decision is made about what to do with it.

TKEY queries are used in the context of TSIG, a protocol DNS servers can use to authenticate to each other. They are special in that unlike normal DNS queries they include a “meta” record (of type TKEY) in the EXTRA/ADDITIONAL section of the message.

CC BY 2.0 image by Ralph Aversen

Since the exploit packet is now public, I thought we might take a dive and look at the vulnerable code. Let's start by taking a look at the output of a crashing instance:

03-Aug-2015 16:38:55.509 message.c:2352: REQUIRE(*name == ((void*)0)) failed, back trace  
03-Aug-2015 16:38:55.510 #0 0x10001510d in  Continue reading

Quick and dirty annotations for Go stack traces

CloudFlare’s DNS server, RRDNS, is entirely written in Go and typically runs tens of thousands goroutines. Since goroutines are cheap and Go I/O is blocking we run one goroutine per file descriptor we listen on and queue new packets for processing.

CC BY-SA 2.0 image by wiredforlego

When there are thousands of goroutines running, debug output quickly becomes difficult to interpret. For example, last week I was tracking down a problem with a file descriptor and wanted to know what its listening goroutine was doing. With 40k stack traces, good luck figuring out which one is having trouble.

Go stack traces include parameter values, but most Go types are (or are implemented as) pointers, so what you will see passed to the goroutine function is just a meaningless memory address.

We have a couple options to make sense of the addresses: get a heap dump at the same time as the stack trace and cross-reference the pointers, or have a debug endpoint that prints a goroutine/pointer -> IP map. Neither are seamless.

Underscore to the rescue

However, we know that integers are shown in traces, so what we did is first convert IPv4 addresses to their uint32 Continue reading

Setting Go variables from the outside

CloudFlare's DNS server, RRDNS, is written in Go and the DNS team used to generate a file called version.go in our Makefile. version.go looked something like this:

// THIS FILE IS AUTOGENERATED BY THE MAKEFILE. DO NOT EDIT.

// +build    make

package version

var (  
    Version   = "2015.6.2-6-gfd7e2d1-dev"
    BuildTime = "2015-06-16-0431 UTC"
)

and was used to embed version information in RRDNS. It was built inside the Makefile using sed and git describe from a template file. It worked, but was pretty ugly.

Today we noticed that another Go team at CloudFlare, the Data team, had a much smarter way to bake version numbers into binaries using the -X linker option.

The -X Go linker option, which you can set with -ldflags, sets the value of a string variable in the Go program being linked. You use it like this: -X main.version 1.0.0.

A simple example: let's say you have this source file saved as hello.go.

package main

import "fmt"

var who = "World"

func main() {  
    fmt.Printf("Hello, %s.n", who)
}

Then you can use go run (or other build commands like go build or go install Continue reading

Go has a debugger—and it’s awesome!

Something that often, uh... bugs¹ Go developers is the lack of a proper debugger. Sure, builds are ridiculously fast and easy, and println(hex.Dump(b)) is your friend, but sometimes it would be nice to just set a breakpoint and step through that endless if chain or print a bunch of values without recompiling ten times.

CC BY 2.0 image by Carl Milner

You could try to use some dirty gdb hacks that will work if you built your binary with a certain linker and ran it on some architectures when the moon was in a waxing crescent phase, but let's be honest, it isn't an enjoyable experience.

Well, worry no more! godebug is here!

godebug is an awesome cross-platform debugger created by the Mailgun team. You can read their introduction for some under-the-hood details, but here's the cool bit: instead of wrestling with half a dozen different ptrace interfaces that would not be portable, godebug rewrites your source code and injects function calls like godebug.Line on every line, godebug.Declare at every variable declaration, and godebug.SetTrace for breakpoints (i.e. wherever you type _ = "breakpoint").

I find this solution brilliant. What you get out Continue reading

Logjam: the latest TLS vulnerability explained

log-jam

Yesterday, a group from INRIA, Microsoft Research, Johns Hopkins, the University of Michigan, and the University of Pennsylvania published a deep analysis of the Diffie-Hellman algorithm as used in TLS and other protocols. This analysis included a novel downgrade attack against the TLS protocol itself called Logjam, which exploits EXPORT cryptography (just like FREAK).

First, let me start by saying that CloudFlare customers are not and were never affected. We don’t support non-EC Diffie-Hellman ciphersuites on either the client or origin side. We also won't touch EXPORT-grade cryptography with a 20ft stick.

But why are CloudFlare customers safe, and how does Logjam work anyway?

The image is "Logjam" as interpreted by @0xabad1dea.

Diffie-Hellman and TLS

This is a detailed technical introduction to how DH works and how it’s used in TLS—if you already know this and want to read about the attack, skip to “Enter export crypto, enter Logjam” below. If, instead, you are not interested in the nuts and bolts and want to know who’s at risk, skip to “So, what’s affected?”

To start a TLS connection, the two sides—client (the browser) and server (CloudFlare)—need to agree securely on a secret key. This process is called Continue reading

Help us test our DNSSEC implementation

For an introduction to DNSSEC, see our previous post

Today is a big day for CloudFlare! We are publishing our first two DNSSEC signed zones for the community to analyze and give feedback on:

www.cloudflare-dnssec-auth.com - a fully signed zone managed by CloudFlare
www.cloudflare-dnssec-cname.com - an external zone linking to a signed record with a CNAME

We've been testing our implementation internally for some time with great results, so we now want to know from outside users how it’s working!

Here’s an example of what you should see if you pull the records of, for example, www.cloudflare-dnssec-auth.com.

$ dig www.cloudflare-dnssec-auth.com A +dnssec

; <<>> DiG 9.10.1-P1 <<>> www.cloudflare-dnssec-auth.com A +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29654
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;www.cloudflare-dnssec-auth.com.    IN  A

;; ANSWER SECTION:
www.cloudflare-dnssec-auth.com.    300 IN  A   104.28.29.67  
www.cloudflare-dnssec-auth.com.    300 IN  A   104.28.28.67  
www.cloudflare-dnssec-auth.com.    300 IN  RRSIG   A  Continue reading