Jesse Kipp

Author Archives: Jesse Kipp

Workers AI Update: Hello Mistral 7B

This post is also available in Deutsch.

Workers AI Update: Hello Mistral 7B

Today we’re excited to announce that we’ve added the Mistral-7B-v0.1-instruct to Workers AI. Mistral 7B is a 7.3 billion parameter language model with a number of unique advantages. With some help from the founders of Mistral AI, we’ll look at some of the highlights of the Mistral 7B model, and use the opportunity to dive deeper into “attention” and its variations such as multi-query attention and grouped-query attention.

Mistral 7B tl;dr:

Mistral 7B is a 7.3 billion parameter model that puts up impressive numbers on benchmarks. The model:

  • Outperforms Llama 2 13B on all benchmarks
  • Outperforms Llama 1 34B on many benchmarks,
  • Approaches CodeLlama 7B performance on code, while remaining good at English tasks, and
  • The chat fine-tuned version we’ve deployed outperforms Llama 2 13B chat in the benchmarks provided by Mistral.

Here’s an example of using streaming with the REST API:

curl -X POST \
“{account-id}/ai/run/@cf/mistral/mistral-7b-instruct-v0.1” \
-H “Authorization: Bearer {api-token}” \
-H “Content-Type:application/json” \
-d '{ “prompt”: “What is grouped query attention”, “stream”: true }'

API Response: { response: “Grouped query attention is a technique used in natural language processing  (NLP) and machine learning  Continue reading

Streaming and longer context lengths for LLMs on Workers AI

Streaming LLMs and longer context lengths available in Workers AI

Workers AI is our serverless GPU-powered inference platform running on top of Cloudflare’s global network. It provides a growing catalog of off-the-shelf models that run seamlessly with Workers and enable developers to build powerful and scalable AI applications in minutes. We’ve already seen developers doing amazing things with Workers AI, and we can’t wait to see what they do as we continue to expand the platform. To that end, today we’re excited to announce some of our most-requested new features: streaming responses for all Large Language Models (LLMs) on Workers AI, larger context and sequence windows, and a full-precision Llama-2 model variant.

If you’ve used ChatGPT before, then you’re familiar with the benefits of response streaming, where responses flow in token by token. LLMs work internally by generating responses sequentially using a process of repeated inference — the full output of a LLM model is essentially a sequence of hundreds or thousands of individual prediction tasks. For this reason, while it only takes a few milliseconds to generate a single token, generating the full response takes longer, on the order of seconds. The good news is we can start displaying the response as soon as the first tokens are generated, Continue reading

Using the power of Cloudflare’s global network to detect malicious domains using machine learning

Using the power of Cloudflare’s global network to detect malicious domains using machine learning
Using the power of Cloudflare’s global network to detect malicious domains using machine learning

Cloudflare secures outbound Internet traffic for thousands of organizations every day, protecting users, devices, and data from threats like ransomware and phishing. One way we do this is by intelligently classifying what Internet destinations are risky using the domain name system (DNS). DNS is essential to Internet navigation because it enables users to look up addresses using human-friendly names, like For websites, this means translating a domain name into the IP address of the server that can deliver the content for that site.

However, attackers can exploit the DNS system itself, and often use techniques to evade detection and control using domain names that look like random strings. In this blog, we will discuss two techniques threat actors use – DNS tunneling and domain generation algorithms – and explain how Cloudflare uses machine learning to detect them.

Domain Generation Algorithm (DGA)

Most websites don’t change their domain name very often. This is the point after all, having a stable human-friendly name to be able to connect to a resource on the Internet. However, as a side-effect stable domain names become a point of control, allowing network administrators to use restrictions on domain names to enforce policies, for example Continue reading

Area 1 threat indicators now available in Cloudflare Zero Trust

Area 1 threat indicators now available in Cloudflare Zero Trust
Area 1 threat indicators now available in Cloudflare Zero Trust

Over the last several years, both Area 1 and Cloudflare built pipelines for ingesting threat indicator data, for use within our products. During the acquisition process we compared notes, and we discovered that the overlap of indicators between our two respective systems was smaller than we expected. This presented us with an opportunity: as one of our first tasks in bringing the two companies together, we have started bringing Area 1’s threat indicator data into the Cloudflare suite of products. This means that all the products today that use indicator data from Cloudflare’s own pipeline now get the benefit of Area 1’s data, too.

Area 1 threat indicators now available in Cloudflare Zero Trust

Area 1 built a data pipeline focused on identifying new and active phishing threats, which now supplements the Phishing category available today in Gateway. If you have a policy that references this category, you’re already benefiting from this additional threat coverage.

How Cloudflare identifies potential phishing threats

Cloudflare is able to combine the data, procedures and techniques developed independently by both the Cloudflare team and the Area 1 team prior to acquisition. Customers are able to benefit from the work of both teams across the suite of Cloudflare products.

Cloudflare curates a set of data feeds Continue reading