We are observing stealth crawling behavior from Perplexity, an AI-powered answer engine. Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences. We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their source ASNs to hide their crawling activity, as well as ignoring — or sometimes failing to even fetch — robots.txt files.
The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust. There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences. Based on Perplexity’s observed behavior, which is incompatible with those preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling.
We received complaints from customers who had both disallowed Perplexity crawling activity in their robots.txt files and also created WAF rules to specifically block both of Perplexity’s declared crawlers: PerplexityBot and Perplexity-User. Continue reading
PlanetScale published a great article describing the high-level principles of how storage devices work and covering everything from tape drives to SSDs and network-attached storage — a must-read for anyone even remotely interested in how their data is stored.
I am a former high school teacher with a passion for networking and programming, especially […]
The post Python Scripts – From Classroom to Community - 1 first appeared on Brezular's Blog.
I am a former high school teacher with a passion for networking and programming, especially […]
The post From Classroom to Community 1 - Sort and Game first appeared on Brezular's Blog.
Earlier this year, a group of external researchers identified and reported a vulnerability in Cloudflare’s SSL for SaaS v1 (Managed CNAME) product offering through Cloudflare’s bug bounty program. We officially deprecated SSL for SaaS v1 in 2021; however, some customers received extensions for extenuating circumstances that prevented them from migrating to SSL for SaaS v2 (Cloudflare for SaaS). We have continually worked with the remaining customers to migrate them onto Cloudflare for SaaS over the past four years and have successfully migrated the vast majority of these customers. For most of our customers, there is no action required; for the very small number of SaaS v1 customers, we will be actively working to help migrate you to SSL for SaaS v2 (Cloudflare for SaaS).
Back in 2017, Cloudflare announced SSL for SaaS, a product that allows SaaS providers to extend the benefits of Cloudflare security and performance to their end customers. Using a “Managed CNAME” configuration, providers could bring their customer’s domain onto Cloudflare. In the first version of SSL for SaaS (v1), the traffic for Custom Hostnames is proxied to the origin based on the IP addresses assigned to the Continue reading
Kubernetes networking is deceptively simple on the surface, until it breaks, silently leaks data, or opens the door to a full-cluster compromise. As modern workloads become more distributed and ephemeral, traditional logging and metrics just can’t keep up with the complexity of cloud-native traffic flows.
That’s where Calico Whisker comes in. Whisker is a lightweight Kubernetes-native observability tool created by Tigera. It offers deep insights into real-time traffic flow patterns, without requiring you to deploy heavyweight service meshes or packet sniffer. And here’s something you won’t get anywhere else: Whisker is data plane-agnostic. Whether you run Calico eBPF data plane, nftables, or iptables, you’ll get the same high-fidelity flow logs with consistent fields, format, and visibility. You don’t have to change your data plane, Whisker fits right in and shows you the truth, everywhere.
Let’s walk through 5 network issues Whisker helps you catch early, before they turn into outages or security incidents.
Traditional observability tools often show whether a packet was forwarded, accepted or dropped, but not why. They lack visibility into which Kubernetes network policy was responsible or if one was even applied.
With Whisker, each network flow is paired with:
Linux 6.11+ kernels provide TCX attachment points for eBPF programs to efficiently examine packets as they ingress and egress the host. The latest version of the open source Host sFlow agent includes support for TCX packet sampling to stream industry standard sFlow telemetry to a central collector for network wide visibility, e.g. Deploy real-time network dashboards using Docker compose describes how to quickly set up a Prometheus database and use Grafana to build network dashboards.
static __always_inline void sample_packet(struct __sk_buff *skb, __u8 direction) {
__u32 key = skb->ifindex;
__u32 *rate = bpf_map_lookup_elem(&sampling, &key);
if (!rate || (*rate > 0 && bpf_get_prandom_u32() % *rate != 0))
return;
struct packet_event_t pkt = {};
pkt.timestamp = bpf_ktime_get_ns();
pkt.ifindex = skb->ifindex;
pkt.sampling_rate = *rate;
pkt.ingress_ifindex = skb->ingress_ifindex;
pkt.routed_ifindex = direction ? 0 : get_route(skb);
pkt.pkt_len = skb->len;
pkt.direction = direction;
__u32 hdr_len = skb->len < MAX_PKT_HDR_LEN ? skb->len : MAX_PKT_HDR_LEN;
if (hdr_len > 0 && bpf_skb_load_bytes(skb, 0, pkt.hdr, hdr_len) < 0)
return;
bpf_perf_event_output(skb, &events, BPF_F_CURRENT_CPU, &pkt, sizeof(pkt));
}
SEC("tcx/ingress")
int tcx_ingress(struct __sk_buff *skb) {
sample_packet(skb, 0);
return TCX_NEXT;
}
SEC("tcx/egress")
int tcx_egress(struct __sk_buff *skb) {
sample_packet(skb, 1);
return TCX_NEXT;
}
The sample.bpf.c file Continue reading
Is an LLM a stubborn donkey, a genie, or a slot machine (and why)? Find out in the Who is LLM? article by Martin Fowler.