In part one, we described our Analytics data ingestion pipeline, with BigQuery sitting as our data warehouse. However, having our analytics events in BigQuery is not enough. Most importantly, data needs to be served to our end-users.
In this article, we will detail:
- Why we chose Redshift to store our data marts,
- How it fits into our serving layer,
- Key learnings and optimization tips to make the most out of it,
- Orchestration workflows,
- How our data visualization apps (Chartio, web apps) benefit from this data.
Data is in BigQuery, now what?
Design Of A Modern Cache—Part Deux
This is a guest post by Benjamin Manes, who did engineery things for Google and is now doing engineery things as CTO of Vector.
The previous article described the caching algorithms used by Caffeine, in particular the eviction and concurrency models. Since then we’ve made improvements to the eviction algorithm and explored a new approach towards expiration.
Eviction Policy
Window TinyLFU (W-TinyLFU) splits the policy into three parts: an admission window, a frequency filter, and the main region. By using a compact popularity sketch, the historic frequencies are cheap to retain and lookup. This allows for quickly discarding new arrivals that are unlikely to be used again, guarding the main region from cache pollution. The admission window provides a small region for recency bursts to avoid consecutive misses when an item is building up its popularity.
This structure works surprisingly well for many important workloads like database, search, and analytics. These cases are frequency-biased where a small admission window is desirable to filter aggressively...
Stuff The Internet Says On Scalability For February 22nd, 2019
Wake up! It's HighScalability time:
Do you like this sort of Stuff? I'd greatly appreciate your support on Patreon. Know anyone who needs cloud? I wrote Explain the Cloud Like I'm 10 just for them. It has 39 mostly 5 star reviews. They'll learn a lot and love you forever.
- 2%: of sales spent by consumer packaged goods companies on R&D (14% for tech); 272 million: metric tons of plastic are produced each year around the globe; 100+ fps: Google's Edge TPU; 6,000: bugs per million lines of code; 2.2 GB/sec: SIMD JSON parser; 20-30%: fall in DRAM prices; 8x: Russian hackers faster than North Korean hackers; 50%: EV car sales in China by 2025;
- Quoteable Quotes:
- @davygreenberg: If I do a job in 30 minutes it’s because I spent 10 years learning how to do that in 30 minutes. You owe me for the years, not the minutes.
- @PaulDJohnston: Lambda done badly is still better than Kubernetes done well
- Ross Mcilroy: we now believe that speculative vulnerabilities on today's hardware defeat all language-enforced confidentiality with no known Continue reading
Sponsored Post: Software Buyers Council, InMemory.Net, Triplebyte, Etleap, Stream, Scalyr
Who's Hiring?
- Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here.
- Need excellent people? Advertise your job here!
Fun and Informative Events
- Join Etleap, an Amazon Redshift ETL tool to learn the latest trends in designing a modern analytics infrastructure. Learn what has changed in the analytics landscape and how to avoid the major pitfalls which can hinder your organization from growth. Watch a demo and learn how Etleap can save you on engineering hours and decrease your time to value for your Amazon Redshift analytics projects. Register for the webinar today.
- Advertise your event here!
Cool Products and Services
- Shape the future of software in your industry. The Software Buyers Council is a panel of engineers and managers who want to share expert knowledge, contribute to improvement of software, and help startups in their industry. Receive occasional invitations to chat with for 30 minutes about your area of expertise and software usage. No obligations, no marketing emails or sales calls. Upcoming topics include infrastructure and application monitoring, AI/ML platforms, and more. Learn Continue reading
Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections
Redis Cluster is the native sharding implementation available within Redis that allows you to automatically distribute your data across multiple nodes without having to rely on external tools and utilities. At ScaleGrid, we recently added support for Redis Clusters on our platform through our fully managed Redis hosting plans. In this post, we’re going to introduce you to the advanced Redis Cluster sharding opportunities, discuss its advantages and limitations, when you should deploy, and how to connect to your Redis Cluster.
Sharding with Redis Cluster
Stuff The Internet Says On Scalability For February 15th, 2019
Wake up! It's HighScalability time:
Opportunity crossed over the rainbow bridge after 15 years of loyal service. "Our beloved Opportunity remains silent."
Do you like this sort of Stuff? I'd greatly appreciate your support on Patreon. Know anyone who needs cloud? I wrote Explain the Cloud Like I'm 10 just for them. It has 39 mostly 5 star reviews. They'll learn a lot and love you forever.
- 200 million: per day YouTube videos recommended on home page; $9.3 billion: 27% increase in AI funding; 70%: Microsoft security bugs are memory safety issues; 11: new version of Perl; 24%: serverless users are new to cloud computing; 1 million: SpaceX satellite uplinks; $500K: ticket to mars; $13 billion: Google's new datacenter construction; 59%: increase in Tesla Autosteer accidents; $.30: reddit per user revenue; 38%: Airbnb bugs preventable by using types; 60K: data breaches reported since GDPR; 350: theoretical max rock stone skips;
- Quoteable Quotes:
- @gchaslot: Brian's hyper-engagement slowly biases YouTube: 1/ People who spend their lives on YT affect recommendations more 2/ So the content they watch gets more views 3/ Continue reading
Stuff The Internet Says On Scalability For February 8th, 2019
Wake up! It's HighScalability time:
Change is always changing. What will the next 5 years look like?
Do you like this sort of Stuff? I'd greatly appreciate your support on Patreon. Know anyone who needs cloud? I wrote Explain the Cloud Like I'm 10 just for them. It has 35 mostly 5 star reviews. They'll learn a lot and love you forever.
- 16,000: Chrome bugs found with ClusterFuzz; $2,000,000: for Apple iOS remote jailbreak; $1 million: think twice when profiting from a bug; 0: clicks to over the air explotation of Marvell Avastar Wi Fi; $300: cost for a bounty hunter to track your phone's location; 321M: Twitter MAUs; 3: years of falling smartphone shipments; 50%: new development uses microservices; 8 inches: big difference in cell phone radiation; ...
- Quoteable Quotes:
- @pczarkowski: As I keep telling people, if you have a kubernetes strategy you've already failed. Kubernetes should be an implementation detail at the tactical level to deal with the strategic imperative of solving the problems that are halting the flow of money.
- EFF: EU countries that do not have zero rating practices enjoyed a double digit drop Continue reading
Sponsored Post: Software Buyers Council, InMemory.Net, Triplebyte, Etleap, Stream, Scalyr
Who's Hiring?
- Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here.
- Need excellent people? Advertise your job here!
Fun and Informative Events
- Advertise your event here!
Cool Products and Services
- Shape the future of software in your industry. The Software Buyers Council is a panel of engineers and managers who want to share expert knowledge, contribute to improvement of software, and help startups in their industry. Receive occasional invitations to chat with for 30 minutes about your area of expertise and software usage. No obligations, no marketing emails or sales calls. Upcoming topics include infrastructure and application monitoring, AI/ML platforms, and more. Learn more and join today.
- InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net
- For heads of IT/Engineering responsible for building an analytics infrastructure, Etleap is an Continue reading
Sponsored Post: Software Buyers Council, InMemory.Net, Triplebyte, Etleap, Stream, Scalyr
Who's Hiring?
- Triplebyte lets exceptional software engineers skip screening steps at hundreds of top tech companies like Apple, Dropbox, Mixpanel, and Instacart. Make your job search O(1), not O(n). Apply here.
- Need excellent people? Advertise your job here!
Fun and Informative Events
- Advertise your event here!
Cool Products and Services
- Shape the future of software in your industry. The Software Buyers Council is a panel of engineers and managers who want to share expert knowledge, contribute to improvement of software, and help startups in their industry. Receive occasional invitations to chat with for 30 minutes about your area of expertise and software usage. No obligations, no marketing emails or sales calls. Upcoming topics include infrastructure and application monitoring, AI/ML platforms, and more. Learn more and join today.
- InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net
- For heads of IT/Engineering responsible for building an analytics infrastructure, Etleap is an Continue reading
Stuff The Internet Says On Scalability For February 1st, 2019
Wake up! It's HighScalability time:
Memory module for the Apollo Guidance Computer (Mike Stewart). The AGC weighed 70 pounds and had 2048 words of RAM in erasable core memory and 36,864 words of ROM in core rope memory. It flew to the moon.
Do you like this sort of Stuff? Please go to Patreon and do what comes natural. Need cloud? Stand under Explain the Cloud Like I'm 10 (35 nearly 5 star reviews).
- $10.9B: Apple's Q1 services revenue; 5 million: homes on Airbnb; 2.5-5%: base64 gzipped files close to original; 60MB/s: Dropbox per Kafka broker throughput limit; 12%: Microsoft's increased revenues; 9: new datasets; 900 million: installed iPhones; $5.7B: 2018 game investment; way down: chip growth;
- Quotable Quotes:
- Daniel Lemire: Most importantly, I claim that most people do not care whether they work on important problems or not. My experience is that more than half of researchers are not even trying to produce something useful. They are trying to publish, to get jobs and promotions, to secure grants and so forth, but advancing science is a secondary concern.
- @da_667: The moral of Continue reading
A Hybrid Cloud Approach from FraudGuard.io that Handles 50M Requests a Day
div align="center">This is a guest post from Ryan Averill at FraudGuard.io.
At FraudGuard.io we are a team of just a few developers; all working with our customers to try to make their applications as safe as possible. We have been working on FraudGuard for about 3 years and we’ve had paying customers for more than 2 years now. The main idea behind FraudGuard is for us to get attacked so you don’t have to. In other words; reduce the overall number of attacks your application receives each day by leveraging our threat data. We do this by by taking our attack data from our network of honeypots and share that data via API direct to you. Instead of some businesses just running services like Maxmind, that update occasionally, we actually run the entire process in house so we can immediately share real-time attack data from around the world....