We at Instaclustr recently published a blog post on the most common data modelling mistakes that we see with Cassandra. This post was very popular and led me to think about what advice we could provide on how to approach designing your Cassandra data model so as to come up with a quality design that avoids the traps.
There are a number of good articles around that with rules and patterns to fit your data model into: 6 Step Guide to Apache Cassandra Data Modelling and Data Modelling Recommended Practices.
However, we haven’t found a step by step guide to analysing your data to determine how to fit in these rules and patterns. This white paper is a quick attempt at filling that gap.
This phase has two distinct steps that are both designed to gain a good understanding of the data that you are modelling and the access patterns required.
The first step is to get a good understanding of your data domain. As someone very familiar with relation data modelling, I tend to sketch (or at least think) ER diagrams to understand the entities, their keys and relationships. However, Continue reading
Hey, it's HighScalability time:
Ah, the good old days. This is how the FBI stored finger prints in 1944. (Alex Wellerstein). How much data? Estimates range from 30GB to 2TB.
If you like this sort of Stuff then please support me on Patreon. Also, there's my new book, Explain the Cloud Like I'm 10, for complete cloud newbies.
This is a guest post by Russell Sullivan, founder and CTO of Kuhirō.
Serverless is an emerging Infrastructure-as-a-Service solution poised to become an Internet-wide ubiquitous compute platform. In 2014 Amazon Lambda started the Serverless wave and a few years later Serverless has extended to the CDN-Edge and beyond the last mile to mobile, IoT, & storage.
This post examines recent innovations in Serverless at the CDN Edge (SAE). SAE is a sea change, it’s a really big deal, it marks the beginning of moving business logic from a single Cloud-region out to the edges of the Internet, which may eventually penetrate as far as servers running inside cell phone towers. When 5G arrives SAE will be only a few milliseconds away from billions of devices, the Internet will be transformed into a global-scale real-time compute-platform.
The journey of being a founder and then selling a NOSQL company, along the way architecting three different NOSQL data-stores, led me to realize that computation is currently confined to either the data-center or the device: the vast space between the two is largely untapped. So I teamed up with some Continue reading
Hey, it's HighScalability time:
Luscious visualization of a neural network as a large directed graph. It's a full layout of the ResNet-50 training graph, a neural network with ~3 million nodes, and ~10 million edges, using Gephi for the graph layout, to output a 25000x25000 pixel image. (mattfyles)
If you like this sort of Stuff then please support me on Patreon. And take a look at Explain the Cloud Like I'm 10, my new book for complete cloud newbies. Thanks for your support! It means a lot to me.
Hey, it's HighScalability time:
How a mutex works pic.twitter.com/TwFLAVs2yd
— ☠️??Cody??☠️ (@valarauca1) October 21, 2017
Perfect! Now, imagine a little dog snuck under Big Dog's cone of shame and covered the food with its own cone of shame, and it won't leave. That's deadlock. Imagine a stream of little dogs sneaking under Big Dog's cone so Big Dog nevers gets a bite. That's livelock.
This is a guest by Michele Palmia, now @EyeEm, good times @IBM, @UniPd and @UCC.
We’ve now been running computer vision models in production at EyeEm for more than three years - on literally billions of images. As an engineer involved in building the infrastructure behind it from scratch, I both enjoyed and suffered the many technical challenges this task raised. This journey has also taught me a lot about managing processes and relationships with different teams, tasks of an especially challenging nature in a dynamic startup environment.
What follows is an attempt to consolidate the computer vision pipeline history at EyeEm, some of the challenges we had to face, some of the learning we’ve gained, and a glimpse into its future.
What is the cloud? Why is it called a cloud? How does the cloud work? What does it mean when something is 'in the cloud'?
I wrote a new book: Explain the Cloud Like I'm 10, answering those questions for the complete beginner. It makes the perfect gift for Halloween. And Thanksgiving. And Christmas. Oh, and birthdays too!
The irony is, if you read HighScalability, you're not the target audience :-) Explain the Cloud Like I'm 10 is for people who hear about the cloud everyday and have wondered what it is.
Talking with people outside the tech bubble I've found the cloud is still a mystery. I think that's because almost every explanation of the cloud I could find was a rewording of the same unhelpful technobabble.
In Explain the Cloud Like I'm 10 I've used a lot of pictures and a lot of examples. I go slow and easy. I try really hard to build up an intuitive understanding of what the cloud is and how it works.
If you know of anyone who might benefit from a book like this, I'd appreciate it if you'd pass it on.
thanks!
Hey, it's HighScalability time:
Cassini's last image of Saturn, stitched together from 11 color composites, each a stack of three images taken in red, green, and blue channels. (Jason Major)
If you like this sort of Stuff then please support me on Patreon.
This is a guest post by Jake Lumetta, co-founder and CEO of ButterCMS.
ButterCMS lets developers add a content management system to any website in minutes. Our business requires us to deliver near-100% uptime for our API, but after multiple outages that nearly crippled our business, we became obsessed with eliminating single points of failure. In this post, I’ll discuss how we use Fastly’s edge cloud platform and other strategies to make sure we keep our customers’ websites up and running.
At its core, ButterCMS offers:
A dashboard for content editors
A JSON API for fetching content
SDK’s for integrating ButterCMS into native code
Hey, it's HighScalability time:
Tech is transforming how food is being grown. Lots of opportunity for local nerdy production. Greenhouses even look like dartacenters! (This Tiny Country Feeds the World)
If you like this sort of Stuff then please support me on Patreon.
Maybe programming will look something like the above video. Humans and AIs working together to produce software better than either can separately.
The computer as a creative agent, working in tandem with a human partner, to produce software, in a beautiful act of co-creation.
The alternative vision—The Coming Software Apocalypse—is a dead end. Better requirements and better tools have already been tried and found wanting. Requirements are a trap. They don't work. Requirements are no less complex and undiscoverable than code. Tools are another trap. Tools are just code that encode an inflexible solution to a problem that's already been solved.
Admittedly, I'm cheating. I have no idea how any of this will work, but here are the seeds of how it has already started:
Here's what we do know: neither tools or requirements are a silver bullet, they are a method of incrementally improving software quality. Software production quantity is not increased at all.
What we need is a manufacturing process that puts software production on an exponential curve. The only conceivable tool we have at Continue reading
Hey, it's HighScalability time:
LiDAR sees an enchanted world. (Luminar)
If you like this sort of Stuff then please support me on Patreon.
Ripple’s XRP Ledger is a blockchain-based payment network that transfers funds between any type of currency within a few seconds with average transaction costs of a fraction of a penny. The core of this peer-to-peer network is an open source C++ application called rippled. Ripple’s goal is to supplant the world’s existing legacy payment networks. As such, scalability is a continuous goal. This document describes how the rippled team has integrated performance engineering into its development processes, and how this has contributed to throughput gains of over 1000%.
Performance engineering practices deliver benefits in addition to measurable performance gains. These include the ability to report on the capabilities of the software so that users can feel confident that their needs will be met by the system. Performance engineering informs capacity planning and optimal configuration of environments to support the application. Many performance problems are caught and addressed before customers notice them. As process automation improves, each change to the software can be quickly assessed for improvement or regression. This methodology also makes better use of developer time by helping choose the most effective tasks for improving performance. Any software project serious about supporting global scale should integrate performance engineering Continue reading
Hey, it's HighScalability time:
Latency Numbers Every Programmer Should Know plotted over time. Click on and move the slider to see changes. There were a lot more blocks in 1990.
If you like this sort of Stuff then please support me on Patreon.
This is a guest post by Roger Jin, Software Architect at ButterCMS and co-author of Microservices for Startups.
For a profession that stresses the importance of naming things well, we've done ourselves a disservice with microservices. The problem is that that there is nothing inherently "micro" about microservices. Some can be small, but size is relative and there's no standard of unit of measure across organizations. A "small" service at one company might be one million lines of code while far less at another.
Some argue that microservices aren’t a new thing at all and rather a rebranding of Service Oriented Architectures, while others advocate for viewing microservices as an implementation of SOA similar to how Scrum is an implementation of Agile.
How do you align your team when no precise definitions of microservices exist? The most important thing when talking about microservices on a team is to ensure that you are grounded in a common starting point.
But ambiguous definitions don’t help with this. It would be like trying to put Agile into practice without context for what you are trying to achieve, or an understanding of precise methodologies like Scrum.
This is a guest post by Arnaud Granal, CTO at Adcash.
Adcash is a worldwide advertising platform. It belongs to a category called DSP (demand-side platform). A DSP is a platform where anyone can buy traffic from many different adnetworks.
The advertising ecosystem is very fragmented behind the two leaders (Google and Facebook) and DSPs help to solve this fragmentation problem.
If you want to run a campaign across 50 adnetworks, then you can imagine the hassle to do it on each adnetwork (different targetings, minimum to spend, quality issues, etc). What we do, is consolidate the ad inventory of the internet in one place and expose it through a self-service unified interface.
We are a technology provider; if you want to buy native advertisement, if you want to buy popups, if you want to buy banners, then it is your choice. The platform is free to use, we take a % on the success.
A platform like Adcash has to run on a very lean budget, you do not earn big money, you get micro-cents per transaction. It is not unusual to earn less than 0.0001 USD per impression.
Oh, by the way, we have 100 ms Continue reading