Hey, it's HighScalability time:
Within the time you read the above sentence, NASA could have collected 1.73 gigabytes of data from around 100 missions which are active currently. NASA doesn’t stop doing this and the rate of collection is growing in an exponential manner. So, managing this kind of data is an uphill task for them. But the data which NASA collects is highly precious and its significance is immense in NASA’s science and research. NASA is trying extremely hard to make this data as approachable and accessible as possible for their daily tasks, various predictions in the universe, and for the human well-being through its innovations and creativity.
In version 2.0 of their “Open Government Plan” in the year 2012, NASA discussed, but did not go deeply into the work they have been doing regarding “Big Data” and they believed that they have much more to explore in this field.
We all know what big data is and what its uses are. So, I don’t think there is any need to mention what really big data is and let’s move on with other topic.
NASA’s Big Data Challenge
Well, not exactly Fishin', but I'll be on a month long vacation starting today. I won't be posting (much) new content, so we'll all have a break. Disappointing, I know. Please use this time for quiet contemplation and other inappropriate activities. Au revoir!
Hey, it's HighScalability time:
Hey, it's HighScalability time:
With Serverless hiring less experienced developers can work out better than hiring experienced cloud developers. That's an interesting point I haven't heard before and it was made by Paul Johnston, CTO of movivo, in The ServerlessCast #6 - Event-Driven Design Thinking.
The thought process goes something like this...
An experienced cloud developer will probably think procedurally, in terms of transactional systems, frameworks, and big fat containers that do lots of work.
That's not how a Serverless developer needs to think. A Serverless developer needs to think in terms of small functions that do one thing linked together by events; and they need to grok asynchronous and distributed thinking.
So the idea is you don't need typical developer skills. Paul finds people with sysadmin skills have the right stuff. Someone with a sysadmin background is more likely than a framework developer to understand the distributed thinking that goes with building an entire system of events.
Paul also makes the point that once a system has built experienced developers will get bored because Serverless systems don't require the same amount of maintenance.
For example, they had good success hiring a person with two years of vo-tech on-the-job training because they didn't have Continue reading
Hey, it's HighScalability time:
Data is the new currency. A phrase we’ve heard frequently in the wake of the story of Unroll.me selling user data to Uber.
Two keys to that story:
In both cases prevention requires user awareness. How do we get user awareness? Force meaningful disclosure. How do we force meaningful disclosure? Here’s an odd thought: use the tax system.
If data is the new currency then why isn’t exchanging data for use of a service a barter transaction? If a doctor exchanges medical services for chickens, for example, that is a taxable event at fair market value. It's a barter arrangement. A free service that sells user data is similarly bartering the service for data, otherwise said service would not be offered.
How would it work?
Service providers send out 1099-Bs to users for the fair market value of the service. Fair market value could be determined using a similar for pay service or as a percentage of the income generated from the data being sold.
The IRS treats barter transactions as income received. Users would need to pay income Continue reading
Hey, it's HighScalability time:
Many of you may have already heard about the high performance of the Tarantool DBMS, about its rich toolset and certain features. Say, it has a really cool on-disk storage engine called Vinyl, and it knows how to work with JSON documents. However, most articles out there tend to overlook one crucial thing: usually, Tarantool is regarded simply as storage, whereas its killer feature is the possibility of writing code inside it, which makes working with your data extremely effective. If you’d like to know how igorcoding and I built a system almost entirely inside Tarantool, read on.
If you’ve ever used the Mail.Ru email service, you probably know that it allows collecting emails from other accounts. If the OAuth protocol is supported, we don’t need to ask a user for third-party service credentials to do that — we can use OAuth tokens instead. Besides, Mail.Ru Group has lots of projects that require authorization via third-party services and need users’ OAuth tokens to work with certain applications. That’s why we decided to build a service for storing and updating tokens.
I guess everybody knows what an OAuth token looks like. To refresh your memory, it’s a structure consisting of 3–4 fields:
This is a guest repost by G Gordon Worley III, Head of Site Reliability Engineering at AdStage.
When I joined AdStage in the Fall of 2013 we were already running on Heroku. It was the obvious choice: super easy to get started with, less expensive than full-sized virtual servers, and flexible enough to grow with our business. And grow we did. Heroku let us focus exclusively on building a compelling product without the distraction of managing infrastructure, so by late 2015 we were running thousands of dynos (containers) simultaneously to keep up with our customers.
We needed all those dynos because, on the backend, we look a lot like Segment, and like them many of our costs scale linearly with the number of users. At $25/dyno/month, our growth projections put us breaking $1 million in annual infrastructure expenses by mid-2016 when factored in with other technical costs, and that made up such a large proportion of COGS that it would take years to reach profitability. The situation was, to be frank, unsustainable. The engineering team met to discuss our options, and some quick calculations showed us we were paying more than $10,000 a month for the convenience of Continue reading
Hey, it's HighScalability time:
Hey, it's HighScalability time:
Hey, it's HighScalability time:
This is a guest post by Apurva Davé, who is part of the product team at Sysdig.
Having worked with hundreds of customers on building a monitoring stack for their containerized environments, we’ve learned a thing or two about what works and what doesn’t. The outcomes might surprise you - including the observation that instrumentation is just as important as the application when it comes to monitoring.
In this post, I wanted to cover some details around what it takes to build a scale-out, highly reliable monitoring system to work across tens of thousands of containers. I’ll share a bit about what our infrastructure looks like, the design choices we made, and tradeoffs. The five areas I’ll cover:
Instrumenting the system
Relating your data to your applications, hosts, and containers.
Leveraging orchestrators
Deciding what to data to store
How to enable troubleshooting in containerized environments
For context, Sysdig is the container monitoring company. We’re based on the open source Linux troubleshooting project by the same name. The open source project allows you to see every single system call down to process, arguments, payload, and connection on a single host. The commercial offering turns all this data into thousands of Continue reading