How we can transfer salesforce data to hadoop? It is big challenge to everyday users. What are different features of data transfer tools.
Hey, it's HighScalability time:
An interesting question came up on the mechanical-sympathy list about how to best benchmark a stack of different queue (aeron/argona, jctools, dpdk, pony) and transport (aeron, dpdk, seastar) options.
Who better to answer than Gil Tene, Vice President of Technology and CTO, Co-Founder, of Azul Systems? Here's his usual insightful and helpful response:
If you are looking at the set of "stacks" (all of which are queues/transports), I would strongly encourage you to avoid repeating the mistakes of testing methodologies that focus entirely on max achievable throughput and then report some (usually bogus) latency stats at those max throughout modes.
The tech empower numbers are a classic example of this in play, and while they do provide some basis for comparing a small aspect of behavior (what I call the "how fast can this thing drive off a cliff" comparison, or "peddle to the metal" testing), those results are not very useful for comparing load carrying capacities for anything that actually needs to maintain some form of responsiveness SLA or latency spectrum requirements.
With the tax deadline looming in the US and the future of the gig economy as the engine of scaling startup workforces under fire, there's an important point to consider: In the gig economy the entire social contract is kaput. Here's why.
Everyone who works in the US pays into the Social Security system. The whole idea of Social Security is young people pay in and old people take out.
When you are an employee Social Security taxes are taken directly out of your paycheck. You don't even have to think about it.
When you work in the gig economy you get a 1099-MISC at the end of the year. A 1099 reports payments made by the hiring company during the year and it's sent by the hiring company both to the worker and the IRS.
It's up to the worker to identify their income on their tax return as self employment income, which is subject to a Social Security tax of 15.3%. Most gig workers probably won't declare this income because a lot of them don't even know they are supposed to. My wife, Linda Coleman, a respected Enrolled Agent, says from people she has talked to Continue reading
Hey, it's HighScalability time:
This is a guest repost by Suresh Kondamudi from CleverTap.
Dealing with large datasets is often daunting. With limited computing resources, particularly memory, it can be challenging to perform even basic tasks like counting distinct elements, membership check, filtering duplicate elements, finding minimum, maximum, top-n elements, or set operations like union, intersection, similarity and so on
Probabilistic data structures can come in pretty handy in these cases, in that they dramatically reduce memory requirements, while still providing acceptable accuracy. Moreover, you get time efficiencies, as lookups (and adds) rely on multiple independent hash functions, which can be parallelized. We use structures like Bloom filters, MinHash, Count-min sketch, HyperLogLog extensively to solve a variety of problems. One fairly straightforward example is presented below.
We at CleverTap manage mobile push notifications for our customers, and one of the things we need to guard against is sending multiple notifications to the same user for the same campaign. Push notifications are routed to individual devices/users based on push notification tokens generated by the mobile platforms. Because of their size (anywhere from 32b to 4kb), it’s non-performant for us to index Continue reading
Hey, this is no joke, it's HighScalability time:
This is one of the most interesting build or buy questions of all time: should Apple build their own cloud? Or should Apple concentrate on what they do best and buy cloud services from the likes of Amazon, Microsoft, and Google?
It’s a decision a lot of companies have to make, just a lot bigger, and because it’s Apple, more fraught with an underlying need to make a big deal out of it.
This build or buy question was raised and thoroughly discussed across two episodes of the Exponent podcast, Low Hanging Fruit and Pickaxe Retailers, with hosts Ben Thompson and James Allworth, who regularly talk about business strategy with an emphasis on tech. A great podcast, highly recommended. There’s occasional wit and much wisdom.
We’ve recently added video streaming service to Mail.Ru Cloud. Development started with contemplating the new feature as an all-purpose “Swiss Army knife” that would both play files of any format and work on any device with the Cloud available. Video content uploaded to the Cloud mostly falls into one of the two categories: “movies/series” and “users’ videos”. The latter are the videos that users shoot with their phones and cameras, and these videos are most versatile in terms of formats and codecs. For many reasons, it is often a problem to watch these videos on other end-user devices without prior normalization: a required codec is missing, or the file size is too big to download, or whatever.
In this article, I’ll go into detail to explain how video playback works in Mail.Ru Cloud, and how we made the Cloud player “omnivorous” and ensured support on a maximum number of end-user devices.
This is a guest post by Christophe Limpalair based on an interview (video) he did with Jon Cowie, Staff Operations Engineer and Breaksmith @ Etsy.
Etsy has been a fascinating platform to watch, and study, as they transitioned from a new platform to a stable and well-established e-commerce engine. That shift required a lot of cultural change, but the end result is striking.
In case you haven't seen it already, there's a post from 2012 that outlines their growth and shift. But what has happened since then? Are they still innovating? How are engineering decisions made, and how does this shape their engineering culture? These are questions we explored with Jon Cowie, a Staff Operations Engineer at Etsy, and the author of Customizing Chef, in a new podcast episode.
Uber faced a challenge. They store a lot of trip data. A trip is represented as a 20K blob of JSON. It doesn't sound like much, but at Uber's growth rate saving several KB per trip across hundreds of millions of trips per year would save a lot of space. Even Uber cares about being efficient with disk space, as long as performance doesn't suffer.
This highlights a key difference between linear and hypergrowth. Growing linearly means the storage needs would remain manageable. At hypergrowth Uber calculated when storing raw JSON, 32 TB of storage would last than than 3 years for 1 million trips, less than 1 year for 3 million trips, and less 4 months for 10 million trips.
Uber went about solving their problem in a very measured and methodical fashion: they tested the hell out of it. The goal of all their benchmarking was to find a solution that both yielded a small size and a short time to encode and decode.
The whole experience is described in loving detail in the article: How Uber Engineering Evaluated JSON Encoding and Compression Algorithms to Put the Squeeze on Trip Data. They came up with a matrix of Continue reading
This quote is from Jeff Dean, currently a Wizard, er, Fellow in Google’s Systems Infrastructure Group. It’s taken from his recent talk: Large-Scale Deep Learning for Intelligent Computer Systems.
Since AlphaGo vs Lee Se-dol, the modern version of John Henry’s fatal race against a steam hammer, has captivated the world, as has the generalized fear of an AI apocalypse, it seems like an excellent time to gloss Jeff’s talk. And if you think AlphaGo is good now, just wait until it reaches beta.
Jeff is referring, of course, to Google’s infamous motto: organize the world’s information and make it universally accessible and useful.
Historically we might associate ‘organizing’ with gathering, cleaning, storing, indexing, reporting, and searching data. All the stuff early Google mastered. With that mission accomplished Google has moved on to the next challenge.
Now organizing means understanding.
Some highlights from the talk for me:
Real neural networks are composed of hundreds of millions of parameters. The skill that Google has is in how to build and rapidly train these huge models on large interesting datasets, Continue reading
"CS research is timeless." Lessons learned are always pertinent. @aysylu22 #qconlondon
— Paula Walter (@paulacwalter) March 8, 2016
There has been a renaissance in the appreciation of computer science papers as a relevant source of wisdom for building today's complex systems. If you're having a problem there's likely some obscure paper written by a researcher twenty years ago that just might help. Which isn't to say there aren't problems with papers, but there's no doubt much of the technology we take for granted today had its start in a research paper. If you want to push the edge it helps to learn from primary research that has helped define the edge.
If you would like to share your love of papers, be proud, you are not alone:
Steve Kerr, head coach of the record setting Golden State Warriors (my local Bay Area NBA basketball team), has this to say about what the team needs to do to get back on track (paraphrased):
What we have to get back to is simple, simple, simple. That's good enough. The simple leads to the spectacular. You can't try the spectacular without doing the simple first. Make the simple pass. Our guys are trying to make the spectacular plays when we just have to make the easy ones. If we don't get that cleaned up we're in big trouble.
If you play the software game, doesn't this resonate somewhere deep down in your git repository?
If you don't like basketball or despise sports metaphors this is a good place to stop reading. The idea that "The simple leads to the spectacular" is probably the best TLDR of Keep it Simple Stupid I've ever heard.
Software development is fundamentally a team sport. It usually takes a while for this lesson to pound itself into the typical lone wolf developer brain. After experiencing a stack of failed projects I know it took an embarrassingly long time for me to Continue reading