Google Bought Kaggle: With Benevolent or Nefarious Intentions?
Last Wednesday, on the Google Cloud Platform Blog, Fei-Fei Li (Chief Scientist of AI/ML, Google Cloud) announced that Google bought Kaggle, the popular machine learning competition platform. In that announcement, Li made statements that were abundantly clear as to why Google bought Kaggle. Those are as follows.
Why Google Bought Kaggle
- Google seeks to lower the barrier to entry for future AI professionals
- Google wants to make AI open to an ever-expanding network of application developers. (Think: “AI-Lite” – Developers can use and deploy machine learning algorithms in their applications without having to become machine learning experts.)
- To support Kaggle’s strong community of data scientists
Quick Note: Kaggle is a machine learning competition platform, but Google is talking about artificial intelligence here. Many people don’t know the difference between these two. If you’d like clarification on that point, you can watch the short webinar I posted on the topic here.
How Does Buying Kaggle Help Google Achieve Its Goals?
Kaggle offers a host of options to help data science newbies get better at doing data science. For starters, it gives new data scientists a chance to get their hands dirty with using machine learning to solve real-world problems. It gives Kaggle competitors data to practice on – something that can be hard to find otherwise. After each competition, it gives them a chance to review and learn from the more successful approaches of other competitors.
Kaggle also offers something it calls Kernels. Kernels are environments that are used for storing all input, output, and code that is need for each analysis. For each competition you enter, you generate a kernel. You can then use your Kaggle Kernels as a sort of data science portfolio when seeking to get hired. Not only that, from a learning perspective, Kernels can be a useful mentoring tool for newbies, as they are collaborative. That means that inexperienced data scientists can ask for and receive feedback on the work that’s in their kernels, or they can review feedback and discussion from other people’s kernels. This sort of collaborative co-mentoring goes a long way in helping new data science professionals gain the confidence and experience they need to land a job.
Google Naysayers Doubt Google’s Benevolence
Although Google’s acquisition of Kaggle sounds like a major win-win, the industry has its naysayers. Some developers have voiced concern that Fortune 500 companies may no longer want to host their competitions (and datasets) on Kaggle, since it is now owned by Google. Over on ycombinator.com, @throwawayange1 went so far as to write, “This was an acqui-hire. They paid for a few talented engineers. I expect it to be shut down in a year or two.”
One thing that all naysayers seem to agree on is the idea that Google did not acquire Kaggle out of the pure goodness of its heart – that it’s not a purely benevolent move. At the very least, when Google bought Kaggle, it bought itself access to Kaggle competitors – a group of people who are quite likely to be willing to pay for access to Google API (ie; Google Scalable Machine Learning) for things like natural language processing and deep learning with TensorFlow.
My Thoughts On The Matter
I can think of only good things when I imagine Google infusing the Kaggle platform with free access to its data management resources.
What are my thoughts on the matter? Well, of course I don’t believe that Google acted in pure benevolence here. Google is a business and the job of a business is to make money – so, yes, when Google bought Kaggle, it was probably making some sort of strategic play. That said, I can think of only good things when I imagine Google infusing the Kaggle platform with free access to its data management resources. And as for the concerns of Fortune 500 Kaggle contest hosts, I could see where concern would be in order. There are several other places, though, that you can host your competition, instead of on Kaggle. Take a look at HackerRank, Topcoder, or DataDriven instead.
But maybe I am missing some key insight here… What about you, what is your opinion? I’d love to hear your thoughts in the comments section below.