GitHub on BigQuery: Analyze all the code

Google, in collaboration with GitHub, is releasing an incredible new open dataset on Google BigQuery. So far you’ve been able to monitor and analyze GitHub’s pulse since 2011 (thanks GitHub Archive project!) and today we’re adding the perfect complement to this. What could you do if you had access to analyze all the open source software in the world, with just one SQL command? The Google BigQuery Public Datasets program now offers a full snapshot of the content of more than 2.8 million open source GitHub repositories in BigQuery. Thanks to our new collaboration with GitHub, you’ll have access to analyze the source code of almost 2 billion files with a simple (or complex) SQL query. This will open the doors to all kinds of new insights and advances that we’re just beginning to envision. For example, let’s say you’re the author of a popular open source library. Now you’ll be able to find every open source project on GitHub that’s using it. Even more, you’ll be able to guide the future of your project by analyzing how it’s being used, and improve your APIs based on what your users are actually doing with it.”

Link