"This data set includes the complete social graph of 500,000 follow links as well as over 1,000,000 commits and 50,000 users."
"...a large fraction of [GitHub] users provide a location in their profile, which we can turn into geographic coordinates using a geocoding API like PlaceFinder...
"For each repository, we extract the owner, collaborator, and contributor usernames, plus branch names. New user- names help to find new repositories, while branch names are used to fetch commit metadata. Using this method, the crawler uncovered 40,860 code repositories, representing 33,388 unique project names and 1,219,872 individual commits."
"In addition to crawled data, we use the complete GitHub user follower graph from Jan 19, 2011. This graph includes 452,248 links connecting 106,247 unique users, 47% (49,500) of which could be geocoded with the PlaceFinder API"
|