Using the powers of data for good, not evil

In the midst of hurricane sandy's pending arrival, the buzz and excitement surrounding this year's sold-out Strata + Hadoop World 2012 conference in New York City (Oct. 20-23) remained intense. Having just returned home after hours at the airport and days of cancelled and delayed flights (due to the frankenstorm) I emerged from the conference with a fresh perspective on the many ways data can be used for good in our world.

What really stuck with me was the recurring theme of Google's slogan, "don't be evil." In fact, it seems clear that big data represents an opportunity to actively do good for society and this was reflected at the conference in a number of ways.

Some of the most exciting work in open data for the public good is being done in the cities of Chicago and New York. Chicago's chief data officer, Brett Goldstein, is seeking to develop data-driven city services that utilize data streams from social media, all in an open fashion and in a manner that is easily portable to other municipalities. They have developed simple services such as and by leveraging tech volunteers in the community.

Michael Flowers, analytics director for New York City, sought to somehow improve detection of illegal housing conversions in the city. Instead of simply sending resources where the complaints are coming from, Flowers wanted to send people where the actual conditions are. His team engaged first responders and integrated their feedback with income and property data and combined it with information from 65,000 calls to 311 a day, to devise a more accurate means of finding problem areas. The success was overwhelming, as inspectors increased the vacate rate from 13 to 70 per cent of calls that they responded to. 

DataKind is another leader in the area of doing good with data and was present in full force. Their mission is to organize data "do-gooders" to help not-for-profits wrangle with their big data problems. They organized a DataSprint on the first day of the conference, which is a one day hackathon aimed at gaining insights on specific data problems. Data scientists gathered to help New York City analyze tree pruning data to predict how accidental damage to property by falling trees can be minimized. The DataSprint was an extension of an earlier DataDive, which is an extended version of the Sprint, during which DataKind performed a preliminary tree block pruning analysis and also developed a Tree Storm Risk Map. The risk map shows which regions of the city are most susceptible to damage during a storm.

While it is easy to get caught up in this buzz and excitement, it remains to be seen what the true potential is of big data. The main uses of big data today are arguably still targeted ads and recommendation engines – hardly tools that will better the world for mankind. And along with every insight that is gained from Foursquare check-in data, the question arises of whom this data really belongs to and whether users' privacy is appropriately protected. The ethics of data can quickly become complex. For example, is your DNA data free for you to share? Or should your children have some stake or ownership? It will take some time yet to determine what the true implications are of big data and where along the technology hype curve we currently sit. Are we headed for a crash or gradually making progress to make big data a useful tool for society? 

As fate would have it, real and useful applications of data goodwill came to light immediately after the conference, as hurricane Sandy came bearing down on New York City. DataKind's Storm Risk Map provided an open source tool for everyone in the affected region and New York City made interactive evacuation and flood zone maps openly accessible. In other instances, businesses are openly sharing access to charging stations and the tech community has banded together to volunteer their services in an effort to help restore damaged IT infrastructure. It seems at least for now, the slogan, "Do Good with Big Data" prevails.