data.jpg
Remember the yip-yip aliens from Sesame Street? In spite of their advanced technology, meaningful communication always seemed to elude them; so when they encountered a telephone or a radio for the first time they would quizzically inquire, “meow?”

That confusion is kind of what’s happening in government right now: lots of new forms of data are becoming available, but nobody’s entirely sure what they’re looking at. Yip yip yip yip yip yip yip … crime maps? Yip yip yip … transit timetables? Yip yip yip … weather? Yip yip … zoning laws? Unemployment rates? Census data? Library inventories? School lunches? Birth certificates? Air quality? All that information is floating around out there somewhere, but is it usable? Nope nope nope nope nope nope nope.

“Data.gov publishes over 100,000 data sources basically in 100,000 formats,” Csaba Csoma explained to us. He’s part of CivicDB, a loosely-organized group of engineers/coders/hackers/developers/designers working to liberate useful public information from incomprehensible databases. The group has regular meetings to bring disorder to the chaos–sort of a reverse knitting-circle, untangling the knotted yarn back into neat little uniform balls.

“Data.gov is the right step,” he says, “The only problem is now the format. … When every city in US starts to publish, for example, crime data in their own internal format, we will have a lot of great and unusable data at our hands. Creating an application to just display this data on Google Maps would cost millions.” He’s not exaggerating: because governments don’t need to worry so much about turning a profit, they often make unprofitable decisions, such as Bay Area taxpayers spending $11 million on the dreadful 511.org, or $9.5 million on recovery.gov. That is SO MUCH MONEY, especially for an outcome that is not super-usable.

If you’ve ever tried to get at a public information, it won’t really come as a surprise that government is, like Grover the waiter, confusing and unhelpful when it comes to sharing information. But we’re lucky enough to live in the Bay Area, where tech expertise and altruism might come to the rescue. Increasingly, private citizens see that something is broken, and want to learn how they can fix it.

CivicDB’s first goal is pretty fundamental: rather than building applications, for now they’re just working on establishing one common format for it all. Once all of the data is in a common language, it’ll become far more usable. But they aren’t starting from scratch, either: San Francisco recently announced that it would begin releasing new feeds of municipal data. It’s part of a larger trend: government agencies reversing old-fashioned
secrecy
in favor of openness. New websites like AnalyzeThe.us, Socrata.com, FollowTheMoney.org, and MAPLight.org provide foggy little windows into government data. But of course, nobody ever said those data feeds would be easy to use, and developers have found that there still isn’t enough rhyme or reason in the way that government information is organized. So development of useful tools for the public has been slow.

In other words, if humans can’t understand the data, then they can’t make an application that understands it. Yip yip yip yip yip yip yip.

Looking even further into the future, one day the city might use standardized formats not just to broadcast data, but also to collect it.The city’s level of commitment to this openness remains to be seen. Gavin made a lot of happy open-data sounds a few weeks ago, but he’s also a professional politician who’s gunning for a promotion right now, so all of our most cynical suspicions about everything he does are probably true. The city’s been slow to release feeds for things like 311; and data about Muni has suffered from questionable lawsuits and closed formats like PDF.

But whether or not it’s all an election ploy, some city employees are making the most of Gavin’s stated commitment to openness. Although they’re reluctant to talk to the press (yes, that’s right, the people working on improving transparency have shied away from publicizing their work), San Francisco civil servants maintain helpful wikis and extol the virtues of openness at conferences.

It’s easy to be annoyed at government inefficiency for creating this problem, but it’s a relief that there are folks working behind the scenes to fix it.

We’re still many stages of development away from seeing a finished, polished product. But if you want to gaze into a hypothetical crystal ball, you could probably use the website SeeClickFix as an example. SCF is a friendly, fun, accessible site that mashes up municipal data and maps, showing you where things are broken and where they need to be fixed. (You can see how the Appeal uses it here.)

And looking even further into the future, one day the city might use standardized formats not just to broadcast data, but also to collect it. Presently, Twitter is the only third party reporting system used by the city. But Ben Berkowitz, the founder of SeeClickFix tells the Appeal, “We believe that third party applications using municipal data are most effective when they are able to pass information both from government to citizen and from citizen to government.”

But the city has to walk before it can run. Right now, many government officials are still getting used to the idea that their data could be public; they only just this week agreed to publicize some data from 311, allowing developers to track citywide issues like they track software bugs.

For all the little victories like this, though, the city is still slow to provide useful data. But a few months or years down the line we could see a lot more SeeClickFixes, all with much richer data.

When will that happen? That all depends on how how many converts CivicDB can win over with its work.

Please make sure your comment adheres to our comment policy. If it doesn't, it may be deleted. Repeat violations may cause us to revoke your commenting privileges. No one wants that!