Thursday, August 13, 2009

Atlanta crime data: project overview

I've been working on analyzing the city's crime data from 2004-2008, which I pulled from their crime mapping page and threw into a database. The crime mapping data actually provides the raw data about each incident, including address, number of victims, etc.

El hermano, who is a data monkey by day, has been helping me pull aggregate summaries for different types of crimes, which i've then thrown into a speadsheet to analyze. This post will be a bit of "housekeeping" and methodology, and future posts will be all about what I've found in the data.

For starters, there is a lot of data. It is a bit too much data to be manageable without isolating particular types of crimes or areas of the city. I have decided to focus first on what I hear about the most from friends and in the papers:
  1. muggings
  2. home break-ins
  3. car break-ins.
While obviously all of the "big seven" crimes (Homicide, Rape, Aggravated Assault, Burglary, Larceny, Auto Theft) are important, these three types of crime, to me, represent the types of crimes which make most people feel unsafe. Well, ALL the crime categories make people feel unsafe, but these three represent our most common fears and are fairly common. In 2008, there were 105 murders in Atlanta but 8,216 home burglaries. I'll be looking at all the crime categories over the next few weeks, but I'll be focusing extra attention on these.

I'm breaking the data up by NPUs. The APD zones are too large and there are too many neighborhoods or police beats to be useful. Atlanta neighborhoods are also vastly different sizes, so they aren't great for comparing data against each other. Using NPUs should break the city into manageable chunks of roughly similar sizes, but with small enough areas that we can see meaningful patterns.

Cassie Branum, a grad student at Georgia Tech who I worked with on the ULI Competition earlier this year, is helping out by putting the data into GIS. She is fantastic to work with, and like el hermano is helping me out for free. We'll have some pretty neat maps showing which parts of the city are "hot spots" for various crimes, as well as where which areas have seen large increases or decreases in activity.

Finally, there has been quite a bit of discussion with el hermano about what sort of summary measurements we should be looking at, and how best to present the data. I'd love to hear from you, my readers, about what you'd be most interested in finding out while we are crunching the data.

I foresee this being a weekly or bi-weekly feature, as we work our way through various crime categories. Classes start on Monday, so things will need to be staggered by necessity.


  1. Awesome!! This is very exciting- I've often wanted to be able to analyze this kind of stuff.

    Any possibility of sharing the database so other folks can run some queries too?? :) You could post a .sql file...

    I'll be watching your blog for maps and graphs and stuff, so cool!


  2. I'm considering putting the data up on Google Fusion Tables, where people can run some basic aggregation, sorting, and filters... putting up the whole thing, queries and all, is a definite possibility. The data itself is all public, and we just cleaned the data a bit - the rest is queries and analysis.

  3. The sql is really just a straight data import of the yearly files available on the APD website:

    Note that the full csv (just a simple cat *.csv > all.csv) is 126M. I'm not even sure that fusion tables will allow that (I think 100M is the limit, isn't it Brett?)

    The schema is just built from the 'column names' file.


Note: Only a member of this blog may post a comment.