Indianapolis Office Involved Shootings¶
Geocoding and Mapping in Python¶
Open data is becoming more and more popular in cities across the US. Many cities have portals which provide citizens access to data about various aspects of their community. Information about education, transportation, zoning, recreation, and crime is made available. By providing this information, the city is able promote honesty and transparency in the local government. Indianapolis, the capital city of Indiana, has an open data portal found here.
For this post, I used the IMPD Officer Involved Shooting dataset. This dataset contains information on Indianapolis Metropolitan Police Department (IMPD) officer involved shootings in Indianapolis and Marion County from January 2014 to March 2017.
The goal of this post is to illustrate how to clean up a dataset and visualize the results. From this analysis, I hope to answer two questions. First, where are the shootings occurring in Indianapolis? Secondly, is the number of officer involved shootings increasing?
import numpy as np #Used for data wrangling
import pandas as pd #Used for data wrangling
import seaborn as sns #Used for data vis
import matplotlib.pyplot as plt #Used for data vis
from geopy.geocoders import ArcGIS #Used for Geoencoding
import folium #Used for creating Leaflet.js maps
from folium import plugins #Used for more advanced mapping options
%matplotlib inline
Step 1: Data Ingest and Wrangling¶
The data from Open Indy are available in a number of formats. For this analysis, because the dataset was small (only around 100 rows), I simply downloaded the data as a .csv file. The Pandas python package makes reading in a .csv file extremely easy:
data = pd.read_csv('/Users/jreinhart/Desktop/Data Analysis Github/Indianapolis /IMPD_Officer_Involved_Shootings.csv')
That's it, isn't Python amazing? Pandas takes care of all the heavy lifting and now we have a dataframe of our .csv file nicely sorted into rows and columns. Let's take a look at the first few rows of data.
data.head(5)
data.head(5) shows the first 5 rows of our dataframe. As you can see from even just the first few rows, the data is a little messy. Most importantly, for trying to visualize the location of the shootings, we don't have lattitude and longitude as features. We do have the address of each incident, but this address is broken up into five fields: street number, name, type (st., ave., rd., etc.), direction, and the city.
In order to get the lat & long of each incident, we can use geocoding to lookup the coordinates for each postal address. However, before we can use geocoding, we need to create a new field in our data called "address" that will contain all of the pieces of the incident location in one place. So let's see what columns we need to combine.
data.columns
Looks like we'll need to combine the STREET_N, STREET, STREET_T, STEET_G, and CITY columns. Since we know all of the addresses are in Indiana, we can add that as well. We can use simple string manipulations to combine the columns:
data["address"] = data["STREET_N"].map(str) + " " + \
data["STREET"].map(str) + " " + \
data["STREET_T"].map(str) + " " + \
data["STREET_G"].map(str) + " " + \
data["CITY"] + " " + "Indiana"
data["address"][:10]
Great, our 'address' feature is shaping up! However, from the first few entries, we can see that some of the steets do not have a direction (nan). Let's clean that up.
data["address"].replace("nan", "", inplace=True, regex=True)
data["address"][:10]
geolocator = ArcGIS(timeout=10)
latitudes = []
longitudes = []
for i in data["address"]:
location = geolocator.geocode(i)
try:
latitudes.append(location.latitude)
longitudes.append(location.longitude)
except:
print i
continue
Now that we have to coordinates, let's add them to our dataframe.
data["latitude"] = latitudes
data["longitude"] = longitudes
data.head(5)
Step 3: Mapping with Folium and Leaflet.js Maps¶
We now have everything we need to plot the locations of the incidents. The Folium package makes it easy to visualize your python data in a Leaflet.js interactive map. Plotting is pretty similar to other charting packages, where you first define the canvas (in this case the basemap using folium.Map) and then plot markers, lines, and other features on top of the basemap.
indcoord = (39.7684, -86.1581) #Use to center our map on Indy
m = folium.Map(location = indcoord,
zoom_start=11,
control_scale=True,
detect_retina=True)
lats = data["latitude"]
longs = data["longitude"]
locations = list(zip(lats,longs))
data.apply(lambda row:folium.CircleMarker(location=[row["latitude"], row["longitude"]],
radius=100,
color=None,
fill_color="Green",
fill_opacity=1).add_to(m), axis=1)
m
The map above plots the location of each officer involved shooting incident as a green circle. We can now see where each shooting took place. However, we can use the color of our dots to encode more information. Let's make a map that plots not only the location of each shooting, but colors the dot red if the citizen was killed. The CIT_COND_TYPE feature states what the condition of the citizen was following the shooting.
print data.CIT_COND_TYPE.unique()
From the command above, there are only four conditions for the citizen as a result of the shooting: gunshot wound, no injuries noted or visible, death, or nan. Now we know how to filter the column (where CIT_COND_TYPE == 'Death'). We can use a lambda funtion to iterate through each row in our data that match the criteria, and plot a circle marker at the coordinates of the incident.
data[data['CIT_COND_TYPE']=='Death'].INCNUM.count()
m = folium.Map(location = indcoord, zoom_start=11, tiles="Stamen Terrain", control_scale=True, detect_retina=True)
lats = data["latitude"]
longs = data["longitude"]
locations = list(zip(lats,longs))
data[data["CIT_COND_TYPE"]!="Death"].apply(lambda row:folium.CircleMarker(location=[row["latitude"], row["longitude"]],
radius=100,
color=None,
fill_color="Green",
fill_opacity=1).add_to(m), axis=1)
data[data["CIT_COND_TYPE"]=="Death"].apply(lambda row:folium.CircleMarker(location=[row["latitude"], row["longitude"]],
radius=100,
color=None,
fill_color="Red",
fill_opacity=1).add_to(m), axis=1)
m
Now our colors mean something! Not only can you see the location of each event, but we know that the red dots were fatal incidents. I also changed the basemap tiles so that the dots are a little easier to see. However, the dots are still a little hard to pick up, so let's make them a little bigger. Furthermore, let's add a heatmap layer so we can see if there are any locations containing clusters of shootings.
m = folium.Map(location = indcoord, zoom_start=11, tiles="Stamen Terrain", control_scale=True, detect_retina=True)
lats = data["latitude"]
longs = data["longitude"]
locations = list(zip(lats,longs))
data[data["CIT_COND_TYPE"]!="Death"].apply(lambda row:folium.CircleMarker(location=[row["latitude"], row["longitude"]],
radius=200,
color=None,
fill_color="Green",
fill_opacity=1).add_to(m), axis=1)
data[data["CIT_COND_TYPE"]=="Death"].apply(lambda row:folium.CircleMarker(location=[row["latitude"], row["longitude"]],
radius=200,
color=None,
fill_color="Red",
fill_opacity=1).add_to(m), axis=1)
m.add_children(plugins.HeatMap(data=locations,
radius=15,
blur=10,
gradient={0.2:'#2c7bb6',
0.4:'#abd9e9',
0.6:'#ffffbf',
0.7:'#fdae61',
0.8:'#d7191c'}))
m
That looks much better. Now from the map we can not only see where there were fatal shootings, but we can also see three areas that have concentrations of incidents. This is where the power of using a Leaflet.js really shines. By having an interactive map, we can zoom in to the clusters and see more precisely where the hotspots are occurring. There is a cluster NW of the Indianapolis Motor Speedway along West 34th Street. There is also a cluster Southeast of the fairgrounds on North Sherman Drive. Finally, there is a third cluster East of downtown Indianapolis on E 10th St.
In order to see the heatmap better, we can use the CartoDB dark_matter tiles which is a dark basemap.
m = folium.Map(location=indcoord, tiles="CartoDB dark_matter", zoom_start=11, control_scale=True, detect_retina=True)
data[data["CIT_COND_TYPE"]!="Death"].apply(lambda row:folium.CircleMarker(location=[row["latitude"], row["longitude"]],
radius=200,
color=None,
fill_color="Green",
fill_opacity=1).add_to(m), axis=1)
data[data["CIT_COND_TYPE"]=="Death"].apply(lambda row:folium.CircleMarker(location=[row["latitude"], row["longitude"]],
radius=200,
color=None,
fill_color="Red",
fill_opacity=1).add_to(m), axis=1)
m.add_children(plugins.HeatMap(data=locations,
radius=15,
blur=10,
gradient={0.2:'#2c7bb6',
0.4:'#abd9e9',
0.6:'#ffffbf',
0.7:'#fdae61',
0.8:'#d7191c'}))
m.save('/Users/jreinhart/Desktop/heatmap_shootings.html')
m
Step 4: Plotting Events with Seaborn¶
We now know where the incidents are occurring, but we still do not know if the incidents are increasing or decreasing over time.
data["OCCURRED_DT"] = pd.to_datetime(data["OCCURRED_DT"], format="%Y-%m-%d")
data["OCCURRED_DT"][:10]
data = data.set_index(data["OCCURRED_DT"])
gb_year = data.groupby(pd.TimeGrouper(freq='A')).count()
gb_year
sns.set_style("white")
sns.set_style("ticks")
ax = sns.barplot(x=gb_year.index.year, y="INCNUM", data=gb_year, color="#222343")
sns.despine(top=True, right=True, bottom=True)
ax.set(xlabel="Year", ylabel='Number of Shootings')
ax = sns.factorplot(x=gb_year.index.year, y="INCNUM", data=gb_year, color="#222343")
ax.set(xlabel="Year", ylabel='Number of Shootings')
Excellent! We now know the answer to our second question. From looking at the bar chart and line chart above, we can see that there were more officer involved shootings in 2015 than 2014 and 2016. However, the overall number of events in those years are fairly close. The number of shootings in 2017 is so low because our data for 2017 only goes through March of that year. It does not appear that the number of officer involved shootings in Indianapolis is increasing or decreasing over the approximately three years of data available.