The
data link is here
import pandas as pd
csv_path = 'https://raw.githubusercontent.com/nygeog/data/master/nyc_crashes/data/NYPD_Motor_Vehicle_Collisions.csv'
inCSV = 'data/NYPD_Motor_Vehicle_Collisions.csv'
ouCSV = 'data/nypd_mv_collisions.csv'
df = pd.read_csv(inCSV).rename(columns=lambda x: x.lower())
#drop ones w/out valid lat #super lazy, just grabbing lat's above 35
df = df[(df.latitude > 35)]
#print df.dtypes
print len(df.index)
df.head(5)
The original CSV data (first 5 records)
| date | time | borough | zip code | latitude | longitude | location | on street name | cross street name | off street name | ... | contributing factor vehicle 2 | contributing factor vehicle 3 | contributing factor vehicle 4 | contributing factor vehicle 5 | unique key | vehicle type code 1 | vehicle type code 2 | vehicle type code 3 | vehicle type code 4 | vehicle type code 5 |
1 | 02/13/2015 | 21:45 | MANHATTAN | 10002 | 40.715622 | -73.994275 | (40.7156221, -73.9942752) | FORSYTH STREET | CANAL STREET | NaN | ... | NaN | NaN | NaN | NaN | 3168577 | PASSENGER VEHICLE | UNKNOWN | NaN | NaN | NaN |
2 | 02/13/2015 | 21:45 | MANHATTAN | 10001 | 40.747535 | -73.988307 | (40.7475349, -73.9883068) | WEST 31 STREET | BROADWAY | NaN | ... | Fatigued/Drowsy | NaN | NaN | NaN | 3169163 | TAXI | TAXI | NaN | NaN | NaN |
3 | 02/13/2015 | 21:45 | BRONX | 10462 | 40.833558 | -73.857732 | (40.8335582, -73.8577325) | WESTCHESTER AVENUE | PUGSLEY AVENUE | NaN | ... | Unspecified | NaN | NaN | NaN | 3169251 | PASSENGER VEHICLE | SPORT UTILITY / STATION WAGON | NaN | NaN | NaN |
4 | 02/13/2015 | 21:44 | MANHATTAN | 10017 | 40.748800 | -73.969846 | (40.7487997, -73.969846) | EAST 42 STREET | 1 AVENUE | NaN | ... | Other Vehicular | NaN | NaN | NaN | 3169176 | PASSENGER VEHICLE | PASSENGER VEHICLE | NaN | NaN | NaN |
5 | 02/13/2015 | 21:40 | STATEN ISLAND | 10304 | 40.617295 | -74.080479 | (40.6172954, -74.0804791) | PARK HILL AVENUE | OSGOOD AVENUE | NaN | ... | Unspecified | NaN | NaN | NaN | 3169614 | PASSENGER VEHICLE | PASSENGER VEHICLE | NaN | NaN | NaN |
5 rows × 29 columns
Mergin' the date (day) and time (hours)
#create datetime http://stackoverflow.com/questions/17978092/combine-date-and-time-columns-using-python-pandas
df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])
df = df[['datetime','latitude','longitude']]
df = df[(df.datetime > '2014-01-01 00:00:01')] #query out only data from 2014 onward
df = df.sort('datetime')
df.to_csv(ouCSV,index=False)
print len(df.index)
192324 #count of records 2014+
df.head(5)
Clean and small data to import to CartoDB
| datetime | latitude | longitude |
228854 | 2014-01-01 00:01:00 | 40.725432 | -73.996771 |
228850 | 2014-01-01 00:01:00 | 40.767889 | -73.981512 |
228855 | 2014-01-01 00:01:00 | 40.750844 | -73.978608 |
229144 | 2014-01-01 00:15:00 | 40.588646 | -73.992452 |
228849 | 2014-01-01 00:20:00 | 40.689019 | -73.986157 |
Maps in CartoDB
PostGIS SQL Statement to grab 2014 from the 2014-2015/02/13 Table
SELECT * FROM nypd_mv_collisions_2014_20150213 WHERE datetime BETWEEN '2014-01-01 00:00:00' and '2015-01-01 00:00:00'
Density Hex Bins of 2014 Collisions
I'm not totally sure if when creating the classes if it ignores hex bins with no collisions or what. Need to look into that.
Density of 2014 Collisions over time (using Torque)
Animation of Density of Collisions. I think these are Kernel Densities but not sure, need to check CartoDB documentation.
Valentine's Day 2014 Collisions (Clickbait)
looks nice. would have been nicer if live ..
ReplyDelete