Projects
- Location Optimization for Homeless Shelters
- Effect of AirBnB on Property Market
- Quantifying the Lumps of Livability in New York and New Jersey
- Assess the green spaces in Dublin city using LIDAR data (Hackathon)
- Study how Heating Oil Based Boilers and Greenhouse Gases affect the air quality of NYC
- iListen - Customized Playlist for Every Mood (Hackathon)
- Hedonic Price Model to Explain Rent Prices in New York City
- Analyzing and Modeling Spatial Change – Space Time Geography
- Detecting Change in Urban Extent and Density: Machine Learning of Land Use Classes Applied to Houston, Texas
- Rent Burden Dashboard: A Data Visualization Project
- Public Sentiment Measures towards Police from Social Media
Location Optimization for Homeless Shelters
The motivation behind this project was the Mayor's aim to come up with 90 new homeless shelters across the city
and closing down the cluster homes and tertiary shleters. We made use of NYC open data portal and scarped some
data from webpages that wasn't available in tabular format. We considered some of the most important variables for our
model like proximity to transit system, distance from schools etc. We ran spatial analysis based on these variables and
assigned score to each zip code in New York City depending on the weight of individual variables. The final result was
a map that suggests which zip code would be most optimum for a new homeless shelter.
For more information on the project, here's a link to our project report.
Back to top
Effect of AirBnB on Property Market
Through this project, we wanted to understand how AirBnB could affect the property market in New York City. Since there's
always a debate in communities if having AirBnB in their neighborhood or not, we considered some important factors in determinig
the correlation between the housing prices and and AirBnB listing numbers. We used data from DOF Annualized Rolling Sales, Inside Airbnb and
PLUTO to create our model. Since the effect of listing cannot be constrained to a known region, we did buffer analysis
on the airbnb listings go get some meaningful insights. The paper that we submitted for our project can be accessed through this link.
Back to top
Quantifying the Lumps of Livability in New York and New Jersey
The project dealt with quantifying the spatial variables that make up for a livable neighborhood. Through this project,
we wanted to see how the variables of livability such as Education Level or Rent Burden are correlated with each other spatially. We analyzed
the census tracts in New York and New Jersey, and ran Global Moran's I and Anselin's Local Moran's I to analyze the correlation at global level (in this case, state level)
and also at local level(census tracts) for the two states. The analysis was carried out on ArcGIS software and the data was collected from AMerican Community Survey(5-Year).
Our analysis with images can be found here.
Back to top
Assess the green spaces in Dublin city using LIDAR data (Hackathon)
This is by far one of the most interesting projects I've worked on. As part of a CUSP Hackathon, we were asked to assess and analyse
the chnages in green spaces across the city of Dublin. What was more exciting was the use of LIDAR data(courtesy Prof.Debra Laefer). Prof.Laefer has been working
for quite some time on LIDAR data collection for Dublin city and her work was recently published by NatGeo.
Owing to the size of the dataset, we could only assess some portions of Dublin city and we were able to extract features that could show the difference
between two datasets collected from different time periods. To see the cool images from LIDAR data, please visit our Github Page.
Back to top
Study how Heating Oil Based Boilers and Greenhouse Gases affect the air quality of NYC<
Every year as soon as the weather gets cold, we see a stream of black smoke arising from the buildings in New York
and since many building still use age old oil fuels to heat their buildings, it would be a good idea to study the
air quality in the winter months and see how these boiler emissions affect the environment. Coming up with a solution
for this question is not easy as it depends on multiple factors such as the type of oil being used, what building type
use these boilers (different laws apply for different building types), if the residents can afford the fuel prescribed by
the government etc. For my analysis, I studied the factors that cause increase in Green House Gases (GHG) and
how oil type contributes to it considering the socio-economic factors of the city. With some of the variables that deemed important
based my research and previous studies, I chose 5 crucial variables that can help in prediction of GHG emissions in the city. Further,
I considered some specific pollutants such as PM2.5, nitrogen dioxide and Sulphur dioxide that covers most part of pollution index of the city.
My project report can be accessed through this link.
Back to top
iListen - Customized Playlist for Every Mood (Hackathon)
iListen is a web application that aims at deciphering user emotion based on their song preference and potentially help people
through counseling who needs it. The reason why we are doing this is because, the fast paced lifestyle today has precipitated numerous negativities and sensitive emotions,
new technologies have gradually distanced people apart. By building a communication and counseling platform via music, we can re-create a smaller world. The app essentially
stores user prefernce for a trainig set of songs from Spotify and gives a score to each 'like' or 'dislike'. Based on user feedback, the scores for each song is re-weighted and
a customized playlist is presented to the user. A spotify player is embedded on our page so the user can listen to the playlist on our webpage itself. Go ahead and check out
our application.
Back to top
Hedonic Price Model to Explain Rent Prices in New York City
Buying a new house is always crucial as you need the best value for your money. For this, you consider a lot of varibales
before finalizing the houe you would want to purchase. These varibales could be internal to the house such as number of bedrooms, square footage etc.
as well as exteranl such as proximity to school and subway, safety of neighborhood etc. As one would assume, not all these variables are of
equal inportance to a person. In this project, I combined a few such variables to create a hedonic price model for New York City. In general, when we run
regression on our models, we fail to account for a very important feature, i.e. geography. To overcome this challenge, I used Geographically Weighted Regression(GWR)
to treat sptail component as a variable and Spatially Adjusted Regression that gives different weights to each varibale based on their geography. My model
explains 80% of the variance between the predicted and the expected values. For detailed analysis, you can access the report here.
Back to top
Analyzing and Modeling Spatial Change – Space Time Geography
The Space-Time geography project was particularly interesting as until now I've had exposure to only the space constrained
movement data with no temporal effect. With this project, I was able to apply the third dimension of time to the movement data
and was able to create a 3-D visualization of how a person moves in space and time. Some intersting use cases that I found were in the
domain of traffic management, as during rush hours some of the streets are heavily crowded while others are comparatively vacant. With Space-Time
data for a city, we can visualize and predict alternative routes during such peak hours. Project report can be found here.
Back to top
Classifying Land Use Pattern using Satellite Imagery
The project was inspired by the Stanford's research on predicting poverty in Africa using night-time satellite imagery.
In this project, we study the land use pattern of different cities and analyse the change in land use over the years. Since we
limited time, we focused on Houston,TX and studied the change in land use pattern over the years from 1993 to 2015 by using
the satellite images from the years 1993, 2003,2007, 2011 and 2015. We trained a random forest model to classify high density urban,
low density urban, non-urban and water areas. The model was able to predict with good accuracy the high density and low density urban areas.
For detailed information on our analysis, here's a link to our paper.
Back to top
Rent Burden Dashboard: A Data Visualization Project
It's no news that everyone living in one of the metro cities across the world is paying hefty amounts of money
to rent a housing unit. But in recent years, the situation has gone from bad to worse, since by 2014 nearly half of the US population
was living under rent burden. We wanted to quantify and visualize the trends across the major cities of the US and therefore,
we created a dashboard using R-Shiny, that displays time series graphs for rent price, change in median rent price and a time lapse
for 10 cities across US over the period of 8 years, 2009 to 2016.
Back to top
Capstone: Public Sentiment Measures towards Police from Social Media
Community support plays a crucial role in the effectiveness of prosecutorial justice sector and law enforcement. However, measuring public sentiments towards relevant policy implementations is difficult. Conventional methods are labor-intensive, time-lagged, and not scalable. Alternatively, we investigate the viability and strategies of leveraging social network services to assess public sentiments in a massive, timely, and effective manner. Focusing on Twitter, we first evaluate strategies to harvest relevant tweets and mitigate contamination. This includes keywords selection and streaming pipeline modification. Then, using the collected tweets regarding prosecutorial justice departments at both local and national levels, we assess whether a sentiment baseline exists based on the positive, neutral, or negative sentiments perceived by existing Natural Language Processing models. Finally, we investigate geographical and longitudinal variations in sentiments. The results show that measuring such sentiments is challenging. First, Twitter’s free Streaming API is not ideal qualitatively nor quantitatively for collecting the desired data. Second, the data is highly contaminated due to the inevitably ambiguous keywords. Finally, most tweets appear either neutral or slightly negative in the two models respectively. However, it is possible to observe and compare longitudinal trends and changes instead of absolute values. Recommendations for similar research pipeline are then concluded.
Here's a link to the project webpage. Back to top
- © GauravBhardwaj
- Design: HTML5 UP