Projects

Since my course is applied in nature, I've been working on various projects for every course.

Location Optimization for Homeless Shelters
Effect of AirBnB on Property Market
Quantifying the Lumps of Livability in New York and New Jersey
Assess the green spaces in Dublin city using LIDAR data (Hackathon)
Study how Heating Oil Based Boilers and Greenhouse Gases affect the air quality of NYC
iListen - Customized Playlist for Every Mood (Hackathon)
Hedonic Price Model to Explain Rent Prices in New York City
Analyzing and Modeling Spatial Change – Space Time Geography
Detecting Change in Urban Extent and Density: Machine Learning of Land Use Classes Applied to Houston, Texas
Rent Burden Dashboard: A Data Visualization Project
Public Sentiment Measures towards Police from Social Media

Location Optimization for Homeless Shelters

The motivation behind this project was the Mayor's aim to come up with 90 new homeless shelters across the city and closing down the cluster homes and tertiary shleters. We made use of NYC open data portal and scarped some data from webpages that wasn't available in tabular format. We considered some of the most important variables for our model like proximity to transit system, distance from schools etc. We ran spatial analysis based on these variables and assigned score to each zip code in New York City depending on the weight of individual variables. The final result was a map that suggests which zip code would be most optimum for a new homeless shelter. For more information on the project, here's a link to our project report.

Effect of AirBnB on Property Market

Through this project, we wanted to understand how AirBnB could affect the property market in New York City. Since there's always a debate in communities if having AirBnB in their neighborhood or not, we considered some important factors in determinig the correlation between the housing prices and and AirBnB listing numbers. We used data from DOF Annualized Rolling Sales, Inside Airbnb and PLUTO to create our model. Since the effect of listing cannot be constrained to a known region, we did buffer analysis on the airbnb listings go get some meaningful insights. The paper that we submitted for our project can be accessed through this link.

Quantifying the Lumps of Livability in New York and New Jersey

The project dealt with quantifying the spatial variables that make up for a livable neighborhood. Through this project, we wanted to see how the variables of livability such as Education Level or Rent Burden are correlated with each other spatially. We analyzed the census tracts in New York and New Jersey, and ran Global Moran's I and Anselin's Local Moran's I to analyze the correlation at global level (in this case, state level) and also at local level(census tracts) for the two states. The analysis was carried out on ArcGIS software and the data was collected from AMerican Community Survey(5-Year). Our analysis with images can be found here.

Assess the green spaces in Dublin city using LIDAR data (Hackathon)

This is by far one of the most interesting projects I've worked on. As part of a CUSP Hackathon, we were asked to assess and analyse the chnages in green spaces across the city of Dublin. What was more exciting was the use of LIDAR data(courtesy Prof.Debra Laefer). Prof.Laefer has been working for quite some time on LIDAR data collection for Dublin city and her work was recently published by NatGeo. Owing to the size of the dataset, we could only assess some portions of Dublin city and we were able to extract features that could show the difference between two datasets collected from different time periods. To see the cool images from LIDAR data, please visit our Github Page.

Study how Heating Oil Based Boilers and Greenhouse Gases affect the air quality of NYC<

Every year as soon as the weather gets cold, we see a stream of black smoke arising from the buildings in New York and since many building still use age old oil fuels to heat their buildings, it would be a good idea to study the air quality in the winter months and see how these boiler emissions affect the environment. Coming up with a solution for this question is not easy as it depends on multiple factors such as the type of oil being used, what building type use these boilers (different laws apply for different building types), if the residents can afford the fuel prescribed by the government etc. For my analysis, I studied the factors that cause increase in Green House Gases (GHG) and how oil type contributes to it considering the socio-economic factors of the city. With some of the variables that deemed important based my research and previous studies, I chose 5 crucial variables that can help in prediction of GHG emissions in the city. Further, I considered some specific pollutants such as PM2.5, nitrogen dioxide and Sulphur dioxide that covers most part of pollution index of the city. My project report can be accessed through this link.

iListen - Customized Playlist for Every Mood (Hackathon)

iListen is a web application that aims at deciphering user emotion based on their song preference and potentially help people through counseling who needs it. The reason why we are doing this is because, the fast paced lifestyle today has precipitated numerous negativities and sensitive emotions, new technologies have gradually distanced people apart. By building a communication and counseling platform via music, we can re-create a smaller world. The app essentially stores user prefernce for a trainig set of songs from Spotify and gives a score to each 'like' or 'dislike'. Based on user feedback, the scores for each song is re-weighted and a customized playlist is presented to the user. A spotify player is embedded on our page so the user can listen to the playlist on our webpage itself. Go ahead and check out our application.

Hedonic Price Model to Explain Rent Prices in New York City

Buying a new house is always crucial as you need the best value for your money. For this, you consider a lot of varibales before finalizing the houe you would want to purchase. These varibales could be internal to the house such as number of bedrooms, square footage etc. as well as exteranl such as proximity to school and subway, safety of neighborhood etc. As one would assume, not all these variables are of equal inportance to a person. In this project, I combined a few such variables to create a hedonic price model for New York City. In general, when we run regression on our models, we fail to account for a very important feature, i.e. geography. To overcome this challenge, I used Geographically Weighted Regression(GWR) to treat sptail component as a variable and Spatially Adjusted Regression that gives different weights to each varibale based on their geography. My model explains 80% of the variance between the predicted and the expected values. For detailed analysis, you can access the report here.

Analyzing and Modeling Spatial Change – Space Time Geography

The Space-Time geography project was particularly interesting as until now I've had exposure to only the space constrained movement data with no temporal effect. With this project, I was able to apply the third dimension of time to the movement data and was able to create a 3-D visualization of how a person moves in space and time. Some intersting use cases that I found were in the domain of traffic management, as during rush hours some of the streets are heavily crowded while others are comparatively vacant. With Space-Time data for a city, we can visualize and predict alternative routes during such peak hours. Project report can be found here.

Classifying Land Use Pattern using Satellite Imagery

The project was inspired by the Stanford's research on predicting poverty in Africa using night-time satellite imagery. In this project, we study the land use pattern of different cities and analyse the change in land use over the years. Since we limited time, we focused on Houston,TX and studied the change in land use pattern over the years from 1993 to 2015 by using the satellite images from the years 1993, 2003,2007, 2011 and 2015. We trained a random forest model to classify high density urban, low density urban, non-urban and water areas. The model was able to predict with good accuracy the high density and low density urban areas. For detailed information on our analysis, here's a link to our paper.

Rent Burden Dashboard: A Data Visualization Project

It's no news that everyone living in one of the metro cities across the world is paying hefty amounts of money to rent a housing unit. But in recent years, the situation has gone from bad to worse, since by 2014 nearly half of the US population was living under rent burden. We wanted to quantify and visualize the trends across the major cities of the US and therefore, we created a dashboard using R-Shiny, that displays time series graphs for rent price, change in median rent price and a time lapse for 10 cities across US over the period of 8 years, 2009 to 2016.

Capstone: Public Sentiment Measures towards Police from Social Media

Community support plays a crucial role in the effectiveness of prosecutorial justice sector and law enforcement. However, measuring public sentiments towards relevant policy implementations is difficult. Conventional methods are labor-intensive, time-lagged, and not scalable. Alternatively, we investigate the viability and strategies of leveraging social network services to assess public sentiments in a massive, timely, and effective manner. Focusing on Twitter, we first evaluate strategies to harvest relevant tweets and mitigate contamination. This includes keywords selection and streaming pipeline modification. Then, using the collected tweets regarding prosecutorial justice departments at both local and national levels, we assess whether a sentiment baseline exists based on the positive, neutral, or negative sentiments perceived by existing Natural Language Processing models. Finally, we investigate geographical and longitudinal variations in sentiments. The results show that measuring such sentiments is challenging. First, Twitter’s free Streaming API is not ideal qualitatively nor quantitatively for collecting the desired data. Second, the data is highly contaminated due to the inevitably ambiguous keywords. Finally, most tweets appear either neutral or slightly negative in the two models respectively. However, it is possible to observe and compare longitudinal trends and changes instead of absolute values. Recommendations for similar research pipeline are then concluded.
Here's a link to the project webpage.

Design: HTML5 UP