Data Hunt

Group: Val Healy, Tuyen Bui, Hayley Song

    For our data hunt, we chose to examine the 2013 Boston Employee Earnings dataset ( This dataset includes city workers’ names, title, department, earnings (broken down by type), and zip code.

One interesting finding is the seeming correlation between department and earnings. We (tentatively) found, by looking at the data, that Boston Police workers tend to be the highest paid city employees overall, with 44/50 of the highest paid workers being from that department. However, much of their earnings came from sources other than their regular pay, such as overtime, ‘other’, ‘detail’, and ‘quinn’.

We came up with three questions of the data, which are detailed below:

  1. How is the budget earnings allocation per department? Where is the money spent on people? Even though we noticed Boston Police workers seemed to be the “better paid”, when we look closer at the dataset, we can see that the Boston earnings budget is spent on Public Schools employees with over $600M VS $345M for the Boston Police Department. One way to understand it is that the Public Schools budget is high because it has to pay a higher number of employees (over 50,000 people).
  2. We were also curious about the relationship between the incomes and places of residency.  We conjectured that different income levels would contribute to where people choose to live; we would like to see the distribution of locations of residency grouped by the income levels.  The report provides us enough information to answer this question: total earnings and zip codes.  First we need to sort the data by income and group them into four income levels: low, low-middle, middle-high, high.  We need to have some context in order to set the breakpoints for these four categories. We realized that it would be helpful to have data on Massachusetts’s annual average or median income in 2013.  We were able to find the data by querying U.S. Census Bureau’s database. Using the data, we can establish the range for each category. Then, we can scatter-plot the distribution of each group on the map of Greater Boston Area.  The map can be easily found online, but we prefer to use python’s Basemap and Matlibplot libraries with the appropriate longitude and latitude to display the distribution.
  3. Lastly, we were interested in visualizing the breakdown of the Boston Police employees’ wages, as much of their earnings were comprised of earnings outside of their regular pay. What percentage of their pay is due to overtime or other sources? Does this percentage vary by position? How do they compare? To accomplish this, we would take the data from all police employees, add up the numbers in each category, and produce a pie chart of the results. If we wished to break the numbers down further, we could separate the data by position and create a set of pie charts. All of this data can be sourced in the original data sheet.

Leave a Reply

Your email address will not be published. Required fields are marked *