Drought Data Curtain

Ceri Riley and Val Healy

For our data sculpture, we decided to make a beaded curtain to visualize data on percentage of cattle affected by drought in the past year. Our goal was to design a data-centric piece of home decor that will showcase this data, while also acting as an attractive decoration. Our audience is anyone looking to incorporate data-driven design into their home space.

To decrease the number of strands and beads we needed, we first modified the data by averaging the values of the total percentage of cattle affected by drought for each month in the graph. We rounded these values to zero decimal places. These values translate to the percentage of red beads placed on the bottom of the strand, as shown in the table below. One flawed outcome of this method is that the first and final strands represent less than a full month’s worth of data, as we made the curtain in the middle of this month, and our data set only includes data from the past year.

To design the curtain itself, we researched typical dimensions of pony beads and settled on using 300 beads per strand; thus, three beads would represent one percent, and we would need 3900 beads in total. The number of each type of bead on each strand is listed in the table below.

Lastly, we decided to use red beads to represent the drought data due to the color’s association with danger, fire, and generally bad things. We chose to use blue beads to represent the percentage of cattle not affected by drought due to the color’s association with water and generally good things.

 

(picture coming soon)

 

Date Percentage red beads blue beads
April 2014 44.3333333333 133 167
May 2014 45.75 137 163
June 2014 39.75 119 181
July 2014 35.8 107 193
August 2014 34.75 104 196
September 2014 30.4 91 209
October 2014 28.25 85 215
November 2014 27.75 83 217
December 2014 28.6 86 214
January 2015 25.75 77 223
February 2015 27 81 219
March 2015 30.2 91 209
April 2015 36 108 192
1303 2597
drought beads “empty” beads

Google Chrome Scraper Extension

What is the Google Chrome Scraper extension?

The Google Chrome Scraper extension is a browser extension for Google Chrome that allows users to quickly and easily scrape data from websites inside the browser.

How do you get started using it?

To get started, you can download the extension from the Chrome Web Store or the scraper’s official website. The site also includes tutorials, documentations, screenshots, test sites to help first-time users practice using the tool, and links to a help forum and bug tracker. If you have no experience scraping, I recommend reading through these resources to get a sense of how to use these programs, and watching the introductory video:

If you know the basics of scraping, the documentation is still useful for getting a sense of the capabilities and specificities of the tool. For example, the scraper supports many types of selectors and methods for selecting. The extension also supports pagination, or scraping through multiple pages of a website.

How easy or hard is it? What skills do you need to become proficient in it?

This scraper is similar to many other scrapers available today. If one has experience scraping, the tool is very intuitive. If one has little or no experience scraping, the documentation is fairly comprehensive without being overwhelming, and there are plenty of opportunities for support.

However, to become proficient in the tool, one must understand the process of scraping, as well as concepts such as selector trees, different types of selectors, and pagination. Learning these concepts, as well as how to use the tool, does take some time and effort. It took me, with some prior scraping experience, about an hour to read through all of the documentation and to learn how to use the tool.

Because it is a Chrome extension, no other downloads, software, or account is required, unlike some other tools, such as import.io. The tool, much to the delight of my Linux-loving heart, has no operating system requirements. However, the extension only works for Chrome, so you must have a current version of the browser installed.

One drawback to the tool is that data can only be downloaded in csv format, so if you are looking to use it in a database– MySQL, for example– you must import it manually or find some other workaround.

I also suspect, judging it based on its simplicity, that this tool is not as powerful as some other scraper tools. However, for what it is– a free browser extension– the tool is quite powerful.

Would you recommend this to a friend?

I would recommend this to anyone looking for an easy way to scrape data from websites into a csv form easily and without obligation to create an account or download specialty software. Because the tool is a Chrome extension, acquiring the tool is simple. The documentation makes it rather easy for novices to use the tool, especially if they have a baseline knowledge of scraping.

Will you consider using it for your final data story?

I will definitely consider using this tool for my final data story. This scraper tool is free, intuitive, powerful, and requires no account. The documentation and web support are well-rounded, giving me confidence that I will be able to work out any snags I encounter.

Data Hunt

Group: Val Healy, Tuyen Bui, Hayley Song

    For our data hunt, we chose to examine the 2013 Boston Employee Earnings dataset (https://data.cityofboston.gov/Finance/Employee-Earnings-Report-2013/54s2-yxpg). This dataset includes city workers’ names, title, department, earnings (broken down by type), and zip code.

One interesting finding is the seeming correlation between department and earnings. We (tentatively) found, by looking at the data, that Boston Police workers tend to be the highest paid city employees overall, with 44/50 of the highest paid workers being from that department. However, much of their earnings came from sources other than their regular pay, such as overtime, ‘other’, ‘detail’, and ‘quinn’.

We came up with three questions of the data, which are detailed below:

  1. How is the budget earnings allocation per department? Where is the money spent on people? Even though we noticed Boston Police workers seemed to be the “better paid”, when we look closer at the dataset, we can see that the Boston earnings budget is spent on Public Schools employees with over $600M VS $345M for the Boston Police Department. One way to understand it is that the Public Schools budget is high because it has to pay a higher number of employees (over 50,000 people).
  2. We were also curious about the relationship between the incomes and places of residency.  We conjectured that different income levels would contribute to where people choose to live; we would like to see the distribution of locations of residency grouped by the income levels.  The report provides us enough information to answer this question: total earnings and zip codes.  First we need to sort the data by income and group them into four income levels: low, low-middle, middle-high, high.  We need to have some context in order to set the breakpoints for these four categories. We realized that it would be helpful to have data on Massachusetts’s annual average or median income in 2013.  We were able to find the data by querying U.S. Census Bureau’s database. Using the data, we can establish the range for each category. Then, we can scatter-plot the distribution of each group on the map of Greater Boston Area.  The map can be easily found online, but we prefer to use python’s Basemap and Matlibplot libraries with the appropriate longitude and latitude to display the distribution.
  3. Lastly, we were interested in visualizing the breakdown of the Boston Police employees’ wages, as much of their earnings were comprised of earnings outside of their regular pay. What percentage of their pay is due to overtime or other sources? Does this percentage vary by position? How do they compare? To accomplish this, we would take the data from all police employees, add up the numbers in each category, and produce a pie chart of the results. If we wished to break the numbers down further, we could separate the data by position and create a set of pie charts. All of this data can be sourced in the original data sheet.

Food for Free: Finding the Story

Recently, our class worked to conceptualize our data mural for the organization Food For Free. I was absent for the last class, in which we carried out most of our visual designing of the mural, so I will focus instead on the processes carried out during the previous class.

In this class session, the executive director of Food For Free visited our class to discuss the organization, its purpose, its functions, and its future goals. This visit exposed us first-hand to Food For Free, which was indispensable to our conceptualization of the organization.

In the readings, Segel and Heer emphasize the importance of narrative to the process of data visualization. Narratives draw readers to a story and help readers to understand and internalize the message being conveyed. Though the authors focused on its impact on journalistic practices, their insights on narrative apply well to our situation.

In our process, we first educated ourselves on the mission and practice of Food For Free. Once we felt well-grounded on the topic, we looked at some data about the group provided to us, brainstorming in groups to identify potential stories held within the data. Once each group came up with a viable story, we presented the stories to the rest of the class and combined them into one, consolidated, group story.

Contrary to traditional storytelling, our mural will not have a beginning, middle, and end, nor will it include any verbal or much written content. Instead, it will employ visual cues and symbolism to convey our story about the organization.

Val’s Data Log (2/8/15)

  • woke up, notified sleep tracking app on my Android phone device
  • checked email, social media (replied to several emails and one Facebook message)
  • checked my living group’s meal plan signup Google spreadsheet
  • looked at Google Calendar for the upcoming week
  • checked CMS.619 syllabus (Google Doc) for clarification on assignment
  • clicked hyperlink to class blog (to read others’ blogs)
  • checked wunderground.com in anticipation of the snowstorm (data on website sourced from many measurement sites, my visit logged)
  • read several articles posted on Facebook.com by friends (clicks and likes logged by Facebook, views of articles logged by their respective websites)
  • logged 2 books in my goodreads.com “to-read” queue
  • showered (total water use measured by City of Cambridge)
  • brushed teeth several times during the day (water use)
  • used the bathroom several times throughout day (water use)
  • ate communal food at living group (all communal food purchased with house card, logged by financial services corporation (ie: Visa/AmEx/etc.), total food use measured weekly by “stewards” who purchase food)
  • heated food on gas stove (amount used measured by gas supplier)
  • used various electronic devices (electricity used is measured by electricity supplier)
  • logged TA work hours for this week on MIT’s Atlas website
  • listened to music on Bandcamp.com (plays logged by site)
  • listened to music on Grooveshark (data both logged by site and sent to my Last.fm account)
  • worked on several assignments on LibreOffice (data stored on my computer)
  • reblogged 2 Tumblr posts, added 14 to queue, liked 5 (data logged by Tumblr)
  • printed assignments and readings for lab class (documents downloaded to computer from course site and data sent to printer)
  • spent most of the day with my cell phone in my pocket (data usage and location tracked)
  • spent time with housemates, most of whom also have data and location-tracking phones
  • notified sleep tracking phone app that I was going to bed

Social Mapping the City

In this TED talk, Dave Troy presents some social maps of cities that he created by analyzing users’ Twitter data and locations. He analyzed the primary interests of each user, color coded it, and mapped it to their location, drawing lines between any connection between two users. What he found was that, in each city, primary interests of users tended to clump geographically; in a way, the primary interests of users created interest boroughs, of sorts.

Given that his map of Baltimore specifically designated the “Geek” area as also the “TEDx” area of the city, it seems that his intended audience for the TED talk is comprised of other data geeks and TED enthusiasts. In addition, I think that the maps could be useful for urban sociologists and those who study the connections between online social behavior and offline location, culture, and behavior.

Troy’s research and presentation aim to examine the social separation within cities, which he views as a social construct which we could choose not to do. While I feel that the data visualizations, taken without comment, provide useful information, I do not agree with his conclusion. Though he mentions gentrification in his talk, he does not seem to acknowledge that many of the people in the cities he examines (specifically, those being pushed out due to gentrification) literally cannot afford to move into other areas of the city because it is too expensive. His presentation is not effective in explaining for this.

I do, however, feel that the data visualizations themselves do effectively show the separations of and connections between different interest bubbles in the cities. It would be interesting if he could somehow incorporate income distribution into the visualizations, as I feel there may be significant correlation and may help show some of the economic underpinnings of these bubbles.