Analyzing Text Data

After a great introduction to text analysis and PMF from Allen Downey from Olin College (author of Think Stats), students had a chance to play with quantitative text analysis.  Grabbing lyrics from a website, and analyzing it with our WordCounter tool, the students looked at the words and phrases used most often by various artists.

Here’s some notes on running this text analysis activity.

Here are some pictures of what they sketched out in the 20 minutes I gave them:

which artists talk about “you”, “I” and “me”
who Johnny Cash talk about
which parts of the body Nicki Minaj and Eminem talk about
the repeating chorus refrains of the Indigo Girls
how much different artists talk about love
what Coldplay sings about
narcissism among big-time artists


Our Questions about Food Security

We’ll be forming teams for final projects.  Each project will explore some topics related to food security. To help start forming teams, we brainstormed topics and questions you are interested in exploring.  Read this list and decide which topic is most interesting to you:

Local Food / Sustainability

  • role of small farming in current global food economy
  • farmers market prices vs. other supermarkets (how can we make local products more accessible, esp to low-income households)
  • local food & community-building
  • effectiveness / scalability of CSA/local food (also freegans)
  • how can we address local food suppliers? what problems are they facing?
  • how far do people around the world have to travel to get food?
  • comparative look at success of community garden types (public, private, school, government-run, non-profit, etc)

Environment / Climate Change

  • how has climate change affected food security?
  • climate change & crop production in the midwest USA
  • relation between climate change & food security & what we can do to affect either / both
  • impact of projected climate change on food suppliers
  • amount of food aid needed for a community fluctuations as seasons change
  • environmenal / toxilogical food security impact from long term contamination of the food staples in populations


  • nutrition distribution (ex % fat, % protein) around the world
  • how and what are we gonna eat in 20 years? food vs. future
  • what is the total monetary and nutritional value at the cut-off value for food insecurity, and how do these values vary by state or by country
  • what are the best practices for feeding children in a safe and healthy way?
  • how can we improve nutrition in local public schools?

Economics / Indicators

  • compare / analyze food insecurity level and other health stats (obesity, heart disease, mental disorder-depression, addiction, etc)
  • food security as an indicator of other measurements / inequalitites of wellbeing (ie. economics, education)
  • relationship between economic condition and what is considered a average / balanced meal (beliefs about food / nutrition)
  • price fluctuations and impact on poor
  • relationship between economic condition and nutrition availability – what can you eat on a certain budget (by country)
  • how does food insecurity impact education / academic performance among children in the US?
  • relationship between school food programs and school performance by economic condition
  • quantifying social & economic outcomes of food security?

Outreach / Education

  • food security education around Boston area – in school (for kids), for parents
  • how can we better education people about food security?


  • investigating gov polies that exacerbate (or help address) food insecurity, what are the drivers / forces at work? who benefits from such policies?
  • comparisons of different strategies of tackling food security issue
  • how governments been keeping their promises wrt food security goals?
  • how to identify policy gaps and other factors outside of food supply levels that correlate with food insecurity?
  • do efforts directed at food security end up being counter-productive? (the source of the problem lies elsewhere?)


  • how can you confirm that food is properly utilized (ex. eaten)?
  • how can we improve food usage and limit waste?
  • how can we use data to characterize and understand the causes and scope of food waste around the world?
  • how do the locations with the highest food waste compare to those with the highest food insecurity? Is there a correlation between food waste and food security?

Food System

  • genetically engineered crops (starting w/green revolution) and food security
  • effect of GMOs on food supply / safety
  • pesticides / GMO and food security
  • animal agriculture / factory farming
  • how does availability of natural resources (water, land, etc) affect food security of a community?
  • how is agriculture being impacted (urbanization, desertification) and how does it affect food security?

Other Stuff

  • relating yelp (micro) and dining data to macro scale / views such as food security indicators published by UN
  • how to solve food deserts
  • food stamps and how that affects # of visitors to soup kitchens, etc per week (I think they’re given out @ the beginning of the month)
  • prison food security
  • how food insecurity changes the decisions you make every day
  • how do you ensure accuracy of data? (regarding nutrition, diet, etc)

What are Ethical Uses of Data?

Ethical questions are critical to effective and responsible use of data.  Since they are often overlooked, I’ll be making special effort to weave conversations about ethics into each module of this course. There are no standards in the industry around ethics right now, thought there are many efforts underway.

In our review of Joel Gurin’s paper Open Governments, Open Data: A New Lever for Transparency, Citizen Engagement, and Economic Growth, students reflected on ethical questions related to three proposed scenarios.  Below is a short summary of their first set of responses to these scenarios.

Scenario 1: Big Data

a company is logging purchases made by each customer and using the transaction data to make personalized marketing efforts

The key questions discussed were about:

  • ownership – people could reasonably assume they own this information, not the companies
  • transparency – people often aren’t aware this data is being collected about them
  • secondary uses – this data is often sold to third parties to do analysis
  • unintended impacts – citing the famous Target “you’re pregnant” story
  • reinforcing existing filter bubbles – personalized marketing might reinforce purchase decisions that you don’t want to make anymore

Scenario 2: Open Data

a data analytics firm is analyzing social media sentiments towards a politician to gauge their electability

Here students were concerned about:

  • representativity – social media is seldom a reflection of society at large
  • trustworthiness – people often make this up
  • ownership / permission – people posting to social media often aren’t giving explicit permission to these uses

Scenario 3: Local Data

a city government is using a 311 phone service to monitor and resolve constituent concerns

The students had these questions about this situation:

  • trustworthiness – constituents could make fake reports to get people in trouble
  • anonymity – one students shared a story of poorly anonymized data
  • accuracy – many of the calls might be hard to categorize in their system, and their code-book might be inconsistently applied

Painting a Food for Free Data Mural

We finished out Data Mural process by painting the mural we designed together!


After finding the data-driven story we wanted to tell and then collaboratively sketching out the mural, it is great to see it finished!

Special guest Emily Bhargava helped turn the sketch I created into a design on a large tarp.  Then we all worked together – some finalizing the data to include while others painted:


Here is the final picture:

food-for-free mural


Designing a Food for Free Data Mural

To kick off our class, we’ve been creating a data mural for Food for Free, a fantastic local non-profit with a large food-rescue program.  Once we turned their data into a story, we spent a class turning that story into a visual design to paint as a mural!  This was a fairly standard data dural process, though very short on time, and I tried some new things to connect the visual back to the data it started from!

The Story

We started off by reminding ourselves of the story we found in the data:

FullSizeRender 2

Seeding the Visual Design

First we did some word-webs to try and make some of the more abstract words concrete.  This activity helps by giving us a visual vocabulary we can pull from while designing a image-based narrative.  For this story, I choose to pull out the words impact, partner, waste, and security – we made a word web for each. Here’s the one the students made for impact:


Then we did our pass-around drawing exercise, to try and turn the data-driven story intro a visual narrative.  The students made a ton of great drawings:


I introduced a few new wrinkles to this activity this time around.  On the second-to-last pass, I asked students to look back at the word-webs and see if each of those key concepts was incorporated in the drawing in front of them.  If one wasn’t, I asked them to try and add it.  In addition, on the last pass, I asked students to look at the design and back a the initial data handout.  If there was a piece of the narrative that could be linked-to and supported-by some of the data easily I asked them to add it.  This change brought us back full-circle to the data we started with, and helped us keep the visual narrative connected well to the qualitative and quantitative data it came from.

Synthesizing a Mural Design

Looking at these all together, we saw some commonalities that we really liked:

  • Food for Free was represented by a truck a lot
  • the recipients of their services were almost always drawn as people, while the donors were drawn as buildings
  • there were a lot of roads being used a scaffolding to connect visual elements
  • there were a number of drawings that used plants to symbolize growing

After discussing these, and other observations, we decided to go with a tree as the central visual metaphor.  The roots would be the donors; the trunk would bring food to the leaves where recipients could pick off fruit and eat it.  Food for Free trucks would be like little ants, moving up and down the roots and tree.  Here’s the super-rough sketch I put together during the discussion in class:


Next Steps

Over the next few days, my collaborator Emily will turn this into a polished design and we’ll prep a canvas for it.  In the next class, we’ll paint it!  We only have an hour and a half, so the design won’t be too complicated.

A Data Story about Food for Free

To kick off the semester we’ve welcome Sasha Purpura, the Executive Director of Food for Free, to share some of their food rescue data and information about food rescue and food insecurity in the US.  Food for Free does food rescue, and other programs, in the area of Cambridge, MA.  Students pored over a Food For Free Data Handout we created, looking for stories they might want to tell.


This is a bit of a departure for the data mural efforts, as all past ones have involved the community group themselves, and the people they serve.  This one, however, is being designed by the students in the class, not Food for Free staff and program recipients.  Work acknowledging, but not a barrier to the process.  This mural is primarily an exercise for the students in the low-tech story-finding and visual-design; the secondary goal is to deliver something of use to Food for Free.

Each of the four teams found a story they wanted to tell:





As you can see, they ranged from very focused, to more broad.  Two of them focused on Cambridge, while others looked at impact.

In abbreviated version of our story-selection process, we defined a criteria for selecting a story and then decided to go with a merged story that I proposed:

The data show that Food for Free is growing our work with local partners to have an even greater impact on the issue of food security in Cambridge.  We want to tell this story because there is still food waste in the area and we want to bring on more partners to help us fulfill our mission.


What Do We Want to Learn?

To help me focus the various modules, I asked everyone to write up some sticky notes indicating what they wanted to learn from this course.  Certainly there is an aspect of “you don’t know what you don’t know”, but the exercise is still valuable for me.  These goals line up very nicely with the syllabus I have planned!

what do you want to learn

Here is the typed-up list:

  • working with unstructured data
  • image design
  • data -> story
  • data visualization tools
  • what defines effective data presentation and how to achieve it
  • telling a compelling story
  • data cleaning tools
  • use-cases and lessons learning
  • implementation in community (experience or) how to
  • technical and design visualization skills
  • what is valuable data?
  • how to better use the data as a narrative
  • best ways to visualize data in different contexts
  • where to go for data
  • develop me creative thinking skills
  • what is an effective visualization?
  • the process of going from an idea to implementation and how you know if you are successful or if you need to revise
  • what makes effective visualization (tools / concepts)
  • when to use what type of presentation
  • statistics!  what does the data mean
  • data storytelling, digital tools ( for cleaning, collecting, presenting)
  • filtering data
  • different ways to present (apart from video or graphs)
  • techniques on storytelling
  • ethical data visualization
  • turning quantitative -> qualitative (w/out losing meaning)
  • making websites beautiful
  • overview of all methods to do data projects
  • data processing tools
  • how to see patterns and interesting (?) in numbers
  • techniques for effectively representing data for social justice
  • new tools and strategies
  • narration building / story boarding
  • (internet-based) data mining skills!
  • meaningful info from data
  • what makes people change minds
  • have a chance to apply good design principles
  • how to make data beautiful!
  • how to make data viz visually appealing / design language
  • related fields / areas of work and theory
  • how can you make complex ideas accessible without losing clarity
  • background overview
  • databases / data cleaning
  • ways to engage different communities with data
  • effective ways to visualize data (design focus)
  • types of visualization (ex. beyond basic charts / plots)
  • how to most effectively use context
  • hands on experience with data and compare different visualizations
  • how to leave a lasting impression

Learning from Each Other

One of our first activities was to assess what skill sets people had in the room.  I am particularly interested in opportunities for us to all learn from each other, so we spent some time having folks indicate whether they were a novice or an expert on the following topics:

  • data munging
  • graphic design
  • statistical analysis
  • visualization
  • writing

Of course this type of work needs more skills than that, but these are some keys ones.  It turned out that we have a good distribution of skills levels on all these topics!

Here are the big papers we used, so you can see for yourself!

data design stats viz writing

What do “Data”, “Storytelling” and “Studio” mean?

One of our first exercises focused on creating a shared definition of the title of this course.  The words “data”, “storytelling” and “studio” are all kind of nebulous!  In order to figure out what we all think they mean, we created sticky notes about what each of those words meant to us.  After clustering them, we ended up with a better shared sense of the course’s title.


Data: presentation, quantitative, processing, objectivity, materials, science, context


Storytelling: audience, empathy, relate-ability, facts/connections, purpose


Studio: creativity, groups work on projects, open space

I was pleased to see that these matched my own expectations and plans fairly well!


This is the shared class blog for the CMS.631 / CMS.831 Data Storytelling Studio course at MIT (Spring 2015).  Many of your homework assignments will require you to submit blog posts here.  Feel free to cross-post them to your own blog.  Much of the conversation about data, finding stories, creating presentations, and creating change happens online; you need to add your voice to that conversation if you plan to do this type of work.