Friday, September 5, 2008

In progress

I'm still working on scraping Craigslist (and hoping I don't crash any computers), but here's my ideas for organizing it:

I'd like to focus on the content of the posts, with respect to the characteristics of the poster, location and date. For now I think I'm going to focus on the personals section, since it contains the most personal characteristics. So here's the potential organizational systems, according to LATCH:
  • Location: Geographic location by region or by city.

  • Alphabetical: name of poster (as determined by the word following "my name is "; this would probably be pretty rare anyway though), name of city

  • Temporal: Date, Day of the Week

  • Categorical: Men-seeking-women, strictly-platonic, missed connections, etc. , gender of the poster.

  • Hierarchical: poster's age, poster's weight/height, length of entry, number of reposts (this could be tricky to count though), number of times a particular word/phrase is used
And then I had a couple ideas for organizing the temporality of the information. Ideally I'd like the user to have some control over this order.

Word Use > Poster's Characteristics > Category > Date > Location
Date > Category > Poster's Characteristics > Word Use > Location

Since I'm working in an interactive medium, I'd like to make grouping a user-defined aspect. For example, the user might compare entries that contain the phrase "fun loving" to those that don't.

Hurricanes part 2

So with the hurricane information I've been looking at the number of deaths, location in America, and category of storm (which is based on wind speed).

L: The location of where the hurricane hits, where the deaths occured
A: Alphabetically by name of Hurricane, location names
T: Date, time the hurricane hit
C: Obviously category storm based on the Saffir-Simpson Hurricane Scale
H: hierarchy of the category storm, number of deaths, which location was hit first

Temporal Map:
  • category storm (wind speed)
  • location of storm
  • length the storm is over this location
  • number of deaths in the place (if I can find separate information from place to place)
There is also the amount of damage done and media coverage of the area. I have been looking at information from the past 35 years amount of time the Saffir-Simpson Hurricane Scale has been used.

LATCH: college communities project

I've narrowed down my topic to examining the top schools on two of Princeton Reviews' lists:

Gay Community Accepted, and its opposite, Alternative Lifestyles not an Alternative.

So far there are some interesting patterns, but also some places where I expected to see patterns and didn't. The pieces of info I've chosen for now are some that show interesting patterns/lack of patterns.

Order of information:

  • Geographical location- showing where each school is, the school's name, and which list it's on.
    • Possily an alphabetical list of schools shown before geographical locations?
  • Religious Affiliation (yes or no, if yes, what is it)
  • Campus Environment (rural, town, city, metropolis)
  • % of students who are female
  • Quality of life rating (number, scale from 0 to 100)


I know it's more than three types of information, but it was hard enough not to put more types in there! Ideally, I want people to be able to turn on and off different pieces of information to see patterns.

I'm not sure how many schools to do for each. I'm thinking the top five from each list, which would mean ten total. I don't want to overwhelm users with info, but if I only do the top 3 or so, I may be missing some interesting patterns.

Stacie, please let me know if this wasn't the sort of thing you meant for this assignment, I wasn't quite sure... I have lots of ideas about how specifically this info should be visualized, but I didn't know if we should talk about that just yet.

Tuesday, September 2, 2008

My proposal: Digital self

Everybody is using Internet, but nobody really cares about what are we actually doing on the Internet. The internet covers everything we need in our daily life. We search for food, map, housing, jobs, friends, and even potential significant others. During the time we build a digital self on the Internet that we barely know about - the browser may knows about us better than ourselves because it stores our daily browsing histories. I found it particularly interesting to observe everybody's daily browsing history changing over time, that adds up a image of a digital being - your digital self on the Internet, mirroring your inner identity and value.

I want to build a dynamic data information visualization piece, through which you could compare your real self and your digital self, in order to know better about yourself.

Current Problem:
No visualizations for personally browsing history

Audience:
Every Internet Users

Context:
Everyday.

Fast Food Restaurants

Hello,

When I first got this assignment, I wasn't exactly sure what direction I wanted to head towards to, but I figured I'll do something food related. I want to actually focus on fast food restaurants and how it affects our daily lives. As you all know, fast food is basically something that you can get quicker and cheaper than your average restaurant. Power house names, such as McDonalds, Burger King, Wendy’s, Taco Bell, Dominos, expanded within different countries and sometimes make their own twist on what they serve based on their culture. That’s one possible information I would like to look into, because I find it interesting when a Burger King from Asia serves different kinds of food than in the United States.

Another thing that I have in mind is the expansion rate of the fast food restaurants. In the past, I’ve been told that a McDonalds is literally on every block of NYC, which shows that the fast food franchise is expanding greatly. So that is something I would focus on as well, which is the number of fast food restaurants being established yearly. Also, I would also look into how it affects the customers that consume the product. An example would be the movie Super Size Me, which shows how fast food can make someone from being healthy into a total mess. Because of this, I would want to compare how the general population health and weight changed as the fast food franchise slowly expanded drastically. Other content I have in mind are:

- calories within the food being served in different fast food franchises
- how much is consumed by the general public
- age group it takes in affect the most

Overall, I believe my main focus is how fast food takes affect in the general public. I believe this is important because people are concerned about what they are eating. I actually viewed a couple of nutrient facts for a few of the franchises online and I feel like there should be something more that could make it visually clear.

Project Definition

Craigslist, the internet classified community founded by Craig Newmark, is well known as an internet anomaly. Despite being an ad-free, and almost entirely free use service run by only 25 employees, it provides a medium for local listings that rival major internet corporations for job postings, real estate, personal ads, and private sales. The Craigslist database is organized regionally to promote local exchange, and broken down hierarchically into subjects to facilitate relevance of postings. The posts are searchable, filterable, and collectively moderated though flagging. In all, the collection is optimized for seeking and locating a specific query.

While Craigslist is established as a useful tool in providing classified services, it has also inadvertently developed a following for being a rich catalogue of transient human pursuit. Each posting is loosely formatted and therefore gives a glimpse at the personality behind the item or service being sought or requested. Though it's efficient as a localized search tool, Craigslist is difficult to navigate in its capacity as a social scrapbook. With the exception of its relatively small collection of "Best of Craigslist", the site is effectively impossible to browse across multiple locations or throughout multiple categories. To more easily understand and appreciate the human elements contained in Craigslist, it's data should be represented in a form that allows users to understand subjective trends and browse entries out of curiosity rather than necessity.

This proposed alternate representation of Craigslist will most benefit members of Gen-X and Gen-Y who contradict the notion of the internet as an information superhighway, but instead choose to embrace the the chaos in it's messy collection of fascinating tidbits. These users are mall rats and window-shoppers of the web. An alternate representation of Craigslist enables these users immerse themselves among the people represented by the posts, to feel connected and be inspired.

Since Craiglist itself exists in cyberspace, it would both logical and convenient for an alternate representation to be posted online, in order to be equally accessible. The current representation of Craigslist is infamously text-based, meaning the only graphical information is the images associated with individual postings. This is useful for approaching Craigslist in a left-brain sense, for filtering and seeking information. To perceive the network catalogued human pursuit, Craiglist should be abstracted, as to perceive large amounts of data simultaneously and dynamically. In addition to existing organizations of data that show location, category, date, and—where appropriate—age, time-span, cost and/or compensation, the alternate representation could show use of descriptive vocabulary, emphasis of self (e.g. use of "I", "Me", "Myself" versus, "You", "Them", etc.), and tone (use of all-capitalization, exclamation marks, profanity, etc.).

Ideally, this information representation could be self-updated by sharing the Craiglist database, or automatically scrapping from the website. Otherwise the information could be periodically scrapped and re-posted independent of the database's own updates. Since the alternate representation would be digital, it could be easily be manipulated to adapt to the changing data source.

Monday, September 1, 2008

Project Definition

Hurricanes one of the most dangerous “natural disasters” in the world. They often include flooding, thunderstorms, and tornadoes. People often don’t know how dangerous these storms can be; deasth is also a common factor in these storms. It isn’t always an easy thing to know how dangerous a storm will be after all it’s only rain and fast wind blowing. People should be able to know how dangerous it is outside, so that they are better able to protect themselves. Is it even safe to stay home or should they evacuate. The news can tell the information, but after awhile it stops making sense and just being statistics.

With an availability to the category storm and the forecasted path, people can predict what might happen and prepare for it. Are there any conections between what category the storm is and how much wind or rain damage there will be. Are there any connections to the the damage done from rain or wind to the catergory storm? Government Officials may with this information also be able topredict how much money thay are going to have to spend repairing the damage. Especially considering the increase in the amount of money that the United States has spent in the past twenty years.

Information I am planning to look into
Category of storm related to amount of rain
Damage from wind
Damage from flooding
Number of tornadoes in a particular storm
When in the season a storm occurs
The duration of the storm
The amount of deaths in a storm
The amount of media coverage for each storm during and after

With this information people in affected areas of the United States may be able to better prepare themselves for what is to come. They can compare how bad the storm is to when in the storm season it occurs, what category it is before it hits land, and what type of damage they should be prepared for before the storm really hits. Is there time to escape?

more ideas

So I had two other ideas rattling around in my head that might actually be useful to other people.

1. Catorgizing the plants in the surrounding area. I know that Pittsburgh has many bikers, hikers, and campers, people who like the outdoors. In any case it might be good to have a way to tell the poisonous plants from the safe plants and to know what plants can do what for you, Safe to eat etc.

2. Information on hurricanes how they are tracked, which hurricanes do the most damage ones with faster wind or more rain, how much money it takes to rebuild after such an occurrence, what the media coverage is like.

A few directions

I've been mulling over topics and I've managed to pair down my brainstorming into a few ideas:

1. I'm interested in working with a large volume of highly organized data. I've poked around on the CIA World Factbook and I like that the raw data is very dry and concrete. I did read the dicussion on Team 5's Blog, so I have some concerns that the data might become dull after a several weeks, but I also think the variety of potential comparisons could keep it interesting.

2. I'm interested in working with data that shows human subjectivity. I've done some small projects in the past with data from Craigslist and, like the CIA Factbook, I like that there's a large volume of easily scrape-able data. Unlike the Factbook though, Craigslist data is highly driven by personal opinions and vernacular which could be much more interesting but, as a tradeoff, it's less organized and harder to parse into organized comparisons.

3. I'm interested in working with data outside the typical scheme of contemporary information visualizations: survey data, web trends, demographics, politics, etc. I came across two interesting databases on dinosaurs by The National History Museum and The Arts and Letters Corporation. Between them there's a nice collection of straightforward data about physical characteristics, diet, time span and fossil locations. I think it could be interesting to relate the fossil location data to the relative location of the continents at the time of their existence (e.g. Pangea, Laurasia, etc).

So, any feedback on which idea would be stronger, more interesting, or more effective in a portfolio would be much appreciated. Thanks.