Saturday, July 11, 2009

Photos from Hypertext 2009 are up on Flickr: http://ping.fm/UnLCT All photos from Hypertext are tagged with ht09 in Flickr: http://ping.fm/ChUGB All Twitter tweets from Hypertext 2009 can be found at http://ping.fm/GhJG1 If you haven't already, connect with your fellow Hypertexters on LinkedIn: http://ping.fm/3fz60 Browse Hypertext presentations from SlideShare: http://ping.fm/06rL9 Or use FriendFeed to see them all together http://ping.fm/TarnG

Wednesday, July 1, 2009

Keynote on Day 3: Relating Content through Web Usage by Ricardo Baeza-Yates

Ricardo is VP of Yahoo! Research in Barcelona, Santiago, and Haifa, Israel. He was a PhD student at the University of Waterloo. He also maintains ties with universities in Spain. His talk is on Web Content through Web Usage. He has a book which is the standard in information retrieval. According to Ricardo, web search is no longer about document retrieval, there is now a new breed of search experiences which involve the Wisdom of Crowds behind Web 2.0. Search is evolving more than just documents towards identifying a user's task and task completion. However the challenges are on-line and scalability.

We now have more complete information available in one search such as shortcuts, deep links and enhanced results. But for search, it is content vs. intent, the premise for the user is that they don't want to search, they just want to get tasks done and straight to their answers. We do searching when we don't know what to ask or who to ask. We are now moving from a web of pages to a web of objects. Objects have attributes, they will be missing, noisy, incomplete, but that is ok. Attributes define faceted search. However, the question is how do we get structured objects/attributes? This will come from metadata/semantic web/ontologies, web usage, and building out an open ecosystem.

From the AOL experience, obtaining queries and clicks is private. Crawling the web is expensive. From James Surowiecki, a New Yorker columnist in his 2004 book: Under the right circumstances, groups are remarkably intelligent. So what do you get from the wisdom of crowds? Popularity, diversity, quality and coverage are what we get out. The wisdom of crowds is crucial for search ranking, we use text (web writers and editors), links (web publishers), now tags (web taggers), and what Yates is mentioning next is taking all the queries.

20 years later, the basic ideas of cross references and dynamic links from Frank Tompa in 1988 is still relevant today. Yahoo Research has some demos of their research, one is TagExplorer which is based on tag similarity. How this is done? First, tag mining needs to be classified and tag semantics are done using WordNet. Yates showed a demo in TagExplorer where you can find tags related to locations, subjects and activities based on a query, he gave the example of Torino. Based on this and finding similar pictures, we can tag pictures automatically. We could also suggest tags to people based on a picture, however if you do that in Flickr, this is not folksonomy any more. This would be biased towards the algorithm and that is what we don't want.

We can also do visual annotations by associating text with a visual area which is done in Flickr as well as tagging people in Facebook. Content-based image retrieval is based on first extracting visual features and describing them, and then building a visual vocabulary using k-means clustering. This is an example of combining tagging and visual image retrieval. Besides WordNet, you can also use Wikipedia search and use that to drive the algorithm. By using this, Yahoo Research has created Correlator to find relations in the Wikipedia. Correlator works by retrieving related sentences and ranking them.

The next part of Yates' talk is Web Usage. We can use clicks by following hyperlinks, queries that express user interest. For example, if q4 is related to q3 because the words in the pages are similar and because the user clicked it. We can see what people are looking for, mapping queries to ODP. You can do hierarchical clustering on the graph (Francisco, Baeza-Yates and Oliveira).

So what are some of the open issues? Data volume versus better algorithms, explicit versus implicit social networks (are there any fundamental similarities), how to evaluate with (small) partial knowledge, and user aggregation vs. personalization. We have a virtuous cycle and improve the web.

So now it's questions. First question was about how Yahoo Research's work on tagging and search compares with Wolfram Alpha. Yates answered that the two come from different ends of the spectrum. Yahoo Research is making some of their datasets like Yahoo Answers available to researchers to use.

,

Tuesday, June 30, 2009

Lada Adamic's Keynote Address

Markus Strohm has written a great blog entry liveblogging about Lada Adamic's keynote today on The Social Hyperlink, thanks Markus!

,
Stanford started with the social web and club nexus, orkut was started. From online profiles you can find social patterns and discover links between people in hyperlink structure called social hyperlinks.

Session 1: Hypertext Structure and Usage

This session is being chaired by Peter Brusilovsky called Hypertext Structure and Usage. The first presenter is Mark Bernstein from Eastgate Systems. This track session is called the Systems track. Mark is talking about On Hypertext Narrative which the paper is based on his book "Reading Hypertext". We want hypertext to do what we cannot do in print. Hypertext tells a story, and has a plot. Plot, not story is where we find meaning. Little Red Riding Hood is the first social software, according to Mark. When do we tell the reader that the wolf has run ahead and eaten grandma? We have four kinds of links. Stretchtext with no navigation (or at least no departure) is about replacing a piece of hypertext with some other hypertext, essentially "stretching" the text. Our business is about varying plot, not varying story. Text stays itself, electronic text replaces itself.

The second paper is on Bringing Your Dead Links Back to Life. They developed PageChaser, a system to find new locations of moved Web pages which is part of the WISH project. So they asked a question: What's wrong with Google? It doesn't work becuase it needs index in advance, keyword matching and we don't know where the page is. PageChaser uses location bias and link authorities. They developed comprehensive set of heuristics for finding likely places, which many other researchers do not focus on the location factors, but just focus on broken links. Very interesting and relevant work.

Day 2 of Hypertext 2009 Opening Session

After some technical difficulties, the Hypertext conference has started! There was approximately 31% acceptance rate which is in line with previous Hypertext conferences. There will be an ACM Student Research Competition which has 13 posters and there will be a session tomorrow from 4:45 to 6:05 pm, with winners announced at closing tomorrow at 6:10 pm. This competition is sponsored by Microsoft Research.

At noon, there will be a pitch or madness session for the posters and demos, all presenters have to speak for just one minute. The posters and demos session will happen at 6:10 to 7 pm in the A foyer and Room (Sala) B. The social dinner is at Societa Canottieri Caprera, Corso Moncalieri 22 at 8:30 pm. There will also be awards session Douglas Englebart and Ted Nelson awards for best papers. There are social tools for Hypertext 2009, please see this URL, use #ht09 for Twitter posts, ht09 tag for Flickr photos, and you can use Nokia Friend View on your phone for location-based Twitter-type posts.

Hypertext 2010 will be in Toronto, Canada from June 14-17, 2010. There will be a SIGWEB Town Meeting today at 2:10 pm in Sala A on the report on SIGWEB.

It's Day 2 of Hypertext, with keynote speaker Lada Adamic talking about the Social Hyperlink http://ping.fm/HNktu