Nothing new under the sun…

Hello all. This is a guest post from a University of Edinburgh colleague of the EDC MOOC course team. My name is Anne-Marie Scott, and I’m the Technology Enhanced Learning Services Team Manager, the shorthand of which is, I look after the IT side of the University eLearning services. I gave a (very) small amount of help with the EDC News site that was used as part of the EDC MOOC, and now that there has been a little time to pause and reflect, it seems like a good time to talk about the site, and, in particular, to get a little technical about what was done, how and why. In the greatest part we were only able to do what we did because of others who had generously shared their own experiences, hints, tips and tricks via blogs and online tutorials. On that basis I think it’s fair to say that our site was mostly an assembly of good ideas from others, but hopefully this post could help someone else looking to do something similar.

First of all, some information about the site in action during the course, to give you a flavour…

The EDC News site was to function like a daily newsletter, aggregating posts about the course from various personal blogs.  Over the lifetime(ish) of the course 931 urls of peronal blogs were submitted for aggregation. The collection of blog urls generated around 1340 posts within the news site and from what we can see it looks as if posts may have come from as many as 300 out of the 900+ different blogs.

Google analytics reports that the site was visited around 1430 times by 997 unique visitors. Half of the visits to the site were from people who had visited before and almost everyone only visited the first page of the site. The front page of the site is set up to show 100 posts, and you can see from the graph below that at the rate new posts were added to the site, a post was likely only to stay on the front page for a couple of days at best.


The eagle-eyed will have already spotted that the EDC News site uses WordPress, and the key piece that makes this all work under the bonnet is the FeedWordPress plugin (we did consider gRSSHopper as an alternate solution to WordPress but ruled this out). Much kudos and credit must go to Martin Hawkesy at JISC CETIS at this point. His immensely helpful post on MOOC aggregation was the best source of information, bar none, for deciding on which tool to use.

The most immediate problem we faced is that whatever we did would have to work on a shared hosting server. There weren’t any major concerns about the potential numbers of visitors to the site, but I had concerns about how the business of aggregation in FeedWordPress would work. The quickest solution I could think of was to see if there was some sort of tool in the cloud that we could use to do the ‘heavy lifting’ of aggregating multiple RSS feeds and filtering out the content that we wanted. That would keep the number of outgoing connections from WordPress low, and reduce the amount of processing being done on our server. I took a quick look at Google Reader and then through a circuitous route I found Yahoo Pipes. As it turns out, if I’d just read a bit more of Martin Hawkesy’s blog I could have found Pipes sooner…

The final piece of the puzzle around how to manage collection of the blog urls from course participants was put together by Jeremy Knox based on a very neat tutorial about using a Google spreadsheet as the source of a Yahoo Pipe. Jeremy and other course colleagues also took care of the visual style of the site – a big plus for using WordPress as this is easy to do.

So, the full setup works like this:

  1. Blog urls are submitted by course participants to a Google spreadsheet.
  2. A series of Yahoo Pipes grabs 20 of the blog urls, loads up their content and filters out anything with the #edcmooc tag.
  3. An aggregated RSS feed from the pipe is fed into FeedWordPress.
  4. The FWP+ Limit size of posts plugin truncates the incoming posts and creates the ‘Continue Reading’ link to the source blog post.
  5. The feeds are run manually once a day (due to some local server constraints, but you could automate this with a cron job).

Some gotchas we hit:

  • Yahoo Pipes has a 30 second timeout – hence chunking up the feeds into batches of 20 – this seemed to be the optimum amount. I also added a ‘last 72 hours’ date filter to avoid the feeds getting too big over time.
  • FeedWordPress also has a timeout. By default it is 20 seconds – increase this to match the 30 second timeout in Pipes.
  • Quite a few of the urls submitted had problems – some admin/curation is always required. Quite a few also contained no #edcmooc content – urls are often submitted with the best of intentions.

To conclude, this proved to be a reasonably quick, and very cheap way of doing blog aggregation with tools that didn’t require a high degree of technical intervention to keep things running. After the initial setup, most of the work was ‘administrative’ (making more pipes, adding feeds to WordPress, updating the feeds) and with clear instructions is pretty accessible to do.

For anyone who wants to consider doing something similar then, this would be our ‘shopping list’:

  • Google spreadsheet
  • Yahoo Pipes
  • WordPress
  • FeedWordPress plugin
  • FeedWordPress Duplicate Post Filter plugin
  • FWP+ Limit size of posts plugin
  • Google Analytics for WordPress plugin (this is entirely optional)
  • ‘Admired’ theme by Brad Thomas (again optional – use what you like)