Tag Archives: ISI

Using Crowd Wisdom to Annotate the Web

Since January I’ve been intensively researching in the space around citation practices in academia as part of the HighWire MRes ‘Special Topics’ module: I’m really intrigued by it all. As is often the way with these things I’ve probably gone about it the wrong way, with an idea of a solution (I was thinking about applications of linked/semantic data…), before properly understanding the problem. Thankfully, to learn this kind of thing is why I’m doing an MRes.

I’m actually consciously trying to resist being consumed by my obsession with this topic, yet at the same time trying to master my own feeling of completeness as regards my knowledge of the subject. Part of the issue is that most (not all) of the academics I’ve spoken to about my concerns to do with citation practices react quickly and deeply, suggesting that this is ‘the way things are’, inherently political, unbounded… and generally a difficult area to work in. They’re probably right, but it doesn’t mean I shouldn’t go there. I’ve already written a 2000 word literature review titled ‘Impact Metrics: Lies, Damned Lies, Statistics‘, and a mock EPSRC funding proposal ‘Expressing Research Output Through Linked Data‘ on these subjects, so I won’t elaborate here… however my thinking in this area did lead me to think about annotation as a method by which to make various practices on the web more transparent, and potentially a way of mitigating the Matthew effect.

The Matthew effect suggests that ‘the rich get richer and the poor get poorer’. It applies in academia too: a highly cited paper is far more likely to grow its citations quicker than a paper that has no citations. Thompson Reuters actually run a ‘Highly Cited’ service, on their front page they state:

“Once achieved, the highly cited designation is retained. With each new list, we add highly cited individuals, departments and laboratories to this elite community.”

I don’t want to appear objectionable, but, it is quite a scary proposition. They’re saying that once they (Thompson Reuters) have awarded this accolade, it is enshrined forever.. thus the ‘elite’ community is created. This touches on my issues with impact measures per se. It is impossible to explain the nuances of a lot of literature, knowledge, or learning, and to express why or how it is valuable by way of a number. The content of academic literature (excepting tables, figures, etc) is qualitative. Regardless of the field there’s a qualitative element. So why don’t we discuss it qualitative terms? Plain English…! “This is relevant because….” or “I disagree because…”

I don’t think we should ignore statistically based metrics. I don’t think we should ignore citation counts. I do think that being highly cited (whether or not Thompson Reuters invite you into their club) is usually a great thing, helping both authors and researchers that need to access relevant literature. However we’re missing out on the subjective. And the subjective has value. Even worse, if we’re counting citations and making a judgement on them, we assume that they’re quite an objective thing: which is a total fallacy. Why one paper receives citations and another one doesn’t could be for any reason, right through to being friends with somebody, to typeface, to an artefact of the indexing process, or simply because of the keywords chosen to describe the paper.

I had a really great lecture from Wolfgang Emmerich, although the lecture was really about agile software development methods used at Wolfgang’s company Zuehlke, we tested out the wisdom of the crowd. Wolfgang had each of us guess the weight of a motocycle. We revealed our first guesses, discussed, then re-guessed. We took the mean of the second guesses, and the ‘crowd’ (only about 10 of us) was within 5% of the correct weight. Pretty impressive I thought.

So… I postulate that the wisdom of the crowd, combined with an open annotation system, could be a massively important tool for adding extra value to things like, for example, citations. On an exploratory punt at working with Mendeley to further explore this through my summer project I was pointed toward hypothes.is by William Gunn (head of Academic Outreach at Mendeley). Hypothes.is are developing an open annotation system, relying on crowd-sourced and reputation based data… to annotate everything…… After watching this short video (below) I had one of those terrible yet affirming moments. The thought running through my head was “Again?!?! Again!!!? Why does every idea I have, seem to have already been had by somebody infinitely more able to deliver on it than myself.” I had had this idea before, but kind of wrote it off as being ‘too big’, eventually sanitising it down so much that I was just thinking about annotating citations. As it is, the ambition embodied by the project I think potentially has the power to transform the web. It’s also a reminder to me to ignore those authority figures that suggest that maybe the area of interest is ‘too big’ or ‘too political’ or ‘just the way things are’ – sometimes you’ve got to throw caution to the wind, just like Mario Capecchi did.