Tag Archives: big data

Cloud computing and big/open data: tools for cosmic data Aikido?

It isn’t by chance that me and Rob Potts (colleague, and new friend, and universal tool for metamorphosing the everyday) have referenced each others’ blogs this week. It will continue, I hope. We are in fact attempting to instigate a general cross-referencing of the blogs that our HighWire cohort have to write for various modules this term in order to stimulate some innovation

The title of this post is a shout out to Rob’s blog Sustain and Release. Read that post and you might see what I mean about him being a universal tool for (occasionally in)coherent metamorphosis of the everyday into visual, spoken and cognitive metaphor (has he infected me?!). Rob says, in regards to sustainability:

Is this a ‘damned if you do, damned if you don’t’ situation?’ What is the smart way to proceed? Perhaps now we need to begin to work with matter, cosmic Aikido.

Professor Gordon Blair presented a lecture on cloud computing to us today. It didn’t contain anything I wasn’t, at least a little, aware of before.. however in the inimitable way that any good presenter can, Gordon’s lecture did make me think about these things in detail – something that is happening consistently in my time at Lancaster.

So back to the Aikido and cloud computing. Cloud computing isn’t a distinct thing. It is Google Apps,  it’s Dropbox, it’s Amazon EC2, it’s BitTorrent, it’s iCloud, it’s the data centres that startups and corporate giants use to harbor their data.

There is no hard and fast definition, and the list above could be a very long one, but in essence, cloud computing is a whole host of overlapping technologies. This is very well demonstrated in this image taken from Cloud computing: state-of-the-art and research challenges (Qi Zhang, Lu Cheng, Raouf Boutaba 2010), via Gordon Blair.

Cloud Computing Architecture

Cloud Computing Architecture

End users see the different levels as Infrastructure as a Service (IaaS), Platform as a Service (PaaS) or Software as a Service (SaaS). The diagram shows the kinds of resources related to each of these layers, and the examples on the right show real world examples of each one. IaaS generally refers to quite raw, ‘low’ level stuff (such as simply having a virtual machine running Ubuntu, or Windows 8, that you have access to via ‘the cloud’). PaaS takes it up a level, maybe you will have access to a programming framework, or a database. You don’t really have to care how it works, but you know you can access it for your own means. Lastly SaaS is the kinds of things that I use everyday, Dropbox, Facebook, Google Apps: user-facing applications. Sometimes you might find that a SaaS is built atop one or both the two layers below it.

Cloud computing is great. It’s very clever, and with the bandwidth available these days, and the hyper-connectivity that in its own right is an intriguing area of study. With it I can happily go to the University campus knowing the papers that I need to read are stored in Mendeley’s online repository, my music is in Google Music, any other documents I need are accessible from Dropbox, and that if i have an innovative startup idea today, I can easily get the computing power needed to support it online – without huge outlay – by tomorrow, and that that solution will be scaleable. It’s incredible.

There is of course the hidden cost. It’s hard to find a reliable figure for this, but you could argue, legitimately, that searching Google twice (incidentally, for each hyperlink in this document I’ve searched Google at least once…) uses the same amount of energy as boiling a kettle. It isn’t fair, I don’t think, to make that comparison directly. However what is undeniably true is that the energy involved in running cloud based services, and the infrastructure that supports them, is magnificently huge. As an evangelist for general movement towards sustainability, and a leading expert in distributed computer systems this actually puts Gordon in a sticky place I would say; I don’t envy him on that front. I am aware of sustainability issues, and increasing care about them (and I want to do something about them) but… fortunately I’m not an expert in distributed systems! Conversely it’s a damn good job that some of the eminent experts in this field appreciate sustainability.

In the same session Gordon covered some issues related to big data and open data. I actually abhor the term big data, as it happens, based on its inherent ambiguity – but no matter. Cloud computing is one of the factors that has enabled big data to splurge across the world, and as a result big data has become a significant area of study (and – excuse me – a big business).

Big data, I think, should be respected and watched. The respect because it can harness great power for, potentially for both good and evil. Watched to make sure that this power is controlled equitably. It scares me to think how much information Google hold on me. It scares me to think how much money our personal data is worth to corporations. It scares me to think that if my DNA or health records become part of this big data craze and comes to be in the hands of corporations concerned with profiting from it. But at the same time the quirky correlations between Google search results and things like house prices or influenza outbreaks, if they continue to emerge and sustain, have huge potential for good in the world (those are just two examples of how people can utilise Google’s big data, there are many other vendors, types and examples). Another interesting story of how scary big data is comes from Malte Spitz. Spitz wondered one day exactly what data his mobile phone company was collecting about him, after a lengthy legal battle he finally received a file that contained 35,830 lines of coded data. From this data you can virtually relive Spitz’s life over an extended period. I really recommend this TED talk, delivered by Spitz, where he sums it up beautifully.

Cloud computing and big data (and indeed the Internet as a whole) share their thirst for energy, and there are no signs of this appetite abating. I find when talking to colleagues that some find it incredibly easy to become ‘anti’ quite quickly when thinking about this. The mixture of the gloomy global outlook when considering sustainability along with the bitterness derived by most when considering issues of privacy and trust related to big data is a heady mix, that can make those concerned with it appear reactionary. Conversely others that, I think quite pragmatically, conclude that big data (and sustainability issues) are with us to stay, oftentimes become equally vocal, and it isn’t difficult to find confusing theories that lead a logical observer toward a head-in-the-sand approach to the dilemmas here (on account of a how entangled the issues are). A third camp, and that is where I see myself, are optimistic that the benefits of big data can be realised while the issues of trust and privacy issues are dealt with sensibly. Apart from revolutionists, I haven’t heard any convincing argument as to how we could realistically dispense of these innovations now that we have them.

The final part of this cosmic data Aikido jigsaw is open data. Open data is an equally broad topic as big data, so I won’t go into any detail, but broadly speaking it means data that are publicly available. You could say that open data are to information, what the open source movement is to software applications. Like open source some see it as a tool or a model that fits into current paradigms, others see it as an entirely different philosophy. I think it has the potential to be both. One example of an open data project is OpenStreetMap, a global map that is made for people, by people, and is owned by people. New York City has a large repository of data that covers everything from wireless hotspots in the city (the most frequently viewed, via the open data portal) through to after school programmes, privately owned public spaces, fiscal stimulus data, and refuse collection tonnages. Another example of open data at work was after the large 2010 earthquake in Haiti the OpenStreetMap data for Port-au-Prince was taken from being virtually nonexistent, to some of the richest cartographic data that’s ever existed. This data was used by aid organisations and health agencies to great effect. In NYC you can view crime statistics on an interactive map, and maybe plan a safer route home, or decide where to live accordingly. It’s early days for open data, but so far some of the applications really have had impact, and are almost heartwarming to my mind.

It’s a difficult thing to imagine, but I really believe that if all of the elements of the system could be modeled to demonstrate that the utility and methods behind cloud computing can deliver the benefits of open and big data in a scaleable and sustainable way. Apart from a hell of a lot of work and ingenuity, it would require a ‘global’ cooperation. If you take global to mean whatever system you’re looking at, rather than ‘planet-wide’, then I think this really could be a reality in the not-so-distant future.

So what am I thinking? A hella distributed computer system. These distributed systems (some of which could be termed cloud computing) are really powerful. A system where every device would contribute its spare processing power and storage to the cloud, whether it be a phone, tablet, laptop, super computer, or a whole a datacentre. All data would be owned by everybody, so forcing a collective responsibility towards how much of it there is, and how it is used. To metaphoricalise it: imagine the world had a single well for drinking water. Nobody in their right mind would use it all up too quickly, neither would they treat it irresponsibly and contaminate it. Indeed if anybody tried to do either of those things, then everybody else would try to stop them. Interestingly the way I’m imagining this system, it could pretty much alleviate the privacy/trust issues associated with big data. You see, I think the best way to incentivise generators of big data to only generate, to only store data they need, and also to ensure that it is treated sensitively, is to store it in an entirely open cloud.

I realise having gotten to the end of this constructed idea of cloud-based cosmic data Aikido, that it is a little Utopian. Maybe a lot. However, there isn’t really anything to suggest that the idea couldn’t work on a relatively local level (look at Diaspora and BitTorrent), before being scaled up. Each increment of the network size would represent a net (pun?) ‘saving’, and a further step towards generating and using data responsibly.

Going a few steps further down the technological discovery line you can imagine how the Utopian vision described above could be supported by ubiquitous computing. It would be a challenge to quantify, but I dare say that if you added up all the spare processing and storage capacity that exists within the incredibly pervasive computer power in the world (including all the processes not only in phones, but in refrigerators, cars, escalators, boilers, routers, etc) – then you could maybe replace a large amount of the energy-gobbling (and expensive) data centres that power the (current incarnation of) the cloud and big data. On another note imagine that the way we store data could be disrupted by storing it inside DNAliving storage devices could be the answer to the practical problem of how to store seemingly infinite data (however, of course, this raises a whole myriad of ethical and trust concerns in its own right). It’s all possible though.

The European Union recently announced a €1 billion research project around large-scale brain simulations. I’m fascinated by it, and ever so slightly scared too. Depending on the outcome, maybe ubiquitous computing could become a vehicle for hosting a large scale virtual brain in a distributed manner. I think maybe I’ve watched too much Ghost in the Shell.

Put briefly, I love the things you can do with cloud computing, big data, and open data. I’m also aware that there are impacts. Computing is ubiquitous, but we’ve got to that stage without much thought for how to sustain it, or how to get the most out of it. Maybe it isn’t practicable, but, I’d like to think that there is a way in which all of these arenas are linked could be put to use and lean on each others’ strong points, while containing the negative connotations related to how we see them now.