This post was originally published on the Biochemical Society blog on 22 October 2015.
People tend to use the word ‘astronomical’ to illustrate an enormous size of numbers. Dr Bissan Al-Lazikani, computational biologist and data scientist from the Institute of Cancer Research, joked at the recent Policy Lunchbox event on big data that we should call huge numbers ‘genomic’, rather than ‘astronomical’ – it’s just more accurate that way.
During its 20 years in space the Hubble Space Telescope has produced around 45 terabytes, TB of information (that’s 1,000,000,000,000 bytes or 1000 gigabytes – whichever you prefer!). Once sequenced, one genome takes up around 1TB but if we add all the information we could get from a patient including imaging, we could potentially have up to 50TB of data. Each of us contains more data in ourselves than we have received from the Hubble Space Telescope!
Bissan has led the development of the world’s largest publicly available cancer knowledgebase, canSAR that has over 140,000 users and is helping to uncover hidden drug targets all over the world. We invited her to speak to tell us more about the challenges that big data in healthcare is facing and the policy issues involved.
Challenge #1: Data availability
Now we know that there is a lot of data in healthcare and it is big, ‘genomic’, one might say. But how much of it is actually accessible in a useful way?
Most current patient consent forms do not cover complicated statistical data analysis. Bissan argued that we need a general patient consent form that wouldn’t go into too much detail on what kind of statistical analysis will take place, as patients are rarely concerned with that as long as the research has the potential to benefit them or other people. Another issue regarding consent is lack of engagement with the public on data protection and analysis. People want to know what happens to their samples, people are curious to learn and she said we shouldn’t ‘dumb down’ the science or leave the public out of the important conversations and that we also tend to think that patients are very protective of their data.
Challenging this, she threw out questions such as: Do you have an Apple watch? Maybe you have signed up to 23andMe for personal genome testing? It seems a lot of people have, which means they are comfortable sharing their healthcare data when they feel the service is secure eg, through the legal agreements people sign with these companies. Finally, once we get the patients’ consent, it is crucial to make sure that the trials are designed for big data analysis.
Challenge #2: Lack of relevant expertise
Maths, statistics and computation have been identified as vulnerable skills within the UK bioscience and biomedical science research base due to their importance is modern science. There is still a division between computer scientists and biologists, though more and more integrated degrees are launched. Bissan said that training in maths and computer science with applications in healthcare would be most useful in practise.
Challenge #3: Technologies
Mobile phones that have 128GB of storage may seem impressive but, we still haven’t solved the problems of storage for big data, its transfer and manipulations (the easiest way to transfer large amounts of data is physically carrying the hard-drive to another building according to Bissan). To maximise the value of the data at hand it is important to not only be able to transfer it from one lab to another but also between organisations, as a patient might deal with a variety of hospitals and centres throughout his life. She emphasised that an increased investment in infrastructure would make a difference and ease the data manipulation.