Give me all my data Facebook: Interview with professor Ruben Verborgh (Part 1)
–– 24 Sep 2019
Ever wondered how much Facebook really knows about you? Ever wondered how to get all of your personal data back from Facebook? Yeah, thought not. Ruben Verborgh professor of Semantic Web technology at Ghent University, on the other hand, decided he was up for the task.
In this interview we talk to Ruben about his time living in Facebook town, his mission to get his data back from Facebook and why he believes our data should stay in one place…
Andy: When did you start getting into data privacy?
Ruben: For me, it was mainly my time working with open data that led me to data privacy. For example in the medical industry, you have clinical trials and have lots of data about these trials. We started asking what if we could use machines to make sense of all the data. That journey brought me to Stanford University, where they have a group that to this day looks at all this biomedical data.
The interesting thing was the timing. All of this happened in December 2016 when some big data things were happening, such as the Brexit vote which nobody saw coming. When I was working at Stanford I was living in Facebook town. I was immersed in the Silicon culture and living alongside generation Zuckerberg.
During my time near Facebook HQ, you could really feel that strange things were happening. I remember that suddenly one day in August 2016 they fired all their human editors. Human editors controlled the news feeds on Facebook, so when they decided they would replace them with algorithms it was quite a shock. It was a really interesting move because they could say “look, the news feed is based on an algorithm so we don’t have control over it”. Another thing I saw was Facebook preparing for fake news. 2016 wasn’t that long ago but fake news was not a thing.
Andy: What else did you experience during your time in Silicon Valley?
Ruben: They really believe that technology can solve everything. I think it’s mistaken, and a problem for society. Technology will fix it but they’re not too concerned about the interactions technology has with society. Just because you can, doesn’t mean that you should, there are few ethics there. They don’t mean it badly but they’re just people very passionate about technology in isolation and they forget to look at the bigger picture.
Andy: For many people, if you told them ‘here you go’, manage and control all your data that could be pretty overwhelming and too much like hard work! How do you see change happening?
Ruben: The first remark is about empowerment. We’re not going to tell people like you HAVE to control your data; if you don’t want to then that’s fine. But it’s good for people to have the possibility to do it. Right now nobody has the option to do that.
Let’s take the traditional Facebook example. If you want to interact with people you almost have to have a Facebook account. And you pay with your data. What we’re saying is that if you want to be in control of your data, and want to share data, you can. There are pieces of data you care about and pieces of data you don’t, we’re saying you can do it for certain pieces of data and others that are less important.
Andy: We really liked the getting my personal data out of Facebook campaign you did, why did you decide to try and get your data out of Facebook?
Ruben: First of all there are no hard feelings. I used Facebook for several years and used it for my professional life. I never posted on Facebook it was always my tweets that were being echoed. One day Facebook decided to stop the Twitter integration which meant no one was interacting with me so I decided to leave. But when I leave I want my data back.
The thing was I didn’t just want my data back, I wanted all the data they hold on me including the stuff I didn’t put on there. You know, things like what they’ve added about me. What Facebook does is say ‘hey here’s a tool to get your data out’ but it only gives me my data that I have put on there, I wanted all the rest.
Andy: How did you get on? Did you get your data back…
Ruben: The process is still ongoing. So far they have not given me a single piece of data yet. My interest is dual. On the one hand, I want my data. On the other hand, I want to find out what kind of data they have about me. Do they have spending data? Geographical data? What do they know about me? These answers I haven’t received yet.
What I found out is how they handle subject access requests. I knew for some time that they had an internal GDPR department. They were preparing for GDPR and know what they’re doing. However, from my experience and the way they react, they seem to have decided on a strategy of delay and confusion. What they’re doing is deliberate and they try to make it seem like they don’t understand what you require. They say things like ‘we’ll need more time’. But if you think about it that’s their strategy to try and be confused about the request and not be helpful.
Andy: Why do they do that?
Ruben: They want people to give up, they want to make it as hard as possible to get their personal data. They misinterpret your questions, they reply to things that you haven’t said. I think they are very afraid to set a precedent, because the moment they give me my data, everyone else has a shortcut.
Another thing I found is they got a bit careless. They probably didn’t expect me to keep on replying either. Apparently they had not anticipated me making those conversations public. I have Google snippets of their answers. They replied about some really interesting questionable things to me. They have funny interpretations of GDPR and thought they would get away with it. If more people start doing this and a court case happens (which is inevitable) I have interesting pieces of evidence.
What I think we all need to do is to bombard all of those companies with access requests, because right now the number of subject access requests is so low that they can afford to have humans replying to them. Once the volume of access requests gets high enough they will need to send the data and provide automated APIs.
Yes, even Facebook doesn’t have enough data, you can even see they are desperate to get more data and now they’re releasing the Libra project to get transaction data. The whole model of data harvesting is broken.
Andy: If you’re a business and you want to move away from data harvesting…
Ruben: Then you have to compete based on the quality of service. Just like we used to do. The last couple of years, every company has become a data company. You’re selling shoes, guess what you need data to sell shoes. We exactly predict when people want shoes. But those companies don’t want to be data companies, they want to make shoes.
I have companies knocking on my door saying we don’t want data. I’ll give you a good example. There’s a company here that uses AI to match people with jobs. But they don’t want to own data, because the moment they start harvesting data they are in competition with LinkedIn and they say please lend us data and using AI we’ll give you good matches, but we don’t want to store your data. This philosophy of not collecting data is happening more and more often.
Data has become a liability, especially as it’s expensive to store and you have to justify every single piece of data. Companies are tired of it, there are only a few companies that want to hold data. Other companies never wanted to be in the first place but they’re stuck with it.
Andy: What do you think is the future of all this? Privacy, data ownership? Data harvesting?
Ruben: I think we will arrive at a future where people will be in control of their own data because it’s the only thing that makes sense. Governments are there to help you but they have massive GDPR problems and IT processes. The problem is that agencies b and c are not allowed to see some things that a and d need to access. With Solid, agencies a/b/c/d just ask only the data they need directly from your pod, so no data ever moves around and you are in control at every point.
What we should be doing is inverting the paradigm, what if data doesn’t move? What if data stays in at least one place? Suddenly it’s easier for companies and governments to do their data processing when data doesn’t move. For every business, their processes become simpler, people are in control and it works together. It makes it less risky and right now governments and companies have many copies of their data and don’t know which is the latest. I think we need to go back to the source of the data, that’s you! After all, this makes much more sense.