I attended a seminar titled "Fixing our Opaque, Fragmented and Disparate Big Data" on November 2, 2016.
It was given by Augustin Chaintreau at Université de Montréal: http://diro.umontreal.ca/departement/colloques/une-nouvelle/news/augustin-chaintreau-fixing-our-opaque-fragmented-39038/
These are my notes.
We hear so much about Big Data these days–is there anything wrong with Big Data? This question is the starting point and motivation for the work that was presented.
Goals:
- Build an infrastructure for transparency;
- Empower users and social scientists;
- Expose discrimination that is created or exacerbated by services using Big Data.
Big Data represents an industrial revolution, just like when the manufacturing society became large-scale. Large-scale commerce paved the way for a retail price that would be fixed and public. This happened not only in the field of commerce, but also in that of information (pay for quality content with advertisement) and politics (nationwide addresses makes accountability possible).
You would think that the Internet only perfected this process! Well, not really. Price discrimination has been exposed (penalizing users with ZIP codes in rural areas, not as given in the checkout form, but as stored in their browsers' cookies!). Bias in sponsored vs. organic search results has been exposed. Some vendors use technologies that ensure some Facebook ads be not shown to users of certain ethnicity. Around Ferguson, law enforcement was monitoring Facebook, Twitter, Instagram. And the list goes on, these issues making headlines on a regular basis.
Considering that Big Data uses reinforce bias, discrimination, stereotypes, our challenge is to reconcile Big Data with our values.
Big Data is intrinsically about 'discriminating' (identifying target users, grouping users based on certain features, etc.). So Big Data's disparate impact comes as no surprise [1]. Big Data is made for discrimination: How can we keep the good part of it, while avoiding the unfair part? Is considered unfair, for example, the targeting of vulnerable communities, the isolation of historically marginalized people, etc. So, of course, there is an assumption on what is considered good and what is considered abuse (say, according to US consensus).
The goal would be to reduce inequalities instead of exacerbating them. If we do not pay attention, the risk is to erode long-standing civil rights protections.
A proper implementation of Big Data systems would address:
- Transparency (reveal data usage);
- Reconciliation (cross-domain risks);
- Network effects (hegemony risks);
and would
- act at root level;
- be reusable
- and scalable (deployable now).
Traditionally, solutions have been developed along the lines of privacy [2] and fair use [3]. Here, the speaker offers a solution based on transparency, accountability, and empowerment of users. The problem is a classical CS one, that of learning functions from examples:
{i} are some inputs (keywords in an email, in a search query...);
a is a target output (say, an ad);
Is there a targeting function f_a for each a (and what is its form exactly, assuming it is Boolean monotonic)?
In this context, they have developed Sunlight [4] and XRay [5].
At this point, the presentation got rushed and slides were running were fast. Nonetheless, interesting points were made. Social sharing may typically worsen inequalities (analogy with public services, where users at the centre of a network benefit very much without contributing much, and vice versa). Also, a single tracking signal may not be sufficient to identify you (think mobile), but the matching of two tracking signals may be...
As a conclusion, these contributions aim at reversing today's opacity, fragmentation, and hegemony of Big Data (and at showing that this is actually possible). So there is hope!
[1] https://mathbabe.org/2014/10/20/big-datas-disparate-impact/
[2] http://people.seas.harvard.edu/~salil/research/CompDiffPriv-crypto.pdf
[3] https://arxiv.org/abs/1510.02377
[4] http://www.cs.columbia.edu/~djhsu/papers/sunlight.pdf
[5] https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-lecuyer.pdf
Data Transparency Lab Conference 2016 (November 16–19) http://www.datatransparencylab.org/