CYFeb 25, 2015Code
CrowdSurf: Empowering Informed Choices in the WebHassan Metwalley, Stefano Traverso, Marco Mellia et al.
When surfing the Internet, individuals leak personal and corporate information to third parties whose (legitimate or not) businesses revolve around the value of collected data. The implications are serious, from a person unwillingly exposing private information to an unknown third party, to a company unable to manage the flow of its information to the outside world. The point is that individuals and companies are more and more kept out of the loop when it comes to control private data. With the goal of empowering informed choices in information leakage through the Internet, we propose CROWDSURF, a system for comprehensive and collaborative auditing of data that flows to Internet services. Similarly to open-source efforts, we enable users to contribute in building awareness and control over privacy and communication vulnerabilities. CROWDSURF provides the core infrastructure and algorithms to let individuals and enterprises regain control on the information exposed on the web. We advocate CROWDSURF as a data processing layer positioned right below HTTP in the host protocol stack. This enables the inspection of clear-text data even when HTTPS is deployed and the application of processing rules that are customizable to fit any need. Preliminary results obtained executing a prototype implementation on ISP traffic traces demonstrate the feasibility of CROWDSURF.
HCFeb 22, 2016
WeBrowse: Mining HTTP logs online for network-based content recommendationGiuseppe Scavo, Zied Ben Houidi, Stefano Traverso et al.
A powerful means to help users discover new content in the overwhelming amount of information available today is sharing in online communities such as social networks or crowdsourced platforms. This means comes short in the case of what we call communities of a place: people who study, live or work at the same place. Such people often share common interests but either do not know each other or fail to actively engage in submitting and relaying information. To counter this effect, we propose passive crowdsourced content discovery, an approach that leverages the passive observation of web-clicks as an indication of users' interest in a piece of content. We design, implement, and evaluate WeBrowse , a passive crowdsourced system which requires no active user engagement to promote interesting content to users of a community of a place. Instead, it extracts the URLs users visit from traffic traversing a network link to identify popular and interesting pieces of information. We first prototype WeBrowse and evaluate it using both ground-truths and real traces from a large European Internet Service Provider. Then, we deploy WeBrowse in a campus of 15,000 users, and in a neighborhood. Evaluation based on our deployments shows the feasibility of our approach. The majority of WeBrowse's users welcome the quality of content it promotes. Finally, our analysis of popular topics across different communities confirms that users in the same community of a place share common interests, compared to users from different communities, thus confirming the promise of WeBrowse's approach.