Drupal and Privacy: A Long Way Still to Go

Nedjo Rogers

on

December 2, 2008

Drupal and Privacy: A Long Way Still to Go

The degree to which privacy is protected on the web is determined not only by the policies and practices of a particular site. In a web application, it also depends on the software the site is built on. Drupal excels at enabling individuals to share information. How strong is Drupal's corresponding support for protecting personal information and privacy? To answer this question, we'll look at display of private information, IP address logging, cookie use, and users' control over their own data.

Public Display of Private Information

When users enter personal information on a website, they generally have an expectation that at least some of that information will be handled carefully and protected from inappropriate access. How well does Drupal restrict access to sensitive personal information? Here, Drupal measures up fairly well. The minimal information that an individual must enter to register with a minimally-configured Drupal site is a user name and a password. While the user name is publicly visible, the email address is protected and accessible only to site administrators. Drupal core ships with the Profile module, used to enable capturing and presentation of user information. A site administrator can create custom fields for user profiles, fields that can be filled either by site administrators or by users themselves as they register or edit their profiles. The Profile module is equipped with a system for "public" and "private" fields. Access to "private" field data is appropriately restricted by permission. However, the profile module is limited in its implementation and many sites choose to replace it, using instead one or another approach to constructing user profiles as "node" types. Here, depending on the implementation, there is plenty of potential for site admins to inadvertently expose private information. For example, a site designer might create a custom content type for users to enter their profile information. However, special expertise would be needed to link this profile to the "access user profiles" permission or otherwise provide nuanced control over who could access what portions of the profile data. The relative weakness of the core Profile module means that in practice user profile data are handled in many different ways, with varying degrees of protection for private information. Besides user profile data, there is the issue of protection of other potentially sensitive content. The "node access" system built into Drupal core enables finely nuanced control over who can access particular pieces of content. Numerous contributed modules provide different options for node access. A widely used approach, Organic groups, allows self-organizing groups to share information internally or with the broader public. Within content records, access to individual fields can be controlled with the field permissions module that comes with the Content construction kit (CCK) package. So, in the area of protecting potentially sensitive information, Drupal gets relatively good marks.

IP Address Logging

IP addresses are one of the main pieces of information that can be used to identify website users. IP addresses are regularly used by law enforcement and other government bodies to track and identify individual users. In the case of governments cracking down on political dissent, IP address logging can literally be a matter of life and liberty to dissidents posting information that governments or security forces want to suppress. Visitor IP addresses often are logged by the web server, so there may be information capture going on that Drupal has no control over. Nonetheless, it's important to look at how well Drupal protects users from IP address logging. Drupal core logs visitor IP addresses in multiple places. IP information is captured in the accesslog, comments, flood, sessions, and watchdog tables. Various contributed modules have their own IP logging. A contributed module, IP anonymize, provides a fairly good solution. However, it's limited by the fact that it has to act after the fact--that is, it works by deleting data that Drupal has previously logged. A fuller approach would require changes in Drupal core. An issue to provide the option to disable IP logging exists but hasn't attracted a lot of attention. A solution in Drupal core could provide an API that contributed modules could build on to give site admins control over IP logging in contributed modules as well as in Drupal core.

Cookie Handling

Ever tried to log into a Drupal site with a browser that doesn't accept cookies? Chances are that if you did you got a broken site. That's because Drupal doesn't work without cookies--but provides no error messages to help users understand why the site is broken. Cookies play a key role in the collection of information about website users. To take a single example, Google relies heavily on cookies in its quest to gather richly articulated data sets on most or all internet users. Part of their strategy is offering numerous "free" services, all of which require cookie acceptance and serve to collect detailed information on users. Through cookies, Google can link email, calendaring, and other social network data on individuals with other data sources such as individuals' search behaviour and advertising response. Privacy-conscious web users can set their browser not to accept cookies and then add exceptions for particular sites. But they need some cue that lets them know when to add an exception. By silently breaking when cookies are not present, Drupal presents real barriers to users who may wish to be selective about their cookie use. It's worth reviewing why Drupal requires cookies in the first place. Drupal uses user "sessions" to keep track of a visitor throughout a site visit. PHP (the language Drupal is written in) provides two ways of passing session information: through cookies and through the URL. The URL approach is provided as an alternative to be used if cookies are not available. Before Drupal 4.7, Drupal supported both methods. However, in the attempt to fix a bug in the URL passing, support for sessions without cookies was broken--and not restored. The problem of Drupal breaking when cookies aren't accepted was recognized over five years ago, when an issue was opened. Like countless others, that issue contains a patch by the amazing chx. But it hasn't received the testing or further work that would solidify the fix and allow it to be applied. Until this is fixed, Drupal users who don't open up their browsers to automatically accept cookies will be left to scratch their heads and wonder why they can't log in.

User Account Deletion

A key aspect of privacy protection is the level of control users have over their own data. Drupal enables users to create accounts and share information. After doing so, do they have control over the information they provided? If cookie handling is a longstanding issue, it pales next to the issue of user account deletion. For this issue, we need to go back to the eighth post on drupal.org. "Deleting one's own account is a basic privacy requirement," ax, one of Drupal's earliest developers, argued in December, 2001. Seven years later, we're no closer than we were then to supporting this basic functionality. Even if they want to, Drupal site administrators can't provide users with the ability to delete or even to disable their own account (at least, not without custom code). Effectively, Drupal enables users to contribute but not to control their own data.

What's Left to Do?

Protecting privacy rights is not an abstract or theoretical concern. Whole branches of government and industry exist for the purpose of carrying out surveillance and amassing detailed individual profiles of citizens/workers/consumers. For these often shadowy agencies, data exchanges on the web - including the tracking of digital thumbprints via IP addresses and the use of cookies to link users across multiple areas of web use - are a key focus of information gathering. Through its design choices, a software like Drupal can help enhance citizens' understanding of and action around personal information protection. Or it can obscure or prevent effective citizen control over data access. How does Drupal measure up in terms of privacy protection? The answer seems to be, protection of public access to personal data is pretty strong, but elsewhere in Drupal there's a lot still to do. The relatively "free and easy" attitude toward privacy in Drupal may reflect the personal attitudes of many Drupal contributors. Drupal is all about enabling people to share information. Far from being concerned with protecting their own privacy, users of many social networking platforms sometimes seem intent on exposing as much as possible about themselves, whether or not the rest of the world really wants to know! It may be that personal experiences predispose the Drupal community to be relatively laissez faire when it comes to enshrining privacy protection and user control over their own data. While there are now hundreds of groups at groups.drupal.org, there isn't yet one focusing on privacy. The category privacy on the groups site currently contains only a single post. How to improve things? Probably, the way that anything happens in Drupal--because a group of people is interested and prepared to do the work. It seems pretty clear that those of us interested in privacy need to do a bit of organizing. A good first step might be to create a privacy group on groups.drupal.org. For many of the issues identified in this review, patches already exist. A bit of concerted energy could go a long way toward ensuring that privacy protection, too, is significantly improved in the next version of Drupal.

Share it!

While there are now hundreds of groups at groups.drupal.org, there isn't yet one focusing on privacy. The category privacy on the groups site currently contains only a single post. How to improve things?