Guarding Your Site Users' Anonymity







The Affordable Living Calculator (ALC) helps users calculate how long they or someone in their care can afford to stay in various assisted-living facilities they are considering.  To this end they must enter the fee schedule of each facility as well as their personal financial information: assets, income, and monthly expenses.  To encourage people to use the Calculator, and feel safe while doing so, I want users of Affordable Living Calculator (ALC) to be able to save their data in a database so that it will be available to them next time they sign in, but without leaving any contact info that might reveal their identity.

In order to avoid having to administer user's identities myself, while safeguarding users' identities, I decided to make use of, Janrain Engage, a third-party authentication service.  Engage, formerly known as rpxnow, allows users to sign in using a service-provider that they already have an account with, such as Google, or Twitter.  A user of a relying site clicks on the logo of their preferred provide and a page opens on that provider's domain asking for their user name and password, and listing exactly what information about them will be shared if the user continues, such as gender, e-mail, birth-date, etc..  On some sites, users can customize what information they want to share.

Once the user authenticates, they are redirected to an address on the relying party's domain with a token that allows the relying party to contact Janrain again for the user's identity and shared information.  The only item of interest to ALC is the user's OpenId identifier, a unique URL that identifies that user uniquely.  Anything else is ignored.  (For security, I would prefer that Janrain not send anything to ALC but the OpenId URL, but they currently do not support that.  Several developers, including myself, have petitioned Janrain asking for this feature.)  

The ALC data-server takes the OpenId URL, and hashes it with SHA1, creating a 40 character hash in hexadecimal format.  It is exceedingly unlikely that two different URLs will map to the same value when hashed this way.  The result of that operation is what the ALC data-server saves as the user's identity to ALC.  Since it is practically impossible to unscramble the hash and recover the original ID URL, and the ALC data-server does not save any other data that might identify or allow one to contact the user, the user's data remains completely anonymous, even to someone with direct access to the data, while the user can get back to their data any time.

The down-side of this scheme is that we have no way to contact the users of ALC and notify them of updates or other useful information that may become available.  However, this information can be provided to users on a different site that allows them to subscribe and receive e-mails about resources and updates.  On this alternate site, their contact information is not directly linked to their identities on the ALC database.  I have not been able to determine whether a user's OpenID URL cannot be deduced by knowing his or her e-mail address on the same provider.  For example, own identity URL on Google contains a 40-character random string that is unrelated to my Gmail address, but this does not have to be the case for all OpenID providers, and in fact, an article by Sam Ruby cited below shows how user's may choose their own custom OpenID URLs.  Therefore, I will instruct users to use an e-mail from a different provider than the one they use to sign into ALC if they are concerned.
It is perfectly feasible to provide OpenID authentication without Janrain and develop the ALC data-server to negotiate the exchange directly with each identity-provider without a middle-man, but since each identity-provider has its own unique interface to this service, letting Janrain act as a middle-man significantly reduces the learning curve and development time.
Resources

Comments

Popular Posts