When we launched our online password manager, we dubbed it the first example of a zero-knowledge web application. We simply meant that Clipperz knows nothing about its users and their data. It was a simplistic and inaccurate definition: the zero-knowledge paradigm needs to be better defined. Our fault.
The original idea aimed to leverage the internet to manage personal information, especially sensitive information. And without disclosing any information to the server providing the service!
The browsers is an ubiquitous and familiar tool and we wanted to use it as a gateway to the online vault containing user’s most precious data. Giulio Cesare was rather skeptical: he has been developing web applications for over six years and he knew how much data is possible to collect about users.
Nonetheless, we focused for months on designing a sound architecture for a new breed of “privacy aware” web applications. The basic idea was to deliver a no trust needed service, where users had the ability to inspect and verify anything running in their browser. We had to drift the attention away from trusting us and let users focus on trusting the application.
It was fun and frustrating at the same time. Privacy and security constraints were popping up everywhere. Despite that we grew convinced that many useful web applications can (and should) be developed applying the following zero-knowledge methodology.
1. Host-proof hosting
In order to avoid storing readable data on the server a zero-knowledge web application should encrypt and decrypt the data inside the browser. A neat idea, not new though. Richard Schwartz, Michael Mahemoff and others introduced the above concept under the name of host-proof hosting in the first half of 2005, few months before we started the Clipperz blog and project. Here is their definition from the AjaxPatterns wiki
Host sensitive data in encrypted form, so that clients can only access and manipulate it by providing a passphrase which is never transmitted to the server. The server is limited to persisting and retrieving whatever encrypted data the browser sends it, and never actually accesses the sensitive data in its plain form. It. All encryption and decryption takes place inside the browser itself.
2. Hide nothing
A zero-knowledge application should be trusted for itself and not because of the reputation of its developers. Therefore full access to the source code of the application is required.
This does not imply that a zero-knowledge application should be free or open source. As an example, Clipperz was originally released under a reference license meant to allow security code reviews while the core crypto libraries were released under a BSD license.
2.1 Code inspection
Developers of zero-knowledge web applications must provide the same exact files that are loaded into the browser when accessing the application.
Usually these files are quite difficult, almost impossible, to work with: spaces and comments have been removed, variables have been renamed. To make life easier to code reviewers, it’s recommended to maintain the source files in their original form and provide instructions on how to derive the compressed and optimized versions. (see Clipperz build environment)
2.2 Code integrity
Performing a code security review it’s a complex matter, and it’s quite likely that most users will rely on reviews performed by others.
However any zero-knowledge web application should provide an easy way to verify that the application downloaded by the browser is the same application built from the code available for inspection.
Ideally we envision a solution that is completely browser based and relies on a redundant and distributed network of servers not associated with the application provider. Each third party server hosts the fingerprint of the zero-knowledge web application, i.e. the checksum of its source code.
At the moment, Clipperz is providing a less than ideal solution.
The Clipperz website hosts both MD5 and SHA1 checksums of the above file along with the instructions on how to compute the checksum on your local machine.
(Any proposal to improve the above scheme is welcome!)
3. Prevent code changes
3.1 Download before login
The whole source code must be downloaded to the browser before the user signs in.
3.2 Avoid code injection
In order to reassure a user about the fact that the web application he logged in won’t morph into a malicious program, a true zero-knowledge application should adopt the following measures:
Never, ever, use the “eval” function on data loaded from the server
The eval function offers great flexibility since it’s able to “run” any string. But if a web application allows to use it to process data provided by the server, then any kind of code could be easily injected, thus hijacking the original application.
Limit the use of the “document.write” function
Keep its use to the bare minimum, allowing for closer inspection when it is really necessary to use it.
Never, ever, load any html content from the server
Loading ‘html’ chunks from the server is another easy way to subvert the behavior of the application. Just imagine what would happen if the server could push this little ‘html’ snippet:
The scary part, is that this token could be hidden anywhere, even attached to a legitimate response. For this reasons, all the html elements used by a zero-knowledge application must be loaded together with the source code before the sign-in phase.
4. Learn nothing
There are countless design decisions that could disclose information to the server. Sometimes data leaks are easy to detect, sometimes very subtle and dangerous. A zero-knowledge application should pay maximum attention to work with as little information as possible. It’s easy to fall for a new fancy feature that can destroy the whole security architecture …
Consider the protocol behind user authentication. The following paragraph clearly explains why a zero-knowledge application should adopt the SRP protocol or an equivalent verifier-based protocol.
While any reasonably secure authentication protocol is expected not to leak any information about the password to eavesdroppers, protocols classified as zero-knowledge do not even leak any information about the password to the legitimate host (except the fact that the party at the other end really does know it). This subset of verifier-based protocols is strong indeed, since the host never stores plaintext-equivalent information and is never given any such information during the course of authentication. (from srp.stanford.edu)
SRP is complex and slower than traditional methods, but it’s perfect to achieve zero-knowledge! Moreover it can be deployed without revealing to the host both the password and the username! (as we do in Clipperz password manager)
As a consequence of the “learn nothing” mantra, every zero-knowledge application should be completely anonymous, or at least it should make it impossible to relate the real name or email of a user to his data.