Anatomy of a zero-knowledge web application

UPDATED ENTRY

When we launched our online password manager, we dubbed it the first example of a zero-knowledge web application. We simply meant that Clipperz knows nothing about its users and their data. It was a simplistic and inaccurate definition: the zero-knowledge paradigm needs to be better defined. Our fault.

The original idea aimed to leverage the internet to manage personal information, especially sensitive information. And without disclosing any information to the server providing the service!

The browsers is an ubiquitous and familiar tool and we wanted to use it as a gateway to the online vault containing user’s most precious data. Giulio Cesare was rather skeptical: he has been developing web applications for over six years and he knew how much data is possible to collect about users.

Nonetheless, we focused for months on designing a sound architecture for a new breed of “privacy aware” web applications. The basic idea was to deliver a no trust needed service, where users had the ability to inspect and verify anything running in their browser. We had to drift the attention away from trusting us and let users focus on trusting the application.

It was fun and frustrating at the same time. Privacy and security constraints were popping up everywhere. Despite that we grew convinced that many useful web applications can (and should) be developed applying the following zero-knowledge methodology.

1. Host-proof hosting

In order to avoid storing readable data on the server a zero-knowledge web application should encrypt and decrypt the data inside the browser. A neat idea, not new though. Richard Schwartz, Michael Mahemoff and others introduced the above concept under the name of host-proof hosting in the first half of 2005, few months before we started the Clipperz blog and project. Here is their definition from the AjaxPatterns wiki

Host sensitive data in encrypted form, so that clients can only access and manipulate it by providing a passphrase which is never transmitted to the server. The server is limited to persisting and retrieving whatever encrypted data the browser sends it, and never actually accesses the sensitive data in its plain form. It. All encryption and decryption takes place inside the browser itself.

Eventually Ajax made pure browser-based cryptography a reality. Javascript implementations of crypto functions have been around for years, but Javascript alone can’t remember data between page loads. This causes an annoying issue since it forces the user to re-enter the passphrase each time. On the other hand, an application developed with Ajax techniques tends to not actually do page transitions, hence solving the problem of keeping a persistent key to perform crypto operations.

2. Hide nothing

A zero-knowledge application should be trusted for itself and not because of the reputation of its developers. Therefore full access to the source code of the application is required.

This does not imply that a zero-knowledge application should be free or open source. As an example, Clipperz is released under a reference license meant to allow security code reviews but prohibiting copying and forking. Only the core crypto libraries are released under a BSD AGPL license.

UPDATE 1
Clipperz code is now also available under an AGPL license. See the Clipperz Community Edition project. Read more here.

UPDATE 2
The Clipperz Crypto Library, now Javascript Crypto Library, changed its license from BSD to AGPL. As a consequence it was moved from Google Code to SourceForge. Read more here.

2.1 Code inspection

Developers of zero-knowledge web applications must provide the same exact files that are loaded into the browser when accessing the application.

Usually these files are quite difficult, almost impossible, to work with: spaces and comments have been removed, variables have been renamed. To make life easier to code reviewers, it’s recommended to maintain the source files in their original form and provide instructions on how to derive the compressed and optimized versions. (see Clipperz build environment)

2.2 Code integrity

Performing a code security review it’s a complex matter, and it’s quite likely that most users will rely on reviews performed by others.

However any zero-knowledge web application should provide an easy way to verify that the application downloaded by the browser is the same application built from the code available for inspection.

Ideally we envision a solution that is completely browser based and relies on a redundant and distributed network of servers not associated with the application provider. Each third party server hosts the fingerprint of the zero-knowledge web application, i.e. the checksum of its source code.

At the moment, Clipperz is providing a less than ideal solution.

  • The whole application is condensed into a single file containing all the resources needed to run the application on the browser: html, css, javascript and also the images (but for IE).

  • The Clipperz website hosts both MD5 and SHA1 checksums of the above file along with the instructions on how to compute the checksum on your local machine.

(Any proposal to improve the above scheme is welcome!)

3. Prevent code changes

Zero-knowledge applications are basically huge Javascript programs running in the browser. Therefore it’s of the utmost importance to implement the necessary measures to stop any attempt to modify the code executed by the browser.

3.1 Download before login

The whole source code must be downloaded to the browser before the user signs in.

This is an essential requirements! If additional chunks of source code were downloaded from the server after the login phase, the user wouldn’t have any chance to verify in advance the security of the web application. Therefore not a single line of Javascript code should be moved to the browser after a successful user authentication.

3.2 Avoid code injection

Since Javascript is a very powerful and dynamic language, the borders between data and code are quite blurred.

In order to reassure a user about the fact that the web application he logged in won’t morph into a malicious program, a true zero-knowledge application should adopt the following measures:

  • Never, ever, use the “eval” function on data loaded from the server
    The eval function offers great flexibility since it’s able to “run” any string. But if a web application allows to use it to process data provided by the server, then any kind of code could be easily injected, thus hijacking the original application.

  • Limit the use of the “document.write” function
    Keep its use to the bare minimum, allowing for closer inspection when it is really necessary to use it.

  • Never, ever, load any html content from the server
    Loading ‘htlm’ chuncks from the server is another easy way to subvert the behavior of the application. Just imagine what would happen if the server could push this little ‘html’ snippet: <script src="/hijack.js"/>

    The scary part, is that this token could be hidden anywhere, even attached to a legitimate response. For this reasons, all the html elements used by a zero-knowledge application must be loaded together with the source code before the sign-in phase.

4. Learn nothing

There are countless design decisions that could disclose information to the server. Sometimes data leaks are easy to detect, sometimes very subtle and dangerous. A zero-knowledge application should pay maximum attention to work with as little information as possible. It’s easy to fall for a new fancy feature that can destroy the whole security architecture …

Consider the protocol behind user authentication. The following paragraph clearly explains why a zero-knowledge application should adopt the SRP protocol or an equivalent verifier-based protocol.

While any reasonably secure authentication protocol is expected not to leak any information about the password to eavesdroppers, protocols classified as zero-knowledge do not even leak any information about the password to the legitimate host (except the fact that the party at the other end really does know it). This subset of verifier-based protocols is strong indeed, since the host never stores plaintext-equivalent information and is never given any such information during the course of authentication. (from srp.stanford.edu)

SRP is complex and slower than traditional methods, but it’s perfect to achieve zero-knowledge! Moreover it can be deployed without revealing to the host both the password and the username! (as we do in Clipperz password manager)

As a consequence of the “learn nothing” mantra, every zero-knowledge application should be completely anonymous, or at least it should make it impossible to relate the real name or email of a user to his data.

tags:

You are ingenuous

(I am not English and translate my words with Babelfish from French, therefore excuse for my English.)

What it says the zero-knowledge is evocative, but ridicule. When I am connected from my PC to clipperz, through tens of nodes (enough to make a traceroute in order to see it). And everyone of these nodes ago the log of all that passes. Therefore even if you say that your application does not know nothing, someone (the police?) could know those that they are themselves connects to you, when they are themselves connects to you, where other have gone. This renders the zero-knowledge one ingenuous utopy and, indirectly, an ugly joke for the customer. The thing makes me to anger.

Giap

No need to for anger

Dear Giap, all the traffic between your browser and the Clipperz server is encrypted twice: through the Clipperz encryption and the SSL channel.

Therefore nobody in the middle can learn nothing about the exchanged data. The only kind of information available to Clipperz are those listed in this post.

We are very serious and we know that security is not a joke.

Feel free to ask any question on our forum.

Marco

Keys

Hi I am curious as to how user passphrases translate to keys. It would seem that at some strange level, anything that you do client side is open to me seeing a transformation, and would be hard to salt and such (more specifically, any private nonce that you do use as a part of this transformation would have to show up in my browser, mostly so I can decrypt all that fine data I encrypted elsewhere). Which leads me to imagine (I admit to being lazy and not looking over your source) that at some level you’re using something very closely derived from the passphrase, and hence something I could calculate given a set of passphrase. This somehow seems to makes me feel that you’d have issues with dictionary attacks.

This seems strange, since I imagine the usecase for a secure password storage is to allow one to have reasonably secure passwords protected by a more memorable passphrase.

Panda

Zero Knowledge

The phrase “zero knowledge” has a precise technical meaning, and abusing it will draw the criticism of theoretical cryptographers.

Choose your words carefully.

Hat Tipping those who have worked on this

Congratulations on this initiative, since the world does need more privacy aware applications.

Just a small comment, you might want to acknowledge some of the work & history of work in this area.

Especially the company Zero-Knowledge Systems (where I was President) who invented and developed a number of systems to enable privacy enabled applications similar to what you describe here.

Also there is a large body of research on Zero-Knowledge protocols in cryptography that probably deserve some attribution.

A case study on the work we did at Zero-Knowledge Systems is available here http://tinyurl.com/2q3kqp

Given that you are working in the area of privacy & security and using the zero knowledge concept this kind of attribution or acknowledgment would be useful since I’ve had a number of people ask me if we are affiliated with your project.

Re: Keys

The passphrase is used to derive three values:

  1. the ‘C’ value of the SRP protocol (aka: username)
  2. the ‘P’ value of the SRP protocol (aka: password)
  3. the ‘key’ to decrypt the index card, where all the other card keys are stored.

These values are computed using the following expressions:

  1. C = sha256(sha256(concat(username, passphrase)));
  2. P = sha256(sha256(concat(passphrase, username)));
  3. key = sha256(sha256(passphrase));

C and P are later used to perform the SRP authentication protocol, while key is used to encrypting/decrypt the index card using AES256 in CTR mode, with a different salt each time.

Hope this helps.

What you do is neat and

What you do is neat and cool, but please drop the zero-knowledge label. It was not meant to be used like this.

I agree

Yup, overuse of the ‘zero-knowledge’ phrase.

Alternatives?

“Zero knowledge proofs” or “zero knowledge protocols” are certainly well known concepts to cryptographers and security experts.

At Clipperz we use the expression “zero-knowledge web application” to mean a special web app that follows the four principles above. Nothing more, nothing less. I don’t think it could be misleading.

However, “zero knowledge web application” is not the name of the company, nor the name of a product/service. Therefore if anybody comes up with a better alternative, we are ready to change label.

Thanks, Marco

Thanks Austin

@ Austin

Thanks for your kind words, I really appreciate it.

And thanks for pointing me to the paper about Zero Knowledge Systems. I have to admit that I did not know much about your company and its products. However the following paragraph really resonated with me!

”[…] The very name of the company was both an homage to cryptography (a “zero-knowledge proof” is a standard cryptographic protocol) and a promise of a specific relation between the company and its clients (Zero-Knowledge would know nothing of its clients – not even who they were). No one need trust Zero-Knowledge to protect privacy; one need only trust their software. One could look at the code to see what it could or couldn’t do, what Zero-Knowledge could or couldn’t know. […]”

Those are exactly the same motivations behind our decision to call Clipperz a “zero knowledge web application”. You can even find similar sentences all over our website! “Don’t trust us, trust our code” is our mantra since the beginning, in 2005!

I’m sorry if some people think that you are affiliated with Clipperz, I’m definitely going to write a post trying to clarify this issue.

Thanks, Marco

Footprints

Hey, interesting idea!

I think I see where Giap was going with his post… Though everything is encrypted, there is still a bit of a footprint left on all the servers between me and the Clipperz servers.

This can leave a pertinent positive indicator that person/IP XYZ is accessing Clipperz. I believe this is why Giap is saying there is no such thing as “zero-knowledge”.

I am thinking that 0-footprint usage of the internet is nigh impossible, and if Giap needs security at that level, maybe he should look at offline security management.

I’m confused as to what

I’m confused as to what you have achieved. It seems like you just wrote a normal javascript client side application and you use your ‘web host’ as a web service for accessing a database, and you make sure you only send encrypted data to the web service. What’s the original contribution? Am I missing something?

What Clipperz does know

@Nelz

I absolutely agree with you: zero-footprint is nigh impossible. See my answer above to Giap, and take a look at this post.

Marco

Nothing new here, but ...

@Anonymous

Sure, there is nothing incredibly new or revolutionary, but I challenge you to find another web application that adopts the same strict criteria or provides comparable privacy and security for your data. :-)

Marco

Could you please explain me

Could you please explain me the meaning of ‘SRP’

SRP

SRP stands for Secure Remote Password protocol developed at Stanford University by Tom Wu. I provides a better way to perform password-based authentication. It is believed that SRP achieves the theoretical limit of security that can be offered by a purely password-based protocol.

You can read more here or enjoy this interview with its inventor.

Marco

Another way to do it

So assuming you trust clipperz through the code review, the real trick is getting code download verified and working. A firefox plugin seems to be one way to handle that. Sign off on the plugin and do the verifications and that way a user only has to verify each time they update the plugin rather than every time they navigate to clipperz.

Post new comment

The content of this field is kept private and will not be shown publicly.
Captcha
This question is used to make sure you are a human visitor and to prevent spam submissions.
Copy the characters (respecting upper/lower case) from the image.