Thursday, May 23, 2013

Securely Importing JS Module, Part 2 - Block Element Referencing

..."The summary being - you can't allow anyone else reference to either your element OR your callback function. But HOW??"

That's the question, isn't it: How do you create a script element, add it to the page, make it load an external script, without ANYONE being able to get reference to it?

Think about it this way - how does anyone on the page get reference to anything else? 

Answer: calls, global scope, or DOM. 

Internally scoping blocks the 2nd option. So DOM and function call remain. Your code certainly isn't making any calls, but the page is - all the time - through events. So this list becomes: "Events and DOM reference".  DOM reference can be blocked simply by removing your script element from the DOM - the only way to guarantee no one gets reference to it directly is by removing the element (from the DOM) before the script that attached the element (to the DOM) is finished running. Obviously this is not enough time for the script to actually load - and every browser handles what happens next differently. Grrrrr.

Firefox just runs the script no matter what. Yay Firefox! Or boo?? Maybe I should be concerned that I can do this...?

IE, unsurprisingly, makes my life very difficult. In IE, the element has to be attached to the DOM when the call tree makes it's way back up to the top. This means that I have to give up focus. Fortunately, I still get first dibs on adding event listeners on the element (since I made it), and by adding listeners on the elements mutation events, I can stop propagation of the events, and on modern IE's, stop immediate propagation, and Remove the element from the DOM. Since these events start on my new element, I can stop them before they reach anyone else. Except one, which in IE7,8 still hits the parent element of the script, but by then I've removed the element and there's no reference to it in the event, so "whew! close one". The only "catch" is that I have to keep on repeating this till the browser actually DOES have the script loaded, cached, and runs it in between my attach and my remove. But I CAN do it.

Suppressing the appropriate events on all three major browsers, and for all their (sometimes crappy) versions was a ROYAL PAIN in my rear. But I managed it. With a few caveats.

See my next post for those details...

So.
Since I can block* reference to the element, no one can add listeners, therefore no one can fake/counterfeit my loading of the external resource. But there still has to be a pipeline between the loaded resource and the loader. The loaded JS will run on the page scope, nothing I can do about that, so I have to find a way for it to run, then load it into my own private scope, and remove the traces of it on the global scope, before anyone else can do anything about it. One way would be to pass a "key" from within the private scope, to the loaded JS, which then uses that "key" to set a global object which can only be referenced using that "key". A very helpful fact is that the event listeners on the element are guaranteed to run immediately after the loaded script runs (the onload attribute being first, the eventListeners second, and in order. yes, they actually run in order in my tests).  Given that fact, I don't actually need a private key, I just need an agreed upon reference that will be on the global scope.

At this point we just have two scripts, one running right after the other. Guaranteed. So it becomes a simple thing to pass information from the first to the second, and then remove the traces of the first one. The second script is in a private scope, so BOOM, we have secure loading.

No entity on the page can possibly even know anything just happened.

As I said, see my next post for some details

Securely Importing JS Modules

Why? That's the first question you might be asking. If a context is "compromised" then it's already too late, so no point in worrying. Weeell, I'm in a situation where I'm passing sensitive information on foreign domains,  and I simply can't let the yocals interfere/intercept/counterfeit.

"And what's the point anyways? You have to assume, that on the client browser, everything is PUBLIC information."  NO! That's the whole point of this project. I WANT security. I want to break that paradigm everyone seems quite happy with. The only question I start with, is "is that possible?"


Some Background


Imports and Modules are going to be a part of the ECMAScript 6 specification, aka JavaScript 2.0. Maybe you've noticed that in a <script> element you can specify what type of javascript you have - that's because JavaScript really has evolved over the years and pre-1.5 JS is just like pre-1.5 Java. Well, eventually we'll think the same thing about pre-2.0 Javascript. But till then, we have to make due with 1.5, and just implement the 2.0 functionality we want ourselves.

Likely the most important thing that 2.0 allows for, are classes, imports, and modules. Imports and Modules have been implemented by Firefox for some time now - likely because the founder/caretaker of Javascript works at Mozilla. If you've created a Firefox plugin, you've seen the concept of modules and exports/imports. Basically, you specify which variables you'll be importing from a module that you're loading. The module is designed to run in it's own context - it's own "window" - and only sends to the loader's window the "var"s specified by the load (in FF, the module specifies the vars, but in JS2.0, the loader AND the module specify).

These features are what I set out to duplicate. On deciding what path to take, I examines the features of the different JS loader's I could find in the wild. In particular, an API called AMD seemed popular, as were the loader's which implemented it. This website was helpful : http://addyosmani.com/writing-modular-js/ . A more extensive list of options and features can be seen here : https://spreadsheets.google.com/ccc?key=0Aqln2akPWiMIdERkY3J2OXdOUVJDTkNSQ2ZsV3hoWVE&hl=en#gid=2

I wasn't immediately impressed with the idea of AMD, but after two days of reading about it, I understood why it was designed how it is. As flexible and robust as JS can be, it's hard to make a general case loader, but I think AMD is 'pretty good' as a standard. But I've no idea why one would want to use a 'standard' API call in this case (since no external callers), except just for the sake of it being a 'standard'. I have a specific use case, with three total goals in mind: security, small fingerprint, and non-blocking cross-domain loading, so no reason to use a standard for standard's sake. So I looked into some of the alternatives on the Google Spreadsheet linked above - all the ones less than 2kb minimized. After looking at the source code of each, I found that all of them 'did too much', and didn't really address all my (aforementioned) main concerns. Specifically, none that I found cared too much about security - - they focused on speed, convenience, features, and footprint. Which is great, but I could hack any of them into loading malicious code instead of the intended modules **. Which is bad.

** Note that I don't mean what you probably think I mean. "Of course they can be used to load in malicious code, they are LOADERS!", right??  What I mean is that when my script makes a call to load a script, anyone can "fake" that load and make my script (in my private, secure scope) run their own code, breaking that "security".

So with that in mind, I set out to make my own importer which fit all my other needs (simply: small, non-blocking, asynchronous) but also was as tight as it got as far as security. I'm not really sure how "secure" I can make this, at this point. I'd love for it to be "impossible" for other scripts to access my information, but I suppose I could accept "very very difficult". Protecting functions and variables in your own JS file is easy, but how do you protect a second loaded JS file when the browser makes the load, the element, and the status, all public knowledge? 


So lets think about this a bit.
Naive load: just a regular document.body.appendChild(myScriptElement). Your new JS loads and runs. But the problem is that it can't tell the difference between the loader JS and a stranger - there's no way to block access to functions and variables selectively. Anything can be overwritten or used by someone else. And you can't scope things internally either, because you'd be locking out the original script (loader) too.

So you have to leave things in the open, but you don't want anyone changing your functions. Idea: using an entry way (one function), attach all that is in the loaded JS's scope to the internal scope of the loader JS, then delete reference to yourself. This would give all the 2nd JS's assets to the protected scope of the 1st JS - which no external scope one can reference. Problem - this only works if you trust the that the first person through the "entry way" is the loader. If the first person through the gate is a stranger - then you've just given away everything to that stranger instead of the loader. How can we guarantee that the first person to call the gateway function is the loader?

The only guarantee the browser provide is that the element's "onload" (or "onreadystatechange") functions will be called before the ones that were attached using attachEvent/addEventListener. "Ok, great, so just make your callback the element's attribute function"? Well no, because if we assume malicious code is on the page, then it could just replace you as the attribute callback function. All malicious code needs is a reference to the element. In fact, even if you placed some checks, malicious code could store your callback, replace it, then when the event has occurred, it would replace the loaded module with a malicious module, reattach your function, then dispatch your callback. A whole list of nasty things could happen if malicious code got reference to your callback function. The summary being - you can't allow anyone else reference to either your element OR your callback function. But HOW??

That's part 2...

Security Sandbox, Flash, Data URLS, and Chrome browser


Recently I was creating a test page (HTML) for injecting HTML from a TextArea into an iframe. Of course I was doing more than just that, but we don't need to go into it. I didn't want the Iframe to be on the same domain as the page (ie.  I didn't want the injected HTML/JS to have access to the parent frame, the parent frame's DOM Storage, nor the parent frame's cookies). I of course -could- have just loaded in a random subdomain and injected it using innerHTML of the HTMLElement node, this would fix my problem, but it would be nice not to do that. So I chose to use Data URLs.
Big mistake.
Interestingly, the "domain" of a Data URL is an empty string. Which no matter what, is different than the parent frame - even running a local system file, the domain is "localhost", which is obviously not equal to the empty string and therefore cannot talk to each other. I tried to just set cookies or DOM Storage from this "" (empty string) domain, but Chrome throws Exceptions. I tried setting the document.domain of the parent frame to "", but again, got exceptions. Great, actually - all is working well, as it's supposed to, given Chrome's sandboxing. Just checking.
My real problem started when I mixed Flash into the equation.
The parent frame (either local or from web) can insert a Flash object which has access to the page scope. So from within the Flash, I can call functions which exist on the web page. This is critical to certain features of my SWF. There are restrictions to this too, but I'm fully aware of them and used to it - it all makes sense anyways, in regards to cross scripting and security.  
However, the anomaly which sparked my interest was the fact that the Flash object in the "" domain simply cannot get access to the page scope, no matter how much the page and the swf want to talk. All I can figure is that Chrome sees the domain as "" and Flash sees the domain as something else, which would impose cross scripting security. But worse than that, it seems that Flash sees the "" domain as LOCAL (non-net) **, whereas Chrome sees it as just another domain.
** this was something I figured out later and made all the problems make sense. I just thought I was dealing with some bug or oversight, but it turns out I'm trying to hack my way through the most important Flash security.

This means that there is simply no way to enable page scope access for the SWF inside the Data URL page without going to the global Flash settings and allowing ALL domains here http://www.macromedia.com/support/documentation/en/flashplayer/help/settings_manager04.html . 
Which most likely is something people aren't so willing to do. As a matter of fact, there's actually some bug there at the settings page when using both the "System" Flash settings AND the settings in the link there. Which means that devs like me who actually use the development features in that panel might have some problems. 

This got me thinking about ways around this. As I said above at the **, I didn't realize yet I was dealing with intentional security. After all, I don't need complex communication - just some small strings. Passing TO the SWF is as always, super easy. But passing information FROM the SWF is not. Within the SWF, without ExternalInterface, one is limited to app settings and network connections. I tried changing app settings, things like name, id, width/height, dispatching events, etc etc, which I knew wouldn't work and sure enough, they didn't. Then I tried from the other directing, taking the DOM object, which is apparently called an "NPObject" (which is just a wrapper around Flash, meaning "native plugin object" I imagine, restricting access to the object). On pages where the SWF has access to the scope already, you can read and write attributes on it, but sure enough, didn't have any luck within the Data URL page. I tried tricks like a javascript url from the SWF, but that was blocked. I tried the hash message method, but again, didn't work - I think due to the Data URL. I did have one idea which seems rather exotic - since Flash is treating the app as local (if indeed it is), then I could change the page location to another Data URL which loads the SWF as a DataURL. This second SWF would appear to be from the same domain as the page and such get page access. Two big "IF"s with this - can you even use a Data URL for a Flash object, and even then, would it really allow page access? Maybe one day I'll try it.

UPDATE: I tried it. Didn't work. In modern versions of Flash they stopped the ability (as in, you used to be able to do it) to use Data URIs. It seems like a major security hole if it did work. I'm just a year or two too late. Drat.