Archive for the 'Programming' Category

Code Markup for WordPress

Thursday, July 5th, 2007

I’ve been looking for a decent code markup plugin for WordPress so that I can include source code fragments in WordPress.

Problem is, using <CODE> tag in conjuction with <PRE> injected extra blank lines ( <BR/> ) into the code.

Using Code Markup, I was able to do it.

But there was a trick…

First, the plugin requires that the <code> tag be in lowercase. Internally, I was using uppercase so it’d stand out visually to me. In theory, HTML tags ought to be case insensitive, but the filter required them. I’m going to look at this as a “good thing” since it allows me both worlds. I just wish I found it by a means other than clever guesswork.

Second, if you want spaces preserved, you need to put your code block inside of a <pre> tag. This is actually well documented on the Code Markup site.

Third and finally, do not go sprinkling HTML entities like &amp; in your code; let the filter do it for you.

Unused local variables, a gotcha that’ll getcha

Friday, May 4th, 2007

Recently I attended the No Fluff Just Stuff conference again and learned about a free, fantastic static code analyzer for Java, called PMD. It can use be used standalone or even integrated into many popular IDEs like NetBeans or Eclipse. For the curious, I’d tell you what PMD stands for, but no one really knows; worse yet, I can’t stop myself from typing PDM.

PMD has a nifty rule that allows it to locate Unused Local Variables.

Very quickly I was able to walk through our entire code base, identify things that were assigned to, and subsequently not used, and remove them.

Gotcha #0
The first major hit of the day is that you’ll want to do a Clean on your project before you start. Believe it or not, some project building steps can build intermediate .java, files from your master source code. Problem is, if they appear anywhere inside your project’s directory structure when PMD is making the sweep, they get analyzed to. And you don’t want that.

Gotcha #1
It turns out that there’s a reverse ripple effect in performing this kind of code cleanup and operation. After you’ve removed code, you’ll want to make sure you perform subsequent passes until all concerns are removed.

Take for instance this trivial case:
&nbsp;&nbsp;B = A;
&nbsp;&nbsp;C = B;
&nbsp;&nbsp;// C is unused!

What happens is that once you remove C, it turns out B may no longer be used. Remove B, and it’s even possible A may no longer be used either.

Additionally, this can eventually lead to additional Unused Imports, which dealing with those can also decrease build time.

Gotcha #2 — The Real Evil
Normally, this kind of code clean up is absolutely harmless, although there’s one error of omission that a developer can make which will create problems and cause a silent failure.

Here’s a case where a static code analyzer recommends removing a very important line of code:

boolean doSomething(int x) {
&nbsp;&nbsp;// Do something very important with x
&nbsp;&nbsp;return result;
}

...
&nbsp;&nbsp;boolean result = doSomething( x ); // Do Something Important
&nbsp;&nbsp;// result not used
...

If the return value from a method isn’t used, then the static analyzer will assume the method doesn’t need to be called, and it will recommend commenting out the line — causing your program to silently break.

This is not a error on the part of the tool!

The error was actually that of the developer for not checking the return results of the method.

To correct the problem with the source code, the developer has three options:

  1. Change the method signature to be of type void.
  2. Throw an exception from the method.
  3. Check the return value of the called method.

Otherwise, the real error is that the method may attempt to do something, fail, and communicate back to the caller than something went wrong, but the caller blindly trusts that things are okay.

If you’re not going to check a return value, you shouldn’t be incurring the overhead of sending one. If the library writer provided one for a reason, then you should be using it.

Conclusion
When a code analyzer makes a recommendation, ask yourself what implied rules about your code the analyzer is assuming. Rather than blaming the tool, the better solution is fixing the source.

It may be best to comment out, rather than deleting, code that initially seems superfluous.

Finally, after a massive code sweep, run those unit tests.

Using </SCRIPT> In A JavaScript Literal

Wednesday, April 25th, 2007

I’m currently working on an application that takes content from various web resources, munges the content, stores it in a database, and on demand generates interactive web pages, which includes the ability to annotate content in a web editor. Things were humming along great for weeks until we got a stream of data which made the browser burp with a JavaScript syntax error.

Problem was, when I examined the automatically generated JavaScript, it looked perfectly good to my eyes.

So, I reduced the problem down to a very trivial case.

What would you suppose the following code block does in a browser?

<HTML>
<BODY>
  start
  <SCRIPT>
    alert( "</SCRIPT>" );
  </SCRIPT>
  finish
</BODY>
</HTML>

Try it and see.

To my eyes, this should produce an alert box with the simple text </SCRIPT> inside it. Nothing special.

However, in all browsers (IE 7, Firefox, Opera, and Safari) on all platforms (XP/Vista/OS X) it didn’t. The close tag inside the quoted literal terminated the scripting block, printing the closing punctuation.

Change </SCRIPT> to just <SCRIPT>, and you get the alert box as expected.

So, I did more reading and more testing. I looked at the hex dump of the file to see if perhaps there was something strange going on. Nope, plain ASCII.

I looked at the JavaScript documentation online, and the other thing they suggest escaping are the single and double quotes, as well as the backslash which does the escaping. (Note we’re using forward slashes, which require no escapes in a JavaScript string.)

I even got the 5th Edition of JavaScript: The Definitive Guide from O’Reilly, and on page 27, which lists the comprehensive escape sequences, there is nothing magical about the forward slash, nor this magic string.

In fact, if you start playing with other strings, you get these results:
  <SCRIPT> …works
  <A/B> …works
  </STRONG> …works
  <\/SCRIPT> …displays </SCRIPT>, and while I suppose you can escape a forward slash, there should be no need to. Ever. See prior example.
  </SCRIPT> …breaks
  </SCRIPTX> …works (note the extra character, an X)

With JavaScript, what’s in quotes is supposed to be flat, literal, uninterpreted, meaningless test.

It was after this I turned to ask for help from several security and web experts.

Security Concerns


Why security experts?

The primary concern is obviously cross site scripting. We’re taking untrusted sites and displaying portions of the data stream. Should an attacker be able to insert </SCRIPT> into the stream, a few comment characters, and shortly reopen a new <SCRIPT> block, he’d be able to mess with cookies, twiddle the DOM, dink with AJAX, and do things that compromise the trust of the server.

The Explanation


The explanation came from Phil Wherry.

As he puts it, the <SCRIPT> tag is content-agnostic. Which means the HTML Parser doesn’t know we’re in the middle of a JavaScript string.

What the HTML parser saw was this:

<HTML>
<BODY>
  start
  <SCRIPT>alert( "</SCRIPT>
  " );
  </SCRIPT>
  finish
</BODY>
</HTML>

And there you have it, not only is the syntax error obvious now, but the HTML is malformed.

The processing of JavaScript doesn’t happen until after the browser has understood which parts are JavaScript. Until it sees that close </SCRIPT> tag, it doesn’t care what’s inside - quoted or not.

Turns out, we all have seen this problem in traditional programming languages before. Ever run across hard-to-read code where the indentation conveys a block that doesn’t logically exist? Same thing. In this case instead of curly braces or begin/end pairs, it was the start and end tags of the JavaScript.

Upstream Processing


Remember, this wasn’t hand-rolled JavaScript. It was produced by an upstream piece of code that generated the actual JavaScript block, which is much more complex than the example shown.

It is getting an untrusted string. Which, to shove inside of a JavaScript string not only has to be sanitized, but also escaped in such a way that the HTML parser cannot accidentally treat the string’s contents as a legal (or illegal!) tag.

To do this we need to build a helper function to scrub data that will directly be emitted as a raw JavaScript string.


  1. Escape all backslashes, replacing \ with \\, since backslash is the JavaScript escape character. This has to be done first as not to escape other escapes we’re about to add.
  2. Escape all quotes, replacing ' with \', and " with \" — this stops the string from getting terminated.
  3. Escape all angle brackets, replacing < with \<, and > with \> — this stops the tags from getting recognized.

private String safeJavaScriptStringLiteral(String str) {

  str = str.replace(”\\”,”\\\\”); // escape single backslashes
  str = str.replace(”'”,”\\'”); // escape single quotes
  str = str.replace(”\”",”\\\”"); // escape double quotes
  str = str.replace(”<”,”\\<”); // escape open angle bracket
  str = str.replace(”>”,”\\>”); // escape close angle bracket
  return str;
}

At this point we should have generated a JavaScript string which never has anything that looks like a tag in it, but is perfectly safe to an XML parser. All that’s needed next is to emit the JavaScript surrounded by a <![CDATA[ ... ]]> block, so the HTML parser doesn’t get confused over embedded angle brackets.

From a security perspective, I think this also goes to show that lone JavaScript fragment validation isn’t enough; one has to take it in the full context of the containing HTML parser. Pragmatically speaking, the JavaScript alone was valid, but once inside HTML, became problematic.

An Advanced Crash Course in AJAX

Thursday, April 12th, 2007

Libraries, like Prototype and jQuery, abstract all this away…
Let’s say that you’ve got a basic understanding of JavaScript, you roughly know what AJAX is, and you can twiddle the DOM, but now it’s time for the rubber to meet the road and you want to get up to speed, know about the quirks, and learn hidden tidbits that come from head bludgeoning against the wall experience.

This guide is a quick romp through AJAX, stopping at all the little pieces that you might not know about.

Making a Request / Response


It turns out that basically every browser on the planet does XMLHttpRequest the same way, with the exception of the evil Internet Explorer, which uses ActiveXObjects distributed with the operating system, and even then it does so inconsistently. In theory, the new IE7 is supposed to conform to the “right” way, but given there are still so many 5.0, 5.0, and 6.0 IE browsers out there, this has made a rats nest out of what should have been simple code to start with. Here’s the fundamental code that returns a browser-neutral object:

function createRequest() {
  var request = null;
  try {
    request = new XMLHttpRequest(); // Everyone but IE
  }
  catch (trymicrosoft) {
    try {
      request = new ActiveXObject(”Msxml2.XMLHTTP”);
    }
    catch (othermicrosoft) {
      try {
        request = new ActiveXObject(”Microsoft.XMLHTTP”);
      }
      catch (failed) {
        request = null;  // Always check for NULL!
      }
    }
  }

  if ( request == null ) // Might as well check here
    alert(”Error creating request object!”);
  } else return request;
}

An aside: It turns out that Internet Explorer 5.x on the Mac (an old, broken, and discontinued product from Microsoft) doesn’t work — AJAX can’t be done, as there is no ActiveX control, and they don’t do it the “standard” way with the browser.

To use such a function in your pages, you’d do something like this:

function getSomething() {
  var request = createRequest();
  var url = "http://www.yourhost.com/serverside“;
  request.open(”GET”, url, true );  // true = asynchronous
  request.onredystatechange = callback;
  request.send (null);  // or whatever data
}

Note: if you use POST, then you also have to set the request header, .setRequestHeader(), usually this will be “Content-Type” with a value of “application/x-www-form-urlencoded” when just sending form data. Otherwise the server has no idea what is being sent in the POST.

The callback function, which can be different for each request, needs to check a ready state and a status — as the call back gets called four times during the process:

function callback(request) {
  if ( request.readyState == 4 ) { // 4 = response downloaded
    if ( request.status == 200 ||  // 200 = success
         request.status == 304 ) { // 304 = not modified
      // Do something
      var response = request.responseText;
    }
  }
}

Note: readyState is a read-only property - you can’t set it.

Another Note: you can use responseXML instead of responseText for XML! You manipulate it just like the DOM, making use of getElementsByTagName(). This requires a Content-Type of “text/xml” from the server to work.

Libraries, like Prototype and jQuery, abstract all this away so that you simply provide the URL, GET/POST type, Asyc/Sync type, and a call back — with special forms to do common tasks, like filling in a DIV with pulled content with one call.

Note: if you find yourself calling getElementById(), you need to look at the $() function of these libraries. And, yes, a dollar sign is a legal identifier character, making a lone dollar sign a valid variable or function name.

Keep in mind, if you start adding other third-party Prototype libraries, they may fail if you use the jQuery enhancements. Fret not, because you can have your cake and eat it to. jQuery lets you unhook itself from the standard AJAX shortcut conventions.

It’s tempting to skim through this, assuming that “you get it” — but the devil is in the details.

Here’s where things get extra tricky.


Browsers Cache Dynamic Responses
Internet Explorer and Opera actually cache the response to a given URL request. That means if you do a GET, the first one will work, but subsequent ones will not. The browser will go “oh, I remember sending this before, here’s the response I got.” As such, you either need to use POSTs or attach a dummy variable, with something like new Date().getTime() as part of the parameters to force it to a unique URL each time.

Script elements need an end tag!
It’s often useful to put JavaScript into its own .js file. Note however that the SCRIPT tag, for historical reasons, expects content. It can be empty, but there must be containing something.

Illegal: <SCRIPT type=”text/javascript” src=”yourlib.js” />
Legal: <SCRIPT type=”text/javascript” src=”yourlib.js”></SCRIPT>

Never Use innerHTML
Additionally, you’ll find that a lot of examples use innerHTML to set the property of an element, like DIV. This is wrong. It is not part of the DOM specification, the W3C has deprecated it, and future browsers may not support it — in fact, some browsers already don’t support it now. Use DOM code, it works on any platform. Plus, libraries like Prototype and jQuery have special shortcuts making it possible to access elements by id, element, css type, XPath, and even as a collection. Seriously, the examples in your books are dated and wrong - look for methods like .text() and .html() instead.

Other useful values: document .documentElement (the root node), .parentNode, .childNodes, .firstChild , .lastChild, .nodeType* , .nodeName, .nodeValue , .getAttribute(), .setAttribute() .

* Once again, IE has problems, this time with the with the Node type.

Set Behaviors Elsewhere, If You Can
It’s also tempting to sprinkle code in onClick handlers, but that can make modification difficult for mass changes, not to mention making the HTML uglier. A library called Behavior solves this problem elegantly. You write regular, clean HTML and it will use JavaScript to add behaviors to the tags you specify after the fact. Simply define a set of rules and apply them.

Note: the onclick property of a DOM object is all lowercase, not camel-cased.

DOMs Reorganizes, They Don’t Copy
There’s also some other DOM magic that isn’t obvious. If you have a DOM tree and you get a reference to an element, and then you do an otherElement.appendChild(firstElement) to some other node element, since a DOM node can only have one parent, it actually gets moved. That is, you don’t have to delete anything.

Stuff About AJAX Libraries You Wanna Know


Drag’n'Drop …uh, no… Sortables
Drag’n'Drop in the browser world of AJAX means dragging DIVs and such to different locations on the screen, of which some of those locations can themselves be containers. If you’re looking to rearrange the elements within a container, that is called Sortables. These are container elements (like OL’s and DIV’s) which contain things (like LI’s, DIV’s, and IMG’s), and maintain the order of them. By far the best sortables example I’ve seen was done by Greg Neustaetter and he explains how he did it.

The HTML ID Does Matter!
Turns out many of the AJAX libraries do trickery based upon the ID of the elements. As you’re aware, every ID on a page must be unique in order to pass valid HTML. When the AJAX libraries go looking for elements, this must be true. Additionally, the IDs often have special meanings. For instance, in order to report sequences, the IDs had to be in a form of string underscore integer. ( e.g., Item_10). You can also use a dash instead. AJAX will let you serialize the numerical parts into a string. So, if your id happens to contain additional dashes, underscores, or forgets the numerics, bad things can happen.

Note: An HTML ID can start with a letter, dollar sign, or underscore. After that you can uses numbers, periods, and dashes. IDs are case sensitive, and though their technical size limit is 64K in size (wow!), though don’t count on your browser to honor that. Long IDs can make things slow and chew up memory.

Be Careful With Arrays
There’s a lot of clever overloading going on. Sometimes a parameter is an element, sometimes it’s a class name, and sometime it’s an array. When it comes to sortables (and drag’n'drop), often you need to provide a list of valid containers. This is done by creating an array of strings with the appropriate names and passing that to the AJAX call. As such, it isn’t mandatory to have an array to make things sortable, but only when you’re crossing containers.

Metadata
It is actually possible to pass collections of metadata inside of a class tag! This can be very handy.

<P ID="thing" class="foo bar { xyzzy: 'plugh', abc: 123 }" />

jQuery has a plug-in called metadata (the documentation is in the JavaScript code) that lets you access this.

$("thing").data().xyzzy returns “plugh”
$(”thing”).data().abc returns 123

AJAX Responses
If an AJAX response returns text, you can access it with .responseText. If an AJAX response returns XML, you can access it with .responseXML and read it just like you would the DOM. And, if the AJAX response send straight HTML, you can always inject it directly into an element with Prototype’s Ajax.Updater.

Currently, the hard part of the problem is taking an XML response from the server and transforming that fragment into HTML using XSLT on the client side.

Normally, a full XML document is transformed into HTML and loaded into the DOM, to which AJAX takes over. The problem is, while AJAX allows for modifying the content of an element, the phase of XML to HTML is already past. Just as XMLHttpRequest() has many different historical quirks, XSLT support and implementation is even worse.

Supposedly, however, there is a library called zXml, and it has a transformToText() function which, in theory, provides cross browser support.

XSLT and AJAX

See the benefits of XSLT. Download, unzip, and drag the .XML file into your browser.


XSLT Example
XSLT_Example.zip

This example separates content, structure, and presentation.

But let’s discuss XSLT in the contents of an entire page.

The magic of XSLT allows the transformation of any arbitrary XML to well formated HTML by rules that you define. And, what’s really spiffy is that you can use XSLT to automatically generate AJAX code as well. However, there are a few tricks to know and a few kinks to watch out for.

XSLT position()
This function returns the element’s position in the tree. The thing to look out for? Whitespace is also an element! As such, if you’re using an <xsl:apply-transformation />, you want to make sure the select statement specifically lists the kind of node you want, and not just some parent element.

XSLT replace() is XSLT 2.0
Turns out some browsers have a problem with XSLT v2.0. Evil. Just evil. And, along this line, so are variables. Some of the really nice features of XSLT might not be possible.

Firefox Hangs When XSLT Generates Scriptaculous
The
Scriptaculous effects library is very clever by being very modular. When you include it, dependencies allow just the pieces you want to load. This has the advantage of making the pages very light weight. It appears to do this feat of magic by injecting content into the DOM at the point you include the <SCRIPT>…</SCRIPT> tag. Only problem is, if the DOM is being generated on the fly by XSLT, bad things can happen. Surprisingly, this seems to be a Firefox-only problem — and I’ve reported the problem to the authors of Scriptaculous. If I get no response, I’m going to the Mozilla people next. IE does not appear to be affected, nor is Safari.

Containers Get Instances
I had some XSLT code which was building my arrays, and deferring initialization of sortable containers until page load completion time. The problem was, for some reason, the containers were not getting initialized at completion. The result was that certain elements weren’t functioning. When I tried initializing as I went, each container got an instance of the array. Follow that again slowly. Sortable containers don’t get a reference to an array, they get a copy of the array, and if the array (which contains all containers you’re allowed to interact with) isn’t fully initialized, your page is broken. Admittedly, a lot of this problem happened because the load order of things between libraries wasn’t clear. Each AJAX library usually hooks into the OnLoad() call, so you better not have one, but you’ll need to see if it put itself first, or last, in the chain.

That sums it up…


That sums up the mental core dump. If you happen to have any tidbits, trivia, or embarrassing corrections, I’d love to hear from you.

Understanding jQuery

Wednesday, April 4th, 2007

I’ve been playing a lot with AJAX recently, and have discovered a library that I’m quickly falling in love with: jQuery.

To provide you with context, I’m a software engineer and have been developing commercial applications for well over twenty years. I’ve played with Javascript when it was young, and I was unimpressed. I played with JavaScript when it was a little more mature, but because of Microsoft’s horrific incompatibilities with Internet Explorer verses the way the rest of the world worked, I gave up. Perhaps prematurely. But, none the less, I didn’t pay any more attention to the world of scripting on the web than whatever problem I had to demanded.

Then along came Ruby on Rails. I was surprised to learn that someone had actually written a library to abstract away JavaScript differences — what a clever solution! To that end, I started looking at Prototype and became impressed at the cleverness of the helper functions. That got me to look at Scriptaculous, and suddenly the world of JavaScript didn’t seem so bleak.

But jQuery. Wow. This library resonated with me very quickly, and I started thinking in and doing more functional programming than I had ever done before (opposed to procedural and object oriented). The library was so easy to use, that I was able to do quite a lot with it without understanding it. That was months ago, but today something clicked. I started to see in my mind’s how exactly how jQuery does its magic, and in such a way as to describe it to someone who’s never used one of these AJAX libraries before.

Javascript, she ain’t that bad


Javascript allows one to define classes. Those classes can be extended — don’t think in terms of derived subclasses, but rather actually plastering on additional methods to a pre-established class. Additionally, those methods can have overloaded signatures. And for the sake of brevity, identifiers we’d never use in other languages, are perfectly acceptable short names. We’ve been taught that although identifiers can start with things like dollar signs and underscores, to stay away — these are for library writers and operating systems people. Even though a single dollar sign might be legally syntactically, one should never do it; though in the world where network speed and space matters, such short names are encouraged. Finally, blocks of code, the very stuff you would call functions, can exist all on their own — all you need is a reference to them, they don’t need a name.

Accept all the above as a given, and a tribute to what’s become of JavaScript while you’ve been playing in other languages.

Groking jQuery


Now at this point, I’ll express my conceptual view of jQuery, and while it may not be technically correct or even how it’s implemented, the mental model will give you gross insights has to how you ought to use the library.

Imagine if you will a class called jQuery. Rather than having to type out jQuery each time, we use an alias, a simple dollar sign (the shortest legal identifier that’s not alphanumeric). Its sole job, internally, is to maintain a collection of references to pre-existing elements in your DOM. This collection may be empty, contain one, or more elements. As the developer using jQuery, you need never see this list; you only deal with the jQuery object itself. Ever.

jQuery has many overloaded constructors, which is how it learns what elements to keep in its internal list. You can provide it a reference to an element, a kind of element, an id of an element, a CSS class used by elements, an XPath to one or more elements, straight blocks of HTML, etc. It can even use another jQuery object (which contains a list). jQuery has exotic syntax for picking very specific elements based on conditions and attributes; it even has filters to removing elements from the list.

The actual list isn’t important, because after it’s done with the constructor, all you have is a jQuery object. And the only thing you can do at that point is call jQuery methods. But, oh how clever is jQuery!

Anytime you call a method of jQuery, it does an internal for-each across its internal list, applying your method to every DOM element in its internal collection. Once more, when it’s done, it returns the very jQuery object that was just used.

Object oriented developers know what this means: you can chain methods, creating long strings of behaviors!

jQuery directly manipulates the DOM in a browser-specific manner under the hood, so that you get one, simple, transparent, elegant way of expressing what you want. The actual implementation details are no concern; if a method exists, it works the same way everywhere, regardless of browser.

And, because jQuery operates on numerous elements by twiddling the DOM, it’s possible to write a small piece of code but hook it in all over the place… a process that used to be quite tedious, but can now be done after the fact, meaning your raw HTML is uncluttered.

The Simple Example


Let’s look at a simple tutorial like example.


$(”.xyzzy a”).click(function(){
alert(”Magic!”);
return false;
});

Quite literally, this says create a jQuery object that is a collection of every anchor in containers with a class of ‘xyzzy’, then assign its onClick event handler to reference a function (that has no name!) that displays an alert message.

In Conclusion


That’s pretty much it. The two things to learn are the number of various constructs and types that can be passed to the constructors and filters, and the other thing is the various methods that affect those elements. That’s the meat of it.

jQuery has other helper functions and such, but those are easily mastered. And, once you’ve got those under your belt, check out the plug-ins that are additional methods bolted on to the jQuery object.

Great XSLT Tool for OS X

Tuesday, March 27th, 2007

While working on some XML and XSLT stuff, I ran into some strange problems where transformed XML content was making Firefox spin its wheels forever and Safari was having problems rendering XSL variables.

I wasn’t engaged in a browser war shoot out, I just wanted to know that the XSLT was correctly transforming the XML into the desired output. As various tools were slowly slipping from my fingertips, I figured I might just have to go back to the command line.

XSLPaletteBut then I discovered XSLPalette. It’s a “free, native, XSLT 2.0, XPath 2.0, and XQuery 1.0 debugging palette” for OS X (and it’s a Universal Binary).

All I have to say is that, as a developer, I’m impressed with the ease this tool provides for trying different XSLT engines. I does basically one thing, and that one thing very, very well. I like that in developer tools.

You give the palette an XML file, and XSLT file, select the engine, and it does the transformation, showing you messages along the way, in addition to the transformed output, a collapsible view, and a browser-like rendered view.

Walt gives XSLPalette a thumbs up!

Inside the Seven Dimensional Problem Space of Quality Assurance

Sunday, January 28th, 2007

The other night I happened to happened to have dinner with an old friend, Jeff Voas. He was telling me about a new problem he was working on in which he hypothesizes there are only seven dimensions that describe all computing problem implementations.

While these dimensions are truly independent, and thus orthogonal to one another, it helps to visualize them in the following manner: you have software that runs on hardware which exists inside some environment, these three things are subject to threats; in addition there are non-functional requirements (such as performance and reliability), and everything is operated within a set of defined policies. All of these things are in respect to time.
Seven Dimensions of the Computing Problem Space
Jeff challenged me to come up with any problem that didn’t fit within this model. I could not.

Jeff also pointed out another interesting attribute of his model. That time and threat space could not be locked down. Everything else could be set into stone, frozen forever.

The implications of this, are fairly straight forward, and that is even if you don’t change anything, new threats can be discovered, resulting in your having to change at least one of the other dimension points to compensate. If one could quantify a baseline as a function of these seven attributes, it would become possible to measure changes as a whole. Even better, risk and change impacts can be better assessed and communicated.

What interested me, however, was the reason two of these dimensions could not be locked down, while the others could. I shared my thoughts with Jeff, who after hearing them, sadly pointed out it was a little too late to get this new insight into the IEEE paper.

Physical Three SpacePut aside the model we were working with and consider for just a moment the real physical world of three dimensional space that we live in. Those dimensions are up/down, left/right, in/out to keep things simple.

Clever sorts will blurt out “you forgot time, time is the fourth dimension.” They’d be wrong, because they’re jumping ahead of themselves. Time is not space, but is merely an aspect of where something is in space. Should one actually write it out as a tuple, yes, you get (X, Y, Z, time), and mathematically you can work such problems as having four variables, all independent, thus mathematically orthogonal, and treat them as if they were all dimensions. But, and this is key, I’m not using that definition for dimension. I mean it in the purer sense of the word, meaning that it is possible to move forwards and backwards along any dimensional axis.

Here’s the key: time is not a dimension, but a vector. It only goes in one direction.

Now, here’s a little puzzle for the brainiacs in the group. What other attribute of our physical real universe is also a vector and not a bidirectional dimension?

The answer happens to be entropy; the universe is slowly falling into a state of disorder, and there’s nothing we can do about it. Any amount of effort to reinstill order in one place just speeds up entropy somewhere else, even if it’s just consumption of energy or heat loss.

Turning back to Jeff’s model, I proposed that he actually only had five dimensions and two vectors. The reason time and threat space could not be locked was because they were vectors. He pondered and bought into that notion.

Then comes the zinger. If we only know of two vectors in the real world, and the model attempts to quantify real world problems, and there are two vectors in the model, then is it possible that the threat space is entropy?

There are few moments in life where you actually get to see the gears turn and smoke come out of the ears of a bright Ph.D., and I watched Jeff retreat into his own mind for a minute or so and then reemerge - he concluded with me that it was, and that it was a shame the IEEE article had already been submitted.

Where is &_= Coming From? (…not a typo…)

Friday, January 12th, 2007

I recently was playing with Prototype, the JavaScript framework that implements an AJAX object to send Request to do a cross-browser XMLHttpRequest.

My server was reporting problems with the messages being sent from AJAX, and after a quick debugging session, I found that everything AJAX was sending had a “&_=” appended onto the end of it.

This clearly looks like a bogus parameter, say appended to a GET sequence, designed to pacify something. A little bit of digging on Google, and it appears it was introduced to resolve an old problem in Apple’s Safari.

Problem is, Prototype is still sending it, and when I sent an XML message to my server, the SAX parser didn’t take too kindly to the extra cruft at the end of the document.

If you open up prototype-1.4.0.js, and jump to line 631, you’ll see a line that looks like this:
if (parameters.length > 0) parameters += '&_=';

…removing it solves the problem. I found this more elegant than making my server pre-process an XML message.

Ruby, and Rails - Some Observations for New Comers

Tuesday, January 9th, 2007

For this post to be of any use to you, see if you are sitting in the following boat:

  • you’re an experienced software developer
  • you are well versed and frequently use C, C++, C++, Java, C# and/or Objective-C
  • you’ve can code in Awk, Perl, and/or Phython
  • regular expressions don’t scare you
  • you can write HTML, XML, XSLT, CSS, and feel that JavaScript is a toy language, though you’re impressed with what people are currently doing with it
  • you easily mastered PHP, JSP, and/or ASP
  • you run Linux/FreeBSD, you installed a webserver, and it might even be running FastCGI
  • you know SQL, maybe even stored procedures, and you use MySQL and/or Postgress without incident
  • you are no stranger to programming languages, and secretly have Assembler, Pascal, Fortran, BASIC, under your belt
  • you have a solid working grasp on object oriented programming and design
  • …you’ve heard all this great news about Ruby on Rails, and you want to give it a try.

You’ve seen the Ruby videos where they make an application in 15 minutes. You got the two must-have books: Programming Ruby and Agile Development with Rails. You even downloaded Ruby and got Rails installed.

Yet, despite all that, you hit the weirdest stumbling blocks: you can’t find methods being called in either your code on the framework, there’s a lot of scripts writing more terse config files than code, there are object oriented things going on you’ve never seen before, and even with the documentation things are going slow. Feel like you’re just not getting it, even when you put it down and come back to it later?

That’s basically how my initial experience with Ruby on Rails went.

But then I had a breakthrough, some things suddenly became apparent and that provided the clarity I needed to pick up and start coding. And if you’re sitting in the same boat, I’m about to share them with you so that you may also start becoming productive.

My first mistake was not having a good development environment. This is almost imperative in order to pick up Ruby on Rails. I tried it on Windows, I tried it on a Linux install, but what go me through was this configuration:

What all of this gives you is an instant Ruby on Rails configuration that’s totally graphically and insulated from everything else on your machine. Want to upgrade? Drop a new copy of Locomotive on your machine. Want to deploy? Move your project directory to your production system. Learning Ruby by picking up Rails is possible, but can be frustrating.

As for book resources, you’re missing one: Ruby on Rails: Up and Running. This has got to be one of the best resources for non-Ruby developers. I’ll be honest… if you already know Rails, this isn’t for you. If you don’t know a good deal of programming, this also isn’t for you. What makes this book nice is that it is not the blind type-what-I-type “tutorials” where you just follow along. No, it explains what’s going on, as well as what’s happening under the hood. The problem, though, is that some of the stuff happening is dang clever, and there are things that may take a little bit of thinking to wrap your mind around.

Over on Ruby-Doc.org, there is a presentation called “10 Things Every Java Programmer Should Know About Ruby” by Jim Weirich from his presentation at OSCON 2005. While worth the read, he makes one very interesting point up front. He teaches C programming to old school Fortran programmers, and what he concludes, from looking at a lot of bad C code, is that Fortran-like code can be written in any language. The point being his students aren’t thinking in C when they are writing in C. And that was my problem with Ruby.

I had glanced over the Ruby book enough to recognize most of the language constructs such that I could read and understand what I was looking at, and I browsed the API to see it was extensive and useful. Some languages fall short of libraries, and Ruby doesn’t make that mistake. But what wasn’t apparent was that I was still thinking like a C++ / C# / Java programmer, and falling back to procedural languages didn’t help either. To use Ruby means learning some neat, new stuff.

For example, we’ve all used the Model-View-Controller (MVC) pattern, but did you know Rails uses a variant called Model2? Did you know that when you generate a Rails project, it includes a web server - and, while WEBrick is common, and lighttpd is known for speed, Rails is now defaulting to one called Mongrel. And, I suggest checking Mongrel’s FAQ; I learned more computer science in an evening, just following links than I would have in a semester or two of college. Turns out, beyond BNF, there’s ABNF, which can be used with Ragel (a Lex / Flex / Yacc / Bison like lexer/tokenizer) to generate note just code (including for the D programming language) but also state machine graphics using dot files from GraphVis. To which, someone made an ANBF file with the HTTP 1.1, thereby generating a very fast web server as a huge state machine. There are also links to fast data structures like Ternary Search Tree (with code). It seems that some kind of lambda calculus is used, making code blocks and closures acting like fast include templates for web pages. …don’t get scared off, like I said, it certainly appears that those that use Ruby think differently.

Ruby’s philosophy is convention over configuration and coding; meaning, if you do thing’s Ruby’s way, you won’t have to write a lot of complicated configuration files, nor will you have to write a lot of code. That poses the first major problem, you can’t just search for for something in the project, because it more likely than not, default behavior is causing it to happen. You need documentation, not grep, to survive.

The good news is that if Ruby is anything, it’s consistent. Its Rails script file generates all projects with the same layouts, and once you learn this, you’re set. Luckily, most of the real action happens in just a few directories. Your code happens in a directory called app, and your database happens in a directory called db; that knowledge alone will get you along 90% of the way.

Here are some other interesting Ruby tid-bits that tripped me up.

I’ll assume you know the subtle differences between scalars, constants, variables, pointers, and references. Ruby introduces something unexpected: identifiers that start with a colon, called symbols — what’s going is that you’re making symbol table entries directly. The idea is that you can save a lot of space if you’re reusing arbitrary keys but don’t care about the value. The newbie will get these confused with strings or variables, just as a novice doesn’t understand the difference between a pointer and a reference. Learn it, it’s everywhere.

Ruby also has method names that end with ?, !, and =. If you’re looking for something special to happen, don’t. These are simply visual sugar to let you know you’re asking, doing something dangerous, or assigning.

Rails is clever in that it uses scripts. You tell a script to generate a model, and it generates you a class (based on ActiveRecord::Base), and a database “migration” (based on ActiveRecord::Migration). The database table and the class are magically tied together by the framework and naming conventions. Migrations, it turns out, are incremental changes to the database, allowing you to move forward and backward in time to different versions of the scheme. You can even execute code to populate and initialize data. Quite clever. ActiveRecord manages the object-relational-mappings (ORM), providing quite a bit of API in the process.

Rails allows for three different databases: development where you do all your dangerous stuff, an optional test database which gets reloaded with virgin test data after each automated test to keep tests independent, and an optional production database for doing the real work - it never gets zapped.

Ruby has a lot of things that take object oriented programming to the next level. Those who have programmed in Objective-C or SmallTalk might recognize some of the constructs. Messages are really, and honestly, separated from methods. You send an object a message, and that in turn invokes a method. This is not like C++, Java, or C# where the notation object.method() simply performs a lookup. No. A message with parameters is sent to the object, and some twiddling might happen with that message before it decides which method to invoke. This means you might, and quite deliberately, send messages to objects for which they have no method to honor it! These can be forwarded, mutated, or dropped on the floor. Or, even more magical, you can capture messages and replay them, and not just to itself.

In fact, this is how Rails accomplishes much of its magic with ActiveRecord. When you send a message like .find_by_name, there is no method on the object. This triggers a missing method routine, it then does string parsing on the message finding out it started with “find_by_” and then it decides to look up in its ORM associated table to see if it has a column called “name”, and if so, it then passes that information to some routine that manages object persistence. As such, you’ll find you use a lot of human readable “methods” (which are really messages) that simply don’t exist, and that’s why grep can’t find them.

To make matters even worse, classes can be extended at any time. You can add new methods (messages!) dynamically. Real ones. Even to ones you don’t own. Including language primitives, which are also true objects. There’s no need for interfaces, just objects that respond to the same message. Weird, eh? Things get stranger when you can extend individual instances, not just the class as a whole.

Naturally your language snobs will get all up in arms about the total lack of type safety and so forth, but it turns out all of the automated tests tend to prevent you from getting into that kind of trouble in the first place. The end result is physically less code that happens to be very readable.

Ruby does use notations, like an at-sign to prefix class instance variables. And, just like C# has get/set accessors, Ruby has a shortcut to turn members into accessors. Hmm, what else, well null happens to be a perfectly legal thing that you can send messages to, and that gets rid of the famous ‘null pointer exception.’

Ruby doesn’t have statement terminator, it’s more like Python in that the end of a line is the end of the statement — unless of course you’re in the middle of one by defining a block of code. Ruby does something else neat, which is kind of like Java’s anonymous functions, it lets you create blocks of code with named parameters. You can store these blocks, pass them around, and invoke them later from other routines. It feels a lot like C++’s Standard Template Library (STL). More than function pointers, more than method pointers, just isolated blocks of reusable functionality. Virtually every message has optional parameters and the ability to take an optional block of code; these get passed to some method eventually.

One of the more confusing aspects visually, at least for me, is the fact that because these things are messages, parenthesis aren’t needed. Thus you’ll see many examples where an object is allocated with .new, and immediately following it is a do-end block that makes no logical sense, especially if it were executed immediately after the allocation. Turns out that’s not what’s going on. The new() method is overloaded, and one version of it takes a block of code. (Remember parameters and code blocks are different and both can be passed.) The new method can hand this block of code to the created object, which is free to shove it away in some member variable for some use later.

Lots of what-would-be-complicated-in-other-languages-stuff is accomplished by clever anonymous code blocks being passed around. These constructs just don’t exist in many languages, and that’s why they feel foreign, and that’s why Ruby can be confusing to pick up.

Ruby comes with an interpreter (irb) and a shell (script/console); both are used from the context of the project. In development mode, each Rails web request reloads the Ruby class files. And while slow (by production standards), this gives instant feedback to code changes. But, fret not, the production version does all the correct caching you’d expect.

Surprisingly, knowing just this little bit of information above is enough to give you the jump forward if you’re stuck. And it turns out learning, using, adopting, and accepting the Ruby way happens very quickly.

Programming Library Conventions

Wednesday, December 27th, 2006

A side study of mine is how developers write (and organize) libraries and then [inadequately] document them.

Consistency is a good thing, and while I’ve never seen the following fact explicitly pointed out, it does represent some extra thought on the part of the Java library authors.

With the realization that applications are not just for USA English speakers, Unicode support is becoming mandatory. Standard ASCII bytes allows for 256 characters, but Unicode supports everything, including foreign characters.

Java’s strings use Unicode characters, not bytes, although we all know a Unicode character is represented by a sequence of one or more bytes. This is why the storage size of the representation is not necessarily the same as the string’s length.

With the Java libraries, anything that talks about Readers and Writers is working with content in terms of Characters.

Anything that talks about Input and Output is working with content in terms of raw bytes.

Knowing that is how the library is sliced up makes it much easier to find the routine you’re looking for.


Bad Behavior has blocked 1329 access attempts in the last 7 days.