Thursday, May 31, 2012

Decoding XFDL

    When I first started to work on my own XFDL Viewer for Apple's osX, I knew that it wasn't going to easy. What I didn't know was just how hard it was going to be.

    The layout of an XFDL document is not only poorly formatted, its also inconsistent. Let me explain. Lets take normal every day "Text Labels". These are static text that say things "Name:" or "Organization:". These are not meant to editable nor should they be. However, within the XML content there are number  of things that I really don't like.

    You would think the layout would be something like this:

    <Top> 45 </Top>
    <Left> 100 </Left>
        <Name> Arial </Name>
        <Size> 8 </Size>
        <FColor> Black </FColor>
        <Style> Bold </Style>
    <Value> Some text to display on the form </Value>

    Thats not really how it looks, but you get the idea. For the most part, it really is just that easy. However, a number of forms do something nasty. They make it look like this: 

  <Top> 45 </Top>
  <Left> 100 </Left>
  <Value> Some stuff to put on the form </Value>
    <Font Information>

    Then they add a new field:

<Line Spacing>1.5</LineSpacing>

    This makes parsing XML Data really hard to do in any sort of "logical" manner. Problems like this, ultimately lead to this:

    And this: 

    In case you are wondering what the problem is on the second one, the text for the line after "Routine Uses:" extends way past the "edge" of the page. I should have seen this coming, but in all fairness and naïvety, I made the mistake of assumption. I assumed a lot apparently. 

    Why did I let this happen? Speed of development. Rather than taking a long time planning the XML parser, I just went for it. What a mess. Also, there is actually a section within the XFDL called "ToolBar". This is an area on the main window that has things like a cool army background, and dynamic buttons. What do these buttons do? They have: 
  • Next Page
  • Previous Page
  • Print
  • Attach File
  • Save
  • Save As
  • Email
    Nothing that can't be hard coded into the program. Nothing. So I figured I would strip out the entire "Toolbar" section and just concentrate on the document itself. I still think this iso a good idea as these "buttons" rely on java scripts embedded within the document to preform their functions. 

    So, as you can see, I am still working on it. However, until I can get a good stable XML parser built to handle the myriad of possible document errors progress is going slow. Yes, there are actually errors within the documents, such as not closing XML tags, or labeling a few hundred different items with the same tags and in no particular order. 

    While I could try my hand at reverse engineering the software that already exists, that would be both boring and illegal. What I am doing is analyzing a document format and trying to make something out of it from scratch. This is much more fun and rewarding that simply hacking windows software onto a mac. 

Until next time ... 

No comments:

Post a Comment