Hexadeciman - 16-bit Programming Blog

Tuesday, August 25, 2009

Open Source Struggle

After having decided on Python 2.6 as a platform it's time to find an IDE and get on with a few tutorials. You may snicker at me wanting an IDE but I'm trying to keep the barrier to entry as low as possible initially so I don't get discouraged. I'm already coming in cold to Python and Ubuntu so having to learn emacs/vi as well is a step too far for me. Plus I have a .NET background so I'm used to the power of Visual Studio.

I decided on Eclipse as an IDE mainly because it's full-featured and is relatively equivalent to Visual Studio. I may take a look at Java as well so using the same IDE for multiple languages is a bonus.

Pydev
Pydev is a plugin for Eclipse which supports Python development. This is where the journey gets a bit interesting. The organisation for Pydev is horrendous. Whereas projects like Python and Eclipse have easy to understand websites that provide a unified front, Pydev is all over the place.

Apparently Pydev is owned by Aptana although it is still free and open source. There are also "Pydev extensions" which provide some extra functionality, also free and open source. Why these are two distinct things escapes me and fragments the product. Some of the downloads are from SourceForce, some are from Aptana and others from Fabioz. All I wanted was a simple plugin, why not just bundle it all together as one Eclipse plugin with a link to the update site on the front page?

After going through the Getting Started section I had Pydev installed correctly and was ready to start. First things first, I need to download the Subclipse plugin which seamlessly integrates Subversion commands into Eclipse. So I find the update URL on the Subclipse site and enter it into Eclipse to which it replies "No features found on the selected site(s)". First roadblock.

Updating Eclipse
So I Google 'Ubuntu Subclipse "no features found"' and I come up with a very limited result set. Some of them suggest using the Sun JDK instead of GCJ which I have no idea how to do. Others suggest making sure my Eclipse 3.2 version is up-to-date. I ask Eclipse for a list of updates and it asks me what mirror I wish to use, naturally I select Australia which gets me an error message saying there were network problems- 404 meaning the server is down. Great. So I try again with a US server, bingo! I assume I want to install the latest version which is 3.2.2 so I check that and continue. Up pops an error saying it can't find particular files and that the installation has failed. Second roadblock.

So I Google the error message and it turns out that I need to install Eclipse 3.2.1 before I can install 3.2.2. There seriously needs to be a version check before allowing me to witlessly try to install a version that can't be installed! Again I fire up the Eclipse update UI and find the 3.2.1 version (for some reason under "Find new features" rather than "Updates") and install it. Back to the update UI, I select 3.2.1 Patches as I assume I'll need this as well to update to 3.2.2. Trying to select that results in an error saying

Eclipse Java Development Tools 3.2.1 performance patch (bug:159325) (1.0.0) requires feature "org.eclipse.jdt (3.2.1.r321_v20060905-R4CM1Znkvre9wC-)".

Great. Now where would I find that and why can't it just automatically resolve this dependency?
I give up on this for now and take a look back at some of the other Google results for installing Subclipse.

Wrong Subclipse Version?
Some results suggest it could be something to do with JavaHL, whatever that is. I find a link to a Tigris page explaining which versions of Subclipse are compatible with which versions of Subversion. I check my Subversion version which is 1.5.4 so apparently I want the 1.4.x version of Subclipse. I was trying to use the 1.6.x version previously which is probably why Eclipse didn't like that update URL. Whoops.

No luck with the 1.4.x version either, it still says "no features found ...". This is becoming a real hassle. I wish Subclipse was just another checkbox in Synaptic Package Manager so I could avoid all this fuss. I have to say that it hasn't been an easy experience so far and the initial respect I had for the environment has been marred by this wild goose chase.

Back To Updating Eclipse
Having exhausted that line of inquiry I decide to update Eclipse again. One search result suggests opening Eclipse as a superuser using "sudo eclipse" and installing version Eclipse 3.2.2 SDK which works. I go back to the update UI and it's still showing me 3.2.1 Patches as an available option which seems weird considering I just upgraded to 3.2.2. I install 3.2.2 Patches and now I'm up to the latest version.

Back To Updating Subclipse
Considering everything's been working OK as superuser, I'll try to add the Subclipse plugin as superuser and see what happens. No luck, still says "no features found". Well my last resort is to add Subclipse as an archive. I shouldn't really have to do this but I'll give it a go, so I'll download the Subclipse zip file and add it to Eclipse as an archived site. This still doesn't work. Something is definitely missing from my Eclipse installation but I have no idea what. My solution is to install every update and feature for Eclipse that I can from the update site. OK almost everything in the Callisto Discovery site tells me that I need Eclipse RCP 3.2.2. WTF?!? I thought I just installed 3.2.2!

Reinstall
I have decided to uninstall Eclipse and reinstall it via Synaptic Package Manager to start again from scratch. After a "Complete Uninstall" and reinstall I still have all of the updates I had applied. How in god's name am I supposed to revert to a plain vanilla Eclipse? Do I have to reinstall the OS?

First Steps With Ubuntu

As a programmer using squalid decades-old technology at work, I'm looking to delve into something more interesting in my spare time.

Linux
First of all I figured I'd look into this new Linux thing that everyone's talking about. The most approachable distribution seems to be Ubuntu so I initially downloaded Ubuntu 9.4 Desktop and tried it out in a virtual machine.

A very slick and easy to use installer let me set up all the essentials and get in to the desktop. I'm impressed that out of the box it has recognised my video card, network card and sound card. Everything is up and running nicely.

Then I took some time out to learn about how the file system is arranged in Linux and a few basic terminal commands. Next I want to get some tools installed so I can start a hobby LAMP project including a webapp, web services and MySQL/PostgreSQL. First of all I installed Eclipse, Subversion and Apache from Synaptic Package Manager. It's a very simple point-and-click GUI to install and update all software on the machine which is fantastic. I really wish Windows had this instead of a mixture of various updating mechanisms which each want to run in their own process. Apparently Windows 7 has this feature for drivers but not for applications.

Python/Perl/PHP
I shopped around and decided on learning Python to play around with. My initial field of candidates was Perl, PHP and Python. Perl is just too ugly and hacky- so many implicit rules to remember and it's OO syntax is obviously a bolted-on afterthought. I'll definitely use this at work though as there are plenty of times when a quick hacky regex script is just what I need. PHP looks good for hacking together code quickly but feels like a modern day VB6 which quickly turns into spaghetti code. It also shares the bolted-on OO syntax.

Python
A couple of the reasons that I chose Python are:
- Google is using it fairly extensively and has contributed many performance updates to the interpreter. All of these languages are interpreted and therefore are much slower than their compiled counterparts. If some of the smartest programmers in the world (Google) are contributing to speed it up then it has the best chance at good performance.
- Easy syntax, great library support and community
- Google AppEngine can be used with Python which provides a free and highly scalable playground for me to test out my code. The only downside is I'd have to learn about the BigTable storage system which probably doesn't suit my needs right now.

Deciding between Python 2.6 and 3.1 was hard. 3.1 is the latest and greatest but in the real world I think 2.6 and earlier are almost universally used. For my career I think the best option is to go with 2.6 so any skills I learn are transferrable to the real world.

Next up you can see my struggles to get the tools I need working to start on my Python development.

Thursday, July 2, 2009

Crossing the Barrier

Although the Win16 API is reasonably complete for the simple things you might want to do, there comes a time where you really wish you had access to the Win32 API to get functionality like threading, modern common dialogs, named pipes and all those goodies.

There are a number of ways to get access to the Win32 API from a 16-bit application and they each have their pros and cons. I'll detail one here and another in a followup post as they are both reasonably involved.

ActiveX EXE
VB3 has an inbuilt ability to call into 32-bit out-of-process ActiveX components. Out-of-process means that the component runs in its own 32-bit process and communicates to the 16-bit application via some sort of IPC that I don't care about. If you have the knowledge and wish to fill me in, please do so!
Example (VB3):



Dim oActiveX As Object

oActiveX = CreateObject("MyObjectPackage.MyObjectName")

oActiveX.SomeFunctionName("Parameter1", 2, "3")

This looks very close to how VBScript instantiates and uses ActiveX components because it is exactly the same mechanism.

Note that when accessing the component from 16-bit code, it has to be an EXE as 32-bit DLLs cannot be loaded into 16-bit address space. This is the method we started off using - writing the ActiveX EXEs in VB6 and accessing them from VB3 - and it was very smooth until we were made aware of the fact that users like to use our application in a "thin client" setup.

I've termed this setup "thin client" for want of a better term. In a standalone install, the application (around 4 GB) would be installed on the target machine and used by one user at a time. Our users cottoned on to the fact that they could share the application directory and use the application from any machine on the (Windows) network by mapping a network drive to that location and creating a shortcut to the main executable. Smart.

The reason this is a popular setup is because the application contains critical business information which is updated on a regular basis- monthly at least. So every month we stamp thousands of DVDs/CDs and courier them to our customers who dutifully install them or use them as coasters. The install on an old machine - Pentium 3 and 4 with 4x or 8x CD-ROMs are the average - takes around 45 minutes. A client that has a dozen or more machines to install the application on quickly looks for a faster solution, hence the "thin client" approach. (By the way if you think our business model is begging for a web application, so do we and we're in the midst of building it).

This approach totally hamstrings our use of 32-bit ActiveX EXEs because of how the mechanism works. When VB3 requests an ActiveX object to be created and passes its ID (known as an AppID I think), Windows will look in the registry under HKEY_CLASSES_ROOT\{AppID} for the ID. If it finds it, it will retrieve the CLSID (a GUID) and look up HKEY_CLASSES_ROOT\CLSID\{CLSID} to find the EXE/DLL which implements the object. If a user is executing the application from a remote machine, they won't have the AppID or CLSID of our object in their registry so the application will fail.

I have come up with a stop-gap solution by writing a script to distribute with the application that will register the ActiveX EXE locally if it is being run across the network but this is an extra step our users have to take (albeit a small one) but any extra step they have to take is a personal failure to me.

A cleaner solution must be found! See my post "All That Thunk" for my cleaner solution although it makes you feel dirty inside (coming soon).

Sunday, June 21, 2009

Upgrade Path

If you are wondering why we bother struggling on with a 16-bit application when it would be much easier to support if ported to 32-bit then that makes two of us.

There are several reasons why this doesn't happen:

1. Cost/Benefit
Taking a moderately sized 16-bit VB and C code base and porting them to 32-bit would take a long time and be quite risky. The business cannot see a reason to sink a lot of developer-hours into an endeavour which - in their eyes - at best is going to result in an indistinguishable application and at worst will cost a huge amount of time and money. That's not mentioning the opportunity cost of having developers working on the port when they could be making beneficial modifications to the existing 16-bit code base. Management sees no value in the 32-bit port.

This sounds incredibly naive and dangerous considering if Microsoft removes 16-bit support from a future OS we will be in serious trouble and need to port to 32-bit or stop making money.

2. Third Party Components
The functionality of the application relies on some 3rd party components for UI controls, ZIP compression and other bits and pieces. The components are supplied as VBX files which are analogous to ActiveX controls in 32-bit land.

With some of the components we are supplied both 16-bit VBX's and 32-bit OCX's (ActiveX) which is a definite plus if we are going to do the port. For the rest of the controls we would be forced to either find 32-bit equivalents that are available and supported, or write our own from scratch. Either option immediately increases the risk involved. Writing our own ActiveX controls is outside the expertise of our staff and finding equivalent controls would require an amount of massaging of the existing code to get the new controls to fit properly.

3. Deprecated Product
Currently there is an effort to re-invent the product as an online solution in Flash/Flex rather than distributing the application on disc to our customers. The effort has been ongoing for a number of years now and is still yet to see a full scale commercial release.

We are hesitant to pour a lot of effort into porting the existing application to 32-bit when it will be retired some time in the near future. This has been the official company line for years although there is still active development on the product by a team of about six so my faith in the product being retired any time soon is low.

4. Tried It Before
At the request of the then lead developer then product was designated to be ported from 16-bit VB3 and VC++ 1.52 to the (then) ultra-modern VB6 and VC++ 6.0. Full liberty was taken with the structure of the VB6 code including liberal usage of classes and other advanced features.

Major headaches were encountered straight away when it was realised that there are subtle differences between how VB3 and VB6 utilise memory. For example, VB3's strings are ASCII with a 16-bit length prefix whereas VB6's strings are UTF-16 with a 32-bit length prefix. Considering we use a lot of Get and Put commands in VB to read and write data to and from disk, there is a lot of hassle involved in porting these modules as the formats are totally incompatible. From memory there is also something different about the Len statement between the two versions which made measuring structure sizes impossible.

When the port was more than three quarters done it was entered for initial QA and immediately raised red flags on the performance front. The 32-bit application was an order of magnitude slower than the 16-bit original largely due to VB6's terrible implementation of classes which brings the interpreter to a crawl.

After several rounds of performance enhancements the 32-bit version ended up several times slower than it's antique counterpart which was still completely unacceptable. Later on the project would be canned entirely in favour of the new web based version which has been written and rewritten six times now by various teams.

It turns out that it's not so elementary to replace a dinosaur. If it works, people will use it- whether it is written in VB3 or the latest and greatest web language du jour matters not to the end user. The web is definitely where we want to end up with the product but it is a hard sell when the user is used to a very responsive rich desktop application and is shown a slow, mostly mouse-based Flash/Flex application. Would you be happy with that upgrade?

Visual C++ 1.52

A lot of the more complex functionality in our application is offloaded into 16-bit C/C++ DLLs which are authored with the last 16-bit Microsoft C++ IDE- Microsoft Visual C++ 1.52. The DLLs are mainly plain C for compatibility with VB3 and include VBAPI.H/LIB to allow manipulation of VB3 arrays, strings and variants. In some places we have used MFC but I try not to as it is just another dependency to cause problems.

In comparison with VB3, VC++ is lightning fast as it is compiled rather than interpreted. Similar to VB3, it has a fully fledged debugger with breakpoints, stack traversal and variable inspection.

Most of what we do is looking up information in huge (multi gigabyte is huge in the 16-bit world) data files. Some of these files have custom hand-written compression and/or "encrypted" with a simple ROT-style algorithm to avoid trivial data harvesting by curious users. Performing this decryption and decompression in VB3 code is out of the question as it is far too slow and in some cases completely impossible.

The one major trick I can share with you in VC++ 1.52 is that there are certain conditions under which the debugger will refuse to recognise a breakpoint. You have a line marked with a breakpoint and you absolutely know it has been executed but the IDE doesn't break into the code and acts as if the breakpoint doesn't exist. The solution is to force a software breakpoint in assembler (ASM) to trigger the debugger to break. In ASM, you issue software interrupt 3 and because C has inline assembler, it's a one-liner:

asm int 3;

Because this is compiled into the DLL, this breakpoint will be hit every time the instruction is executed. Sometimes - such as in the guts of an inner loop - this is undesirable so you must make it conditional:

if (strcmp(value, "expected value") == 0)
  asm int 3;

I can't count the number of times that single instruction has saved my bacon. ASM FTW!

Apart from that, VC++ 1.52 is very solid and not too different from it's grandchild VC++ 6.0, which I have used extensively. The main difference is you are targeting the 16-bit Windows API, rather than the Win32 API. Most things that you take for granted when using the Win32 API are gone- all you can really achieve with the Windows API is synchronous file I/O, network I/O and registry modifications.

It is possible to use the Win32 API from 16-bit code using a process called thunking. This will be detailed in a post of it's own as it is rather complicated, at least in the way that we use it.

Saturday, June 20, 2009

Visual Basic 3.0

There are many things to love and to hate about Visual Basic 3.0 (VB3). Released in 1993, this tool was revolutionary in its time, providing a WSIWYG form (dialog) editor when the next best thing was hand coding forms in C++. The fact that anyone with a mouse could now design a GUI application with basic logic made ranks of developers nervous that they would become obsolete.

Pros:
The VB3 IDE (for want of a better acronym) is replete with line-by-line debugging, breakpoints, variable inspection, stack traversal and a surprising number of modern features.

The form designer is very simple and really not that different from the UI designer in Visual Studio from recent years. Click and drag to create your controls, give them an ID then refer to them in code by that ID in event handler functions. Visual Basic 3 the language is quite feature-complete for a dinosaur. All of the mainstays are there- If/Else/ElseIf, Switch/Case, While loops and the infamous GoTo. It has garbage collection and dynamic typing so you don't need to worry too much about types or memory allocation if your application is moderately sized.

It is entirely imperative with no concept of user classes. As with C, classes can be approximated by creating a module and having all public functions take a user defined type as the first argument. If you really need classes then VB3's integration with 16-bit C DLLs is remarkable so if VB3 doesn't meet the performance requirements you can easily drop to C for a lightning fast binary search lookup or other required heavy lifting.

Cons:
There are two major limits we run into when maintaining a large VB3 code base. First is the limit on number of controls per form. On the main form of our application there are images, grids, a toolbar, menus and invisible controls such as timers.

Because (I assume) VB3 has an internal numeric identifier for a control and that ID must be an 8-bit number because there is a limit of 256 controls on any single form. This leads us to the ridiculous situation where if something is to be added to the main form, I must ask my manager which toolbar button or other control should be removed. He understands this is simply a limitation of the tool so usually prepares his change requests with what should be added and hence which controls should be removed.

The next major limitation is the number of global variables. Internally VB3 stores all global symbols (global variables and constants, public module functions, external function declarations) in a table of fixed size. I'm not sure how big this is but it must be around 1,000 entries. VB3 will let you know right away when you go over the limit as you will no longer be able to run the application in the IDE or compile it to EXE.

I can understand this limit but the thing that bugs me is that constants are included in the global symbol table. We have huge lists of constant integers that act as enumerations. For example, because we ship the product worldwide we have a list of natural languages that the application may be running in. That's about 30 global symbol table entries taken up right there. This forces us to eliminate global constants and replace them with literal values (magic numbers) in the code that consumes them. This is just asking for trouble but it's the only way we can squeeze in new event handlers and global functions. This is very common:

' These global constants are commented out because
' we don't have room for them
' Global Const gEnglish% = 1
' Global Const gSpanish% = 2
...
Switch gCurrentLanguage
  Case 1  ' gEnglish = 1
  Case 2  ' gSpanish = 2

Also, because of this limit we are forced to combine functions where possible to keep the global symbol table entries down. This leads to some very confusing code and breaks down the idea of having modules as classes. Spaghetti code, here we come!

Event handling is done by creating a function in the form's module named controlName_eventName() and all of the plumbing is done behind the scenes for you. Because it is so easy to learn the language and the event handlers are such a natural place to put code, many VB3 application are spaghetti with no real separation of UI code from business logic or data access. With discipline you can create a two or three tiered application using VB3 but it really does tempt you to break the rules and put in filthy hacks.

Visually, in a VB3 application your GUI controls are limited to those available in Windows 3.11 and makes your application stick out like a sore thumb when running on a more recent version. For a legacy application I guess this doesn't really matter. Plenty of libraries and government departments are still running character mode DOS applications but these days it's via terminal software running on XP or Vista. Users that have to use an old application daily don't have the luxury of complaining about it- if it does the job then nobody cares what it looks like.

One of my pet peeves with VB3 is that the project file doesn't keep track of the breakpoints introduced into code. This means that you can be in the middle of an investigation into strange behaviour carefully setting breakpoints and observing variable values when for no reason the IDE will GPF (crash) and everything's gone. Because the IDE crashes reasonably frequently - several times a day if I'm using it full time - you end up with a Ctrl-S hair trigger. Every line of code I write is followed by a save to make sure it isn't lost. I have seen several hours work simply disappear due to an IDE crash and vowed to never let it happen again.

Backward Compatibility

Microsoft goes to great lengths to ensure that software written for a previous OS works on the next version. If businesses realise that upgrading to the latest version of Windows will cause their mission critical legacy apps to stop working then they will simply not upgrade (see: Vista).

This means that software written for Windows 3.11 can run on Vista and 7 with only minor modifications. The mechanism by which this works is called Windows on Windows (WoW), which entails 16-bit applications being hosted in a process known as the NT Virtual DOS machine (NTVDM.exe).

System calls are intercepted by NTVDM which mimics a native 16-bit OS. The running application is none the wiser that it is running in a virtual machine and goes about its business as if it's running on Windows 3.11. These system calls are presumably delegated to the host OS then re-packaged to be returned to the host application. This process - going from 16-bit to 32-bit and back - is called "thunking" and is a topic I will go into detail about in further posts.

There are several factors that make 16-bit programming a headache. First of all the tools are not meant for development at the scale that we are doing it. The 16-bit memory model limits you to around 1MB of total memory so memory allocation has to be managed very carefully. Stepping over that line will result in a crash. In future posts I will detail the extreme contortions we go through to stay under the memory limit and within the boundaries.

Tuesday, August 25, 2009

Thursday, July 2, 2009

Sunday, June 21, 2009

Saturday, June 20, 2009

Blog Archive