I’ve been rather quiet over the past year, apologies for that.

Planning

But, it wasn’t all for nothing, some of the time was taken to port Cannon Fodder from the disassembly (DOS Version), to C++. Originally after looking at the disassembly and finding out the game had some 660 functions, I decided to port 3 functions a day, which would mean the project would be completed in 6months.

Just to ensure a day off could occur, 6 functions a day where ported, and the entire project was completed within 4 months. In the end, it wasn’t enough work… so support for the Amiga version was added, along with the “Amiga Format Christmas Special” and the “Amiga Power Plus” versions.

Porting Secrets

The biggest secret to porting a game, whether the original source is available or not,  is to start at the beginning, and do one line of code at a time. Attempting to re-engineer, take shortcuts, or anything short of this, will undoubtedly end badly. Do the port, get it running. Then make changes (keep them small and simple, you want the game to be true to the original!).

Patience and Persistence. No one ever said this work is easy, or was going to be quick. Its also not guaranteed everything will go smoothly.

Bugs

Many bugs where accidentally introduced during the porting of Cannon Fodder (at one point, it was calculated I was making 1 mistake every 3 functions), some of which, took many hours of debugging (including single stepping the original in Softice, inside a Windows2k VM, while single stepping my code in Visual Studio).

The worst part about that, is sometimes you hunt down a bug, only to find it actually occurs in the original game.

Patience

Due to the boredom which sometimes occurs while porting large functions of assembly (Some of the functions in Cannon Fodder, are well in excess of 1000 lines of assembly), a small video was put together for your viewing pleasure (ensure you switch to 1080p, or it will just be a blur)

x86 to C++ Video

Of course, this isn’t how I work… Generally Visual Studio is open on the left, and the disassembly is open on the right of the screen, and I literally, do one line of assembly at a time

Multi Platforms

If you did take the time to look over the history of the repository, you’ll probably notice the Amiga/PC split just prior to the completion of all the ‘sprites’… at which point, classes are created to handle everything Amiga/PC specific… and any calls to the original functions, have been replaced with calls into objects (which are created on startup of a specific game platform).

The final code isn’t all that clean, it contains large numbers of GOTOs (while I cant verify it, it does appear the original engine was written in assembly)… and the Amiga specific palette handling code, was done very poorly.. it does work as intended.

To aide in the conversion of Amiga sprite data, a few PHP scripts where put together, one of them is available here. https://gist.github.com/segrax/00ad90a9bc54e2518c38

 

Project Link: https://github.com/segrax/openfodder

 

Mission 5: Those Vicious Vikings

Lately I’ve been learning a lot about various file systems and how they store data out to the disk.

Specifically the various methods available under ZFS have been of interest to me, and after reading the ZFS On Disk specification, it was clear the documentation was incomplete (especially in relation to RAID-Z). This lead me to begin reading many lines of code until it was understood.

ZFS uses virtual devices (vdevs) to create pools, and a pool can be made up of more than one vdev,  a vdev itself can then be made up of one or more of the following

  • Files
  • Partitions
  • Entire Drives

In this post, I’m going to concentrate on how multiple drives are combined in RAID-Z mode, to provide a single device to a pool

Many drives become one

Lets assume we have 3 drives, named D1, D2, and D3. To simplify things, each drive only has 12 blocks (sectors). The calculations/allocations are done by a function named ‘vdev_raidz_map_alloc‘ located in a file named ‘vdev_raidz.c‘ (The FreeBSD source is pointed to by this post, but the code should be the same in any current version of ZFS)

This function takes a few parameters,

  • pointer to the data
  • size of the data
  • target offset
  • on disk block (sector) size
  • number of drives
  • number of parity devices

Based on the provided information, it calculates the position to use on each physical device (remembering, the upper layers in ZFS keep track of which blocks are free in the virtual device and pass the values to this function), returning a structure with this information.

Drive Block (sector) Allocations

To the pool, we have one continuous drive, which looks like this (with each block being labelled from 1 to 36 — Each Device has 12 blocks)

Virtual Device
D1 D2 D3
01 02 03 04 05 06 07 08 09 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36

To the mapping function, each device looks like this (which each block being labelled from 1 to 12, on each device)..

Drives with blocks
D1 D2 D3
01 02 03 04 01 02 03 04 01 02 03 04
05 06 07 08 05 06 07 08 05 06 07 08
09 10 11 12 09 10 11 12 09 10 11 12

Based on the algorithms at the start of the function, we can determine exactly how the data is spread across the drives…

This table shows how the virtual device blocks are spread onto the three drives, with the blocks on each device being labelled using the block number from the virtual device.

Drives with blocks
D1 D2 D3
01 04 07 10 02 05 08 11 03 06 09 12
13 16 19 22 14 17 20 23 15 18 21 24
25 28 31 34 26 29 32 35 27 30 33 36

An Example

Lets write a file to the virtual device, again for simplicity, lets use a multiple that matches up with our devices.

  • Block: 2 (This is the target block in the virtual device — Remember, the upper layers keep track of used/free blocks, its no concern at this level)
  • Size: 1024 Bytes  ( 2 blocks )

We have a 3 drive system, which means 1 drive of space is used for parity…  meaning, storing this file will require 3 blocks in total (for each block that exists on each drive, we require 1 block of parity).

This means, we use Block2 from D2, Block3 from D3, and Block4 from D1. With D2 (the first block used) storing the parity.

Example file blocks
D1 D2 D3
01 04 07 10 02 05 08 11 03 06 09 12
13 16 19 22 14 17 20 23 15 18 21 24
25 28 31 34 26 29 32 35 27 30 33 36

The total size, including parity is returned to the upper layer.. so keeping track of all used blocks is possible (its also aware enough blocks are available at the requested location, prior to calling the function)

Recently while improving support for the C64 version of Maniac Mansion in ScummVM, I found myself re-writing large chunks of assembly in C… and it was decided the time to write another blog post had come.

Note: This is designed as a one for one translation demonstration, not a complete conversion of an application between different systems and does not solve all the issues. Eventually I will do a post on a complete port from dis-assembly.

1. Getting Started

First we need to pick a target, as I promised in an earlier post (Cartridge Dumping), further investigation would be done on the ‘EpyxFastLoad’ cartridge. For now, we will just use it to as an easy target.

Tools

  1. da65 (Which comes part of the CC65 package)
  2. Cartridge Dump (which was acquired earlier)

 

2. Dump the dis-assembly of the binary

For da65 to function correctly, we need to create an ‘info’ file with the basic information about the ROM. The first two bytes of a normal C64 Cartridge contain the ‘Entry Point’ of the first executable code, followed by a marker (0xC3 C2 CD 38 30). On start-up of the C64 kernal, these bytes are checked for by reading from 0x8004 – 0x8009), if they exist, the address in 0x8000 is jumped to.

This means, in our ‘info’ file, we mark the beginning of the file as a byte table (I’ve already peeked ahead, so I’m sure everything before 0x8030 is data, as well as an address table at 0x8047).

Execute the application

3. Finding some code

The very beginning of the dis-assembly shows us nothing worth porting (it sets memory registers, then copies the address table at 0x8047-0x8056 to the stack.. this causes each function to be executed in order, until executing the final start-up function at (0x8057)

 

The next piece (from the start up function) is a little more interesting, so we will work with it

4. Porting some code

Just about the easiest starting approach, is to just map the instructions one to one, into ‘pseudo’ C.

Here is the first part, along with the ASM

We can see from this, a never ending loop if the value returned from LC00B() is not 0.

Also is an access to the byte array at the beginning of the ROM, which is copied to upper memory (it copies the word ‘FASTLOAD’ to the keyboard buffer in this case), and there is a piece of code which is copied into LC00B (in a real situation, we would leave this code out.. as copying 6502 code in memory to jump to, is rather useless.. we would re-write this code separately and jump to it)

The second part, this is a little trickier as it uses the 6502 Zero Page, we can instantly see its writing to 0xAe, 0xAF, then referencing it like a pointer

This looks like its going to be a problem…

 

5. A Problem (overflows, bytes and words)

We now have a problem, this code will not work if we just ‘translate’ it, as its checking for overflows in bytes, and using pointers… so lets try again, with a few minor changes. Lets remove the bytes, and use something bigger to store the number (a pointer is fine on a 32bit platform for this purpose — as it is a pointer in the original system).

Of course, this is going to leave the problem when its used that the ‘address’ its pointing to, is likely not to exist on a modern system, and even if it does.. its not going to be the data we are expecting!)

We could tackle that by mallocing some memory, and chucking the original rom from 0x8011 to 0xA000 into it- but its pointless!!)

This is actually a simple CRC, it adds together every byte from 0x8011 to 0xA000, and compares it to the values stored in 0x800F, 0x800A.. and if it fails, it prints a message, and then loops forever.

 

Hopefully this gives you the feel for how simple translating code between ASM and C actually is… doing it completely and clean is a bit more work, but one step at a time

 

Not long ago, while attempting to dump C64 cartridges, I discovered that 3 of my C64s had died at some point over the last few years. After some probing, it became clear there was multiple issues, one board had been previously repaired.. with at least 4 chips being replaced.

The Kernal ROM on one board, the track had lifted and been patched with wire wrap… it wasn’t looking good.

Then it happened, while trying to think of my next Arduino based project, it was decided that nothing lives forever, so why not get some DE-soldering practice and remove some chips from the worst of the boards… in the process, it would give me access to spare parts, and a CPU, which could possibly be hooked upto my Arduino for some fun!

 

Step 1: Get the Parts

First, I needed to remove the MOS 6510 from the C64 motherboard.

so we take out the bottom screws..

C64 outside the hood

then the screws from the internal shield / heat-sink

C64 under the hood

40 pins later…  (this photo was taken after the Kernal was removed too)

6510 Removed

Got it out in one piece!

MOS6510 Just Removed from board

 

Step 2: Figure out the specs and the lay of the land

Now we need to decide how to go about wiring this up, lets take a look at the 6510 Data Sheet, according to this we need a minimum of 50 KHz for “Clock In”,with a minimum pulse width of 430 nanosecond (ns) and a minimum of 1000 ns for each clock cycle.

Anything under 430 ns, will cause issues (such as clock glitching, which is when parts of the circuit inside the CPU, don’t have enough time to complete while power is available, allowing strange behavior to occur – such as skipping execution of instructions).

We quickly put this piece of code together to test the number of times ‘loop’ is called per second (with basically no code in it)

The result, about 84KHz…

Loop Result

But, an Arduino has a number of Pulse Width Modulation Pins (PWMs), which most of the pins are running at 900Hz by default (the exception being Timer 0, which runs at 490)… It turns out with some tuning, we can set Timer 0 to run at 62KHz.

After setting everything up, I found that its rather difficult to single step from the ‘loop’ function while the CPU is cycling at high speed, and despite the doco stating it requires 50KHz, it actually ran at 490Hz without a problem…

 

Step 3: Wiring the Bread Board

First we grab the 6510 Pin-Out, its pretty simple, 16 Address Lines, 8 Data lines, Clock ,Ready, Reset and RW are the pins we need to write to the Arduino for now,

IRQ, NMI, AEC, VCC all attach to +5v, and GND connects to ground

We place our chip in the breadboard…

MOS6510 in Breadboard

15 minutes later, all wired up and ready to go

mos_1
As each CPU pin was wired up, The used Arduino pin number was put into a variable for easy use later

 

Step 4: Coding the Sketch

With the pin numbers already setup, Our next goal is to write some functions to control these pins as required by the CPU, a quick list of pins we need functions for

Pins and Purpose
Pin Direction Purpose
Clock In Frequency the chip is running at
Ready In Single Step the CPU
Reset In Start the CPU Reset sequence
ReadWrite Out is the CPU reading or writing the address bus
Address In/Out Address to be read or written to
Data In/Out Data just read from memory or to be written to memory

How does each pin work? a quick check of the documentation, told me

Pin Control
Pin How
Clock Constant High-Low switching
Ready When held 'HIGH' the CPU will execute normally with each clock cycle
Reset When held 'LOW' for a few cycles the CPU will jump to the Reset vector

Setup

First we setup the Arduino pins to the correct input/output mode… if you’re wondering why the Address pins have been skipped, its because they change state, and can be read/written depending on what the CPU is doing, so we set their direction when we attempt to read them.

Reading/Writing Address Lines and Data

To set the ‘Data’ pins from a provided byte

To read the ‘Address’ Pins,

Memory Simulation

As we have no address bus, or memory… we need to ‘simulate’ one for the CPU (otherwise, how will it know what to do?) a simple array will work for our purposes,

If you’re wondering what this bit of code does (note we skip bytes 0 and 1 because they’re used internally by the processor)

Main Loop

The main loop is fairly simple,

  1. Read the Address Lines
  2. Read the Read/Write Pin
  3. Output via serial port Read/Write Status: Address: Data  (if Read/Write or Address has changed)
  4. Set the Data Port to the correct byte for the ‘Memory’ being accessed (Note we force the reset vector to return 0x0002)
  5. Check for commands from the serial port

Because we have no bus and are manually setting the Data pins, we cannot have the CPU running past a single instruction without this loop executing, as data read/writes would be missed.

Commands accepted via the serial port

  • ‘r’ will start the 6510 restart process,
  • ‘z’ will single step the 6510

You can find the entire sketch on Github

 

Final Result

There we have it, We can now reset and single step the CPU, watching the output as it runs.. of course we could make a more complex program to execute, and write data back to memory if we wanted…

Serial output

Untitled

The Running Arduino

mos_2

My first experience with RAID-5 was over a decade ago now, running on my file server at home with an expensive Intel Raid Controller protecting my data for many years.

How it really worked was a bit of a mystery, spreading the data across a number of drives and taking up one drive worth of space for ‘parity’.

A few years ago I heard about ZFS, The benefits of the software based RAID-Z immediately stood out… coupled with some of the other features, it became an obvious choice for my next file server, and twice since then, it has saved two of my servers from data loss when drives have failed.

Recently, the old question of how it all worked came up again.. and after some reading, I stumbled onto the ‘on-disk format‘ documentation for ZFS and read it through, its extremely detailed, and with the information learned, I headed over to the git repository for ZFS and started snooping around.

Eventually finding line 606 of ‘vdev_raidz.c’, a simple loop XORing data together.

The entire process turned out to be very simple, you take a file, split it into sector sized blocks, XOR the same indexed byte from each block together to create the ‘parity’ block. The effect this has, is that one entire block can be missing or corrupted… and using the parity block, it can be recovered.

Data Blocks

Lets pretend we have a 4 Drive RAID5, our pretend sectors are only 16 bytes each, and we have a file which contains 48 bytes.
This means we need 3 blocks to store this file, plus the parity block.

Here are our blocks (parity block is zero until its calculated)

Blocks
0 1 2 3 0 1 2 3 0 1 2 3 Parity Block
73 65 67 72 69 64 20 62 67 20 77 69 00 00 00 00
61 73 20 73 6C 6F 63 6B 74 68 20 70 00 00 00 00
69 6D 70 6C 20 73 70 61 61 72 69 74 00 00 00 00
65 20 72 61 6E 6E 69 6E 79 00 00 00 00 00 00 00

Building Parity

To generate our parity block, we XOR a byte from column x, row y from Block1, Block2 and Block3 together,
For Example,
XORing column 0, row 0 of each block together: 0x73 ^ 0x69 ^ 0x67 = 0x7D
XORing column 1, row 0 of each block together: 0x65 ^ 0x64 ^ 0x20 = 0x21

Some PHP was put together to calculate this for us,

The result from executing the code is the parity block,

Blocks and Parity
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
73 65 67 72 69 64 20 62 67 20 77 69 7D 21 30 79
61 73 20 73 6C 6F 63 6B 74 68 20 70 79 74 63 68
69 6D 70 6C 20 73 70 61 61 72 69 74 28 6C 69 79
65 20 72 61 6E 6E 69 6E 79 00 00 00 72 4E 1B 0F

Verification of Blocks

Now we execute the same code, with the extra parity block added to the initial array
For Example,
XORing column 0, row 0 of each block together: 0x73 ^ 0x69 ^ 0x67 ^ 0x7D = 0x00

The result, is that all columns are zero

Verification
0 1 2 3
00 00 00 00
00 00 00 00
00 00 00 00
00 00 00 00

One Drive Fails

But what happens in the event of a drive failure?

Lets’ wipe Block 2 ( zero it , or just completely remove it )

Failed Block 2
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
73 65 67 72 00 00 00 00 67 20 77 69 7D 21 30 79
61 73 20 73 00 00 00 00 74 68 20 70 79 74 63 68
69 6D 70 6C 00 00 00 00 61 72 69 74 28 6C 69 79
65 20 72 61 00 00 00 00 79 00 00 00 72 4E 1B 0F

The result… The missing Block!

Block 2
0 1 2 3
69 64 20 62
6C 6F 63 6B
20 73 70 61
6E 6E 69 6E

 

 

While digging through boxes of Commodore 64 (C64) goodies recently, I stumbled across a few old cartridges… most of them where easily found online in cartridge (CRT) format, but a couple of others turned out to be a little harder to find.

Freeze Machine
By A Ager of Softcell

Freeze Machine Cartridge

Graphics Utility
By Cockroach Software

Graphics Utility Cartridge

It might seem pointless to have the CRTs of applications designed to rip games/pictures from the memory of a C64, but for me its all about the code inside those cartridges.

Dumping

The first step was to find out what’s required to rip a cartridge, turns out you need a C64, a Disk Drive,  a 1541 to USB Cable and a blank disk or two. Just so happens all these items where inside my box of goodies.

Unfortunately, upon hooking up all 3 of my C64s… they presented a range of symptoms, from black screens to garbage

IMG_4868_2

Without spares, and not wanting to purchase another C64 just yet, I decided to try something a little more fun and adventurous….

Arduino

An Arduino is a simple programmable input/output device, with a range of digital/analog input/outputs (depending on the model), possibly a perfect match for my small project, now all i needed to do was find out how many pins would be required.

Step 1: Ordering the bits

Locating the Pin-out for the expansion port was a simple job for google, leading me to the C64 Expansion Port Pin-out (this site is full of information about multiple Commodore computers). It turns out the Expansion Port is an EDGE Connector, with 44 contacts and a pitch of 2.54mm.

This information made for an easy search of eBay, leading to a supplier who sold in groups of 5 (Industrial Card Edge Slot Socket Connector 22x2P 44P 2.54mm 0.1″ 3A)

According to the pin-out, i only really needed 24 of those pins, so an Arduino Mega 2560 would do me just fine (with its 54 digital input/output pins — giving me space for future projects), also purchased was a Beginners pack, and a pack of jumper wires.

Arduino Beginners Pack
Arduino Jumper Wires
Arduino Mega 2560

Step 2: Putting it together

A week later, all packages had arrived… Unfortunately, the two sides of the connector weren’t far enough apart to fit into my breadboard (by about 3mm!).. this resulted in a good hour worth of fiddly soldering and wire stripping…

Wiring up the edge connector socket

Step 3: Wiring the breadboard

With the connector wired, and all wires ready to go… the breadboard was wired up, writing down which pin was in which row/col as I went

Wiring the bread board

Step 4: Wiring the Arduino

The result was a touchy mess of colored wires, but on first run, it appeared to work. After dumping an entire cart, something looked wrong… and it turns out one of the wires had came lose from the breadboard, causing an address pin to be always 0.. resulting in duplicated data every 64 bytes (address line 6)

Arduino To C64 Cart

Step 5: Coding the Arduino

A combination of bit shifting and pin on/offs was used to read the cartridges on the Arduino side, with a simple serial dumper running on Windows (which started the dump by sending a simple command to the Arduino).

Two arrays where used on the Arduino side, mapping its in/out pins to the Address/Data pins in use by the connector. This allowed easy shifting of the address and turning the specific address line pin on/off.

Setting an Address to read

 

Data was handled the same way, setting a bit to ON if the data pin was HIGH, by shifting a 1 left by the pin number that was being read

Reading the Data

Full Source: https://github.com/segrax/Arduino-C64-Cartridge-Dumper

 

Step 6: Verification

Dumping cartridges wont do anyone a lot of good, if you cant be certain its working correctly. So a cartridge which was available for download was dumped and compared (Visible Solar System).

A perfect match! (the CRT header was removed first)

Visible Solar System Comparision

Next up was the Epyx Fastload Cartridge…

Hmm, a word (two bytes) here are different… but why…

Fastload

Conclusion

Both the Freeze Machine and the Graphics Utility cartridge where ripped, but without opening the actual cartridges up, its hard to tell their exact memory configuration from the dump (C64s are very flexible in terms of expansion carts). Both carts use have buttons, which leads me to believe they will be firing interrupts…

Further analysis on the differences between the two versions of the Epyx Fast Load Cartridge is required, keep an eye out for a post about it in the future

My first real experience with reverse engineering was similar to jumping into the deep end of a swimming pool, having no knowledge on how to swim and no-one around to help.

That was the DOS version of Dune II, and while many years later results where achieved… an easier approach would of been to start with smaller executables and slowly work towards the big ones.

Thats where “Hello World” comes in, as a simple program which almost all beginnings are taught, it makes sense that your first steps into software reverse engineering, should begin with the same program. This allows us to get a feel of what happens when code is compiled, and an idea of what to expect on bigger projects.

Required Tools

Compiling

We begin by creating an empty console project, and compiling this small piece of code.

 

Quick Analysis

Now we load the compiled executable into IDA, an auto-analysis will begin and once complete we should be located at the ‘main’ function declared above.

Almost instantly we will be able to notice,

  • Main has been identified
  • Strings have been resolved and given names ‘aHelloWorld’ (showing the referenced string in full next to its use)
  • Included Library functions have been resolved to names ‘__imp_printf’

Dis-assembly of 'Hello World' in IDA

We can easily map this disassembly back to the C code from above.
C Mapped

In-depth Analysis

You’ve probably noticed that most of the instructions seem to be completely unrelated to the C code above, this is because the compiler takes care of handling the stack and parameters for us. You may also notice that a pointer to the ‘Hello World’ string is passed to printf, and not the string itself.

Lines  2- 4: Store the stack from the calling function, and prepare the stack for this function
Lines  6- 8: Store registers which will be ‘smashed’ during the execution of ‘printf’
Lines 10-15: Push parameters for ‘printf’ to the stack, call ‘printf’, then restore the stack
Lines 17-19: Restore the registers which where saved prior to the call to ‘printf’
Lines 21-23: Remove our stack changes, and restore the calling function stack