A brain dump on the subject of disassembling Vectrex code

gtoal
Armor Attacked

Posts: 186

A brain dump on the subject of disassembling Vectrex code Oct 23, 2019 22:15:41 GMT -5

Quote

Post by gtoal on Oct 23, 2019 22:15:41 GMT -5

Traditional static disassemblers generally don't do a very good job of labeling data areas.

Some of them have trouble even determining what is data and what is code, though a semi-intelligent disassembler will do a tree-walk of code areas from known entry points to find areas of a program that it knows for sure are code. However indirect jumps and other tricks can mean that some of the code areas don't get discovered. Data areas are generally opaque but for a good disassembly we need to know at least when a data item is a two-byte value and whether it represents a code or data address.

Some of these issues are important if you're building a static binary translator (or decompiler), others are more important for a disassembly, especially if you need to modify the disassembly and re-assemble the program with changes. Inserting a NOP at the start of the code or an extra byte of data at the start of the data area can completely break a re-assembly if done naively.

On the 6809 it's not actually possible to statically make a complete labelled symbolic disassembly because of the DP register - data addresses can represent different areas dependent on the value of DP. In a binary translator that's not so much of an issue as you're just accessing a numeric offset but in a disassembler you want the name of the location so that if you move the data around and reassemble, everything still works.

However there is a way to get all the information that a disassembler or translator needs - something that was not possible a few years ago when computer memory was at a premium.

We can use an emulator to dynamically profile the execution.

What we do is have the emulator keep arrays of information, with an array element for each byte of the ROM (or we might as well do it for each byte of the machine's address space to keep the code simple at the expense of using a lot more RAM - of which we have gigabytes available nowadays so who cares?)

This info is kept in a database for each program being profiled. It is dumped at the end of a run (or incrementally during a run for safety in case of emulator crashes or forced exits) and the last dump is loaded the next time you run the same program so that the data is added to incrementally. If possible the data collection should be done by all users of the emulator and shared. If done efficiently, this data collection will not affect the speed of the emulation and should allow users to run emulated games, programs etc as they normally do so that there can be a widespread collection of data by a user community, not just a one-off by a programmer creating a disassembly.

The information that needs to be captured on an emulation run includes at least these details: (There may be more needed)

This is an instruction that was executed. (record *all* values of DP seen when executing at this address.)

This is an parameter byte of an instruction that was executed (don't care too much which byte of parameter since easy to rediscover)

This instruction was explicitly jumped to (bra, jmp, jsr etc)

This instruction was indirectly jumped to (jmp (blah), jmp blah,x)

This instruction was indirectly jumped to (rts) (always needed for binary translator, needed by disassembler if return address was modified)

This instruction was indirectly jumped to (rti) (for modified return address, not for return from real interrupt)

This address was loaded from directly as a single byte

This address was loaded from directly as a double byte

This address was loaded from indirectly as a single byte

This address was loaded from indirectly as a double byte

This address was used as the base of a single byte indexed fetch - save lowest and highest offsets, useful in reconstructing data tables

This address was used as the base of a double byte indexed fetch - save lowest and highest offset

This address and the following byte contained the values of an indirect jump target address

This address and the following byte contained the values of an indirect data pointer address

Profiling information such as the number of times a code or data address was fetched could be useful too.

for each instruction address (opcode byte), record the values of each register: (DP handled differently; and record CC bits separately...)

- initialise to "FEEDBEEF"

- when executed, if cache is "FEEDBEEF" set to value of the register

if cache is not "FEEDBEEF" and not the same as the value, set to "DEADBEEF"

after execution, the cache will contain whether a register is always constant at that point in code (we might as well use 4 byte integers in our arrays - faster, easier to work with, and we can afford the RAM)

The register contents info isn't critical to disassembly and not strictly needed in a binary translation (although it *can* be used for optimisation), but if known, it can help a lot with commenting the disassembly.

I'm not asking any of our emulator authors (Vide, VecX, Vectrexy, Mame etc) to add these facilities but if you find yourselves at some point adding some subset of these features for some need of your own, do consider the bigger picture and think about adding all of them at the same time. (I've added this to my own ever-increasing job queue but it is *way* *way* down that stack...)

I'm pretty sure that with an instrumented emulator like this and suitable back-end code, we could do really good disassemblies of all the old Vectrex code in a way that would make it reassemblable, not to mention decompiling to C etc for retargetting to other architectures.

(Of course I'ld like to see something similar in general purpose emulators such as Mame to make retargetting of arcade games simpler too)

Graham

Malban
Cinematronic

Posts: 514

A brain dump on the subject of disassembling Vectrex code Oct 24, 2019 1:19:49 GMT -5

Quote

Post by Malban on Oct 24, 2019 1:19:49 GMT -5

Just to be the kind of "show off" I am...

Vide has some of this already.

In configuration->Debug, there are 3 Checkboxes:

Codescan in Vecxi
enableProfiler
vector information collection active

If enabled, different kind of information is collected while runnning the game, which dissi can use to disassemble "better".

The data collected must be manually stored using dissis "save cnt". In order for it to show, you also have to "reset dissi".

That is a bit cumbersome, but I never really thought it all the way thru - and never thought anyone but me would ever use it :-).

(I'm not sure if I even documented this anywhere...)

Cheers

Malban

Vide: vide.malban.de/

D-Type
Space War(ped)

Posts: 366

A brain dump on the subject of disassembling Vectrex code Oct 24, 2019 1:24:09 GMT -5 via mobile

Quote

Post by D-Type on Oct 24, 2019 1:24:09 GMT -5

I was checking out disassemblers for some 6502 code and it seems that everyone's still using IDA else there's a couple of others out there that may have 8-bit capability.

Your method reminds me of how a basic Google Maps might work, privacy issues aside; just track where everyone goes by GPS co-ords and that becomes a suggested route when someone searches for directions.

P*h*i*l*l*i*p EEaattoonn in real life

Malban Cinematronic Posts: 514	A brain dump on the subject of disassembling Vectrex code Oct 24, 2019 1:38:43 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Malban on Oct 24, 2019 1:38:43 GMT -5 Found it :-): vectrex.malban.de/preliminary/ae7f11e6.html
	Vide: vide.malban.de/

The Doctor
Guest

A brain dump on the subject of disassembling Vectrex code Oct 6, 2020 18:07:08 GMT -5

Quote

Post by The Doctor on Oct 6, 2020 18:07:08 GMT -5

How feasible would it be to edit a Vectrex game to change the Z values on particular vectors? I'm thinking of a colour mod, where the Z value sets the colour. It would be great if original games could be modified to set the desired colours.

gtoal
Armor Attacked

Posts: 186

A brain dump on the subject of disassembling Vectrex code Oct 7, 2020 13:16:13 GMT -5

Quote

Post by gtoal on Oct 7, 2020 13:16:13 GMT -5

Oct 6, 2020 18:07:08 GMT -5 The Doctor said:

How feasible would it be to edit a Vectrex game to change the Z values on particular vectors? I'm thinking of a colour mod, where the Z value sets the colour. It would be great if original games could be modified to set the desired colours.

It's actually been done (in hardware) by Arcade Jason, and implemented in software by the guys doing the Android-based emulator.

D-Type
Space War(ped)

Posts: 366

A brain dump on the subject of disassembling Vectrex code Oct 12, 2022 6:47:30 GMT -5 via mobile

Quote

Post by D-Type on Oct 12, 2022 6:47:30 GMT -5

gtoal Not sure if I mentioned this elsewhere, but MAME has a debugger option to highlight the code that has been previously run.

It's not perfect because you have to then manually copy out the addresses, but, for maybe a 1 hour effort, it's well worth it for a single binary.

I did this for a 6502 game I'm reverse engineering, I fed the output into dasmfw disassembler via an data/code "info" file and you get a clean source code file produced. (Dasmfw started life as a 6809 tool and was extended to 6502.)

MAME debugger also has an option to log the code executed. I haven't played with this much and the output file gets pretty big pretty quick, but I think from memory I think it put line counts or something like that to make it practical. Anyway, maybe worth consideration.

Example dasmfw info file:
github.com/phillipeaton/JETPAC_VIC-20_disassembly/blob/main/nfo_code_data.nfo

P*h*i*l*l*i*p EEaattoonn in real life

A brain dump on the subject of disassembling Vectrex code

Post by gtoal on Oct 23, 2019 22:15:41 GMT -5

Post by Malban on Oct 24, 2019 1:19:49 GMT -5

Post by D-Type on Oct 24, 2019 1:24:09 GMT -5

Post by Malban on Oct 24, 2019 1:38:43 GMT -5

Post by The Doctor on Oct 6, 2020 18:07:08 GMT -5

Post by gtoal on Oct 7, 2020 13:16:13 GMT -5

Post by D-Type on Oct 12, 2022 6:47:30 GMT -5