VCP200 reverse engineering

Some years ago I got a few VCP200 Speech recognition chips. Even though they are very limited in what they can do I was motivated by the simple interface to try to find a use for them. I eventually built the circuit available in the April 1991 edition of Radio Electronics to make use of the chip (with minor modifications). After finding some soldering faults and understanding more of what the amplification and clipping sections were doing I actually got it to work! My next plan for the three chips I have now is to draw up a design for the circuit to drive them using either through hole or surface mount components and have some slightly more robust modules than the one I hand soldered together.

But this post is not about that. When I watched the EEVblog episode on this chip it interested me greatly that it was not an ASIC, but probably a microcontroller with mask ROM inside. Having done a lot of work with MAME and dumping whatever I can get my hands on this seemed like an interesting challenge. After asking around it seemed that it may be possible to electrically dump the contents of the chip using a diagnostic mode Motorola included. Now comes the challenge of building the circuit to get those modes working and eventually figure out if the data I want comes out somewhere undocumented.

That’s the problem, the documentation says that this chip, the MC6804J2 has four modes of operation, and none of the documentation for any of them indicates that it can be used to read out the internal program or data memory. Based on some previous reverse engineering efforts on similar Motorola parts we started by looking at Non-User Mode, but that ended up being a dead-end (for now). For starters the documentation says that ports A, B, and C are used to interface to external instructions, but on the J2 chip we use there’s only port B and half of port A available at all! In addition the non-user mode interface is described in no documents I can find for this chip, but we may be able to eventually infer what the interface is based on other similar Motorola chips with more surviving documentation.

Based on the schematics for these test circuits I built something similar. What I made used the reset circuit and PA6/PA7 drive circuit from the functional test (which is self-check from above), but changed the crystal to an oscillator and made a whole bunch of pin configuration jumper settable. Before we move on to my design, take a look at the circuit for the ROM verify mode. This chip has a hardware CRC circuit inside and the ROMs, when properly authored, will be able to pass a CRC check using this circuit. That means that at very least the ROM data is coming out on one or more of those pins, XORed or otherwise garbled, and that at least one pin is clocking all those bits into this circuit which can verify that the CRC passes or fails.

As you can see I have adopted my usual use of massive piles of 0.1″ header and jumpers to configure things. Here I have pin headers that can be connected to a logic analyzer (port B only) as well as jumpers to enable pull ups on any available port pin. For the non-port pins: /IRQ is floating, as is EXTAL, XTAL goes to the output of my 1.000MHZ oscillator can, MDS and TIMER are both pulled up with 1K resistors. The reset circuit pins are pulled up with 4.7K resistors as are all of port B. Port A pins 6 and 7 are pulled up with 10K resistors, and pins 4 and 5 are pulled up with 1K resistors. Now for the taking of the data.

For this we used sigrok (the pulseview GUI). I had previously built a clone logic analyzer, but really the only important part about it is that it’s compatible with this open source software stack that’s pretty easy to use. For this dump we connected to pins PB0 (D7), PB2 (D6), PB3 (D4), PB5 (D2), PB6 (D1), and PB7 (D0). The other pins didn’t seem to do anything interesting and I was having an issue with some channels of the logic analyzer (soon to be overhauled). You can see that PB7 is the master bit clock, and that makes sense because in the rom verify crc circuit it is clocking the latches. PB6 is not used in that circuit but seems to be a pulse happening once every 12 clock cycles, our frame clock. PB5 seems to be the address of memory being represented in those 12 bits of data (each address is sent twice before moving on). PB3 and PB2 look random, probably the CRC values. PB0 looks like our serial program data! Here is the explanation for the above captured data:

one frame clock every 12 bits
one cycle of data every 24 bits
we get the address twice
we get 0xff for the first data byte (not real)
we get the real data for the second data byte
also, the data is delayed by 24 bits (1 address cycle)
Example as seen in the capture pic:
1 00000000000 D1, PB7 frame clock
1 00000111010 D2, PB5 reverse is (010 1110 0000) (0x2e0 address)
1 10000000011 D7, PB0 reverse is (11 00000000 1) (0x00 data from 0x2df)
1 00000000000 (frame clock)
1 10000111010 reverse is (010 1110 0001) (0x2e1 address)
1 11111111111 reverse is (11 11111111 1) (0xff data not real)
1 00000000000 (frame clock)
1 10000111010 reverse is (010 1110 0001) (0x2e1 address)
1 11010011011 reverse is (11 01100101 1) (0x65 data from 0x2e0)
1 00000000000 (frame clock)
1 01000111010 reverse is (010 1110 0010) (0x2e2 address)
1 11111111111 reverse is (11 11111111 1) (0xff data not real)

The python code that does all this parsing is based around the raw binary logic data that Pulseview can export. This keeps the filesizes nice and small while still being pretty easy to parse. That python code was able to generate a binary file representing the entirety of the program memory on the chip.

Now that we have a big block of everything in program memory it needs to be disassembled. You can do this by hand, but that’s infinitely tedious, and since there was no disassembler for the 6804 readily available we had to write one. The first pass was my super easy method of decoding every byte. I just checked one table to see what instruction decoded to and printed the contents of that table entry. Then I checked another table to see if it was at more than one byte, if so I grabbed another byte and printed it verbatim. Then I checked that table again to see if it was three bytes long, if so I grabbed the last byte and printed it, otherwise I moved on. There’s no re-checking or anything so if you get off by one byte somewhere you may never get lined up again and you’re reading data as opcodes and everything is all screwy. Frank had an improvement to my disassembler that made it a little smarter in handling each op code and formatting them in a way that matches standard disassembly syntax better.

Inside the ROM we see key areas described in the datasheet memory map. The User Restart Vector and the User /IRQ Vector are the same because in the VCP200 datasheet the /IRQ line of the microcontroller is tied to Vcc. The self-test ROM Reset Vector goes to AF6 which is interesting since the region starts at AE0. It turns out that AE0 is used for a subroutine that gets called later. The self-test /IRQ line goes to some sort of subroutine in the middle of the self test region so that makes sense, whenever that line is triggered then the program can verify it happened based on where it is in the code. The bottom of the self-test ROM actually contains some ascii data saying “MC6804J2 – ATX” which must mean that this is the self-test code specific to that processor variant, confirming what we suspected (except now we know it’s probably the HMOS variant). In the preliminary reverse engineering of the program memory it looks like the VCP200 code looks at three bytes of the data ROM that we don’t have. It also doesn’t look like the self-test code look at it at all. I am unclear what part of this processor checks the data ROM for integrity, if any.

It’s possible the data ROM is somewhere in the data we’ve already captured, it’s also possible it’s accessible in non-user mode, or through some part of the functional test program that we’ve started reverse engineering. If we emulate the processor, then send it known streams of pulses representing speech data and see what the real chip does versus the emulated chip we may even be able to reverse engineer those three bytes it reads out of the data ROM. It should be a tedious, but simple task of designing test sequences that cause the program to read those data bytes and analyzing what it does after to determine what those bytes were. That is beyond my current level of ambition though.

This entry was posted on November 26, 2020 at 4:30 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Evan's Techie-Blog