Tuesday, 6 September 2016

What are .hex files?

You must have come across .hex files by now. Every time you compile a program on the micro:bit website you end up with one. You copy it to your device and the program runs. Every time we post a project on this blog we include a .hex file. Every time you hit the Flash button in Mu it silently and secretly creates a hex files and automatically copies it to your micro:bit. But what are .hex files, how do they work and why should we care about them?

The Intel HEX Format

The micro:bit's .hex format is actually not all that special, in fact the format was created by Intel way back in 1988. Intel set out to define a simple format that they could use to load binary programs and data into their processors. They wanted it to be ASCII (human readable) rather than the hard-to-read binary that computers needed so they devised a simple way to encode the binary data as ASCII characters. They also defined the format to be able to specify where in the target device's memory the data should be stored and where the device should start executing the code from. Because the format was so simple it was easy for them to write code for their processors that would load and run one of these files. The format soon became a favourite and is now supported for everything from microprocessors to EPROM programmers to.. your micro:bit.

How Does It Work?

The format is straightforward, a sample line looks like this:
:1047D0001EE0019B201C013303990193FFF71AFB94
Every character of the line (apart from the very first colon) represent one hexadecimal digit. Two hexadecimal digits make a single byte of data. Each line of the file follows the exact same format:
  • Start code. This is the : symbol at the start of the line.
  • Byte count. Two bytes to indicate how much data there will be in the data field.
  • Address. Address to store the data at.
  • Record type. What type of data is in this line.
  • Data. The actual data.
  • Checksum. A simple checksum that is used to ensure the integrity of the data.
We won't go into more detail than this here - if you want to know more then take at look at the Wikipedia entry or, if you're having trouble sleeping, dive into Intel's original specification document.

What Does A MicroPython .hex File Look Like?

Let's write a simple Python script and compile it into a hex file. Here's the script:

from microbit import *

display.show('Hello World')

And here's resulting the .hex file.
simple_script_1.hex

The raw .hex file is not very exciting or readable, so here's an annotated version which shows what each line means:
simple_script_1.dump

Although there are lots of possible record types in the HEX format you'll only see four types used here:
  • ELA. Short for Extended Segment Address. This tells the micro:bit where in memory to store the data
  • DAT. This is the actual data. You can see it in hexadecimal format as well as ASCII.
  • SLA. Start Linear Address. Tells the micro:bit where to start executing the code from.
  • EOF: A marker for end-of-file, meaning that there is no more data to follow.
Take a look through the file and you'll see lots of unreadable junk in there (this is generally the actual machine code that the processor runs) as well as some readable text. The majority of the file is the Phython interpreter. If you look closely you'll see all the function and modules names, help text and error messages in there. But then.. right at the very bottom of the file you'll see what looks like the text for our script, and that's exactly what it is!

Now, what happens if we write a different script?

# A slightly longer script this time..

from microbit import *

display.show('Goodbye World')

simple_script_2.hex

simple_script_2.dump

Notice anything interesting in the dump file? Well, almost all of the file is exactly the same. In fact there are only a few lines of difference and those few lines are the ones that we already identified as containing the data for our script.

Here's the end of the first script:
ELA: 0x00030000
DAT: 0x0003e000 4d50340066726f6d206d6963726f6269 MP4.from microbi
DAT: 0x0003e010 7420696d706f7274202a0a0a64697370 t import *..disp
DAT: 0x0003e020 6c61792e73686f77282748656c6c6f20 lay.show('Hello 
DAT: 0x0003e030 576f726c6427290a0000000000000000 World').........
SLA: 0x00013a85
EOF:
And here's the second:
ELA: 0x00030000
DAT: 0x0003e000 4d505e0023204120736c696768746c79 MP^.# A slightly
DAT: 0x0003e010 206c6f6e676572207363726970742074  longer script t
DAT: 0x0003e020 6869732074696d652e2e0a0a66726f6d his time....from
DAT: 0x0003e030 206d6963726f62697420696d706f7274  microbit import
DAT: 0x0003e040 202a0a0a646973706c61792e73686f77  *..display.show
DAT: 0x0003e050 2827476f6f6462796520576f726c6427 ('Goodbye World'
DAT: 0x0003e060 290a0000000000000000000000000000 )...............
SLA: 0x00013a85
EOF:

From this we can infer that the .hex for a micro:bit MicroPython script looks roughly like this:
  1. Code for the MicroPython interpreter
  2. Instruction for where to store the script
  3. The actual script data
  4. Instruction for where to start running the code from
  5. An end-of-file record
..and the only part that differs is part 3. What would happen if we copied parts 1, 2, 4 and 5 from a healthy donor .hex file and injected our own part 3? First we need to understand part 3 a little more.

How Does Our Script Become Hex?

Take a look at the part of the .hex file that contains our script:
DAT: 0x0003e000 4d50340066726f6d206d6963726f6269 MP4.from microbi
DAT: 0x0003e010 7420696d706f7274202a0a0a64697370 t import *..disp
DAT: 0x0003e020 6c61792e73686f77282748656c6c6f20 lay.show('Hello 
DAT: 0x0003e030 576f726c6427290a0000000000000000 World').........
You can see that ahead of the script text are four mysterious looking bytes. What are they? Actually they're not all that scary. The first two bytes are the signature "MP" which, presumably, means MicroPython. The next two bytes turn out to be the length of the script that follows. These four bytes are checked by MicroPython when it compiles the script on the micro:bit and they must be exactly right. At the end of the script section you can also see that the data is padded with zeros. This probably isn't necessary - it's likely that only the first zero is actually needed.

Now we know that to make our own part 3 we only need to prefix the script with "MP" and the two length bytes and zero terminate the script text.

So, Can We Make .hex Files Ourselves?

Now that we understand all parts of the .hex file it's pretty easy for us to create them ourselves. Copying parts from donor file is straightforward but it still might seem a little daunting to actually synthesise our own part 4 - we will need to fill in the record types, calculate checksums etc. for each line. But don't worry, because we've written a simple Python script to automate the process.

Because we'll be using Python on our Windows/Mac/Linux/Whatever computer we'll need to download and install it first. We'll be using Python 2.7, so head over to https://www.python.org/downloads/ and click on "Download Python 2.7.12". The last digit "12" may change over time as they release updates, but it's correct at the time of writing.

Once you have Python installed you'll need to download the following two files into a new directory somewhere on your computer:
make_hex.py
python_base.hex < This is the "donor" file containing the MicroPython code

Then create a Python script file in the same directory (call it "test.py") and paste the following code in it:

from microbit import *

display.show('Hooray it works!')

Now use the make_hex tool to make the .hex file. From the command line type:
make_hex -i python_base.hex -s test.py -o test.hex
And, finally, copy the "test.hex" file onto your micro:bit. From the command line you can type:
copy test.hex g:
(substitute "g:" with wherever your micro:bit is located)

You should see the yellow light flash for a short time and then the script should run and scroll the text message across the display.

Wrapping Up

In this post we learned what the .hex format is, where it came from and a little bit about how it works internally. We also learned how MicroPython uses .hex files. Armed with this knowledge we figured out how to create our own .hex files from raw Python scripts.

Use this knowledge how you want. From now on we will probably be editing our code in a standard text editor and then creating and copying the .hex file to our micro:bit using our new tool. Up until now we've been using Mu. We do like Mu, and it's editor is pretty nice, but it's not a patch on other, more fully featured code editors.

No comments:

Post a Comment