kids encyclopedia robot

Assembly language facts for kids

Kids Encyclopedia Facts
Motorola 6800 Assembly Language
An example of assembly language code.

An assembly language is a special programming language that lets you tell a computer exactly what to do. Think of it as a very direct way to talk to the computer's brain. It's very similar to the computer's own secret language, called machine code. The main difference is that assembly language uses short words instead of long numbers.

Computers can't understand assembly language directly. But don't worry! A special program called an assembler quickly changes the assembly language words into the numbers a computer understands. These numbers are the machine code.

Programs written in assembly language are made of many tiny steps called instructions. Each instruction tells the computer to do one small task. The part of the computer that follows these instructions is the processor.

Assembly language is known as a low-level language. This means it only handles the very basic tasks a computer can do. To make the computer do something complex, you have to break it down into many, many simple steps. For example, to show a sentence on the screen, an assembly program must tell the computer every tiny action needed. This makes assembly programs very long and hard for people to read.

On the other hand, a high-level programming language is much easier. It might have one simple command, like PRINT "Hello, world!". This single command tells the computer to do all the small steps for you.

How Assembly Language Was Developed

When the first computers were built, computer scientists had to program them using only machine code. This was a series of numbers that told the computer what to do. Writing programs in machine code was very difficult and took a lot of time.

Because of these difficulties, assembly language was created. It was much easier for humans to read and write programs with. However, it was still much harder to use than the high-level programming languages we have today. High-level languages try to be more like human languages.

Programming with Machine Code

To program in machine code, a programmer needed to know what each instruction looked like. These instructions were in binary (using only 0s and 1s) or hexadecimal (a number system using 0-9 and A-F). While computers understand this quickly, it's very hard for people.

Each instruction could look like a bunch of numbers. If a programmer made a mistake, they would only find out when the computer did something wrong. Fixing the mistake was hard because machine code is almost impossible for people to understand just by looking at it.

Here is an example of what machine code looks like:

05 2A 00

This hexadecimal code tells an x86 processor to add the number 42 to something called the "accumulator." It's very hard for a person to read and understand, even if they know machine code.

Using Assembly Language Instead

With assembly language, each instruction is written as a short word. This word is called a mnemonic. It's followed by other information, like numbers or other short words. The mnemonic helps programmers remember what the instruction does. They don't have to remember exact numbers.

For example, common mnemonics include add (to add data) and mov (to move data). The words and numbers after the mnemonic give more details. For instance, after add, you might specify what two things to add. After mov, you'd say what to move and where to put it.

The machine code example from before (05 2A 00) looks much simpler in assembly language:

 add ax,42

Assembly language also makes it easier to write the actual data your program uses. It supports numbers and text directly. In machine code, you'd have to turn every number (positive, negative, decimal) into binary yourself. Text would also need to be defined letter by letter as numbers.

Assembly language gives you a simpler way to work with machine code. It's like a helpful translator. When you use assembly, you don't need to know all the tiny details of what numbers mean to the computer. The assembler program figures that out for you.

Assembly language lets programmers use all the features of the processor, just like with machine code. But it's much easier to use. This is why machine code is almost never used for programming anymore.

Disassembly and Debugging

When programs are finished, they are changed into machine code so the computer can run them. Sometimes, a program might have a bug (a mistake). When this happens, programmers might want to see what each part of the machine code is doing.

Disassemblers are programs that help with this. They turn the machine code back into assembly language. This makes it much easier for programmers to understand. So, disassemblers do the opposite of assemblers. Assemblers turn assembly language into machine code, and disassemblers turn machine code back into assembly language.

How Computers Are Organized

To understand how assembly language works, it helps to know how computers are set up. At a basic level, computers have three main parts:

  • Main memory (also called RAM): This holds data and instructions.
  • A processor: This part processes data by following instructions.
  • Input and output (I/O): This allows the computer to talk to the outside world. It also stores data outside of main memory so it can be used later.

Main Memory

In most computers, memory is divided into small chunks called bytes. Each byte holds 8 bits (a bit is a single 0 or 1). Every byte in memory also has an address. This address is a number that tells the computer exactly where that byte is located. The first byte has address 0, the next has address 1, and so on.

Even though an address points to one byte, processors can use several bytes in a row. This is often done to store numbers. For example, 2 or 4 bytes together can represent a larger number, like an integer. A single byte can only hold 256 different values. Using 2 bytes allows for 65,536 values, and 4 bytes allows for over 4 billion values!

When a program uses a byte or several bytes to represent something (like a letter or a number), these bytes are called an object. Even though all objects are stored in the same kind of memory bytes, they are treated differently based on their 'type'. The type tells the computer how to understand the bytes. For example, is it a number, a character, or an instruction?

The idea of a 'type' is very important. It tells the computer what can and cannot be done with that object. For instance, you can't store a negative number in a space meant only for positive numbers. An assembly language program needs to keep track of which memory addresses hold which objects and how big they are.

The Processor

The processor runs (executes) the instructions. These instructions are stored as machine code in the main memory. Besides memory, most processors also have a few small, very fast storage areas called registers. These registers hold objects that the processor is currently working with.

Processors usually run three main types of instructions:

Instructions that read or write memory

These instructions move data between memory and the processor's registers. For example, this instruction reads a 2-byte object from memory address 4096 (written as `1000h` in hexadecimal) and puts it into a 16-bit register named 'ax':

        mov ax, [1000h]

The square brackets around `1000h` mean "go to this address in memory and get the data." This is called indirection.

Without the square brackets, the instruction puts the actual number directly into the register:

        mov bx, 20

Here, the number 20 itself is loaded into the 'bx' register.

If the parts of the instruction are swapped, it writes data from a register to memory:

        mov [1000h], ax

This instruction takes the value from the 'ax' register and stores it at memory address `1000h`.

Instructions that do math or logic

These instructions perform calculations or logical operations. The machine code example we saw earlier (`05 2A 00`) would be this in assembly language:

        add ax, 42

This adds 42 to the value in 'ax' and stores the result back in 'ax'.

You can even combine memory access with math:

        add ax, [1000h]

This adds the value stored at memory address `1000h` to 'ax' and puts the answer in 'ax'.

Here's an example of a logical operation:

        or ax, bx

This instruction performs a logical "OR" operation on the values in registers 'ax' and 'bx'. The result is stored back in 'ax'.

Instructions that decide the next step

Normally, instructions are run one after another, in the order they appear. But to do complex things, computers need to make choices. They need to run different instructions based on the data they have. This ability to make choices is called branching. Instructions that decide the next step are called branch instructions.

Let's imagine you want to calculate how much paint you need for a square. But the paint store won't sell you paint for anything smaller than a 100x100 square.

Here are the steps to figure out the paint needed:

  • Subtract 100 from the side length.
  • If the answer is less than zero, change the side length to 100.
  • Multiply the side length by itself.

This can be written in assembly code (where 'ax' holds the side length):

        mov bx, ax
        sub bx, 100
        jge continue
        mov ax, 100
continue:
        mul ax

The first two lines copy the side length to 'bx' and then subtract 100 from 'bx'.

A new idea here is a label. A label is a name (like 'continue' in this example) that marks a specific instruction. The assembler understands this label as the memory address of that instruction.

Another new idea is flags. Many instructions set special 'flags' in the processor. These flags tell the next instruction what happened. For example, after `sub bx, 100`, a flag might be set if the result was less than zero.

The `jge` instruction means "Jump if Greater than or Equal to." It's a branch instruction. If the flags show the result was greater than or equal to zero, the processor will jump to the instruction at the 'continue' label (`mul ax`). If not, it will just go to the next line (`mov ax, 100`).

A more common way to write this is using a `cmp` (compare) instruction. This instruction sets the flags without changing the values:

        cmp ax, 100
        jge continue
        mov ax, 100
continue:
        mul ax

Now, `ax` is not changed by the `cmp` instruction. The flags are still set correctly, and the jump happens in the same way.

Input and Output

Input and output (I/O) are how a computer talks to the outside world. This includes things like reading from a keyboard or showing text on a screen. How I/O works in assembly language can be different depending on the computer and its operating system.

Assembly language can do anything a computer is capable of. However, there are no single assembly instructions that always do I/O. This is because I/O is not built into how the processor works. Instead, it relies on other parts of the computer system.

Assembly Languages and Portability

Even though assembly language isn't run directly by the processor (machine code is), it's very closely linked to it. Different types of processors have different features, instructions, and rules. Because of this, each type of processor needs its own specific assembly language.

This means assembly language lacks portability. Something portable can be easily moved from one type of computer to another. While other programming languages are portable, assembly language generally is not. If you write an assembly program for one type of computer, it usually won't work on another type without big changes.

Assembly Language vs. High-Level Languages

Today, assembly language is not often used for big software projects. Here are some reasons why:

  • It takes a lot of effort to write even a simple program in assembly.
  • Assembly language doesn't protect much against mistakes. It's easy to make errors that can crash the program.
  • It doesn't encourage good programming habits, like breaking code into smaller, easier-to-manage parts.
  • It's hard to understand what a programmer intended when looking at assembly code.

Because of these challenges, most projects use high-level languages like Pascal, C, and C++. These languages let programmers express their ideas more directly. They don't have to worry about telling the processor every single tiny step. They are called "high-level" because you can express more complex ideas with less code.

Programmers who use high-level languages use a program called a compiler. A compiler changes their high-level code into assembly language (and then into machine code). Compilers are much harder to write than assemblers.

High-level languages also don't always let programmers use every single feature of the processor. This is because high-level languages are designed to work on many different types of processors. Unlike assembly languages, high-level languages are portable.

Over many years, compilers have become very good. Now, there's usually no need to use assembly language for most projects. Compilers can often turn high-level code into assembly language just as well, or even better, than a human programmer could.

Example Programs

Here's a "Hello World" program written in x86 Assembly. This program simply prints "Hello, World!" on the screen.

adosseg
.model small
.stack 100h

.data
hello_message db 'Hello, World!',0dh,0ah,'$'

.code
main  proc
      mov    ax,@data
      mov    ds,ax

      mov    ah,9
      mov    dx,offset hello_message
      int    21h

      mov    ax,4C00h
      int    21h
main  endp
end   main.

This next example is a function that prints a number to the screen. It uses special calls to the computer's BIOS (Basic Input/Output System). In assembly language, anything after a semicolon (`;`) on a line is a comment. Comments are ignored by the assembler, but they are very important for humans to understand the code!

printn:
        push    bp
        mov     bp, sp
        push    ax
        push    bx
        push    cx
        push    dx
        push    si

        mov     si, 0
        mov     ax, [bp + 4]    ; number to print
        mov     cx, [bp + 6]    ; base (like base 10 for regular numbers)

gloop:  inc     si              ; count how many digits
        mov     dx, 0           ; clear dx
        div     cx              ; divide by the base
        cmp     dx, 10          ; is the remainder 10 or more? (for hex numbers)
        jge     num
        add     dx, '0'         ; convert digit to its character (e.g., 5 becomes '5')
        jmp     anum
num:    add     dx, ('A'- 10)   ; convert digit to hex character (e.g., 10 becomes 'A')
anum:   push    dx              ; put the character onto a temporary storage area (stack)
        cmp     ax, 0           ; is the number now zero?
        jne     gloop           ; if not zero, keep looping

        mov     bx, 7h          ; for the interrupt (color attribute)
tloop:  pop     ax              ; get a character from the stack
        mov     ah, 0eh         ; for the interrupt (teletype output)
        int     10h             ; write the character to the screen
        dec     si              ; decrease digit count
        jnz     tloop           ; if more digits, keep looping
        
        pop     si      
        pop     dx
        pop     cx
        pop     bx
        pop     ax
        pop     bp
        ret     4               ; return from the function

Books

  • Michael Singer, PDP-11. Assembler Language Programming and Machine Organization, John Wiley & Sons, NY: 1980.
  • Peter Norton, John Socha, Peter Norton's Assembly Language Book for the IBM PC, Brady Books, NY: 1986.
  • Dominic Sweetman: See MIPS Run. Morgan Kaufmann Publishers, 1999. ISBN: 1-55860-410-3
  • John Waldron: Introduction to RISC Assembly Language Programming. Addison Wesley, 1998. ISBN: 0-201-39828-1
  • Jeff Duntemann: Assembly Language Step-by-Step. Wiley, 2000. ISBN: 0-471-37523-3
  • Paul Carter: PC Assembly Language. Free ebook, 2001.
    Website
  • Robert Britton: MIPS Assembly Language Programming. Prentice Hall, 2003. ISBN: 0-13-142044-5
  • Randall Hyde: The Art of Assembly Language. No Starch Press, 2003. ISBN: 1-886411-97-2
    Draft versions available online as PDF and HTML
  • Jonathan Bartlett: Programming from the Ground Up. Bartlett Publishing, 2004. ISBN: 0-9752838-4-7
    Available online as PDF and as HTML

Software

  • MenuetOS - Operating System written entirely in 64-bit assembly language

See also

Kids robot.svg In Spanish: Lenguaje ensamblador para niños

kids search engine
Assembly language Facts for Kids. Kiddle Encyclopedia.