logo picture
book picture

Download


Why Learn Assembly

Learning assembly language programming helps you:

Program Efficiency and Speed

The superiority of assembly language in generating fast, compact code is well documented. Assembly code is ideal for time-critical tasks that must be completed in real time. Likewise, some systems require compactness of application code such as systems found in portable computers, phones, appliances, and onboard systems found in aircraft and spacecraft. Again, assembly code excels in compactness.

Access to System Hardware

Time-critical applications often require direct control over system hardware, which programmers are insulated from when using traditional languages like BASIC, COBOL, FORTRAN, Go, Java, and Python. Example applications include operating systems, assemblers, compilers, linkers, device drivers, and network interfaces. Assembly language programming is the only way to go when low-level access is required.

Language Limitations

Sometimes programmers find that their high-level programming language has serious limitations that prevent them from exploiting certain kinds of microprocessor capabilities. Examples include text and bit manipulation, MMX technologies, streaming SIMD extensions (SSE), Advanced Vector Extensions (AVX), and multi-core concurrent operations. Typically, only assembly language gives you access to these machine-level capabilities.

Programming Skills

Assembly language is central to computer science. Learning assembly language has both practical and educational purposes. A strong foundation in assembly language programming can help improve your awareness of why high-level languages are structured the way they are. Certainly, knowledge of assembly language improves your understanding of the underlying computer system.

Personal Satisfaction

Although learning assembly language programming is more difficult than learning a high-level language, there is a certain aspect of personal satisfaction that comes with learning something new and complex. The insights assembly language programming give you makes the time spent learning assembly well worth your while.

certificate picture

Ada/PM
Inline Assembly Language Programming

Exploit the x86-64 Microprocessor

This publication explores 64-bit mixed-mode programming techniques. Primary focus is on combining low-level assembly language instructions with the high-level program source code of the Ada/PM programming language. Learning how to write mixed-mode programs is essential for writing a compiler and reusable component libraries. Excerpts in the following paragraphs are taken from the book, Ada/PM Inline Assembly Language Programming, First Edition.

NOTE The Ada/PM Inline Assembly Language Programming book and Web Help file are currently undergoing technical and editorial reviews and are not yet available. Thank you for your patience.

Overview

To understand how to design and develop a compiler, you need to have a good appreciation for the underlying architecture of the target processor and be able to program it at the machine level. This requires familiarity with the microprocessor and its assembly language instruction set. In the case of the Intel/AMD x86 processor family, understanding their assembly mnemonics and how they operate is paramount.

Authoritative assembly language instruction details can be found in the current Intel IA-32 Architectures Software Developer's Manuals. They are offered as a set of four freely available PDF files. While the Intel manuals provide comprehensive coverage, the AMD counterpart manuals appear to be easier reading.

Pure Assembly Language Coding

Take note that if you are going to write assembly language code effectively and are new to assembly language, it will take you a significant amount of hard work and persistent practice to get moderately familiar with pure assembly coding techniques. In my opinion, one way to learn assembly language is to not try and learn every last instruction or every detailed nuance of the language, but instead, learn enough to be conversant in how data are stored in memory, passed to subroutines in registers or on the stack, and manipulated by the various native and SIMD registers available in modern multi-core processors. This approach will shorten your learning curve considerably. Let's look at a pure assembly language program.

Example of a pure assembly language program:

;---------------------------------------------------------------------------------
; Pure assembly language program
; Assemble: nasm -f win32 -o file.o file.asm
; Link: golink file.obj /console
;---------------------------------------------------------------------------------
segment .data    ; Define a data memory segment to hold variables
  x dq 15        ; Declare a qword integer variable and assign it a value of 15
  
segment .bss     ; Define a section for storing variables with unassigned values
  total resq 1   ; Set aside storage space for one qword
  
segment .text    ; Define a code segment to hold instructions
  global main    ; Define the starting point of the program
main:
  mov rax, 1000  ; mov 1000 into rax
  add [x], rax   ; Add 1000 to the location stored in x
  push [x]       ; Push result onto the stack
  pop [total]    ; Pop result from the stack into memory for later access
  xor eax, eax   ; Zero rax register to inform caller of success
  ret            ; Return to calling routine
;----------------------------------------------------------------------------------

As you can see, unlike high-level languages such as Ada, BASIC, C/C++, or Go, assembly language programming requires a thorough understanding of the underlying architectural aspects of the target microprocessor. It also requires knowledge of the assembler and linker software tools before any useful code can be written.

Inline Assembly Coding

Another helpful approach to learning assembly language is to learn how to embed assembly instructions right in the source code of a high-level language. Embedded instructions are called inline assembly code. The benefit of using inline assembly code versus linked assembly code is that inline assembly instructions reduce your overall workload. The reason for this is that many mixed-mode capable high-level languages allow assembly instructions to take advantage of functions and included libraries that contain various standardized routines. These functions and libraries are automatically included when a program is compiled. Most importantly, high-level languages set up the data and text segments, creates stack and heap memory, and perform other chores to facilitate your program.

Additionally, you don't have to set up the program and pass it through a separate assembler and linker. This aspect makes inline assembly extremely convenient, powerful, and readily available. It also provides access to numerous support libraries and system functions.

Assembly language instructions are used to accomplish the following tasks:

In my view, the true test of a compiler is that it should be written exclusively in its own language. This is because a compiler written completely in its own language is immeasurably easier to understand. It also helps facilitate the detection of potential shortcomings in the language's grammar and syntax. However, before you can begin building your own compiler, you must start with a genesis programming language. Preferably a 64-bit language accompanied by a full set of support libraries.

Mixed-Mode Programming

Mixed-mode programming is the process of writing programs in which the source code is written in two or more programming languages. For our purposes, mixed-mode programming means combining a high-level language's source code with inline assembly instructions and accessing Win64 API and C Runtime (CRT) functions located in multiple external libraries. Although mixed-mode programming presents additional challenges for the programmer, learning mixed-mode programming techniques is worthwhile because it enables high-level languages to be extended, provides access to computer systems at the hardware level, and improves program size and performance.

There are numerous programming languages capable of incorporating or accessing assembly instructions in one form or another. For the most part, most fifth-generation programming languages come with steep learning curves and if you don't already program in any of them, it may take many additional hours to develop an adequate understanding of how to employ them effectively. In terms of this project, I have a personal preference against using visually oriented programming languages such as Delphi, Smalltalk, Visual Studio, VRML and any interpreted languages such as JavaScript, Perl, Python, or VBScript. For logistical reasons, I limit programming to the Windows environment where most of my clients perform their work.

Inline Assembly Language Programming

In choosing a first-rate compiler, I looked at several currently available and affordable fifth-generation languages that natively compile to object code. Significantly, many of these languages require Linux, UNIX, BSD, or Apple/Mac operating systems. Many also include visually oriented IDEs or contain object-oriented instruction sets as found in C#, C++, Java, and VB. For now, we will steer clear of Visual Studio .net languages since these they require a completely different knowledge base.

In the end, it made sense to reform a grammar taken from existing high-level languages and to use a freely available 64-bit Windows compiler, assembler, and linker to compile and debug my programs.

The Ada/PM compiler, Netwide assembler, GoLink linker, and Notepad++ editor will be used together to design and develop a program that include assembly language instructions in the source code. Let's turn our attention to an example mixed-mode program using Ada/PM syntax.

Here is an example Ada/PM program that incorporates inline assembly code:

 1 -------------------------------------------
 2 -- Simple program: Greetings.ada
 3 -- Last updated: June 20, 2019
 4 -------------------------------------------
 5 with io;
 6 package Greetings is
 7    constant str : string = "Hello, world.";
 8
 9 begin
10    asm
11       lea rcx, [str]
12       call put
13       ret
14   end asm;
15 end Greetings;
16 -------------------------------------------

Notice that I did not have to specify segments or perform other set up chores. All this was done by the Ada/PM compiler. The compiler also automatically takes care of assembling and linking the source code. The point is, mixed-mode programming provides the best of both worlds when it comes to developing applications where low-level program code is necessary.

Assemblers and Linkers

Speaking of assemblers, the latest version of the Netwide Assembler (NASM) should be located and downloaded from the web. Having NASM on hand helps in the development and testing of stand-alone assembly programs. The latest copy of NASM is used to update the assembler used in Ada/PM. NASM is preferred because it is well-supported and frequently updated with the latest instructions offered by Intel and AMD on their next generation processors.

Keep in mind that when writing stand-alone assembly language programs using NASM, a linker might be needed. One linker, ALink (Anthony's linker) is written as a companion to NASM. It is free, but currently only works on 32-bit source code. Another highly regarded linker is Jeremy Gordon's GoLink. It links both 32-bit and 64-bit source programs. In my opinion, GoLink contains comprehensive features not found in ALink; therefore, I prefer GoLink. It is frequently updated and well-supported.

As an aside, the YASM project is a complete rewrite of the NASM assembler. It currently supports x86-64 instruction sets, accepts NASM and GAS assembler syntax, outputs to several binary formats, and generates source debugging information for many debugging tools. YASM is easily integrated into Ada/PM for assembling NASM instructions into Win64 object file formats.

In testing YASM, I found that if the main executable is renamed to nasm.exe and placed in the /bin directory of the Ada/PM compiler, it becomes a one-to-one replacement for NASM. And, if you are into GAS syntax, YASM will process GAS instructions contained inside Ada/PM's asm/end asm clauses. The downside to YASM is that it is not maintained or updated (last updated August 2014) nearly as often as NASM. Also, YASM seems to have an issue with reserving .bss memory. For now, I recommend sticking with NASM.

Organization of the Book

There are six parts to this book: Part I introduces the major architectural structures and data representations of Intel 64- and IA-32-based processors. Chapter 1 discusses the general-purpose registers, segment registers, flag register, instruction pointer, floating-point registers, XMM, YMM, and ZMM registers, and memory. Chapter 2 covers how data are represented in assembly and compares NASM data types with Ada/PM data types.

Part II covers inline assembly language organization. Chapter 3 introduces assembly statements including directives, labels, mnemonics, operands, and comments. Chapter 4 discusses the code and data segments associated with assembly code blocks. Chapters 5 through 9 explain the stack segment, stack operations, variable access, procedure calls, C runtime library functions, and macros.

Part III includes Chapters 10 through 15. These chapters provide examples of the most commonly used 64-bit assembly instructions. Coverage includes data transfer; integer math; branching and comparison; logic, string, bit, flag, string, and array instruction sets.

Part IV introduces Intel 64 and IA-32 microprocessor performance technologies. Chapter 16 focuses on the CPUID instructions that help determine which performance features are available on the CPU. Chapter 17 covers FPU operations. Chapters 18 through 20 provide an overview of the MMX, SSE, AVX, and AVX2 technologies.

Part V provides an overview of important advanced topics. Chapter 21 discusses composite data structures including arrays, records, stacks, queues, lists, dictionaries, and binary trees. Chapter 22 covers controls structures including FOR-NEXT, WHILE-LOOP, LOOP-UNTIL, IF-ELSE, and CASE constructs.

The Appendices make up the final part of the publication. Appendix A is a reference to NASM reserved words, expressions, identifiers, operators, and macro directives. Appendix B is a listing of selected resources used in this book as well as appropriate web sites. Appendix C covers information on key C runtime library functions. Appendix D contains supporting reference tables. Appendix E contains tables and flags for key x86-64 instructions used in the book. Appendix F contains tables for FPU instructions. Appendix G contains tables for the MMX instruction set. Appendix H contains tables of the SSE instruction set. Appendix I contains tables of the AVX instruction set. Appendix J includes macro library code examples. Appendix K provides answers to chapter review exercises.

NOTICE Chapters and topics related to the Ada/PM Inline Assembly Language Programming manual are subject to change as the project matures and advanced technologies are incorporated.

Page Top

Copyright © PMzone Software. All rights reserved. Terms of Use