| title | Generalized assembler 'axx General Assembler' |
|---|---|
| tags | Terminal Python general assembler |
| author | fygar256 |
| slide | false |
GENERAL ASSEMBLER 'axx.py'
FreeBSD terminal
Qiita: https://qiita.com/fygar256/items/1d06fb757ac422796e31
GitHub: https://github.com/fygar256/axx_relocatable_elf_generation
C version is also available. C version is nameded Caxx. Caxx is much faster than Paxx.
Paxx is newest version. Due to Paxx version upgrades, updates to caxx may be delayed.
# install
git clone https://github.com/fygar256/axx.git
cd axx
chmod +x axx.py
sudo cp axx.py /usr/local/bin/paxx
# execution(assemble)
paxx patternfile.axx [source.s] [-b outfile.bin] [-e expfile.tsv] [-i impfile.tsv] [-o object.o]
patternfile.axx --- pattern file
source.s --- assembly source
outfile.bin --- raw binary output file
expfile.tsv --- section and label information export file
impfile.tsv --- section and label information import file
object.o --- ELF relocatable object file
% paxx hello.axx hello.s -o hello.o # assemble
% ld hello.o -o hello # link
% ./hello
hello, world
cc -o caxx caxx.c -lm -O2
caxx z80.axx z80.s [ option ] # execution
Because caxx is written in C, I'm wondering if someone will incorporate this into their toolchain.
It was written in python, so the nickname is Paxx.
Since axx stands for 'Arbirtary eXtended X(cross) assembler'. It also means that I combined the unknown X in 'ASM', which represents the CPU, to create 'AXX'.
As early as 1986, the original idea for axx, the name 'AXX', and a prototype written in C were already in existence at Tokyo Electronics Design, where I was working part-time during my university days. However, it wasn't until 2024, 38 years later, that I published working code like the one we see today. It was dormant for 30 years. The instruction of axx pattern file is a metalanguage for all imperative assembly languages. It's a DSL, but it doesn't have a specific grammar; it's a pattern language where you create the grammar by combining string literals, symbols, and expressions.
All imperative assembly languages other than EPIC, which have meta-level complexity in machine code, essentially reduce to a simple structure: instruction :: error_patterns :: binary_list. Further simplification by omitting error checking results in instruction::binary_list. Here, axx's binary_list includes complex expression calculations, alignment, and the ; prefix modifier (which prevents binary output if 0) for practical purposes, but in the minimal model, these do not need to be considered. An instruction is a combination of (string literals, symbols that can be replaced with integer values, integer expressions, and integer factors). Floating-point numbers are also replaced with integer values by bit patterns. This allows processing of any imperative assembly language. However, the binary generation function is not omnipotent, which limits compatible processors, but it can process any processor where instructions and machine code are a one-to-one mapping.AXX can also handle Itanium-type EPIC and VLIW processors through later expansions.
for example.
Extracting the essential commonalities of the von Neumann architecture
Instruction Set Architecture (ISA) metamodel
Formalization using pattern matching.
axx.py is a general assembler that generalizes assembly language. It can process almost any processor architecture. To process a specific processor architecture, a corresponding pattern file (processor description file) is required. While free instructions can be defined, creating a pattern file based on the target processor's assembly language will allow processing of that processor's assembly language, albeit with slightly different syntax. In essence, it's all about instruction grammar rules and binary generation based on them. axx targets not only virtual CPUs, but also "abstracted real CPUs." Converting the specifications of a real processor into a pattern file allows for direct assembly. In that sense, creating pattern files for large ISAs is more suited to AI than human labor. Creating a pattern file for large ISAs is time-consuming, but once you've created them, the ISA is complete and can be reused.
It is not a "general-purpose assembler" in the sense of being "widely applicable." It is a "general assembler" in the sense of being "common to all." binary_list has only five control syntax constructs: assignment, ternary operator, ; modifier, alignment, and @@[]. While typical general assemblers use mnemonic operand definitions, axx's pattern definitions use instruction :: error_pattern :: binary_list, allowing for flexible instruction patterns. Therefore, notations such as r1 = r2 + r3 are possible, making it usable as a general-purpose binary generator, not just for assembly language. Pattern files are Turing-incomplete. Because it is Turing-incomplete, it is not suitable for processors with extremely twisted architectures. Processor architectures can become infinitely complex. While it can be adapted to Turing-complete languages, axx.py is Turing-incomplete and therefore not a "universal assembler." The reason why it is currently Turing incomplete is that if it were Turing complete, the DSL would become a "program."
It cannot handle very specialized processors. For example, it cannot describe the ISAs of the following processors other than general processors.
Processors - Reason
Mill CPU - Belt Architecture
ZISC - No Instructions
Thinking Machines - Massively Parallel
The execution platform is also independent of a specific processing system. It also ignores chr(13) at the end of lines in DOS files. It should work on any processing system that runs Python.
This version only contains the core parts of the assembler, so it does not support practical features found in dedicated assemblers, such as optimizations and high-function macros that convert structured/functional assemblies to instructional assemblies. For practical features, please use the preprocessor for macros. Optimization is not supported. I think it has basic functionality, so please make use of it. The current version lacks practicality.
Because the pattern file and source file are separated, it is possible to generate machine code for a different processor from source code of a certain instruction set, provided the coding effort is not a problem. It is also possible to generate machine code for different processors from a common language. Writing multiple instruction codes in the _list of pattern data functions as a macro, but it is not very elegant. This allows you to write simple compilers.
axx reads assembler pattern data from the first argument and assembles the source file of the second argument based on the pattern data. The pattern data is then matched line by line from the top to each assembly line, and the binary_list of matching patterns is output as the result. If the second argument is omitted, source code is input from the terminal (standard input).
The result is output as text to standard output, and if an argument is specified with the -o option, a binary file is output to the current directory. The -e option outputs the labels specified in .export along with section/segment information to a file in TSV format.
In axx, assembly language source files and lines input from standard input are named assembly lines.
The execution platform is also independent of any specific processing system. It is designed to ignore chr(13) at the end of lines in DOS files. It should work on any processing system that runs Python.
This version only includes the core assembler, so it does not support practical features such as optimization, advanced macros, and debuggers that are available in dedicated assemblers. For practical functionality, use a preprocessor for macros. For now, use a program that manages files and label (symbol) files as a linker/loader. Since this is not an IDE, use an external debugger. Optimization is not supported. I believe it has basic functionality, so please apply it. The current version is not practical enough.
Pattern files are processor description files that are user-defined to correspond to individual processors. It is a kind of meta-language for machine code and assembly language.
If you find it difficult to define a pattern file, you can pass only the minimum number of operands to the expression evaluation and write it as a string literal.
Pattern data in a pattern file is arranged as follows.
instruction :: error_patterns :: binary_list
instruction :: error_patterns :: binary_list
instruction :: error_patterns :: binary_list
:
:
Instruction is not optional. Error_patterns is optional. Binary_list is not optional.
Instruction, error_patterns, and binary_list should be separated by ::.
for ex. (x86_64)
RET :: 0xc3
If you write /* in a pattern file, the part after /* on that line will become a comment. Currently, you cannot close it with */. It is only valid after /* on that line.
Case Sensitivity and Variables
Uppercase letters in the instruction field of a pattern file are treated as character constants. Lowercase letters are treated as single-character variables. The value of the symbol at that position on the assembly line is assigned to the variable. Using !lowercase assigns the value of the integer expression at that position, !!lowercase assigns the value of the factor at that position, !Flowercase assigns a bit pattern of integer conversion of the 32-bit floating-point expression at that position, and !Dlowercase assigns a bit pattern of integer conversion of the 64-bit floating-point expression at that position. These values are then referenced from error_patterns and binary_list. All unassigned variables have an initial value of 0. ! is not necessary when referencing from error_patterns and binary_list. All values are referenced similarly.
Uppercase letter, other symbols and escaped character: Character constant
Lowercase letter: Value of the symbol at that position
!Lowercase letter: Value of an integer expression
!!Lowercase letter: Value of an integer factor
!FLowercase letter: Value of a 32-bit floating-point expression
!DLowercase letter: Value of a 64-bit floating-point expression
Lowercase variables are all initialized to 0 for each line in the pattern file.
From the assembly line, uppercase and lowercase letters are treated the same, except for labels and section names.
The special variable $$ represents the current location counter.
The escape character \ can be used within the instruction.
error_patterns uses variables and comparison operators to specify the conditions under which an error occurs.
Multiple error patterns can be specified, separated by ','. For example, as follows.
a>3;4,b>7;5
In this example, if a>3, error code 4 is returned, and if b>7, error code 5 is returned.
binary_list specifies the codes to be output, separated by ','. For example, if you specify 0x03,d, 0x3 will be output followed by d.
Let's take 8048 as an example. If the pattern file contains
ADD A,R!n :: n>7;5 :: n|0x68
and you pass add a,rn to the assembly line, if n>7 it will return error code 5 (Register out of range), and add a,r1 will generate a binary of 0x69.
If an element of binary_list is empty, it will be aligned. If it starts with , or if it is 0x12,,0x13, the empty part will be padded up to the exact address.
If an element of binary_list is preceded by ;, it will not be output if it is 0.
In a binary_list, you can use @@[n,]. This expands n times. for example, @@[3,%%],@@[4,0x10+%%] expanded to 0x00,0x01,0x02,0x13,0x14,0x15,0x16. To get correct expansion, write binary_list like this : @@[3,%%],%0@@[4,0x10+%%].
%0 set the index %% to 0.
.setsym :: symbol :: n
Writing this defines symbol with the value n.
Symbols are letters, numbers, and strings of several symbols.
To define symbol2 with symbol1, write it as follows.
.setsym ::symbol1 ::1
.setsym ::symbol2 ::#symbol1
Here is an example of symbol definition z80. If you write the following in a pattern file:
.setsym ::B ::0
.setsym ::C ::1
.setsym ::D ::2
.setsym ::E ::3
.setsym ::H ::4
.setsym ::L ::5
.setsym ::A ::7
.setsym ::BC ::0x00
.setsym ::DE ::0x10
.setsym ::HL ::0x20
.setsym ::SP ::0x30
The symbols B, C, D, E, H, L, A, BC, DE, HL, and SP will be defined as 0, 1, 2, 3, 4, 5, 7, 0x00, 0x10, 0x20, and 0x30, respectively. Symbols are not case sensitive.
If there are multiple definitions of the same symbol in a pattern file, the new one will replace the old one. That is,
.setsym ::B::0
.setsym ::C::1
ADD A,s
.setsym ::NZ::0
.setsym ::Z::1
.setsym ::NC::2
.setsym ::C ::3
RET s
In this case, the C in ADD A,C is 1, and the C in RET C is 3.
・Example of a symbol that contains a mixture of symbols, numbers, and letters
.setsym ::$s5:: 21
To clear a symbol, use .clearsym.
.clearsym::ax
The above example undefines the symbol ax.
To clear everything, do not specify any arguments.
.clearsym
You can determine the character set to use for symbols from within the pattern file.
.symbolc::<characters>
You can specify characters other than numbers and uppercase and lowercase letters in <characters>.
The default is letters + numbers + '_%$-~&|'.
Pattern files are evaluated from top to bottom, so the pattern placed first takes precedence. Special patterns should be placed first, and general patterns should be placed last. Like below.
LD A,(HL)
LD A,e
Optional items in the instruction can be enclosed in double brackets. This shows the inc (ix) instruction for z80.
INC (IX[[+!d]]) :: 0xdd,0x34,d
In this case, the initial value of the lowercase variable is 0, so inc (ix+0x12) is output as 0xdd,0x34,0x12 if not omitted, and inc (ix) is output as 0xdd,0x34,0x00 if omitted.
If you add
.padding::0x12
to the pattern file, the padding bytecode will be 0x12. The default is 0x00.
If you specify
.bits::12
in the pattern file, you can handle 12-bit processors. The default is 8 bits.
Use this directive to assemble processors that are less than 8 bits, such as bit slice processors or processors whose machine language words are not in byte units. Since axx outputs in 8-bit units, the lower 4 bits are output to the binary file in 8-bit increments for a 4-bit processor, and (lower 8 bits, upper 3 bits) or (upper 3 bits, lower 8 bits) for an 11-bit processor, depending on the specified byte order. Any extra bits within 8 bits are masked with 0.
If you specify the .bits directive, the value indicated by the address will be in words. For example, the 64-bit processor x86_64 can process in bytes, so there is no need to specify the .bits directive.
Specify the byte order as follows:
.bits::big::12
big arranges bytes in big endian. little arranges them in little endian. The default is little, and it will be used even if not specified.
This allows you to include a file.
.include "file.axx"
An expression stops evaluation when it encounters the escape character ''. Escaped character is saved for later processing within the pattern file.
LEAQ r, [ s + t * !h \+ !i ] :: 0x48,0x8d,0x04,((@h)-1)<<6|t<<3|s,i
This example processes an x86_64 assembly line like leaq rax,[rax+rbx*2+0x40].
LEAQ r,(s+t*!!h+!!i) :: 0x48,0x8d,0x04,((@h)-1)<<6|t<<3|s,i
This example would be used in a case like leaq rax,(rax+rbx*(2+2)+0x40).
.vliw::128::41::5::00
This will handle an EPIC processor with a bundle bit count of 128, instruction bit count of 41, template bit count of 5, and NOP code of 0x00 (Itanium example).
For example, on Itanium, there are three 41-bit instructions, a total length of 41 * 3 = 123 (bits), plus a 5-bit template bit at the beginning. For non-EPIC processors, specify 0 for the template bit.
For EPIC processors, the pattern file is written as follows:
/* VLIW
.setsym::R1::1
.setsym::R2::2
.setsym::R3::3
.setsym::R4::4
.vliw::128::41::5::00
EPIC::1,2::0x8|!!!!
EPIC::1::0x01
AD a,b,c:: ::0x01,0,0,a,b,c::1
LOD d,[!e]:: :: 0x00,0x01,0,d,e,e>>8::2
Written as above, !!!! represents the stop bit. EPIC::1,2::0x8|!!!! represents the set of EPIC instructions, the bundle of instructions with indexes 1 and 2, and the bitwise OR code for the template with 0x8 and the stop bit.
The next instruction, AD a,b,c:: ::0x01,0,0,a,b,c::1, outputs 0x01,0,0,a,b,c to the ADD instruction r1,r2,r3 without error checking, with an index code of 1. LOD d,[!e]:: :: 0x00,0x01,0,d,e,e>>8::2 stores the contents of [!e] in the LOAD instruction r4, outputs 0,1,0,0xd,e (lower 8 bits), e (upper 8 bits) without error checking, and represents an instruction with an index code of 2. This sample is for testing purposes and will differ from the actual bytecode.
For example, on Itanium, there are three 41-bit instructions, a group of instructions with a length of 41 * 3 = 123 (bits), plus 5 template bits at the end. If the instruction is not EPIC, set the template bits to 0.
If the template bit is a positive number, the template bit is placed at the right end; if it is a negative number, the template bit is placed at the left end. The number of bits in the template bit is an absolute value. Specifying big for the endianness in the .bits directive reverses the byte order of the output compared to the default little setting.
In EPIC, error patterns must be explicitly omitted using :: ::.
For non-EPIC processors, the pattern file is written as follows:
/* VLIW
.setsym::R1::1
.setsym::R2::2
.setsym::R3::3
.setsym::R4::4
.vliw::128::32::0::0x00
AD a,b,c::0x01,a,b,c
LOD d,[!e]::0x02,d,e,e>>8
JMP !a ::0x03,a,a>>8,0
To bundle multiple VLIW instructions into a single bundle, use !! to connect them as shown below.
ad r1,r2,r3 !! lod r4,[0x1234]
If a pattern file's binary_list contains !!!, it represents the number of instructions concatenated with !!.
If the concatenation ends with '!!!!', it sets a stop bit.
Labels can be defined from the assembly line in the following way.
Labels defined with .equ lose their relocation information and are treated as constants.
label1:
label2: .equ 0x10
label3: nop
A label is a string of letters, numbers, and some symbols that starts with a non-numeric ., an alphabet, or some symbols.
To define a label with a label, do the following:
label4: .equ label1
You can determine the character set to use for labels from within the pattern file.
.labelc::<characters>
You can specify characters other than numbers and uppercase and lowercase letters in <characters>.
The default is letters + numbers + underscore + period.
ORG is specified from the assembly line as
.org 0x800
or
.org 0x800,p
.org changes the value of the location counter. If ,p is added, if the previous location counter value is smaller than the value specified by .org, it will be padded to the value specified by .org.
If you enter .align 16 from an assembly line,
.align 16
it will align to 16 (pad with the bytecode specified by .padding up to an address that is a multiple of 16). If you omit the argument, it will align to the value specified by the previous .align or the default value.
For example, suppose we have a processor (such as ARM64) that accepts floating-point numbers as operands. VMOV.F32 S0, #3.14 loads the float (32-bit) value 3.14 into the S0 register, with its opcode 0x80. In this case, the pattern data would be:
(test case the bytecode is different from the reality)
VMOV.F32 S!n,#!Fd ::0x80|n,d>>24,d>>16,d>>8,d
If we pass vmov.f32 s0,#3.14 to the assembly line, the binary output will be 0x80,0xc3,0xf5,0x48,0x40. If !F becomes !D, it's a double-precision floating-point number. !Q is a 128-bit floating-point number.
Please prefix binary numbers with '0b'.
Please prefix hexadecimal numbers with '0x'.
.ascii outputs the bytecode of a string, and .asciz outputs the bytecode of a string with 0x00 at the end.
.ascii "sample1"
.asciz "sample2"
.zero <expression> fills the number of bytes specified by with 0x00.
.zero 65536
The following command exports a label along with section/segment information. Only the label specified by the .export command is exported.
.export label
Pass a label to an external source.
.global label
Declares the loading of an external label.
.extern label
.global and .extern are processed by the ELF relocatable object file output feature.
The following command allows you to specify a section/segment.
.section .text
or
.segment .text
Currently, .section and .segment have the same meaning.
For example,
.section .text
ld a,9
.section .data
.asciiz "test1"
.section .text
ld b,9
.section .data
db 0x12
If you do this, the text will be arranged exactly as it is, so use section sort to sort it.
https://qiita.com/fygar256/items/fd590cab2078a4e8b866
.section .text
ld a,9
ld b,9
.section .data
.asciz "test1"
db 0x12
You can include a file like this.
.include "file.s"
Assembly line comments are ;.
One special factor is !!!, which represents the number of commands connected by !!.
The special variable is '$$', which represents the current location counter.
And there's %%, which returns the number of times %% appears (index starting from 0).
Since the assembly line expressions and pattern data expressions call the same functions, they work almost the same. Variables in lowercase cannot be referenced from the assembly line.
Operators and precedence are based on Python and are as follows.
(expression) An expression enclosed in parentheses
# An operator that returns the value of a symbol
*(x,y) yth byte from the lowest value of x (y>=0)
-,~ Negative, bitwise NOT
'c' character code of 'c'
@ Unary operator that returns the bit position from the right of the most significant bit of the value that follows
:= Assignment operator
** Power
*,/,// Multiplication, division, integer division
+,- Addition, subtraction
<<,>> Left shift, right shift
& Bitwise AND
| Bitwise OR
^ Bitwise XOR
' Sign extension
<=,<,>,>=,!=,== Comparison operators
not(x) Logical NOT
&& Logical AND
|| Logical OR
x?a:b Ternary operator
There is an assignment operator :=. If you enter d:=24, 24 will be assigned to the variable d. The value of the assignment operator is the assigned value.
The prefix operator # takes the value of the symbol that follows.
The prefix operator @ returns the bit position from the right of the most significant bit of the value that follows. We'll call this the Hebimarumatta operator.
The binary operator ', for example a'24, will make the 24th bit of a the sign bit and perform sign extension (Sign EXtend). We'll call this the SEX operator.
The binary operator ** is exponentiation.
The ternary operator ?:, for example x?a:b, will return a if x is true and b if it is false.
When the prompt >>> appears and you enter text from the keyboard, you can use the label display command ?.
.setsym:: BC:: 0x00
.setsym:: DE:: 0x10
.setsym:: HL:: 0x20
LD s,!d:: (s&0xf!=0)||(s>>4)>3;9 :: s|0x01,d&0xff,d>>8
Then, ld bc,0x1234, ld de,0x1234, ld hl,0x1234 output 0x01,0x34,0x12, 0x11,0x34,0x12, 0x21,0x34,0x12, respectively.
The same mnemonic for registers of different byte length is handled as follows.
.setsym::SI::0
.setsym::BX::0
/*********************************************************************/
/* If AX or AL appears at this point, neither will match the pattern */
/* Define AL. At this point, AL matches the pattern
.setsym::AL::0xb0
MOV s,!a :: 0xb0,a
.clearsym::AL /* Clear symbol AL
/* Define AX. At this point, AX matches the pattern.
.setsym::AX::0xb8
MOV s,!a::0xb8,a,a>>8
.clearsym::AX /* Clear symbol AX
/********************************************************************/
MOV BYTE [e + f + !c],!d::0xc6,c>=0x100?0x80:0x40,c,;c>>8,d
MOV BYTE [e + f],!g :: 0xc6,0,g
MOV BYTE [!a],!b :: 0xc6,0x6,a,a>>8,b
MOV WORD [e + f + !a],!b::0xc7,a>=0x100?0x80:0x40,a,;a>>8,b,b>>8
MOV WORD [e + f],!a :: 0xc7,0,a,a>>8
MOV WORD [!a],!b::0xc7,0x06,a,a>>8,b,b>>8
mov byte [bx+si],0x12
mov byte [0x3412],0x56
mov byte [bx+si+0x12],0x34
mov byte [bx+si+0x3412],0x56
mov al,0x12
mov word [bx+si],0x3412
mov word [0x3412],0x7856
mov word [bx+si+0x12],0x5634
mov word [bx+si+0x3412],0x7856
mov ax,0x3412$ axx 8086.axx 8086.s
0000000000000000 8086.s 1 mov byte [bx+si],0x12 0xc6 0x00 0x12
0000000000000003 8086.s 2 mov byte [0x3412],0x56 0xc6 0x06 0x12 0x34 0x56
0000000000000008 8086.s 3 mov byte [bx+si+0x12],0x34 0xc6 0x40 0x12 0x34
000000000000000c 8086.s 4 mov byte [bx+si+0x3412],0x56 0xc6 0x80 0x12 0x34 0x56
0000000000000011 8086.s 5 mov al,0x12 0xb0 0x12
0000000000000013 8086.s 6 mov word [bx+si],0x3412 0xc7 0x00 0x12 0x34
0000000000000017 8086.s 7 mov word [0x3412],0x7856 0xc7 0x06 0x12 0x34 0x56 0x78
000000000000001d 8086.s 8 mov word [bx+si+0x12],0x5634 0xc7 0x40 0x12 0x34 0x56
0000000000000022 8086.s 9 mov word [bx+si+0x3412],0x7856 0xc7 0x80 0x12 0x34 0x56 0x78
0000000000000028 8086.s 10 mov ax,0x3412 0xb8 0x12 0x34
$
Because this is a test, the binary is different from the actual code.
/* test
.setsym ::a:: 7
.setsym ::b:: 1
LDQ A,!Qx :: 0x1,@@[16,*(x,%%)]
/* ARM64
.setsym ::r1 :: 2
.setsym ::r2 :: 3
.setsym ::r3 :: 4
.setsym ::lsl:: 6
ADD w, x, y z #!d :: 0x88,d
ADD x, y, !e :: 0x91,x,y,e
/* A64FX
.setsym ::v0 :: 0
.setsym ::x0 :: 1
ST1 {x.4S},[y] :: 0x01,x,y,0
/* MIPS
.setsym ::$s5 ::21
.setsym ::$v0 ::2
.setsym ::$a0 ::4
ADDI x,y,!d :: (e:=(0x20000000|(y<<21)|(x<<16)|d&0xffff))>>24,e>>16,e>>8,e
/* x86_64
.setsym ::rax:: 0
.setsym ::rbx:: 3
.setsym ::rcx ::1
.setsym ::rep ::1
MMX A,B :: ,0x12,0x13
LEAQ r,[s,t,!d,!e] :: 0x48,0x8d,0x04,((@d)-1)<<6|t<<3|s,e
LEAQ r, [ s + t * !h \+ !i ] :: 0x48,0x8d,0x04,((@h)-1)<<6|t<<3|s,i
[[u]]MOVSB :: ;u?0xf3:0,0xa4
TEST !a:: a==3?0xc0:4,0x12,0x13
/* ookakko test
LD (IX[[+!d]]),(IX[[+!e]]):: 0xfd,0x04,d,e
NOP :: 0x01
For x86_64 expressions such as LEAQ r,[s+t*h+i], please write LEAQ r,[s+t*!!h+!!i]. If you write !h instead of !!h, when pattern matching, the assembly line expression evaluation function will interpret the part after 2 in leaq rax,[rbx+rcx*2+0x40] as !h, and will interpret the part after that as an expression, 2+0x40, as !h, and 2+0x40 will be substituted for h, resulting in a syntax analysis error for the remaining +!!i. !!h is a factor, and !h is an expression. This is because escape characters in expressions cannot be processed.
leaq rax , [ rbx , rcx , 2 , 0x40]
leaq rax , [ rbx + rcx * 2 + 0x40]
addi $v0,$a0,5
st1 {v0.4s},[x0]
add r1, r2, r3 lsl #20
rep movsb
movsbExecution example
0000000000000000 test.s 1 leaq rax , [ rbx , rcx , 2 , 0x40] 0x48 0x8d 0x04 0x4b 0x40
0000000000000005 test.s 2 leaq rax , [ rbx + rcx * 2 + 0x40] 0x48 0x8d 0x04 0x4b 0x40
000000000000000a test.s 3 addi $v0,$a0,5 0x20 0x82 0x00 0x05
000000000000000e test.s 4 st1 {v0.4s},[x0] 0x01 0x00 0x01 0x00
0000000000000012 test.s 5 add r1, r2, r3 lsl #20 0x88 0x14
0000000000000014 test.s 6 rep movsb 0xf3 0xa4
0000000000000016 test.s 7 movsb 0xa4
・If the label overlaps with a symbol in the pattern file, a "is a pattern file symbol error" will occur.
・If the same label is defined more than once, a "label already defined" error will occur.
・If syntax analysis is not possible, a "syntax error" will occur.
・If an undefined label is referenced, a "Label undefined" error will occur.
・If the syntax is incorrect, an "Illegal syntax in assembler line" or "pattern line" will occur.
・If no template set for instruction of EPIC, "No template set." error will occur.
・If any of the conditions in error_patterns are met, an error will occur. In that case, the following messages will appear for error codes 1, 2, 3 , 5, and 6 (Invalid syntax, Address out of range, Value out of range, Register out of range, Port number out of range). If there are not enough types of errors, add an error message to the source.
-Sorry for the original notation.
-I know it's a ridiculous request, but quantum computers and LISP machines are not supported.
The assembly language of quantum computers is called quantum assembly, and is not assembly language.
LISP machine programs are not assembly language.
-From homemade processors to supercomputers, please feel free to use axx.py.
-Please evaluate, extend, and modify axx. The structure is difficult to understand, but since it is written in Python, it is easy to extend. Please feel free to extend it.
-For now, only constants can be used for quadruple precision floating point numbers. This is the specification of python3. It would be nice if quadruple precision floating point numbers could be handled in python4.
-
Use a preprocessor for macro functionality. To cover all assembly languages, a high-performance macro processor is needed to translate functional,structured and such high-level assembly language into imperative assembly language.
-
For now, when the linker loader specifies option
-i, it imports labels from the TSV file, and when the option-eis specified, the label specified in .export is exported to the file in TSV along with the section/segment to which the label belongs, so this is used. -
Creating axx pattern files is difficult for large ISAs, and since the specifications have been fixed, I'm wondering if AI can do it. Assemblers were originally created to make machine code easier for humans to understand, but in today's world where AI is writing code, I think it would be good to have general assemblers for both assembly language and computers.
- Make it compatible with the linker generally.
・The order of evaluation of pattern files is difficult.
・Make it possible to take an equation for x in qad(x).
・Now that the core is made, I think it would be a complete system if I prepared a pattern file for axx and added a high-performance macros, and optimization functionalities, I would be happy if it were put to practical use.
axx2 (the next generation of axx) concept. Explanation of pattern files (processor description files). Feature not available now.
- Using a more descriptive metalanguage for pattern files would improve readability, eliminate dependency on evaluation order, make control statements easier to write, and make processor description file debugging easier. However, pattern data is more intuitive. Further generalizing the metalanguage and using a descriptive metalanguage for pattern files, adding string literals, string operations, and numeric operations to binary_list, and adding control statements, would enable the generation of intermediate languages and converters between assembly languages. In this case, the binary_list would be renamed object_list, and the pattern file would be renamed processor_specification_file. The metalanguage would be a multi-line description language rather than pattern data. This is feasible. Apparently, someone is currently working on it based on axx. Even in pattern files, you can write macros smartly by setting a='MOV b,c', assigning commands (strings) to character variables (currently lowercase letters, but if you expand this to what we normally call symbols), and writing them in binary_list. Allowing loop structures makes debugging difficult if an infinite loop occurs during processing within axx.py, but allowing evaluation only in pattern files simplifies debugging and allows loop and branch structures. Turing-completeness allows processing of any processor architecture. Lisp machines are also possible in principle. Self-reference checks are required. Use expand(a) to expand. For example, if a='b ; c' b='MOV AX,d' c='JMPC e', the result becomes 'MOV AX,d ; JMPC e'. Use expression(a) to evaluate the expression, and label: to define the label. Keeping labels separate in the processor description file and the assembly file eliminates the need to worry about the same label appearing in both. Meta-processing like EPIC is solved by enumerating variables. Making it a descriptive metalanguage requires drastic rewriting. If the assembler's processor characteristic description file becomes complex, it becomes difficult to make the file compatible with General Disassembler.
If you find a bug, I would appreciate it if you could let me know how axx won't work.
I would like to express my gratitude to my mentor, Junichi Hamada, and Tokyo Denshi Sekkei, who gave me the problems and hints, the University of Electro-Communications, the computer scientists and engineers, Qiita, Google, IEEE, The Alan Turing Institute and some unforgettable people. I received a passing grade from Professor Kameda of the Information Processing Society of Japan. Thank you very much.
