Retrieve access information of instruction operands

1. Get access info of registers

Now available in the Github branch next, Capstone provides a new API named cs_regs_access(). This function can retrieve the list of all registers read or modified - either implicitly or explicitly - by instructions.


The C sample code below demonstrates how to use cs_regs_access on X86 input.

 1 #include <stdio.h>
 2 
 3 #include <capstone/capstone.h>
 4 
 5 #define CODE "\x8d\x4c\x32\x08\x01\xd8"
 6 
 7 int main(void)
 8 {
 9   csh handle;
10   cs_insn *insn;
11   size_t count, j;
12   cs_regs regs_read, regs_write;
13   uint8_t read_count, write_count, i;
14   
15   if (cs_open(CS_ARCH_X86, CS_MODE_32, &handle) != CS_ERR_OK)
16     return -1;
17   
18   cs_option(handle, CS_OPT_DETAIL, CS_OPT_ON);
19   
20   count = cs_disasm(handle, CODE, sizeof(CODE)-1, 0x1000, 0, &insn);
21   if (count > 0) {
22     for (j = 0; j < count; j++) {
23       // Print assembly
24       printf("%s\t%s\n", insn[j].mnemonic, insn[j].op_str);
25 
26       // Print all registers accessed by this instruction.
27       if (cs_regs_access(handle, &insn[j],
28             regs_read, &read_count,
29             regs_write, &write_count) == 0) {
30         if (read_count > 0) {
31           printf("\n\tRegisters read:");
32           for (i = 0; i < read_count; i++) {
33              printf(" %s", cs_reg_name(handle, regs_read[i]));
34           }
35           printf("\n");
36         }
37 
38         if (write_count > 0) {
39           printf("\n\tRegisters modified:");
40           for (i = 0; i < write_count; i++) {
41             printf(" %s", cs_reg_name(handle, regs_write[i]));
42           }
43           printf("\n");
44         }
45       }
46     }
47 
48     cs_free(insn, count);
49   } else
50      printf("ERROR: Failed to disassemble given code!\n");
51 
52   cs_close(&handle);
53 
54   return 0;
55 }


Compile and run this sample, we have the output as follows.

lea    ecx, [edx + esi + 8]

    Registers read: edx esi
    Registers modified: ecx

add eax, ebx

    Registers read: eax ebx
    Registers modified: eflags eax


Below is the explanation for important lines of the above C sample.


For those readers more familiar with Python, the below code does the same thing as the above C sample.

 1 from capstone import *
 2 
 3 CODE = b"\x8d\x4c\x32\x08\x01\xd8"
 4 
 5 md = Cs(CS_ARCH_X86, CS_MODE_32)
 6 md.detail = True
 7 
 8 for insn in md.disasm(CODE, 0x1000):
 9  print("%s\t%s" % (insn.mnemonic, insn.op_str))
10 
11  (regs_read, regs_write) = insn.regs_access()
12 
13  if len(regs_read) > 0:
14      print("\n\tRegisters read:", end="")
15      for r in regs_read:
16          print(" %s" %(insn.reg_name(r)), end="")
17      print()
18 
19  if len(regs_write) > 0:
20      print("\n\tRegisters modified:", end="")
21      for r in regs_write:
22          print(" %s" %(insn.reg_name(r)), end="")
23      print()


Below is the explanation for important lines of this Python sample.


2. Get access info of operands

For instruction operands, besides the information such as size & type, now we can retrieve the access information. This is possible thanks to the new field csx86op.access in x86.h.


With the help of csx86op.access, we can find out how each instruction operand is accessed, like below.

lea    ecx, [edx + esi + 8]
    Number of operands: 2
        operands[0].type: REG = ecx
        operands[0].access: WRITE

add eax, ebx
    Number of operands: 2
        operands[0].type: REG = eax
        operands[0].access: READ | WRITE

        operands[1].type: REG = ebx
        operands[1].access: READ


Note that instruction LEA do not actually access the second operand, hence this operand is ignored.


3. Status register update

Arithmetic instructions might update status flags. In X86 case, this is the EFLAGS register. Capstone does not only tell you that EFLAGS is modified, but can also provide details on individual bits inside EFLAGS. Examples are CF, ZF, OF, SF flags and so on.

On X86, this information is available in the field cs_x86.eflags, which is bitwise OR of X86EFLAGS* values. Again, this requires the engine to be configured in DETAIL mode.


See the screenshot below for what this feature can provide.


4. More examples

Find the full sample on how to retrieve information on operand access in source of test_x86.c or test_x86.py.