C#•8mo ago

Optimizing the pipeline on my Z80 emulator

Hello! I've come to ask you all a question about optimization of lambda expressions. Currently, my pipeline looks like this:

private readonly Action[] instructionTable = new Action[256];

// in ctor, for example:
instructionTable[0x48] = () => LD_R_R(C, B); // LD C, B
instructionTable[0x49] = () => LD_R_R(C, C); // LD C, C
instructionTable[0x4A] = () => LD_R_R(C, D); // LD C, D

private void LD_R_R(byte dest, byte source)
{
    Registers.RegisterSet[dest] = Registers.RegisterSet[source];
    LogInstructionExec($"0x{_currentInstruction:X2}: LD {Registers.RegisterName(dest)}, {Registers.RegisterName(source)}:0x{Registers.RegisterSet[source]:X2}");
}

// calling:
_currentInstruction = Fetch();
switch (_currentInstruction)
{
    case 0xDD:
        DDInstructionTable[_currentInstruction](); break;
    case 0xFD:
        FDInstructionTable[_currentInstruction](); break;
    case 0xED:
        EDInstructionTable[_currentInstruction](); break;
    case 0xCB:
        CBInstructionTable[_currentInstruction](); break;

    default:
        instructionTable[_currentInstruction](); break;
}

private readonly Action[] instructionTable = new Action[256];

// in ctor, for example:
instructionTable[0x48] = () => LD_R_R(C, B); // LD C, B
instructionTable[0x49] = () => LD_R_R(C, C); // LD C, C
instructionTable[0x4A] = () => LD_R_R(C, D); // LD C, D

private void LD_R_R(byte dest, byte source)
{
    Registers.RegisterSet[dest] = Registers.RegisterSet[source];
    LogInstructionExec($"0x{_currentInstruction:X2}: LD {Registers.RegisterName(dest)}, {Registers.RegisterName(source)}:0x{Registers.RegisterSet[source]:X2}");
}

// calling:
_currentInstruction = Fetch();
switch (_currentInstruction)
{
    case 0xDD:
        DDInstructionTable[_currentInstruction](); break;
    case 0xFD:
        FDInstructionTable[_currentInstruction](); break;
    case 0xED:
        EDInstructionTable[_currentInstruction](); break;
    case 0xCB:
        CBInstructionTable[_currentInstruction](); break;

    default:
        instructionTable[_currentInstruction](); break;
}

This is nice because it's very reusable, but, lambda indirection to convert the LD_R_R(C, B) into an anonymous call takes time. In this case, it seems like lambdas are pretty slow. How should I go about optimizing this while still keeping it reusable?

6 Replies

reflectronic•8mo ago

you would need to write a separate method with a separate switch statement for all of the cases in instructionTable that calls LD_R_R(C, B); etc. directly

PdawgOP•8mo ago

what, so like

private void LD_C_B() => LD_R_R(C, B);

instructionTable[0x48] = LD_C_B;

private void LD_C_B() => LD_R_R(C, B);

instructionTable[0x48] = LD_C_B;

reflectronic•8mo ago

no, like:

switch (_currentInstruction)
{
    case 0x48: LD_R_R(C, B);  
    // etc.
}

switch (_currentInstruction)
{
    case 0x48: LD_R_R(C, B);  
    // etc.
}

i do not think it is that much more onerous to write

case 0x48: LD_R_R(C, B);

case 0x48: LD_R_R(C, B);

than

_instructionTable[0x48] = () => LD_R_R(C, B);

_instructionTable[0x48] = () => LD_R_R(C, B);

PdawgOP•8mo ago

hmmm yeah i guess only thing is id like to separate the switch blocks out for each "table" should i just

_currentInstruction = Fetch();

switch (_currentInstruction)
{
    case 0xDD:
        //DDInstructionTable[_currentInstruction](); break;
        DDSwitchTable(); break;
        //etc
}

_currentInstruction = Fetch();

switch (_currentInstruction)
{
    case 0xDD:
        //DDInstructionTable[_currentInstruction](); break;
        DDSwitchTable(); break;
        //etc
}

like move each switch out to another method

reflectronic•8mo ago

yes that would be fine i think

PdawgOP•8mo ago

mk just really trying to optimize it while keeping it readable ive seen some horribly unreadable emulators but its fast and some emulators that are all pretty but eat 20% of your cpu trying to emulate a 4MHz z80

Gaming

Programming

Optimizing the pipeline on my Z80 emulator

Did you find this page helpful?