Introduction

There seems to be an outage of game hacking content that dives into explaining how to get started with reversing games made on the Unity engine. Most guides out there are either

  • made for C# (disgusting…)
  • short, and provide the what but not the why (and sometimes how)

I hope to turn these guides into YouTube videos soon therefore if you have any constructive criticism about the information provided, please do not hesitate to let me know so I can update it accordingly.

Prerequisites

This post is aimed at late-beginner/intermediate game hackers who are transferring from starter games (such as CS:GO, AssaultCube, etc). In order to take away the most information possibly from this blog post, it is important that you are familiar with the following:

  • x86/64 memory layout
  • class inheritance
  • basics of Unity

Background

Although this post focuses on Il2Cpp, most of the information covered can be applied to Mono due to how the two our similar. Sometimes, I will actually be referencing both Il2Cpp sources and Mono sources. Without further discussion, let’s hop right into what Unity is and how it is structured.

What is Il2Cpp?

IL2CPP (Intermediate Language to C++) is a Unity-developed scripting backend that offers several advantages over Mono. Instead of executing C# code directly, IL2CPP converts the C# code into C++ code (important), which is then compiled into a native executable for the target platform.

Some key components to note about Il2Cpp is the fact that:

  • it utilizies JIT (Just In Time) compilation which affects function calls and hooking (outside the scope of this tutorial)
  • C# to C++ means reverse engineering functions will be “harder”

Il2cpp Structure

Il2CppDomain

Information about the Il2CppDomain isn’t too important. All you have to really know is that the Il2CppDomain is where all the information about assemblies is stored. It acts as a sandbox for executing managed code; isolating and encapsulating memory for each assembly.

Taking a look at the API, you can see that internally, the Il2CppDomain is used to retrieve assemblies (explained in a moment). This is important as it gives us a hint as to where to look for assemblies externally.


DO_API(Il2CppDomain*, il2cpp_domain_get, ());
DO_API(const Il2CppAssembly*, il2cpp_domain_assembly_open, (Il2CppDomain * domain, const char* name));
DO_API(const Il2CppAssembly**, il2cpp_domain_get_assemblies, (const Il2CppDomain * domain, size_t * size));
il2cpp_domain_get_assemblies actually returns an array of assemblies, which we’ll verify later in this tutorial.

Il2CppAssembly

An Il2CppAssembly is an assembly generated by Il2Cpp when converting from managed code to C++ code. In C#, assemblies are structures that contain data regarding the current image (the current DLL being referenced in this case) and a little more.


typedef struct Il2CppAssembly
{
    Il2CppImage* image;
    uint32_t token;
    int32_t referencedAssemblyStart;
    int32_t referencedAssemblyCount;
    Il2CppAssemblyName aname;
} Il2CppAssembly;
Below is an example of an Il2CppAssembly. Only the first assembly contains valid referenced_assembly_count (amount of assemblies) from what I’ve seen, which can be used as a counter when iterating the assembly list.

Il2CppImage

After retrieving a Il2CppAssembly you’ll eventually find yourself looking at a Il2CppImage. This structure is pretty short and contains data regarding the name of the image, a pointer to the assembly it is assigned to, along with the start index of the types (classes) it contains and the count.

It is defined as so:


typedef struct Il2CppImage
{
    const char* name;
    const char *nameNoExt;
    Il2CppAssembly* assembly;

    TypeDefinitionIndex typeStart;
    uint32_t typeCount;

    TypeDefinitionIndex exportedTypeStart;
    uint32_t exportedTypeCount;

    CustomAttributeIndex customAttributeStart;
    uint32_t customAttributeCount;

    MethodIndex entryPointIndex;

#ifdef __cplusplus
    mutable
#endif
    Il2CppNameToTypeDefinitionIndexHashTable * nameToClassHashTable;

    uint32_t token;
    uint8_t dynamic;
} Il2CppImage;

Below is an example image. One thing I have noted is that often times than not, the typeCount field (amount of classes defined inside this image) will be empty. I’ve found it more reliable to get the class count by dividng the typeStart field by sizeof(uintptr_t).

Why do we need to divide the offset by 4/8?

The class list as we’ll see soon arranges it’s classes in a linear format, in order of instantiation (index) in the assemblies list. In order to iterate through this list without the typeCount field, we need to use the typeStart as a counter (and offset) and add to it after every iteration.

Let me prove this theory.

As you can see, we are looking at a class in the class list. Take note of the index, as it is currently 1 (less than our class_offset).

If you check the image name, you’ll see it is apart of Assembly-CSharp. Keep going past the class_offset until you find a valid class and here you can see we are in a new assembly.

Repeat the process (adding the offsets of assemblies we’ve already parsed) and we will encounter a new class in a different assembly.

This predictable pattern of classes in assemblies matches the order they are defined in the assembly list.

Il2CppClass

In C# (almost) everything is a class (beside primitive types like integers and booleans). When the IL to C++ process takes place, these types/classes are converted to these Il2CppClass structures.


typedef struct Il2CppClass
{
    // The following fields are always valid for a Il2CppClass structure
    const Il2CppImage* image;
    void* gc_desc;
    const char* name;
    const char* namespaze;
    Il2CppType byval_arg;
    Il2CppType this_arg;
    Il2CppClass* element_class;
    Il2CppClass* castClass;
    Il2CppClass* declaringType;
    Il2CppClass* parent;
    Il2CppGenericClass *generic_class;
    const Il2CppTypeDefinition* typeDefinition; // non-NULL for Il2CppClass's constructed from type defintions
    const Il2CppInteropData* interopData;
    Il2CppClass* klass; // hack to pretend we are a MonoVTable. Points to ourself
    // End always valid fields

    // The following fields need initialized before access. This can be done per field or as an aggregate via a call to Class::Init
    FieldInfo* fields; // Initialized in SetupFields
    const EventInfo* events; // Initialized in SetupEvents
    const PropertyInfo* properties; // Initialized in SetupProperties
    const MethodInfo** methods; // Initialized in SetupMethods
    Il2CppClass** nestedTypes; // Initialized in SetupNestedTypes
    Il2CppClass** implementedInterfaces; // Initialized in SetupInterfaces
    Il2CppRuntimeInterfaceOffsetPair* interfaceOffsets; // Initialized in Init
    void* static_fields; // Initialized in Init
    const Il2CppRGCTXData* rgctx_data; // Initialized in Init
    // used for fast parent checks
    Il2CppClass** typeHierarchy; // Initialized in SetupTypeHierachy
    // End initialization required fields

    uint32_t initializationExceptionGCHandle;

    uint32_t cctor_started;
    uint32_t cctor_finished;
    ALIGN_TYPE(8) uint64_t cctor_thread;

    // Remaining fields are always valid except where noted
    GenericContainerIndex genericContainerIndex;
    uint32_t instance_size;
    uint32_t actualSize;
    uint32_t element_size;
    int32_t native_size;
    uint32_t static_fields_size;
    uint32_t thread_static_fields_size;
    int32_t thread_static_fields_offset;
    uint32_t flags;
    uint32_t token;

    uint16_t method_count; // lazily calculated for arrays, i.e. when rank > 0
    uint16_t property_count;
    uint16_t field_count;
    uint16_t event_count;
    uint16_t nested_type_count;
    uint16_t vtable_count; // lazily calculated for arrays, i.e. when rank > 0
    uint16_t interfaces_count;
    uint16_t interface_offsets_count; // lazily calculated for arrays, i.e. when rank > 0

    uint8_t typeHierarchyDepth; // Initialized in SetupTypeHierachy
    uint8_t genericRecursionDepth;
    uint8_t rank;
    uint8_t minimumAlignment; // Alignment of this type
    uint8_t naturalAligment; // Alignment of this type without accounting for packing
    uint8_t packingSize;

    // this is critical for performance of Class::InitFromCodegen. Equals to initialized && !has_initialization_error at all times.
    // Use Class::UpdateInitializedAndNoError to update
    uint8_t initialized_and_no_error : 1;

    uint8_t valuetype : 1;
    uint8_t initialized : 1;
    uint8_t enumtype : 1;
    uint8_t is_generic : 1;
    uint8_t has_references : 1;
    uint8_t init_pending : 1;
    uint8_t size_inited : 1;
    uint8_t has_finalize : 1;
    uint8_t has_cctor : 1;
    uint8_t is_blittable : 1;
    uint8_t is_import_or_windows_runtime : 1;
    uint8_t is_vtable_initialized : 1;
    uint8_t has_initialization_error : 1;
    VirtualInvokeData vtable[IL2CPP_ZERO_LEN_ARRAY];
} Il2CppClass;

Here is what BaseNetworkable looks like when you view it as a Il2CppClass:

Most of the fields aren’t important, so let me filter out the ones that will be of use initially for externals.


typedef struct Il2CppClass
{
    // The following fields are always valid for a Il2CppClass structure
    const Il2CppImage* image;
    const char* name;
    const char* namespaze;
    ...
    
    // The following fields need initialized before access. This can be done per field or as an aggregate via a call to Class::Init
    FieldInfo* fields; // Initialized in SetupFields
    ..
    void* static_fields; // Initialized in Init
    ...
    uint16_t field_count;
	...
} Il2CppClass;
Fields are “members” of the class, initalized at runtime.

typedef struct FieldInfo
{
    const char* name;
    const Il2CppType* type;
    Il2CppClass *parent;
    int32_t offset; // If offset is -1, then it's thread static
    uint32_t token;
} FieldInfo;
They don’t contain the actual data member though, but instead an offset. This offset can be used by adding it to the address of an instance of that class and reading it.

Here is a snippet from an external Rust SDK I’ve been working on, with my il2cpp external wrapper (blog post in the future) that utilizes this concept.


PlayerInventory* get_inventory( )
{
	static auto klass = il2cpp::find_class( "Assembly-CSharp", "BasePlayer" );
	static auto offset = klass->get_field_offset( "inventory" );

	return reinterpret_cast<PlayerInventory*>(
		memory::read<uintptr_t>(uintptr_t( this ) + offset)
	);
}
Assuming the this pointer points to a valid Il2CppClass instance of type BasePlayer, once it retrieves the offset for BasePlayer::inventory, it’ll add it to the pointer and dereference. Basic stuff.

Just like how there are static classes in C++ which contain static fields, there is a Il2Cpp equivalent. These static fields are located at Il2CppClass::static_fields (0xB8), which is just an array of these static classes.

For example, this is what BaseNetworkable’s static fields look like: If you check the class in dnSpy, there is only one field clientEntities, and it is located at offset 0x0.

Where can I find these classes without parsing the class list?

Il2Cpp keeps data about these classes in it’s metadata, where it is represented as a Typeinfo object. If you use a tool like Il2CppDumper or Il2cppAssemblyUnhollower, it’ll read the metadata where you can retrieve a static instance of the class.

Convert the address to hex and add it to GameAssembly to get the static instance.

Conclusion

I hope this post serves as a decent resource for newcomers who are just starting Unity Il2cpp games but have no idea what the hell is going on. Although this is aimed for those looking to create an external, similar concepts can be applied to the Mono backend.

I highly suggest you seek some time into the backend yourself, first practicing on a dummy Unity il2cpp game (so you can get PDB files and look at the source), and then actual compiled games.

Resources

Il2Cpp Class Structures Il2Cpp Functions Il2Cpp Source Example of a Il2Cpp External API Wrapper Posts about Il2Cpp

Contact

If you’d like to leave criticism, suggest a change or even ask questions, you can find me on UC (@absceptual) or Discord (absceptual#4435)