Beginner’s Guide to Reversing Il2Cpp Games: Structures
Introduction
There seems to be an outage of game hacking content that dives into explaining how to get started with reversing games made on the Unity engine. Most guides out there are either
- made for C# (disgusting…)
- short, and provide the what but not the why (and sometimes how)
I hope to turn these guides into YouTube videos soon therefore if you have any constructive criticism about the information provided, please do not hesitate to let me know so I can update it accordingly.
Prerequisites
This post is aimed at late-beginner/intermediate game hackers who are transferring from starter games (such as CS:GO, AssaultCube, etc). In order to take away the most information possibly from this blog post, it is important that you are familiar with the following:
- x86/64 memory layout
- class inheritance
- basics of Unity
Background
Although this post focuses on Il2Cpp, most of the information covered can be applied to Mono due to how the two our similar. Sometimes, I will actually be referencing both Il2Cpp sources and Mono sources. Without further discussion, let’s hop right into what Unity is and how it is structured.
What is Il2Cpp?
IL2CPP (Intermediate Language to C++) is a Unity-developed scripting backend that offers several advantages over Mono. Instead of executing C# code directly, IL2CPP converts the C# code into C++ code (important), which is then compiled into a native executable for the target platform.
Some key components to note about Il2Cpp is the fact that:
- it utilizies JIT (Just In Time) compilation which affects function calls and hooking (outside the scope of this tutorial)
- C# to C++ means reverse engineering functions will be “harder”
Il2cpp Structure
Il2CppDomain
Information about the Il2CppDomain isn’t too important. All you have to really know is that the Il2CppDomain is where all the information about assemblies is stored. It acts as a sandbox for executing managed code; isolating and encapsulating memory for each assembly.
Taking a look at the API, you can see that internally, the Il2CppDomain is used to retrieve assemblies (explained in a moment). This is important as it gives us a hint as to where to look for assemblies externally.
DO_API(Il2CppDomain*, il2cpp_domain_get, ());
DO_API(const Il2CppAssembly*, il2cpp_domain_assembly_open, (Il2CppDomain * domain, const char* name));
DO_API(const Il2CppAssembly**, il2cpp_domain_get_assemblies, (const Il2CppDomain * domain, size_t * size));
il2cpp_domain_get_assemblies
actually returns an array of assemblies, which we’ll verify later in this tutorial.
Il2CppAssembly
An Il2CppAssembly is an assembly generated by Il2Cpp when converting from managed code to C++ code. In C#, assemblies are structures that contain data regarding the current image (the current DLL being referenced in this case) and a little more.
typedef struct Il2CppAssembly
{
Il2CppImage* image;
uint32_t token;
int32_t referencedAssemblyStart;
int32_t referencedAssemblyCount;
Il2CppAssemblyName aname;
} Il2CppAssembly;
Il2CppImage
After retrieving a Il2CppAssembly you’ll eventually find yourself looking at a Il2CppImage. This structure is pretty short and contains data regarding the name of the image, a pointer to the assembly it is assigned to, along with the start index of the types (classes) it contains and the count.
It is defined as so:
typedef struct Il2CppImage
{
const char* name;
const char *nameNoExt;
Il2CppAssembly* assembly;
TypeDefinitionIndex typeStart;
uint32_t typeCount;
TypeDefinitionIndex exportedTypeStart;
uint32_t exportedTypeCount;
CustomAttributeIndex customAttributeStart;
uint32_t customAttributeCount;
MethodIndex entryPointIndex;
#ifdef __cplusplus
mutable
#endif
Il2CppNameToTypeDefinitionIndexHashTable * nameToClassHashTable;
uint32_t token;
uint8_t dynamic;
} Il2CppImage;
Below is an example image. One thing I have noted is that often times than not, the typeCount
field (amount of classes defined inside this image) will be empty. I’ve found it more reliable to get the class count by dividng the typeStart
field by sizeof(uintptr_t)
.
Why do we need to divide the offset by 4/8?
The class list as we’ll see soon arranges it’s classes in a linear format, in order of instantiation (index) in the assemblies list. In order to iterate through this list without the typeCount
field, we need to use the typeStart
as a counter (and offset) and add to it after every iteration.
Let me prove this theory.
As you can see, we are looking at a class in the class list. Take note of the index, as it is currently 1 (less than our class_offset).
If you check the image name, you’ll see it is apart of Assembly-CSharp.
Keep going past the class_offset
until you find a valid class and here you can see we are in a new assembly.
Repeat the process (adding the offsets of assemblies we’ve already parsed) and we will encounter a new class in a different assembly.
This predictable pattern of classes in assemblies matches the order they are defined in the assembly list.
Il2CppClass
In C# (almost) everything is a class (beside primitive types like integers and booleans). When the IL to C++ process takes place, these types/classes are converted to these Il2CppClass structures.
typedef struct Il2CppClass
{
// The following fields are always valid for a Il2CppClass structure
const Il2CppImage* image;
void* gc_desc;
const char* name;
const char* namespaze;
Il2CppType byval_arg;
Il2CppType this_arg;
Il2CppClass* element_class;
Il2CppClass* castClass;
Il2CppClass* declaringType;
Il2CppClass* parent;
Il2CppGenericClass *generic_class;
const Il2CppTypeDefinition* typeDefinition; // non-NULL for Il2CppClass's constructed from type defintions
const Il2CppInteropData* interopData;
Il2CppClass* klass; // hack to pretend we are a MonoVTable. Points to ourself
// End always valid fields
// The following fields need initialized before access. This can be done per field or as an aggregate via a call to Class::Init
FieldInfo* fields; // Initialized in SetupFields
const EventInfo* events; // Initialized in SetupEvents
const PropertyInfo* properties; // Initialized in SetupProperties
const MethodInfo** methods; // Initialized in SetupMethods
Il2CppClass** nestedTypes; // Initialized in SetupNestedTypes
Il2CppClass** implementedInterfaces; // Initialized in SetupInterfaces
Il2CppRuntimeInterfaceOffsetPair* interfaceOffsets; // Initialized in Init
void* static_fields; // Initialized in Init
const Il2CppRGCTXData* rgctx_data; // Initialized in Init
// used for fast parent checks
Il2CppClass** typeHierarchy; // Initialized in SetupTypeHierachy
// End initialization required fields
uint32_t initializationExceptionGCHandle;
uint32_t cctor_started;
uint32_t cctor_finished;
ALIGN_TYPE(8) uint64_t cctor_thread;
// Remaining fields are always valid except where noted
GenericContainerIndex genericContainerIndex;
uint32_t instance_size;
uint32_t actualSize;
uint32_t element_size;
int32_t native_size;
uint32_t static_fields_size;
uint32_t thread_static_fields_size;
int32_t thread_static_fields_offset;
uint32_t flags;
uint32_t token;
uint16_t method_count; // lazily calculated for arrays, i.e. when rank > 0
uint16_t property_count;
uint16_t field_count;
uint16_t event_count;
uint16_t nested_type_count;
uint16_t vtable_count; // lazily calculated for arrays, i.e. when rank > 0
uint16_t interfaces_count;
uint16_t interface_offsets_count; // lazily calculated for arrays, i.e. when rank > 0
uint8_t typeHierarchyDepth; // Initialized in SetupTypeHierachy
uint8_t genericRecursionDepth;
uint8_t rank;
uint8_t minimumAlignment; // Alignment of this type
uint8_t naturalAligment; // Alignment of this type without accounting for packing
uint8_t packingSize;
// this is critical for performance of Class::InitFromCodegen. Equals to initialized && !has_initialization_error at all times.
// Use Class::UpdateInitializedAndNoError to update
uint8_t initialized_and_no_error : 1;
uint8_t valuetype : 1;
uint8_t initialized : 1;
uint8_t enumtype : 1;
uint8_t is_generic : 1;
uint8_t has_references : 1;
uint8_t init_pending : 1;
uint8_t size_inited : 1;
uint8_t has_finalize : 1;
uint8_t has_cctor : 1;
uint8_t is_blittable : 1;
uint8_t is_import_or_windows_runtime : 1;
uint8_t is_vtable_initialized : 1;
uint8_t has_initialization_error : 1;
VirtualInvokeData vtable[IL2CPP_ZERO_LEN_ARRAY];
} Il2CppClass;
Here is what BaseNetworkable
looks like when you view it as a Il2CppClass:
Most of the fields aren’t important, so let me filter out the ones that will be of use initially for externals.
typedef struct Il2CppClass
{
// The following fields are always valid for a Il2CppClass structure
const Il2CppImage* image;
const char* name;
const char* namespaze;
...
// The following fields need initialized before access. This can be done per field or as an aggregate via a call to Class::Init
FieldInfo* fields; // Initialized in SetupFields
..
void* static_fields; // Initialized in Init
...
uint16_t field_count;
...
} Il2CppClass;
typedef struct FieldInfo
{
const char* name;
const Il2CppType* type;
Il2CppClass *parent;
int32_t offset; // If offset is -1, then it's thread static
uint32_t token;
} FieldInfo;
Here is a snippet from an external Rust SDK I’ve been working on, with my il2cpp external wrapper (blog post in the future) that utilizes this concept.
PlayerInventory* get_inventory( )
{
static auto klass = il2cpp::find_class( "Assembly-CSharp", "BasePlayer" );
static auto offset = klass->get_field_offset( "inventory" );
return reinterpret_cast<PlayerInventory*>(
memory::read<uintptr_t>(uintptr_t( this ) + offset)
);
}
this
pointer points to a valid Il2CppClass
instance of type BasePlayer
, once it retrieves the offset for BasePlayer::inventory
, it’ll add it to the pointer and dereference. Basic stuff.
Just like how there are static classes in C++ which contain static fields, there is a Il2Cpp equivalent. These static fields are located at Il2CppClass::static_fields
(0xB8), which is just an array of these static classes.
For example, this is what BaseNetworkable
’s static fields look like:
If you check the class in dnSpy, there is only one field clientEntities
, and it is located at offset 0x0.
Where can I find these classes without parsing the class list?
Il2Cpp keeps data about these classes in it’s metadata, where it is represented as a Typeinfo object. If you use a tool like Il2CppDumper
or Il2cppAssemblyUnhollower
, it’ll read the metadata where you can retrieve a static instance of the class.
Convert the address to hex and add it to GameAssembly to get the static instance.
Conclusion
I hope this post serves as a decent resource for newcomers who are just starting Unity Il2cpp games but have no idea what the hell is going on. Although this is aimed for those looking to create an external, similar concepts can be applied to the Mono backend.
I highly suggest you seek some time into the backend yourself, first practicing on a dummy Unity il2cpp game (so you can get PDB files and look at the source), and then actual compiled games.
Resources
Il2Cpp Class Structures Il2Cpp Functions Il2Cpp Source Example of a Il2Cpp External API Wrapper Posts about Il2Cpp
Contact
If you’d like to leave criticism, suggest a change or even ask questions, you can find me on UC (@absceptual) or Discord (absceptual#4435)