About ORM design
If you work with PHP, you've probably used tools like Eloquent or Doctrine ad nauseam. They are fantastic, saving our lives every day. But have you ever wondered what exactly happens "under the hood" when you write User::find(1)?
Often we use these frameworks as black boxes, trusting blindly. But for us developers, understanding how things work is fundamental (and also fun). In this series of articles, I want to take you behind the scenes of Concrete, an ORM I wrote from scratch.
Let's try to analyze together the why of certain choices, the compromises I had to accept, and how the concepts we read in books (SOLID, Design Patterns) apply (or are ignored!) in the real world.
The problem
The first wall we hit when writing an app is always the same: Impedance Mismatch. It's that technical term indicating the paradigm difference between the two worlds: the object world (made of classes, inheritance, and properties) and the data world (made of tables, rows, and constraints). Indeed, on one side we have our PHP objects, on the other relational tables in the database. They speak two different languages, and the ORM must play the role of simultaneous translator.
For this project, I set myself three simple goals:
- Don't wed any database: Today I use MySQL, tomorrow I want to switch to SQLite for tests without rewriting half a line of code.
- Active Record: Objects must be "smart". For example, an instance of the
Userclass must know how to save itself or delete itself from the database, without having to pass through external managers every time. - But above all: It must be easy! If I have to read the documentation to do a simple query, I have failed.
The architecture
To not tie my hands and feet to a single database, let's create levels of separation. These are abstractions that somehow isolate our code from specific database details. Imagine it like this:
- Connection Layer: It's the physical contact with the DB.
- Driver Layer: It's the "translator" that knows how that specific database speaks. So, ideally, there will be multiple drivers, one for each database.
- ORM Layer: It's the interface we see and use in our code.
The driver level
For the driver level (the "translator"), we will use the Strategy Pattern. Simply put, this pattern allows us to define a family of algorithms, encapsulate them, and make them interchangeable. In our case, the "algorithms" are the different ways of interacting with databases (MySQL, SQLite, etc.), allowing the rest of the application to ignore which technology is actually running under the hood.
In fact, each database has its own characteristics: MySQL uses backticks ` for column names, PostgreSQL uses double quotes, and SQL functions vary slightly.
In practice, we will define a contract, a DriverInterface.
Warning though: here is where many amateur ORMs fall into the SQL Injection trap. If our driver merely returned a SQL string with values concatenated inside (e.g., "... WHERE id = " . $id), it would be enough for a malicious user to insert ' OR '1'='1 to breach our database.
For this reason, our Driver must not return only the SQL, but must separate the query structure from the data. We will use Prepared Statements (or Parameter Binding). The driver will generate a SQL string with "placeholders" (like ? or :value), and return the real values separately. It will then be PDO's job to join them safely.
interface DriverInterface
{
public function connect(string $host, string $db, string $user, string $pass): \PDO;
/**
* Compiles a Select and returns a structure containing SQL and Bindings.
* @return array{0: string, 1: array} Example: ['SELECT * FROM users WHERE id = ?', [1]]
*/
public function compileSelect(string $table, array $columns, array $wheres): array;
public function compileUpdate(string $table, array $values, array $wheres): array;
}
Using this approach, if tomorrow I want to add support for SQL Server, I simply create a SqlServerDriver. He will know that for limits and offsets he must use a different syntax than MySQL, but in the end, he will always return a clean array: [SQL string, Parameters array]. The ORM core will remain safe and unaware of everything.
The connection level
To handle the connection (Database.php), we will use a Singleton (or almost), accessible statically.
class Database
{
private static ?\PDO $pdo = null;
public static function init(DriverInterface $driver, /* ... */)
{
// ... initialization ...
}
public static function getConnection(): \PDO
{
return self::$pdo;
}
}
This implementation might make architecture purists turn up their noses. Indeed, books say global static state is absolute evil. It makes tests harder and hides dependencies. But then why are we doing it?
In reality, we are implementing a pattern called Facade. Imagine the Facade as the reception of a large hotel: you (the client code) speak only with the receptionist (the static class) to get the room key, without worrying about how the electrical, plumbing, or booking system of the hotel behind it works.
Technically, in an Active Record ORM (like the one we are building), we want to be able to write $user->save(). If we didn't have static access to the DB, we would have to initialize every single model passing it the connection: new User($dbConnection). A nightmare to write every time.
Underneath all this, we will use PDO because it already does half the dirty work of abstraction for us. It offers a unified interface to prepare and execute queries, regardless of the underlying driver. Reinventing the wheel here would have been useless.
The ORM level
After establishing the connection and translation architecture, we now focus on the heart of the library. So let's talk about the two key concepts for using the ORM in our projects, namely:
- The Query Builder, responsible for the programmatic construction of SQL queries.
- The Active Record, which maps database rows into objects.
The query builder
The goal of the Query Builder is to provide an object-oriented abstraction for SQL syntax. Instead of manipulating strings, the developer manipulates objects. This means that, rather than writing $builder->query('select * from users where name like %foo%'), we can ideally write $builder->select('users')->where('name', 'like', '%foo%').
This design pattern is called Fluent Interface and allows writing more readable and understandable code. We will see now, but it is usually implemented by having each method return the object itself, allowing method chaining.
Let's analyze the implementation of a method (where) of the query builder:
public function where(string $column, string $operator, $value): self
{
$this->wheres[] = ['type' => 'basic', 'column' => $column, 'operator' => $operator, 'value' => $value];
return $this; // Return the instance itself to allow chaining
}
Each method mutates the internal state of the object and as you can see, returns $this. This allows writing:
$builder->where('age', '>', 18)->orderBy('name')->take(5);
From the State Machine point of view, the
Builderobject accumulates state transitions until a terminal method is invoked (likeget()orfirst()), which triggers compilation and execution.
The use of Traits
When designing a very complex class, such as a Builder can be, one often encounters a problem: the number of lines and the complexity of the class itself. It is easy indeed to reach a point where one can no longer understand which method does what and what dependencies exist between methods. In fact, a SQL builder must handle SELECT, UPDATE, DELETE, JOIN, WHERE, ORDER BY, GROUP BY, etc. and putting everything in a single class would lead to creating an immense and unmanageable black box.
In these cases PHP comes to our aid with Traits, which instead of making us create a complex inheritance hierarchy (as object-oriented programming would want), allows us to compose classes horizontally simply by adding functionalities.
class Builder
{
use Filterable; // WHERE logic
use Orderable; // ORDER BY logic
use Joinable; // JOIN logic
// ...
}
This approach respects the Interface Segregation principle (the "I" of SOLID) at the implementation level. Each trait encapsulates a specific responsibility, making the code more modular and testable. If we wanted to add support for HAVING clauses, we would create a Havingable trait (not the best name, I know) without "polluting" the existing class logic.
Active Record
The Active Record pattern (described by Martin Fowler in Patterns of Enterprise Application Architecture) is an approach where an object "wraps" a row in a database table, encapsulating data access and adding domain logic on that data. By doing so, the model becomes a true "domain object" capable of autonomously performing operations on its own properties, such as validation, event handling, business logic, etc.
Object hydration
Hydration is the process by which the ORM transforms the "raw" results of a SQL query (usually associative arrays returned by the database driver) into meaningful objects. It is one of the most delicate and computationally expensive phases because it is not limited to a simple passing of values, but involves several operations:
- Type Mapping: Converts strings returned by the database into correct types (e.g., strings into
DateTimeobjects, integers, or booleans). - Property Assignment: Maps table columns (e.g.,
created_at) to corresponding class properties (e.g.,$createdAt), often handlingprotectedorprivatevisibility via Reflection. - State Management: The object is marked as "existing" (no longer dirty), allowing the ORM to know if a subsequent
save()should generate anINSERTor anUPDATE.
Without this step, we would work with simple arrays; thanks to hydration, every database row becomes an entity capable of exposing methods and business logic.
// Loop all rows coming from the database and create objects in $results
while ($row = $stmt->fetch()) {
$obj = new $this->modelClass();
$obj->hydrate($row);
$results[] = $obj;
}
In the hydrate method, we will have to populate an $attributes array with the current data coming from the database, and we will also keep a second copy of this data in $original to be able to compare them later.
Ok, said like this it seems like a waste of memory, but in reality, it is an optimization that greatly reduces the database workload.
Dirty checking and query optimization
An error I have seen analyzing some amateur ORMs is executing the UPDATE of all columns every time an object is saved, even if only one field has changed.
Imagine a table with 50 columns and heavy indices. Doing a full update when only the email changed is a waste of I/O and computational resources for the database.
To solve this, we implement Dirty Checking. When the record's save() method is called, the ORM compares the current state ($attributes) with the initial state ($original) and generates an UPDATE query that modifies only the effectively changed columns.
public function getDirty(): array
{
$dirty = [];
foreach ($this->attributes as $key => $value) {
// Check if key exists in original and if value is different
if (!array_key_exists($key, $this->original) || $this->original[$key] !== $value) {
$dirty[$key] = $value;
}
}
return $dirty;
}
If $dirty is empty, the save() method returns immediately without executing any SQL query. If it is not empty, it generates an UPDATE query that modifies only the changed columns. This reduces the database parsing load and, in some engines, reduces row locking or index fragmentation.
Primary key management
A robust system cannot assume that the Primary Key (PK) of a table is always a single auto-incrementing integer. In real contexts, especially in legacy databases or enterprise architectures, one often clashes with natural keys or composite keys (composed of multiple columns).
An ORM, even if basic, must be flexible: it must allow defining which column (or columns) uniquely identify the record and handle the search polymorphically. If we pass a single value, the ORM will query the standard primary column; if we pass an array, it must be able to dynamically construct a multi-column WHERE clause.
In our Model we could define something similar to:
public const PRIMARY_KEY = 'id'; // Default, but overrideable
In the find($id) method, therefore, we will execute a Type Inspection on the input. If $id is an array, the system will deduce that it is a search on a composite key and dynamically adapt the WHERE clause.
if (is_array($id)) {
foreach ($pks as $pk) {
$query->where(static::col($pk), '=', $id[$pk]);
}
}
Conclusions
We have just scratched the surface of what it means to build an ORM from scratch. There are dozens of other aspects we haven't touched: relationships (HasOne, HasMany), Eager Loading to avoid the N+1 problem, transaction management, and model lifecycle events.
I invite you to take a look at the Concrete repository on GitHub, there you can find the complete code of the ORM we tried to design in this article and the complete documentation.
The code is open source and any contribution, be it a Pull Request for a new driver, a bug fix, or even just feedback on the documentation, is absolutely welcome. Building tools from scratch is the best way to learn, but improving them together is what makes us progress.