【Data-oriented Programming】面向数据编程

Seperate Code from Data!


  • 2022.07.24:完成初稿


牢记下面 4 条规则:

  1. Separate code from data
  2. Represent data with generic data structures
  3. Data is immutable
  4. Separate data schema from data representation

可以减少额外的复杂度,并且提升抽象程度。另外,面向数据编程和 API 特别配,没有系统依赖,简化数据应用代码。


Complexity of object-oriented programming

  • OOP has a tendency to create complex systems
  • Complexity in the context of this book means hard to understand

对于 OOP 来说,有以下四个特点:

  1. Code and data are mixed
    1. Classes tend to be involved in many relations
  2. Objects are mutable
    1. Extra thinking is needed when reading code
    2. Explicit synchronization is required on multi-threaded enviroments
  3. Data is locked in objects as members
    1. Data serialization is not trivial
  4. Code is locked into classes as methods
    1. Class hierarchies are complex
  • DOP is compatible both with OOP and FP
  • In OOP, code and data are mixed together in classes: data as members and code as methods.
  • Data immutability brings serenity to DOP developers’ minds
  • DOP reduces complexity by rethinking data

Separation between code and data

Separate code from data such that the code resides in functions, whose behavior desn’t depend on data that is somehow encapsulated in the function’s context.


  1. Code modules -> Stateless functions
  2. Data entities -> Only members


  • The system is simple and easy to understand
  • The system is flexible and extensible


  • DOP is against data encapsulation.
  • DOP principles are language-agnostic.
  • The only kind of relation between code modules is the usage relation.
  • The only kinds of relation between data entities are the association and the composition relation.

Basic data manipulation

  • The weak dependency between code and data makes it easers to adapt to changing requirements
  • In DOP, you can retrieve every piece of infomation via an information path and a generic function.
  • In DOP, many parts of our code base tend to be just about data manipulation with no abstractions.

State management

  • A mutation is an operation that changes the state of the system.
  • In a multi-version approach to state management, mutations are split into calculation and commit phases.
  • All data manipulation must be done via immutable functions. It is forbidden to use the native hash map setter.
  • Structural sharing creates a new version of the data by recursively sharing the parts that don’t need to change.
  • A function is said to be immutable when, instead of mutating the data, it creates a new version of the data without chaning the data it receives.

Basic concurrency control

  • Optimistic concurrency control is lock-free.
  • Managing concurrent mutations of our system state with optimistric concurrency control allows our system to support a high throughput of reads and writes.
  • Before updating the state, we need to reconcile the conflicts between possible concurrent mutations.

Unit tests

  • Most of the code in a data-oriented system deals with data manipulation.
  • It’s straightforward to write unit tests for code that deals with data manipulation.
  • We avoid using string comparison in unit tests for functions that deal with data.
  • Remember to include negative test cases in your unit tests.

Basic data validation

  • The boundaries of a system are defined to be the areas where the system exchanges data.
  • Data validation in DOP means checking whether a piece of data conforms to a schema.
  • When a piece of data is not valid, we get information about the validation failures and send this information back to the client in a human readable format.
  • JSON Schema is a language that allows us to separate data validataion from data representation.
  • JSON Schema syntax is a bit verbose.
  • The expressive power of JSON Schema is high.
  • It’s good practice to be strict regarding data that you send and to be flexible regarding data that you receive.

Advanced concurrency control

  • Managing concurrency with atoms is much simpler than managing concurrency with locks because we don’t have to deal with the risk of deadlocks.
  • Cloning data to avoid read locks doesn’t scale.
  • When data is immutable, reads are always safe.
  • With atoms, deadlocks never happen.
  • It’s quite common to represent an in-memory cache as a string map.

Persistent data structures

  • In practice, manipulation of persistent data structures is efficient even for collections with 10 billion entries!
  • Persistent lists can be manipulated in near constant time.

Database operations

  • Representing data from the database as data reduces system complexity because we don’t need design patterns or complex class hierarchies to do it.
  • The best way to manipulate data is to represent data as data.
  • We represent data from the database with generic data collections, and we manipulate it with generic functions.
  • Flexibility is increased as many parts of the system are free to manipulate data without dealing with concrete types.
  • In DOP, field names are just strings. It allows us to write generic functions to manipulate list of maps representing data fetched from the database.
  • In DOP, fields are first-class citizens.

Web Services

  • We build the insides of our systems like we build the outsides.
  • In DOP, the inner components of a program are loosely coupled.

Advanced data validation

  • We define dagta schemas using a language like JSON Schema for functions arguments and return values
  • We visualize a data schema by generating a data model diagram out of a JSON Schema.
  • Data validataion inside the system should be dsabled in production.
  • We treat data validation like unit tests.


  • The main benefit of polymorphism is extensibility.
  • A multimethod is made of a dispatch function and multiple methods.

Advanced data manipulation

  • Maintain a clear separation between the code that deals with business logic and the implementation of the data manipulation.
  • We design and implement custom data manipulation functions in a four-step process:
    • Discover the function signature by using it before it is implemented.
    • Write a unit test for the function.
    • Formulate the behavior of the function in plain English.
    • Implement the function
  • The best way to find the signatrue of a custom data manipulation function is to think about the most convenient way to use it.


  • In DOP, a function context is make only of data.
  • In modules that deal with immutable data, function behavior is deterministic - the same arguments always lead to the same return values.


P1: Separate code from data

  • Benifits
    • Code can be reused in different contexts.
    • Code can be tested in isolation.
    • Systems tend to be less complex.
  • Costs
    • No control on what code accesses which data.
    • No packaging.
    • More entities.

P2: Represetn data with generic data structures

  • Benifits:
    • Using generic functions that are not limited to our specific use case
    • A flexible data model
  • Costs:
    • A slight performance hit.
    • No data schema is required.
    • No compile time check that the data is valid is necessary.
    • In some statically-typed languages, explicit type casting is needed.

P3: Data is immutable

  • Benifits:
    • Data access to all with confidence
    • Predictable code behavior
    • Fast equality checks
    • Concurrency safety for free
  • Costs:
    • A performance hit
    • Required library for persistent data structures

P4: Separate data schema from data representation

  • Benifits:
    • Freedom to choose what data should be validated
    • Optional fields
    • Advanced data validation conditions
    • Automatic generation of data model visualization
  • Costs:
    • Weak connection between data and its schema
    • A small performance hit