Do you have any good resources on this subject? I agree and would like to see wh...

ebonnafoux · 2025-11-07T15:33:48 1762529628

The very classic parse don't valide if you haven't already read it : https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

strongly-typed · 2025-11-07T19:15:08 1762542908

Here's my take. It helps you enforce properties about your data. Didn't mean to make this response so long, but alas.

  (* 
    Quick note on notation: I will use "double quotes" when referring to _values_ and `backticks` when referring to _types_.
  *)

  (* 
    Think of this as an interface. It defines the shape of a module. Notice that the interface describes a module that defines a type called `t`, and two values: "of_string", and "to_string", and they are functions with types: `string -> t`, and `t -> string`. 
  *)
  module type ID = sig
    type t
    val of_string : string -> t
    val to_string : t -> string
  end
  
  (* 
    Below this comment is a module named "Id" that _is of type_ (in other words: it _implements the interface called_) `ID`. Due to the explicit type annotation (Id : ID), now from the perspective of anywhere else in the code, the exported interface of the module "Id" is `ID`.
  
    Modules only contain two things: `type declarations`, and "values". Values are your primitives such as 1, '<', "hello", but also composite such as (fun x -> x + 1), (Some x), f x, { foo = "bar"; baz = 42 }, and even (module Id) (yes! modules can be values too!). Type declarations tell the compiler . Anything which is a value _always_ has a type that can _usually_ be inferred.
  
    No type annotation is necessary when the compiler correctly deduces the type of your value through static analysis. For instance, in the module below, "of_string" is deduced to be of type ('a -> 'a). The ' on the symbol 'a signifies a "type variable", and it means that it can be filled in with any type. For instance (t -> t) and (string -> string), but not (t -> string) or (string -> t). For those it would have to be of type ('a -> 'b). We cannot deduce this type, however, because our implementations do nothing with their inputs besides return them. Since nothing is changed, it's always the same type. 

    Now, can you spot the pink elephant? Notice how the "ID" interface from above defines "of_string" to be of type (string -> t). How can this be possible? It's because we gave the compiler a hint when we said `type t = string`. This says that a "value" of type `t` is backed by a value of type `string`. If something type checks as `t`, it also type checks as `string`. 

    So, we could reason through and say ('a -> 'a) can be instantiated to (t -> t), but `t` is also equal to `string`, so we can mentally imagine a hypothetical intermediate type... something like ({t,string} -> {t,string}). This type and type equality is visible _inside_ the module. But when the `ID` interface was applied over the `Id` module as in (Id : ID), this has the effect of hiding the type equality (the fact that `type t = string`) because in the `ID` interface we define `t` without an equals sign: `type t`. This forces us to _choose_ a concrete type to expose externally, even though the type is less general than what the implementation sees.
  
    NOTE: OCaml doesn't use parens for function definition or application. Compare this OCaml code against its Python equivalent.
  
    > let hello_world h w = (h, w)
    > let h, w = hello_world 1 2
  
    vs.
  
    > def hello_world(h, w):
    >   return (h, w)
    > h, w = hello_world(1, 2)
  *)
  module Id : ID = struct
    type t = string
    let of_string s = s
    let to_string s = s
  end
  
  let main () =
    let s = "abc123" in
    let id = Id.of_string s in
  
    (* NOTE(type error): because the built-in "print_endline" function is of type (string -> unit) and not (Id.t -> unit) *)
    (* NOTE: if an expression returns unit, you don't need to create a let binding for it. You can simply tack a semicolon to the end of it if you need sequence another expression to follow it. *)
    print_endline id;
  
    (* okay *)
    (* STDOUT: abc123 *)
    print_endline (Id.to_string id)
  ;;
  
  main ()

You could imagine implementing this pattern of defining parsers such as "of_string", "of_bytes", "of_json", "of_int", "of_db_row", "of_request", for any piece of input data. You can think of all of these functions as static constructors in OOP... you take in some data, and produce some output value: e.g. "of_string" takes in a `string` and produces a `t`.

Now, if you have a bunch of "values" of type `t`, you know that they _only_ could have been produced by the `of_string` function, because `of_string` might be the _only_ function that ends with `-> t`. Therefore, all the values maintain the same properties enforced by the `of_string` function (similar to class constructors in OOP). With this, you can create types such as `Nonnegative.t`, `Percent.t`, `Currency.t`, `Image.t`, `ProfilePicture.t`, and parsers from another type to the newly minted type.

The compiler can help you enforce these properties by providing guardrails in the form of static compiler checks (these checks are run _before_ your code can even be compiled). If I have a value of type `Nonnegative.t`, then not only do I not need to validate that it's not negative, I also don't have to validate that it's not negative everywhere else that values of that type are used -- the validation logic is baked into the constructor. Parse, don't validate.*