hckrnws
Lies I was told about collaborative editing, Part 2: Why we don't use Yjs
by antics
I'm actually in the middle of rewriting the y-prosemirror binding with Kevin Jahns as we speak, we hope to address a number of the fundamental design choices that were made 6 years ago. I did a presentation on this at FOSDEM this year if anyone is interested in some specifics to the approach we are taking for this: https://fosdem.org/2026/schedule/event/8VKQXR-blocknote-yjs-...
I feel like this post is overly hyperbolic about the choices that an Open Source maintainer made years ago, and no one seemed to care enough to pay him to rewrite it.
The y-prosemirror rewrite is super exciting!
To speak on yjs: We use yjs over at LegendKeeper. We're not a huge app, but our users do worldbuilding for D&D, and have amassed over 30+ million collaborative documents, ranging from rich text to fantasy maps to fantasy timelines. Is yjs technically overkill when you have a central tie-breaker? Sure, but the DX is fantastic, and personally I love the idea of my application being truly local-first, even if our core value prop is not necessarily tied to being offline. It also gives me a legacy support plan for our users in case I ever get hit by a bus. :)
On the tech side, you save a lot of cognitive overhead when you can just do:
applyUpdate(docA, update1) applyUpdate(docB, update1)
and now docA and docB are in the same state, no matter what the context. For centralization, adding in a "well, they'll converge once we add a third party" absolutely increases the cognitive complexity of reasoning about your code, and limits your ability to write clean tests. Centralization buys you a lot, too. I don't think one is correct over the other.
There are tradeoffs. There's a memory and CPU cost, and yes, sometimes the "Technically merged state" of a yjs-prosemmirror document is not what's expected. Over seven years and 150,000+ users, we've never had a single person complain about it.
Hi, the author of Yjs here. Thanks, Nick, for chiming in!
As this article is blowing up now, I want to address a few points.
I, too, feel the need for simplicity over overly complex solutions - and I found it in CRDTs. They beautifully allow me to reason about conflicts - so that my users don't have to. Very few people can design a custom conflict resolution algorithm for an application. Yjs is a general-purpose framework that enables you to make EVERYTHING collaborative. That's the goal.
It's fine if you want to explore different solutions. I don't understand the need to put down one framework in favor of another. It doesn't have to be "OT vs CRDT". Hey, if you found something that works for you - great! But let me tell you that neither solution magically makes everything simpler. There is still a lot to learn.
Different solutions to conflict resolution have different tradeoffs. It's unfortunate that the author of the article attributes all complexity to Yjs. It's just that collaborative editing is a very complex problem and requires a lot of attention to detail. In many regards, Yjs has done very well for the larger ecosystem. In other regards there is room for improvement.
The only thing I acknowledge from the article is the criticism about y-prosemirror "replacing the whole document". Unfortunately, the author extrapolated some false assumptions. This is not a performance issue. y-prosemirror runs at 60 fps even on large documents. It's like arguing React is slow because it replaces the whole document with every edit. We leverage ProseMirror's behavior to do identity checks on the nodes before updating the DOM. However, it's true that this breaks positions for some plugins (e.g. a comment plugin).
Instead of Prosemirror positions, we encourage plugins to use Yjs-based positions, which are more accurate in case of conflicts. Marijn talks about this as well [1]. The collab implementation in Prosemirror does not guarantee that positions always converge. That means, comments could end up in different places for different collaborators. This works in most cases, but in some it doesn't - which is one of the reasons why I prefer CRDTs as a framework to think about conflicts.
But as Nick said, we are currently working on a new y-prosemirror binding that works better with existing plugins.
I'm curious about the section "CRDTs are much, much harder to debug" which ironically talks about how hard prosemirror-collab is to debug. You won't find any such bugs in Yjs. The conflict-resolution algorithm is quite simple and has been battle tested. Before every release, Yjs undergoes extensive fuzz testing for hours in simulated scenarios. I'm very happy to show anyone how to debug a CRDT. It requires some background information, but it ultimately is easier.
To address another unfounded claim by the author: I bet OP $1000 that the GC algorithm in Yjs is correct even in offline-editing scenarios. He won't be able to reproduce the issues he is talking about.
[1]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...
> I don't understand the need to put down one framework in favor of another.
I didn't take the article as "putting down Yjs", just suggesting it's not the best solution for ProseMirror-backed product use cases patterned after their own.
> y-prosemirror runs at 60 fps even on large documents
Did the OP claim Yjs was slow? Have you created a ProseMirror-backed product of the complexity of Confluence's editor with 16ms frame time targets? The challenge isn't the collab algorithm as much as the CPU time of the plugins, "smart" nodes, and other downstream work triggered by updates. It's incredibly useful to have control over the granularity of updates and that is IMHO easier when dealing more closely with ProseMirror steps and transactions .
> I bet OP $1000 that the GC algorithm in Yjs is correct even in offline-editing scenarios.
Did the OP claim the Yjs algorithm is incorrect?
> You won't find any such bugs in Yjs. The conflict-resolution algorithm..
I didn't take the debugging section to indicate an issue with Yjs convergence.. Nor do most people encounter "bugs" with prose-mirror collab; last update 3 years ago? The debugging challenge is typically around "how did we arrive at this document state"(what steps got us here, where did they come from) and how steps/updates interact with plugins. IMHO discarding the original steps and dealing with a different unit of change complicates that greatly. Especially when dealing with a product in production and a customers broken document that needs to be fixed and root caused..
Am I correctly understanding that you (Moment) have chosen to use Prosemirror and that with that using Yjs was the hard part? Or did you mean to say in the article that you used Yjs directly? It would be less prone to misunderstanding if it read "why we don't use y-prosemirror" and you would lose a lot of potential audience for the post.
I tried to understand what was wrong in Yjs, as I'm using it myself, but your point is not really with Yjs it seems but on how the interaction is with Prosemirror in your use case. I can see why you're bringing up your points against Yjs and I'm having a hard time understanding why you don't consider alternatives to Prosemirror directly. Put another way, "because this integration was bad the source system must also be bad". I do not condone this part of your article. Seems like a sunken cost fallacy to me and reasoning about it at anothers expense, but perhaps not. Hoping to hear back from you.
So, we are basically making two points.
First, the fact that Yjs bindings, by design (for ~6 years), replace the entire document, does in my opinion, indicate a fundamental misunderstand what rich text editors need to perform well in any circumstance, not just collaborative ones. As I say in the article... I hope to be able to write another article that this has changed and they now "get" it, but for now I do not think it is appropriate to trust Yjs with this task for production-grade editors. I'm sorry to write this, but I do think it's true! I'm not trying to bag on anyone!
Second, and more material: to deploy Yjs to production in the centralized case, I think you are very much swimming against the current of its architecture. Just one example is permissions. There is no established way to determine which peers in a truly-p2p architecture have permissions to add comments vs edit, so you will end up using a centralized server for that. But that's not free, CRDTs are mechanically much more complicated! For example, you have to figure out how to disallow a user to make mark-only edits if they have "commenter" access, but allow editing the whole doc for "editor" access. This is trivial in `prosemirror-collab` (say) but it's very hard in Yjs because you have to map it "through" their XML transformations model.
I'm happy to talk more about this if it's helpful. But yes, we are trying to say some stuff about Yjs specifically, and some stuff about CRDTs generally.
You misunderstand how the "document replacement" in y-prosemirror works. It's like arguing that React is bad because it performs a complete document replacement on every change. The diffing part makes it fast.
That said, it's not without problems - I acknowledge that. But it's not as bad as you make it sound. You didn't list one concrete case when you had issues with it.
I'm very happy that Nick and I finally found funding to make a rewrite happen. It really did take 6 years to make this happen, because it's hard to find funding for open source projects.
FWIW, I'm literally working on rewriting the y-prosemirror binding today with Kevin Jahns, the creator of Y.js and wrote the initial binding. Yes, the current binding has it's flaws, but we hope to flush out the most egregious of them with a completely different design which I made a presentation about at FOSDEM this year: https://fosdem.org/2026/schedule/event/8VKQXR-blocknote-yjs-...
https://github.com/disarticulate/y-webrtc/blob/master/src/y-... has a validMessage function passed into the room. This allows you to validate any update and reject them. It might be "costly", but it lets you inspect the next object. Since Yjs doesn't care about order or operations, it doesn't really matter how long validation takes.
Not sure what the error conditions loop like, but you could probably bootstrap message hashes in a metadata array in the object, along with encryption signatures to prevent unwanted updates to objects.
Just use OT like normal people, it’s been proven to work. No tombstones, no infinite storage requirements or forced “compaction”, fairly easy to debug, algorithm is moderate to complex but there are reference open source implementations to cross check against. You need a server for OT but you’re always going to have a server anyway, one extra websocket won’t hurt you. We regularly have 30-50k websockets connected at a time. CRDTs are a meme and are not for serious applications.
Author here, I did not specifically mention OT in the article, since our main focus was to help people understand the downsides of the currently-most-popular system, which is built on CRDTs.
BUT, since you mention it, I'll say a bit here. It sounds like you have your own experience, and we'd love to hear about that. But OUR experience was: (1) we found (contrary to popular belief) that OT actually does not require a centralized server, (2) we found it to be harder to implement OT exactly right vs CRDTs, and (3) we found many (though not all) of the problems that CRDTs have, are also problems in practice for OT—although in fairness to OT, we think the problems CRDTs have in general are vastly worse to the end-user experience.
If there's interest I'm happy to write a similar article entirely dedicated to OT. But, for (3), as intuition, we found a lot of the problems that both CRDTs and OT have seem to arise from a fundamental impedance mismatch between the in-memory representation of the state of a modern editor, and the representation that is actually synchronized. That is, when you apply an op (CRDT) or a transform (OT), you have to transform the change into a (to use ProseMirror as an example) valid `Transaction` on an `EditorState`. This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.
With all of that said, OT is definitely much closer to what modern editors need, in my opinion at least. The less-well-known algorithm we ended up recommending here (which I will call "Marjin Collab", after its author) is essentially a very lightweight OT, without the "transformation" step.
> (1) we found (contrary to popular belief) that OT actually does not require a centralized server
In theory, yes, but in practice, any OT that operates without a central server (or master peer) essentially ends up being a CRDT. A CRDT is a subset of OT, specifically one that adds the requirement of P2P support.
> (2) we found it to be harder to implement OT exactly right vs CRDTs
I would say that each has its own complexity in different areas. CRDT's complexity lies in its data structure and algorithm, while OT's lies in its sync engine (since it must handle race conditions and guarantee deterministic ordering). In my opinion, OT is simpler overall. Hopefully DocNode and DocSync will make OT even easier.
> (3) we found many (though not all) of the problems that CRDTs have, are also problems in practice for OT
Oh, definitely not! OT has many benefits[1]. I think the misconception stems from the common belief that OT should be positional, rather than id-based. In the first case, operations are transformed on other operations. In the second case, operations can also be transformed on the current document (O(1)), eliminating the problems commonly associated with OT. This is the approach I use in DocNode.
> the problems CRDTs have in general are vastly worse to the end-user experience.
This is 100% correct.
____
I feel we were quite lucky back in 2015 when we started rewriting CKEditor with RTC as a core requirement. At the time, OT seemed like the only viable option, so the choice was simple :)
What definitely helped too was having a very specific use case (rich-text editing) which guided many of our decisions. We focused heavily on getting the user experience right for common editing scenarios. And I fully agree that it's not just about conflict resolution, but also things like preserving position mappings. All these mechanisms need to work together for the experience to make sense to the end user.
This is an older piece (from 2018), but we shared more details about our approach here: https://ckeditor.com/blog/lessons-learned-from-creating-a-ri...
One clear issue with OT that we still face today is its complexity. It's nearly 8 years since we launched it, and we still occasionally run into bugs in OT – even though it sits at the very core of our engine. I remember seeing a similar comment from the Google Docs team :D
I always mentally slotted prosemirror-collab/your recommended solution in the OT category. What’s the difference between the “rebase” step and the “transformation” step you’re saying it doesn’t need?
Great question. Matt has a comment about this here, and he has an actual PhD on the subject! So rather than doing a worse job explaining I will leave it to him to explain: https://news.ycombinator.com/user?id=mweidner
Having a central server is not necessary, but we have one anyway and we use it, especially if you have a permissions system. It lets us use the "Google wave" algorithm which vastly simplifies things.
https://svn.apache.org/repos/asf/incubator/wave/whitepapers/...
> This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.
Maintaining text editor state is normal. Yes you do need to convert the OT messages into whatever diff format your editor requires (and back), but that's the standard glue code.
The nice thing about OT is that you can just feed the positions of marks into the OT algorithm to get the new positional value. Worst case, you just have the server send the server side position when sending the OT event and the client just displays the server side position.
Josh eloquently explains how Google Wave's DACP (Distributed Application Canceling Protocol) works:
One way to minimize impedance mismatch is to work with DOM-like or JSON-like structures mostly immune to transient bugs, which I am doing currently in the librdx project. It has full-CRDT RDX format[1] and essentially-JSON BASON[2] format. It does not solve all the problems, more like the set of problems is different. On the good side, it is really difficult to break. On the bad side, it lacks some of the rigor (esp BASON) that mature CRDT models have. But, those models are way more complex and, most likely, will have mismatching bugs in different implementations. No free lunch.
[1]: https://github.com/gritzko/librdx/tree/master/rdx [2]: https://github.com/gritzko/librdx/tree/master/json
> CRDTs are a meme and are not for serious applications.
You don't think Figma is a serious application?
By all means, use OT. I worked on OT software for many years - and my work on OT types, ShareJS and ShareDB is still in production all over the place. But I don't think there's anything you can do with OT that you can't do just as well with CRDTs.
The only real benefit of OT is that its simpler to reason about. Maybe that's enough.
> You don't think Figma is a serious application?
I don't know where this popular belief came from. The Figma blog literally says "Figma isn't using true CRDTs"[1].
> The only real benefit of OT is that its simpler to reason about.
That's incorrect. When you free yourself from the P2P restriction that CRDTs are subject to, there's a huge amount of metadata you can get rid of, just to mention one benefit.
[1] https://www.figma.com/blog/how-figmas-multiplayer-technology...
> I don't know where this popular belief came from.
It may be worth reading the whole paragraph of the blog you referenced...
> Figma isn't using true CRDTs though. CRDTs are designed for decentralized systems where there is no single central authority to decide what the final state should be. There is some unavoidable performance and memory overhead with doing this. Since Figma is centralized (our server is the central authority), we can simplify our system by removing this extra overhead and benefit from a faster and leaner implementation.
> It’s also worth noting that Figma's data structure isn't a single CRDT. Instead it's inspired by multiple separate CRDTs and uses them in combination to create the final data structure that represents a Figma document (described below).
So, it's multiple CRDTs, not just one. And they've made some optimizations on top, but that doesn't make it not a CRDT?
Nowhere does it say it's multiple CRDTs. It says "isn't a single CRDT" and that "it's inspired by multiple separate CRDTs." A bit confusing, I agree.
By the way, I work at Figma.
> When you free yourself from the P2P restriction that CRDTs are subject to...
CRDT stands for "Conflict-free replicating data type". They're just, any data type with an idempotent, commutative merge function defined on the entire range of input. Technically, CRDTs have nothing to do with networks at all.
The simplest "real" CRDT is integers and the MAX function. Eg, you think of an integer. I think of an integer. We merge by taking the max of our integers. This is a CRDT. Would it have eventual consistency? Yes! Would it work in a server-to-client setup? Of course! Does it need any metadata at all? Nope! It doesn't even need versions.
There's no such thing as a "P2P restriction". Anything that works p2p also works server-to-client. You can always treat the server and client as peers.
> there's a huge amount of metadata you can get rid of, just to mention one benefit.
Can you give some examples of this metadata? In my experience (10-15 years with this stuff now), I've found you can get the same or better performance out of CRDTs and OT based systems if you're willing to make the same set of tradeoffs.
For example, OT has a tradeoff where you can discard old operations. The cost of doing so is that you can no longer merge old changes. But, you can do exactly the same thing in CRDTs, with the same cost and same benefits. Yjs calls this "garbage collection".
In my own eg-walker algorithm, there's 3 different ways you can do this. You can throw away old operations like in OT based systems - with the same cost and benefit. You can keep old metadata but throw away old data. This lets you still merge but you can't see old versions. Or you can keep old edits on the server and only lazy-load them on the client. Clients are small and fast, and you have full change history.
CRDTs generally give you more options. But more options = more complexity.
I'm no p2p idealist. Central servers definitely make some things easier, like access control. But CRDTs still work great in a centralised context.
Ok, replace "P2P restriction" with "idempotent, commutative restriction".
> For example, OT has a tradeoff where you can discard old operations. The cost of doing so is that you can no longer merge old changes.
Why wouldn’t you be able to? My server receives operations, applies them to the document, and discards them. It can receive operations as old as it wants.
___
> Can you give some examples of this metadata?
Yes, it depends on the CRDT, but if we're talking about lists or tree structures with insert and delete operations, these can come in the form of thombstones, or operation logs, or originRight or originLeft, or DAG. Even with a garbage collector, the CRDT needs to retain some of this metadata.
Yes, you can optimize by not bringing it into memory when it’s not needed. But they’re still there, even though they could be avoided entirely if you assume a central server that guarantees a deterministic ordering of operations.
Are there any major libraries for OT? I've been looking into this recently for a project at work, and OT would be completely sufficient for our use case, and does look simpler overall, but from what I could tell, we'd need to write a lot of stuff ourselves. The only vaguely active-looking project in JS at least seems to be DocNode (https://www.docukit.dev/docnode), and that looks very cool but also very early days.
Author here. I think it depends what you're doing! OT is a true distributed systems algorithm and to my knowledge there are no projects that implement true, distributed OT with strong support for modern rich text editor SDKs like ProseMirror. ShareJS, for example, is abandoned, and predates most modern editors.
If you are using a centralized server and ProseMirror, there are several OT and pseudo-OT implementations. Most popularly, there is prosemirror-collab[4], which is basically "OT without the stuff you don't need with an authoritative source for documents." Practically speaking that means "OT without T", but because it does not transform the ops to be order-independent, it has an extra step on conflict where the user has to rebase changes and re-submit. This is can cause minor edit starvation of less-connected clients. prosemirror-collab-commit[5] fixes this by performing the rebasing on the server... so it's still "OT without the T", but also with an authoritative conflict resolution pseudo-T at the end. I personally recommend prosemirror-collab-commit, it's what we use, and it's extremely fast and predictable.
If you just want something pedogocically helpful, the blessed upstream collaborative editing solution for CodeMirror is OT. See author's blog post[1], the @codemirror/collab package[2], and the live demo[3]. In general this implementation is quite good and worth reading if you are interested in this kind of thing. ShareJS and OTTypes are both very readable and very good, although we found them very challenging to adopt in a real-world ProseMirror-based editor.
[1]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...
[2]: https://codemirror.net/docs/ref/#collab
[3]: https://codemirror.net/examples/collab/
[4]: https://github.com/ProseMirror/prosemirror-collab
[5]: https://github.com/stepwisehq/prosemirror-collab-commit
When I was starting my research into collaborative editing as a PhD student 20+ years ago, rebase-and-resubmit was well known. It was used in one Microsoft team collab product (I forgot the name). It is 100% legit algo except intermittently-connected clients may face challenges (screw them then).
Unless you have to support some complicated scenarios, it will work. I believe Google Docs initially used something of the sort (diff-match-patch based). It annoyed users with alerts "lets rebase your changes", esp on bad WiFi. So they borrowed proper OT from Google Wave and lived happily since (not really).
One way to think about it: how many users will your product have and how strange your data races / corner cases can get. At Google's scale, 0.1% users complaining is a huge shit storm. For others, that is one crazy guy in the channel, no biggie. It all depends.
TLDR: people invented OT/CRDT for a reason.
First of all, thanks for chiming in! I wish someone would collect stuff like this and write it down in some sort of "oral history of collab editing."
Second of all, I actually think we're more aligned than it seems here. What we're really advocating for is being super clear about what your end-user goals are, and deriving technology decisions from them, instead of the reverse. Our goals for this technology are (1) users should be able to predict what happens to their data, (2) the editor always run at 60fps, and (3) we are reasonably tolerant of transient periods of disconnection (up to, say, 30s-1m).
Because of (1) in particular, a lot of our evaluation was focused on understanding which situations users would be unable to predict what was going to happen to their data. This is only our own experience, but what we found (and the impetus for part 1 of this series) is that almost 100% of the time, when there is a direct editing conflict, users interpret the results of the dominant CRDT and OT implementations as silently corrupting their data. So, the name of the game is to decrease the likelihood of direct editing conflicts, e.g. presence carets in the live-collab case. In particular, we did not notice a meaningful difference between how users view reconciliations of OT and CRDT implementations.
Since our users could not tell the difference, and in fact viewed all options as equally bad ("disastrous" as one user said), this freed us up to consider a much broader class of algorithms, including prosemirror-collab and prosemirror-collab-commit.
I know there is a rich history of why OT is OT, but our final determination was made pretty simple by the fact that the source of the majority of race conditions in our view come from the difficulty of integrating CRDTs and OT directly into modern editing stacks, like ProseMirror. As far as I am aware, prosemirror-collab-commit behaves as good or better on every dimension than, say, an OTTypes implementation would... and mostly that is because it is native to the expressive `Transaction` model of the modern editor. If we had to do interop I think we would have shipped something noticably worse, and much slower.
If you have a different experience I would love to hear about it, as we are perennially in the market for new ideas here.
> I wish someone would collect stuff like this and write it down in some sort of "oral history of collab editing.
I'd be very happy to contribute to this if someone wanted to do some storytelling.
I also interviewed Kevin Jahns, the author of Yjs several years ago to get his take on how Yjs works and how he thinks about it all [1]. The conversation was long but I very much enjoyed it.
> This is only our own experience, but what we found (and the impetus for part 1 of this series) is that almost 100% of the time, when there is a direct editing conflict, users interpret the results of the dominant CRDT and OT implementations as silently corrupting their data.
That's not been my experience. Have edits been visible in realtime? I think about it as if there's essentially 2 use cases where collaborative editing shows up:
1. Realtime collab editing. So long as both users' cursors are visible on screen at the same time, users are often hesitant to type at the same place & same time anyway. And if any problems happen, they will simply fix them.
2. Offline (async) collab editing. Eg, editing a project in git. In this case, I think we really want conflict markers & conflict ranges. I've been saying this for years hoping someone implements conflicts within a CRDT, but as far as I know, nobody has done it yet. CRDTs have strictly more information than git does about what has changed. It would be very doable for a CRDT to support the sort of conflict merging that git can do. But, nobody has implemented it yet.
In our case, we're not using a text editor, but instead building a spreadsheet, so a lot of these collab-built-into-an-editor are, like you say, pedagogically useful but less helpful as direct building blocks that we can just pull in and use. But the advice is very useful, thank you!
Interesting! I am building a spreadsheet and the next few months will be building the collaborative side of it. I think many of the things that work for text don't necessarily translate for spreadsheets.
We made a spreadsheet on top of OT several years ago. Most OT related documentation doesn't talk about how to do this. But it worked pretty well for us.
Cheers for plugging prosemirror-collab-commit! Nice to see it's getting used more.
Author of DocNode here. Yes, it’s still early days. But it’s a very robust library that I don’t expect will go through many breaking changes. It has been developed privately for over 2 years and has 100% test coverage. Additionally, each test uses a wrapper to validate things like operation reversibility, consistency across different replicas, etc.
DocSync, which is the sync engine mainly designed with DocNode in mind, I would say is a bit less mature.
I’d love it if you could take a look and see if there’s anything that doesn’t convince you. I’ll be happy to answer any questions.
I've looked through the site, and right now it's probably the thing I'd try out first, but my main concerns are the missing documentation, particular the more cookbook-y kinds of documentation — how you might achieve such-and-such effect, etc. For example, the sync example is very terse, although I can understand why you'd like to encourage people to use the more robust, paid-for solution! Also just general advice on how to use DocNode effectively from your experience would be useful, things like schema design or notes about how each operation works and when to prefer one kind of operation or structure over another.
All that said, I feel like the documentation has improved since the last time I looked, and I suspect a lot of the finer details come with community and experience.
Thanks! I've recently made some improvements to the documentation. I agree the synchronization section could be improved more. I'll keep your feedback in mind. If you'd like to try the library, feel free to ask me anything on Discord and I'll help you.
Comment was deleted :(
Agreed. In my limited experience, conflict resolution rules are very domain specific, whereas CTDTs encourage a lazy attitude that "if it's associative and commutative it must be correct".
What is OT?
Operational Transformation: https://en.wikipedia.org/wiki/Operational_transformation
Comment was deleted :(
What does OT stand for so I can learn more?
Operational Transformation
"CRDTs are a meme and are not for serious applications."
That is one hot take!
Let's balance the discussion a bit.
I remember reading Part 1 back in the day, and this is also an excellent article.
I’ve spent 3+ years fighting the same problems while building DocNode and DocSync, two libraries that do exactly what you describe.
DocSync is a client-server library that synchronizes documents of any type (Yjs, Loro, Automerge, DocNode) while guaranteeing that all clients apply operations in the same order. It’s a lot more than 40 lines because it handles many things beyond what’s described here. For example:
It’s local-first, which means you have to handle race conditions.
Multi-tab synchronization works via BroadcastChannel even offline, which is another source of race conditions that needs to be controlled.
DocNode is an alternative to Yjs, but with all the simplicity that comes from assuming a central server. No tombstones, no metadata, no vector clock diffing, supports move operations, etc.
I think you might find them interesting. Take a look at https://docukit.dev and let me know what you think.
Hello again Germán! Since the product we make is, basically, a local-first markdown file editor, I would humbly suggest that the less-well-known algorithm we recommend is thus also local-first. But, I fully believe that you do a ton of stuff that we don't, and if we had known about it at the time, we very definitely would have taken a close look! We did not set out to do this ourselves, it just kind of ended up that way.
Cool! We also build client-server sync for our local-first CMS: https://github.com/valbuild/val Just as your docsync, it has to both guarantee order and sync to multiple types of servers (your own computer for local dev, cloud service in prod). Base format is rfc 6902 json patches. Read the spec sheet and it is very similar :)
Looks really cool, I would love to use it in my DollarDeploy project. Documentation could be a bit better still, it is not clear, are content is pure markdown or it is typescript files? Which GitHub repo it synchronizes to? I prefer monorepo approach.
Awesome feedback! Will update the docs! The content is TS files. You can chose which repo GitHub you want to synchronize to - monorepo also works!
Should add: you can read more docs here: https://val.build/docs/create
Tiny fail at undo: insert 1 before E, Ctlr+Z, move left/right: left editor moves around E, right editor moves around the nonexistent 1
And for real "action" there should be a delay/pause button to simulate conflicts like the ones described in the blog
Yes, the undo issue is a known bug in the website demo because it's messing with Lexical's undo functionality. It's not actually a DocNode bug. I'll fix it soon.
The feedback about the delay/pause button is also good, thanks!
Back around 2000 or 2001, I got the idea for a collaborative editor that also would have had some UI fanciness in it. I abandoned it when I couldn't find a GUI toolkit that had an acceptable level of quality for that UI fanciness, without itself becoming a multi-year project. So I never even got to the point of playing with the actual collaborative editing aspects.
Having watched that space now for the last nearly 25 years... of all the projects I've abandoned over the years, that is the one that I am most grateful I gave up on. The gulf between "hey what if we could collaboratively edit live" and what it takes to actually implement it is one of the largest mismatches between intuition and reality I know of. I had no idea.
And let's not forget that the official paper on Yjs is just plain wrong, the "proofs" it contains are circular. They look nice, but they are wrong.
This was my impression as well. If you ignore the paper and just look at the source code - and carefully study Seph Gentle's Yjs-like RGA implementation [1] - I believe you find that it is equivalent to an RGA-style tree, but with a different rule for sorting insertions that have the same left origin. That rule is hard to describe, but with some effort one can prove that concurrent insertions commute; I'm hoping to include this in a paper someday.
Yes, I think it would be a good paper.
I made a tiny self contained implementation of this algorithm here if anyone is curious:
https://github.com/josephg/crdt-from-scratch/blob/master/crd...
FugueMax (or Yjs) fit in a few hundred lines of code. This approach also performs well (better than a tree based structure). And there's a laundry list of ways this code can be optimised if you want better performance.
If anyone is interested in how this code works, I programmed it live on camera in a couple hours:
https://www.youtube.com/watch?v=_lQ2Q4Kzi1I
This implementation approach comes from Yjs. The YATA (yjs) academic paper has several problems. But Yjs's actual implementation is very clever and I'm quite confident its correct.
Could you elaborate on that or share a source? It sounds like it'd be not just interesting but important to learn.
https://dl.acm.org/doi/epdf/10.1145/2957276.2957310
Try to understand 3.1-3.4 in this paper, and you'll find that the correctness proof doesn't prove anything.
In particular, when they define <_c, they do this in terms of rule1, rule2, and rule3, but these are defined in terms of <_c, so this is just a circular definition, and therefore actually not a definition at all, but just wishful thinking. They then prove that <_c is a total order, but that proof doesn't matter, because <_c does not exist with the given properties in the first place.
The biggest evidence against collaborative editing working and being useful is that programmers don't use it. We go through the pain of having git branches and manual merges.
We're the nerdiest bunch in the world, absolutely willing to learn and adapt the most arcane stuff if it gives us a real of percieved advantage, yet the fact that Google Docs style CRDTs have completely elided the profession speaks volumes about their actual usefulness.
> The biggest evidence against collaborative editing working and being useful is that programmers don't use it. We go through the pain of having git branches and manual merges.
Hmm -- this seems a bit apples and oranges to me: collaborative editing is sync; git branches, PRs, etc. are all async. This is by design! You want someone's eyes on a merge, that's the whole rationale behind PRs. Collab editing tries to make merges invisible.
Totally different use case, no?
Collaborative coding is a niche but possibly interesting use case. I’m thinking of notebook cells with reactive inputs and outputs. Actually not dissimilar to a spreadsheet in many ways.
Normal documents don't have broken builds when lines are incomplete. It's a completely different situation and makes sense why manually controlling it in chunks is better.
The biggest evidence for collaborative editing is the immense popularity of Google Docs, Notion and Figma.
Just because programming code isn't a good use case for automated conflict resolution doesn't mean everything else isn't.
Just imagine non-technical people using git to collaborate on a report, essay, or blog post. It's never going to happen.
> The biggest evidence against collaborative editing working and being useful is that programmers don't use it. We go through the pain of having git branches and manual merges.
But git branches are collaborative editing! They're just asynchronous collaborative editing.
It would be possible to build a git clone on top of CRDTs, which had the same merge conflict behaviour. The advantage of a system like that would be that you could use the same system for both kinds of collab. editing - realtime collab editing and offline / async collab editing. Its just, nobody has built that yet.
> the fact that Google Docs style CRDTs have completely elided the profession speaks volumes about their actual usefulness.
Software engineers still rely on POSIX files for local cross-app interoperability. Eg, I save a file in my text editor, then my compiler reads it back and emits another file, and I run that. IMO the real problem is that this form of IPC is kind of crappy. There's no good way to get a high fidelity change feed from a POSIX file. To really use CRDTs we'd need a different filesystem API. And we'd need to rewrite all our software to use it.
That isn't happening. So we're stuck with hacks like git, which have to detect and reconstruct all your editing changes using diffs every time you run it. This is why we don't have nice things.
> There's no good way to get a high fidelity change feed from a POSIX file.
Personally, my main point of frustration with git is the lack of a well-supported ast-based diff. Most text in programming are actually just representations of graphs. I’m sure there is a good reason why it hasn’t caught on, but I find that line-based diffs diverging from what could be a useful semantic diff is the main reason merge conflicts happen and the main reason why I need to stare hard at a pull request to figured out what actually changed.
>Google Docs style CRDTs
Google Docs is OT though.
Also note that our use case is much simpler. The programming language tells you whether your merge created a valid document.
I've never seen that information actually being used in any merge tool, with the notable exception of Visual Studio/C# (where you get symbol resolution for the merged doc, but even there the autogenerated result is a bit hit and miss)
I think the reason is that the algorithms want to be content-agnostic.
But it's of course weird — as a user — to see a conflict resolution tool confidently return something that's not even syntactically valid.
It's disingenuous to suggest that "Yjs will completely destroy and re-create the entire document on every single keystroke" and that this is "by design" of Yjs. This is a design limitation of the official y-Prosemirror bindings that are integrating two distinct (and complex) projects. The post is implying that this is a flaw in the core Yjs library and an issue with CRDTs as a whole. This is not the case.
It is very true that there are nuances you have to deal with when using CRDT toolkits like Yjs and Automerge - the merged state is "correct" as a structure, but may not match your scheme. You have to deal with that into your application (Prosemirror does this for you, if you want it, and can live with the invalid nodes being removed)
You can't have your cake and eat it with CRDTs, just as you can't with OT. Both come with compromises and complexities. Your job as a developer is to weigh them for the use case you are designing for.
One area in particular that I feel CRDTs may really shine is in agentic systems. The ability to fork+merge at will is incredibly important for async long running tasks. You can validate the state after an agent has worked, and then decide to merge to main or not. Long running forks are more complex to achieve with OT.
There is some good content in this post, but it's leaning a little too far towards drama creation for my tast.
You can split CRDT libs and compose them however you want, but most teams never get past the blessed bindings, because stitching two moving targets together by hand is miserable even if you know both codebases. Then you're chasing a perf cliff and weird state glitches every time one side revs.
In theory you can write better bindings yourself. In practice, if the official path falls over under normal editing, telling people to just do more integration work sounds a lot like moving the goalposts.
Author here, sorry if this was not clear: that specific point was not supposed to be an indictment of all CRDTs, it was supposed to be much more narrow. Specifically, the Yjs authors clearly state that they purposefully designed its interface to ProseMirror to delete and recreate the entire document on every collab keystroke, and the fact that it stayed open for 6 YEARS before they started to try to fix it, does in my opinion indicate a fundamental misunderstanding of what modern text editors need to behave well in any situation. Not even a collaborative one. Just any situation at all.
I think it's defensible to say that this point in particular is not indicting CRDTs in general because I do say the authors are trying to fix it, and then I link to the (unpublicized) first PR in that chain of work (which very few people know about!), and I specifically spend a whole paragraph saying I hope that I a forced to write an article in a year about how they figured it all out! If I was trying to be disingenuous, why do any of that?
> sorry if this was not clear
It's easy to make that mistake reading your post because of sentences like
> I want to convince you that all of these things (except true master-less p2p architecture) are easily doable without CRDTs
> But what if you’re using CRDTs? Well, all these problems are 100x harder, and none of these mitigations are available to you.
It sure sounds a lot like you're calling CRDTs in general needlessly complex, not just the yjs-prosemirror integration.
To be clear, we ARE arguing CRDTs needlessly complex for the centralized server use case. What I am describing in the "delete and replace all on every keystroke" problem is the point at which it became clear to me that the project did not understand what modern text editors need to perform well in any circumstance, let alone a collab one.
I think this is still reasonable to say because the final paragraph in that section is 100% about how they might fix the delete-all problem, and I hope they do, so that I can write about that, too. But also, that the rest of the article is going to be about how you have to swim upstream against their architecture to accomplish things that are either table stakes or trivial in other solutions.
> To be clear, we ARE arguing CRDTs needlessly complex for the centralized server use case.
I've been working in the OT / CRDT space for ~15 years or so at this point. I go back and forth on this. I don't think its as clear cut as you're making it out to be.
- I agree that OT based systems are simpler to program and usually simpler to reason about.
- Naive OT algorithms perform better "out of the box". CRDTs often need more optimisation work to achieve the same performance.
- But with some optimisation work, CRDTs perform better than OT based systems.
- CRDTs can be used in a client/server model or p2p. OT based systems generally only work well in a centralised context. Because of this, CRDTs let you scale your backend. OT (usually) requires server affinity. CRDT based systems are way more flexible. Personally I'd rather complex code and simpler networking than the other way around.
- Operation based CRDTs can do a lot more with timelines - eg, replaying time, rebasing, merging, conflicts, branches, etc. OT is much more limited. As a result, CRDT based systems can be used for both realtime editing or for offline asyncronous editing. OT only really works for online (realtime) editing.
(For anyone who's read the papers, I'm conflating OT == the old Jupitor based OT algorithm that's popular in google docs and others.)
CRDTs are more complex but more capable. They can be used everywhere, and they can do everything OT based systems can do - at a cost of more code.
You can also combine them. Use a CRDT between servers and use OT client-to-server. I made a prototype of this. It works great. But given you can make even the most complex text based CRDT in a few hundred lines anyway[1], I don't think there's any point.
> But with some optimisation work, CRDTs perform better than OT based systems.
I read your paper and I think this is a mistake. You assume that OT has quadratic complexity because you're considering classic operation-based OT. But OT can be id-based, in which case operations are transformed directly on the document, not on other operations. This is essentially CRDT without the problems of supporting P2P, and therefore the best CRDT will never perform better than the best OT.
> CRDTs let you scale your backend. OT (usually) requires server affinity. CRDT based systems are way more flexible. Personally I'd rather complex code and simpler networking than the other way around.
All productivity apps that use these tools in any way shard by workspace or user, so OT can scale very well.
If you don't scale CRDT that way, by the way, you'd be relying too much on "eventual consistency" instead of "consistency as quickly as possible."
> (For anyone who's read the papers, I'm conflating OT == the old Jupiter-based OT algorithm that's popular in Google Docs and others.)
Similar to what I said before. I think limiting OT to an implementation that’s over three decades old doesn’t do OT justice.
> I think limiting OT to an implementation that’s over three decades old doesn’t do OT justice.
I haven't kept up with the OT literature after a string of papers turned out to "prove correctness" in systems which later turned out to have bugs. And so many of these algorithms have abysmally bad performance. I think I implemented an O(n^4) algorithm once to see if it was correct, but it was so slow that I couldn't even fuzz test it properly.
> You assume that OT has quadratic complexity because you're considering classic operation-based OT. But OT can be id-based, in which case operations are transformed directly on the document, not on other operations.
If you go down that road, we can make systems which are both OT and CRDT based at the same time. Arguably my eg-walker algorithm is exactly this. In eg-walker, we transform operations just like you say - using an in memory document model. And we get most of the benefits of OT - including being able to separately store unadorned document snapshots and historical operation logs.
Eg-walker is only a CRDT in the sense that it uses a grow-only CRDT of operations, shared between peers, to get the full set of operations. The real work is an OT system, that gets run on each peer to materialise the actual document.
> This is essentially CRDT without the problems of supporting P2P, and therefore the best CRDT will never perform better than the best OT.
Citation needed. I've published plenty of benchmarks over the years from real experiments. If you think I'm wrong, do the work and show data.
My contention is that the parts of a CRDT which make them correct in P2P settings don't cost performance. What actually matters for performance is using the right data structures and algorithms.
> Citation needed
It seems to me the burden of proof is on you. You were the one who claimed that “CRDTs perform better than OT-based systems.” I’m simply denying it. My reasoning is that CRDTs require idempotence and commutativity, while OTs do not. What requirement does OT have that CRDT does not? Because if there isn’t one, then by definition your claim can’t be correct. And if there is one, that would be new to me, although I suspect you might be using a very particular definition of OT.
> the fact that it stayed open for 6 YEARS before they started to try to fix it...
This is all opensource software, provided for free by volunteers. If you want better bindings, go write them. Or pay someone else to do so.
Comment was deleted :(
Fantastic article. I was particularly interested because WordPress has been working to add collaborative editing and the implementation is based on yjs. I hope that won't end up being an issue...
It would have been nice if the article compared yjs with automerge and others. Jsonjoy, in particular, appears very impressive. https://jsonjoy.com/
The transport for collaborative editing in Wordpress 7.0 is HTTP polling. Once per second, even if no one else is editing. It jumps to 4 requests/sec if just two people are editing. And it's enabled by default on all sites, though that might not be the case when it leaves beta.
The transport is a completely different concern... (though there's also a websocket implementation).
They use Yjs: https://make.wordpress.org/core/2026/03/10/real-time-collabo...
The PowerSync folks and I worked on a different approach to ProseMirror collaboration here: https://www.powersync.com/blog/collaborative-text-editing-ov... It is neither CRDT nor OT, but does use per-character IDs (like CRDTs) and an authoritative server order of changes (like OT).
The current implementation does suffer from the same issue noted for the Yjs-ProseMirror binding: collaborative changes cause the entire document to be replaced, which messes with some ProseMirror plugins. Specifically, when the client receives a remote change, it rolls back to the previous server state (without any pending local updates), applies the incoming change, and then re-applies its pending local updates; instead of sending a minimal representation of this overall change to ProseMirror, we merely calculate the final state and replace with that.
This is not an inherent limitation of the collaboration algorithm, just an implementation shortcut (as with the Yjs binding). It could be solved by diffing ProseMirror states to find the minimal representation of the overall change, or perhaps by using ProseMirror's built-in undo/redo features to "map" the remote change through the rollback & re-apply steps.
Hi Matt! Good to see you here. For those who don't know, Matt also wrote a blog about how to do ProseMirror sync without CRDTs or OT here: https://mattweidner.com/2025/05/21/text-without-crdts.html and I will say I mostly cosign everything here. Our solution is not 100% overlap with theirs, but if it had existed when we started we might not have gone down this road at all.
Your part 1 post was one of the inspirations for that :)
Specifically, it inspired the question: how can one let programmers customize the way edits are processed, to avoid e.g. the "colour" -> "u" anomaly*, without violating CRDT/OTs' strict algebraic requirements? To which the answer is: find a way to get rid of those requirements.
*This is not just common behavior, but also features in a formal specification [1] of how collaborative text-editing algorithms should behave! "[The current text] contains exactly the [characters] that have been inserted, but not deleted."
[1] http://www.cs.ox.ac.uk/people/hongseok.yang/paper/podc16-ful...
Great article - you mentioned "two most popular families of collab editing [...] OT and CRDT". One thing you should look into is the work of https://braid.org - Its combining crdt with ot. Work that is inspired by that build the basis of loro which allows to prune history (helping with the tombstone issue you mentioned)
YJS works perfectly. I use it for years on PlayCode. But you are talking about the specific plugin for prosemiror.
Yes, here I agree: yjs core is well written, while plugins are “nice to have”.
Alternatively, a much simpler CRDT solution is to flatten our tree and build a LWW underneath it. This makes it easy to debug, save, and delete the history. { “id:1”: { “parent_id”: “root”, “type”: “p” }, “id:2”: { “parent_id”: “id:1”, “type”: “text”, “content”: “text”, "position": 1 } }
Or internally: [ [HLC, “id:2”, ‘parent_id’, “id:1”], [HLC, “id:2”, ‘type’, “text”], ... ]
Merging is easy, and allows for atomic modifications without rebuilding the entire tree, as well as easy conflict resolution. We add the HLC (clock, peer id). If the time difference between the two clocks is significant, we create a new field [HLC, id, “conflict:” + key, old_value]
Couldn't agree more with the gist of the argument, especially in the context of ProseMirror.
That's why I created prosemirror-collab-commit.
Hi folks, author here. I thought this was dead! I'm here to answer questions if you have them.
EDIT: I live in Seattle and it is 12:34, so I must go to bed soon. But I will wake up and respond to comments first thing in the morning!
Hi Alex, I'm the author of prosemirror-collab. I agree with your point that CRDTs are not the solution they often claim to be, and that CRDT editor integrations (at least the ones for Yjs and Automerge) are often shockingly sloppy.
But, seeing how I've had several people who read your article write me asking about this miraculous collab implementation, I want to push back on the framing that ProseMirror's algorithm is 'simple' or '40 lines of code'. The whole document and change model in ProseMirror was designed to make something like prosemirror-collab possible, so there is a lot of complexity there. And it has a bunch of caveats itself—the server needs to hold on to changes as long as there may be clients to need to catch up, and if you have a huge amount of concurrent edits, the quadratic complexity of the step mapping can become costly, for example. It was designed to support rich text, document schemas, and at least a little bit of keeping intentions in mind when merging (it handles the example in the first post of your series better, for example), but it's not a silver bullet, and I'd hate for people to read this and go from thinking 'CRDT will solve my problems' to 'oh I need to switch to ProseMirror to solve my problems'.
Just wanted to say thanks! This is a great write up and resonates with issues I encountered when trying to productionise a yjs backed feature.
I think Y.js 14 and the new y-prosemirror binding fix a lot of the encountered issues
It might fix the replace-everything bug. It definitely does not fix any of the other issues I mentioned. Even just taking the permissions problem: Yjs is built for a truly p2p topology you as a baseline will have a very hard time establishign which peers are and aren't allowed to make which edits. You can adopt a central server, but then the machinery that makes Yjs amenable to p2p is uselessly complicated. And if you cross that bridge, you'll still have to figure out how to let some clients do mark-only edits to the document for things like comments, while other can edit the whole text. That can be done but it's not at all straightforward, because position mapping is very complicated in the Yjs world.
Comment was deleted :(
It should be noted that this is about text editing specifically, and for other use-cases YJS is using other code pathways/algorithms, but you have to be careful how you design your data structure for atomic updates.
I'm curious how these approaches compare with MRDTs implemented in Irmin
Collaborative editing looks deceptively simple until you deal with real-world concurrency and network issues. Operational transforms and CRDTs both introduce their own tradeoffs.
I read both parts. Well written, I agree with a lot of stuff.
I am long-time CKEditor dev, I was responsible for implementing real-time collaboration in the editor and the OT implementation.
Regarding the first part of your article. Guess what - CKEditor would output "" :). And even better, if the user who deleted all does undo, you'd get "u" where it was typed originally.
However, I fully agree, that for every algorithm, you will be able to find a scenario where it fails to resolve conflict in a way expected by the user. But we cannot ask user to resolve a conflict manually every time it happens.
Offline editing, as you correctly observed, is more difficult, because the conflicts pile up, and multiple wrong decisions can result in a horrifying final result. I fully agree, that this is not only an algorithmic problem but also a UX problem. Add to this, that in many apps, you will also have other (meta)data that has to be synced too (besides document data).
CKEditor is, in theory, ready for offline editing. From algorithm POV, offline is no different than very very very slow connection (*). In the end, you receive a set of operations to transform against other set of operations. However, currently we put the editor in read-only state when the connection breaks. We are aware, that even if all transformations resolve as expected, then the end result may still be "weird". And even if the end result is actually as expected, the amount of changes may be overwhelming to a person who just got the connection back, so it still may be good to provide some UI/UX to help them understand what happened.
(*) - that is, unless the editing session on the server ended already, and, simply saying, you don't have anything to connect to (to pull operations from).
Regarding OT. I have a feeling that one mistake most people make, is that they take OT as it is described in some papers or article, and don't want to iterate over this idea. To me, this is not just one algorithm, rather an idea of how to think about and mange changes happening to the data.
For CKEditor, from the very beginning, we were forced to innovate over typical OT implementations. First of all we focused on users intentions. Second of all, we needed to adapt it to tree data structure. These challenges shaped my way of thinking - OT is "an idea", you need to adapt it to your project. Someone here asked if there's library for OT, because they want to use it for spreadsheets. I'll say -- write it on your own and adapt it to spreadsheets. You'll discover that maybe you don't need some operations, or maybe you need new operations dedicated for spreadsheets. This is what we ended up doing. @Reinmar already posted this link here, but we describe our approach here: https://ckeditor.com/blog/lessons-learned-from-creating-a-ri....
Circling back to your example with typing and removing whole sentence. This is how you innovate over OT. To us, such deletion is not deleting N singular characters starting from position P. The intention is to remove some continuous range of text. If someone writes inside the range, it just changes the boundary of stuff to remove, but surely we don't want to show some random letters after the deletion happens. We account for that and make modifications in our OT implementation.
Similarly with positions in document. In CKEditor, you can use LivePositions and LiveRanges, which are basically paths in tree data structure. Every position is transformed by operation too. Many features we have base on that.
So, my take here is -- don't bash OT because you based your experience on some simple implementations. Possibly the same is with Yjs. Don't bash CRDTs because Yjs is doing something badly?
And some final words regarding the second part.
We also follow the same pattern as your diagram shows in "How the simple thing works" section. As I was reading through the article, and looking at provided examples, it's hard for me not to think, that what's happening is some kind of an OT-variant, maybe simplified, or maybe adapted to some specific cases. But there are strong similarities between what you described and CKEditor 5, and we use OT. Like, looking at this from top-level view, I could say, "well, we do the same". We have the same loop with conflict resolution, we just call "rebase" a "transformation", and instead "steps" we have "operations".
Also, you say it is 40LOCs, but how much magic happens in `step.apply()`? How much the architecture was made to make it possible? Even Marijn makes this comment here: https://news.ycombinator.com/item?id=47409647.
For comparison, this is CKEditor's file that includes the OT functions to transform operations: https://github.com/ckeditor/ckeditor5/blob/master/packages/c.... It's 2600LOCs (!), but at least most of it are comments :). Again, the basic idea for OT is very simple (and this implementation could be simpler, we also learned a lot in the process). It's up to you how much you want to delve into solving "user intention" issues.
> Also, you say it is 40LOCs, but how much magic happens in `step.apply()`?
Right, but if you are already using ProseMirror that infrastructure is in place if you are taking advantage of it directly or bolting Yjs on top..
we're about to implement collaborative editing at Mintlify and were considering yjs so this couldn't have come at a better time
Author here, my personal mission is for people implementing this to have clear, actionable advice. Which is something we did not when we started. If you want to chat about it I'm happy to help, just email me: clemmer.alexander@gmail.com
Replacing CRDT with 40 lines of code. Amazing.
It appears Moment is producing "high-performance, collaborative, truly-offline-capable, fully-programmable document editor" - https://www.moment.dev/blog
There seems to be a conflict of interest with describing Yjs's performance, which basically does the same thing along with Automerge.
Author here. To be clear, we do not in ANY WAY compete with Yjs! We are a potential customer of Yjs. This article explains why we chose not to be a customer of Yjs, and why we don't think most people building real-time collaborative text editors should be, either.
You have an amazing tagline. This is the first time I read a tagline and thought: this is exactly what I was looking for.
But the product seems much more narrow than an actual tool run the whole business in markdown. I was hoping to see Logseq on steroids, and it feels like a tool builder primarily. I love the tool building aspect, but the fundamentals of simply organizing docs (docs, presentations, assets etc, the basics of a business) are either not part of the core offering or not presented well at all.
I love the idea of building custom tools on top of MD and it's part of my wishlist, but I feel little deceived by your tagline so I wanted to share that :)
This is great feedback, thank you. I will say that IS our goal... but we only really launched last week and are still figuring out what resonates with people and what they really want! It sounds like you're saying that the organization aspects are not there, which is very helpful to know... I am not quite sure I understand if you also think the toolbuilding is lacking?
If you are open to it, I'd love the opportunity to hear more. Here or email (alex@moment.dev) or our Discord (bottom right of our website) or Twitter/X... or whatever you prefer.
No, the tool building looks very sophisticated and powerful and I love that it hinges very much on the new era of building your own custom tools with the help of agents. The live collaboration on top of md files is also exactly what I was looking for!
If you're saying that Logseq on steroids is what you're aiming for, then, my immediate feedback would be to emphasize more: - the writing experience: at the end of the day, writing and taking notes will be the most common activity - the file organisation: tags, templates, media files, does it do the basics? - the sharing and access mechanism: can I easy share a doc with a partner / client?
Those are the basics of daily business tasks for my consultancy, and so the first thing I'm looking for. I really wish to get off Google drive, but those points need to be solved for that to sound feasible.
As for the tool building it looks very powerful, but the first example you presented (on-call dashboard), was a bit too much from the get go to wrap my head around the building blocks of your system. I've been building custom tools/wrappers of varied complexity on top of markdown for my team, from a custom revealJS skill that follows our design guide, to a form builder to a project/client DB that wraps duckdb (for yaml frontmatter parsing) with a semantic layer. I've watched your intro video but I'm still not sure whether your service would help me more closely integrate those tools to my company's knowledge base or not.
But once again, if your vision matches your tagline, then I'm really looking forward to hear more from you
That doesn't make sense. If you are a customer that implies you pay for it, so people can be users of Yjs which is free and open-source, but not customers.
The logic that makes sense is you are using your own framing (Moment.dev will later be paid and people will be customers) to interpret Yjs.
Moreover, the 'social proof' posted by the following later on by 'auggierose' and 'skeptrune': - https://news.ycombinator.com/item?id=47396154 - https://news.ycombinator.com/item?id=47396139
Appears, to me, to be manufactured. The degree of consolidation in this 'SF/Bay Area tech cult' which I've noticed, although I am unsure if others are aware, that tries to help other members at the expense of quality, growing network wealth through favoritism rather than adherence to quality, is counterpoint to users whose interest is high quality software without capture.
While you may not like me describing this, it is not in your own interest to do this because it catabolizes the base layer that would sustain you. Social media catabolizes actual social networks, as AI catabolizes those who write information online. Behavior like this ruins the public commons over time.
I'm not sure I fully understand, but to be clear, we actually do voluntarily pay for the Free and OSS software we use. For example, we support `react-prosemirror` directly with monetary compensation. And if we used Yjs, we would have paid for that too. So in that sense, I do think of us as customers!
It's hard to tell, but I think you also might be saying that criticizing the FOSS foundations of our product actually hurts the ecosystem. I actually am very open to that, and it's why we took so much time writing it since part 1 came out. But the Yjs-alternative technology we use is all also F/OSS, and we also do directly support it, with actual money from our actual bank account. All I'm recommending here is that others do the same. Sorry if that was not clear.
The rest of your reply, I'm not sure I grok. I think you might be suggesting that we are sock-puppeting `auggierose` or `skeptrune`, and that we are part of some (as you put it) "cult" of the Bay area! Let me be clear that neither of these things true. I don't know anyone at Mintlify personally, and in any event we are from Seattle not the Bay!
No, you're not sock-puppeting it yourself. But you all are probably friends and cross-promoting. It's a common business strategy these days, but to some underhanded seeming compared to straightforward ways.
Anyhow, we just have different norms of being. I still stand by my above statements and observations, which you reject but has plausible deniability, so we'll just leave it as is.
Reminds me a bit of google-mobwrite. I wonder why that fell out of favour.
I just read part 1 as well as part 2, for me it raises an interesting question that wasn't addressed. I correctly guessed the question posed about the result of the conflict, and while it's true that's not the end result I'd probably want, it's also important because it gives me visibility of the other user's change. Both users know exactly what the other did - one deleted everything, the other added a u. If you end up with an empty document, the deleting user doesn't know about the spelling correction that may need to be re-applied elsewhere. Perhaps they just cut and pasted that section elsewhere in the document.
But there's another issue that the author hasn't even considered, and possibly it's the root cause why the prosemirrror (which I'd never heard of before btw) does the thing the author thinks is broken... Say you have a document like "请来 means 'please go'" and independently both the Chinese and English collaborators look at that and realise it's wrong. One changes it to "请走 means 'please go'" and the other changes it to "请来 means 'please come'". Those changes are in different spans, and so a merge would blindly accept both resulting in "请走 means 'please come'" which is entirely different from the original, but just as incorrect. Depending on how much other interaction the authors have, this could end up in a back and forth of both repeatedly changing it so the merged document always ended up incorrect, even though individually both authors had made valid corrections.
That example seems a bit hypothetical, but I've experienced the same thing in software development where two BAs had created slightly incompatible documents stating how some functionality should work. One QA guy kept raising bugs saying "the spec says it should do X", the dev would check the cited spec and change the code to match the spec. Weeks later, a different QA guy with a different spec would raise a bug saying "why is this doing X? The spec says it should do Y", a different dev read the cited spec, and changed the code. In this case, the functionality flip-flopped about 10 times over the course of a year and it was only a random conversation one day where one of them complained about a bug they'd fixed many times and the other guy said "hey, that bug sounds familiar" and they realised they were the two who'd been changing the code back and forth.
This whole topic is interesting to me, because I'm essentially solving the same problem in a different context. I've used CRDT so far, but only for somewhat limited state where conflicts can be resolved. I'm now moving to a note-editing section of the app, and while there is only one primary author, their state might be on multiple devices and because offline is important to me, they might not always be in sync. I think I'm probably going to end up highlighting conflicts, I'm not sure. I might end up just re-implementing something akin to Quill's system of inserts / deletes.
I see someone has downvoted my actually relevant post. Not sure why, but anyway.
I also tried out the behaviour of their example. Slowing the sync time down to 3 seconds, and then typing "Why not" and then waiting for it to sync before adding " do this?" on client A and " joke?" on client B. The result was "Why not do this? joke?" when I'd have hoped that this would have been flagged as a conflict. Similarly, starting with "Why not?" and adding both " do this" and " joke" in the different clients produced "Why not do this joke?" even though to me, that should have been a conflict - both were inserting different content between "t" and "?".
Finally, changing "do" to "say" in client A and THEN changing "do" to "read" in client B before it updated, actually resulted in a conflict in the log window and the resultant merge was "Why not rayead this joke?" Clearly this merge strategy isn't that great here, as it doesn't seem to be renumbering the version numbers based on the losing side (or I've misunderstood what they're actually doing).
Component library page in the docs gives 404
From the "40 line CRDT replacement":
const result = step.apply(this.doc);
if (result.failed) return false;
I suspect this doesn't work.Author here. I'll actually defend this. Most of the subtlety of this part is actually in document schema version mismatches, and you'd handle that at client connect, generally, since we want the server to dictate the schema version you're using.
In general, the client implementation of collab is pretty simple. Nearly all of the subtlety lies in the server. But it, too, is generally not a lot of code, see for example the author's implementation: https://github.com/ProseMirror/website/tree/master/src/colla...
(Xpost from my lobsters comment since the Author's active over here):
I really disagree with this article - despite protestation, I feel like their issue is with Yjs, not CRDTs in general.
Namely, their proposed solution:
1. For each document, there is a single authority that holds the source of truth: the document, applied steps, and the current version.
2. A client submits some transactional steps and the lastSeenVersion.
3. If the lastSeenVersion does not match the server’s version, the client must fetch recent changes(lastSeenVersion), rebase its own changes on top, and re-submit.
(3a) If the extra round-trip for rebasing changes is not good enough for you, prosemirror-collab-commit does pretty much the same thing, but it rebases the changes on the authority itself.
This is 80% to a CRDT all by itself! Step 3 there, "rebase its own changes on top" is doing a lot of work and is essentially the core merge function of a CRDT. Also, the steps needed to get the rest of the way to a full CRDT is the solution to their logging woes: tracking every change and its causal history, which is exactly what is needed to exactly re-run any failing trace and debug it.Here's a modified version of the steps of their proposed solution:
1. For each document, every participating member holds the document, applied steps, and the current version.
2. A client submits (to the "server" or p2p) some transactional steps and the lastSeenVersion.
3. If the lastSeenVersion does not match the "server"/peer’s version, the client must fetch recent changes(lastSeenVersion). The server still accepts the changes. Both the client and the "server" rebase the changes of one on top of the other. Which one gets rebased on top of the other can be determined by change depth, author id, real-world timestamp, "server" timestamp, whatever. If it's by server timestamp, you get the exact behavior from the article's solution.
If you store the casual history of each change, you can also replay the history of the document and how every client sees the document change, exactly as it happened. This is the perfect debugging tool!CRDTs can store this casual history very efficiently using run-length encoding: diamond-types has done really good work here, with an explanation of their internals here: https://github.com/josephg/diamond-types/blob/master/INTERNA...
In conclusion, the article seems to be really down on CRDTs in general, whereas I would argue that they're really down on Yjs and have written 80+% of a CRDT without meaning to, and would be happier if they finished to 100%. You can still have the exact behavior they have now by using server timestamps when available and falling back to local timestamps that always sort after server timestamps when offline. A 100% casual-history CRDT would also give them much better debugging, since they could replay whatever view of history they want over and over. The only downside is extra storage, which I think diamond-types has shown can be very reasonable.
I know it seems that way, but it's actually not 80% of the way to a CRDT because rich text CRDTs are an open research problem. Yjs instead models the document as an XML tree and then attempts to recreate the underlying rich text transaction. This is much, much harder than it looks, and it's inherently lossy, and this fundamental impedance mismatch is one of the core complaints of this article. Some progress is being made on rich text CRDTS, e.g., Peritext[1]. But that only happened a few years ago.
Another important thing is that CRDTs by themselves cannot give you a causal ordering (by which I mean this[2]), because definitionally causal ordering requires a central authority. Funnily enough, the `prosemirror-collab` and `prosemirror-collab-commit` do give you this, because they depend on an authority with a monotonically increasing clock. They also also are MUCH better at representing the user intent, because they express the entirety of the rich text transaction model. This is very emphatically NOT the case with CRDTs, which have to pipe your transaction model through something vastly weaker and less expressive (XML transforms), and force you to completely reconstruct the `Transaction` from scratch.
Lastly for the algorithm you propose... that is, sort of what `prosemirror-collab-commit` is doing.
[1]: https://www.inkandswitch.com/peritext/
[2]: https://www.scattered-thoughts.net/writing/causal-ordering/
The actual point of the post: Y.js is slow and buggy.
Sorry that I am too stupid to understand what Moment is.
It is a collaborative markdown file that also renders very fast. So far so good.
And then... it somehow adds Javascript? And React? And somehow AI is involved? I truly don't understand what it is, and I am (I think) the end customer...
edit: I tried it and I just get "Loading..." forever. So, anyway, next time.
Hey karel-3d, I'm one of the engineers working on Moment and would love to help figure out the issue you're running into. Would you mind reaching out via our Discord or email (trey@moment.dev)?
I would like to know if you plan to open source anything, and how much. https://github.com/orgs/moment-eng/ looks a bit empty
unfortunately I cannot reproduce the "Loading..." issue. Now everything works. (I still don't fully understand Moment. But reading Agents.md ironically helps me understand it a bit.)
OK I will be happy to help. I didn't mean to be dismissive! Will ping you tomorrow
[dead]
[dead]
[dead]
[dead]
Very likely AI slop, very hard to read. Too many indications. HN should have another rule: explicitly mention if article was written (primarily) by AI.
I'm the author. Literally 0% of this was written with AI. Not an outline, not the arguments, not a single word in any paragraph. We agonized over every aspect of this article: the wording, the structure, and in particular, about whether we were being fair to Yjs. We moved the second and third section around constantly. About a dozen people reviewed it and gave feedback.
EDIT: I will say I'm not against AI writing tools or anything like that. But, for better or worse, that's just not what happened here.
Apologies. Was it at all edited by an AI?
It doesn’t strike me as AI. The writing is reasonably information-dense and specific, logically coherent, a bit emotional. Rarely overconfident or vague. If it is AI then there was a lot more human effort put into refining it than most AI writing I’ve read.
Funnily enough I had 2 HN tabs open, this one and https://news.ycombinator.com/item?id=47394004
Crafted by Rajat
Source Code