How and why we built our expense tracker with CRDTs

I started building Tender a year ago to explore an alternative path for building a personal finance app - one that prioritizes privacy and runs locally on the user's device. There are countless personal finance tools out there, but by-and-large they operate in the cloud, exposing user data to a broader surface to secure. I believe that a local-first app not only strengthens our privacy posture, but also makes for a better user experience.

Tender uses the Automerge CRDT to store user data - a (conflict-free, replicated) data-type that supports making edits across multiple devices in a way that can be merged. CRDTs are common in collaborative apps like Figma or Apple Notes, where you might have several users working on the same document, making edits that need to be combined.

Although Tender doesn't (yet!) utilize the multiplayer capabilities that CRDTs provide, Automerge gives us the foundations to build a secure, local-first app. I wanted to share some of the reasoning behind Tender's CRDT-based architecture, as well as some lessons learned around building local-first.

Thinking in local-first

Automerge lets us store user data as opaquely as possible. Instead of every piece of functionality in the app touching a server-side database full of user data, we simply treat all user data as opaque binary files, not a complex dataset that needs to be carefully managed and worked with. The cloud becomes our backup solution, not where all of our application logic lives.

This architecture has greatly simplified how we build the product. When we want to build new features, we don't have to deal with the boilerplate of writing out a database layer, then a REST or GraphQL API, then finally the interface on top of it. Everything is locally available, already in the browser where we keep the business logic and application UI side-by-side.

Interacting with data

Since data is mostly handled on the user's device, our backend takes on fewer responsibilities. For instance, when users sync their data from our data partners like Plaid, our server simply acts as an intermediary proxy to plaid on behalf of the user. The backend isn't handling any user data on its own when a user's device isn't requesting it.

As a plus, a simple backend system means that our backend costs are quite low. We run two small virtual machines for all of our traffic, and the second one is just there to have a hot standby.

Working in local-first also naturally sets us up to build end-to-end encryption (e2ee) in the future, so that servers would only hold encrypted data that can't be read, even if an attacker broke in. In contrast, grafting e2ee might be difficult or impossible to add after-the-fact to a cloud-based app.

Skip the internet connection

Having user data on-device eliminates the threat of internet latency. In traditional cloud apps interactions might take multiple round trips to a server to fetch, process, and refetch data. In Tender, changes happen virtually instantly since all of the necessary data is already available and ready to serve.

Not only does this architecture help with times when the user might have a spotty connection, but it means we can trivially support offline mode. Changes are persisted locally, then synced back to the cloud when the device comes back online.

Indexing on the bleeding edge

Local-first architectures are still relatively nascent, and we've had to spend a few of our innovation tokens to get here.

For one, the Automerge CRDT is structured as a JSON document. As a personal finance tool, Tender has to be able to run queries against its data in ways that don't quite mesh with such a structure. For instance, how would we run a full-text search to match descriptions? Or calculate how much you spent on sparkling water this month? There isn't an off-the-shelf way to do these things.

In Tender, we built an indexing system that takes data out of our CRDT and builds indices in a sqlite database in the browser. Essentially, we take records like:

{
  "description": "Uber Eats",
  "amount": 1950,
  "date": 1712029191841,
  // ...
}

and turn them into something that can be queried with sql:

sql

select sum(amount) from transactions
  where date > datetime('now', '-1 month')
  order by date;

Supporting both reading from the document structure and querying with SQLite gives us a lot flexibility when developing new features.

We're getting some of this work ready to open source (please send me a note if you're interested!) in case other folks might find it useful as well.

Proxying connections

Earlier, I mentioned that for operations involving third party providers like Plaid and Splitwise, we treat the server as a simple proxy for operations kicked off by the client-side application.

We want as little of our user's data as possible, so ideally we want Tender to talk to these providers directly, cutting us out as a middle man. However, practically we can't build a system to work this way - we still need to be able to check access control, deal with CORS, rate limit clients, and at the very least, provide our app-level access tokens in the outgoing requests.

So we do the next best thing - we use the node-based API bindings intended to run on a server, but in the browser instead. We intercept the requests made by those libraries at the browser-level and reroute them through our server. This scheme lets us do everything we need for access control, etc. before making the request.

As an aside, here's an idea for how API providers can make local-first apps more private: providers should support encrypting user data with public/private keys that only the end user's device holds. Our backend could facilitate getting data to users' devices without actually having access to it. Unfortunately, the market is likely too small for anyone to undertake such an architecture.

Scaling the CRDT

We expect a user's data to grow as the user spends more time with Tender. Over time, the data grows larger and larger to sizes that start to get unwieldy in the browser, especially since Automerge's history-based CRDT keeps the history of edits over all time in the document.

Right now, Tender uses a single monolithic document to store its data, but we'll likely need to look into splitting this document up to scale to years and years of transaction data.

Currently, Tender also does most of its data manipulation in the main JS thread. This has worked perfectly fine for a while just because of how fast the local data structures really are, but we'll need to background more work as the application grows in size.

The local-first future

Automerge has provided a strong foundation for us to build a private and secure application that has all of the benefits of the modern web with fewer of the drawbacks. We're also thinking up more ways we can take advantage of the CRDT (mulitplayer expenses, anyone?) going forward. I'm really excited that we get to be part of the local-first movement.

Discuss on HN.

Thinking in local-first ​

Interacting with data ​

Skip the internet connection ​

Indexing on the bleeding edge ​

Proxying connections ​

Scaling the CRDT ​