Mapeo: Lessons from building a decentralized app

By 
Gregor MacLennan
July 13, 2023

Mapeo Blog Series: a look into the process of co-developing innovative offline digital tools for social justice.

The series will highlight what’s next for Mapeo, its exciting new features, our learnings and challenges in co-designing a tool that responds to earth defender’s mapping needs in diverse territories.

Mapeo is an easy-to-learn tool that was built for remote communities around the world to map and document the world around them. It is built on a radical new approach to technology, called "decentralized" or "peer-to-peer", that allows users to collaborate to collect data without needing a server or any internet connection. We developed Mapeo over 8 years through a co-design process with local partners, and have learned a huge amount about the challenges and opportunities of peer-to-peer technology along the way. This post shares some technical details about these challenges and how the solutions are guiding our work on "Mapeo Next".

This blog post is part of a series about "Mapeo Next", the major upgrade to the Mapeo Platform coming in the late fall:

  1. Mapeo: The next chapter
  2. Mapeo Co-Design: A Powerful Learning Journey
  3. Mapeo: Lessons from building a decentralized app (this post)
  4. How peer-to-peer works and why it is important
  5. Better security through projects and invites
  6. Under-the-hood: controlling access in decentralized system
  7. The new Mapeo: features and how to get started.

Opportunities

Mapeo does not use a centralized server to store data. There is no server or database to maintain, and no need to host that server on the internet or in an office. All the data collected with Mapeo is stored in an embedded database on the device itself, and it is synchronized across all devices that are part of a Mapeo project. This means that not only does Mapeo work offline for collecting data, but a group of people can collaborate to collect data and share it amongst themselves without needing the internet or a server. All the data remains on their devices, and is owned and controlled by them. If a device fails or is lost, every other device acts as a backup, and the device can easily be replaced and synchronized with any other device in the project to restore all the data.

We call this "peer-to-peer". It makes Mapeo more resilient to unreliable internet, censorship or blocking of internet, and it keeps data in the hands of users without needing to trust their data to servers around the world, and it guarantees that data will always remain accessible to them: there are no subscription or hosting fees because all the data is local to their device. This doesn't mean that peer-to-peer apps like Mapeo need to remain isolated offline though. In fact they can make more efficient use of the internet when it is available: two devices can make a direct connection to each other without having to make a long round-trip via a central server that is often on the other side of the world. Since no server is involved, the data can be encrypted "end-to-end", e.g. only the devices can see the data being shared and it is impossible for any intermediary to read it. It also opens the possibility for fully encrypted online backups: Mapeo data could be replicated to a server on the internet, but the data would not be accessible even to someone with physical access to the server hard drives and all the server keys.

Encrypted synchronization with another device over the internet, and end-to-end encrypted backup to a remote server, are two key features that will be enabled by the new platform powering "Mapeo Next".

Challenges

Many of the challenges we faced building Mapeo stem from the lack of precedents for how to build a fully decentralized peer-to-peer app and the user experience for synchronizing data. We built the database that powers Mapeo from scratch on limited resources and time, and it serves users needs well, but because it has been patched and adapted in response to users needs over time, it has become increasingly difficult and time-consuming to add new features. Also, some bugs have proven hard to fix due to design decisions made years ago.

"Mapeo Next" will include a new database that addresses some key challenges:

  • Improved reliability of synchronization between devices
  • Easy-to-create, secure, encrypted projects
  • Control and better visibility about who is part of a project
  • Synchronization over the internet (Mapeo will continue to support offline in-person synchronization)
  • Intelligently managing device storage
  • New features like GPS tracks and audio recordings

Improved reliability of synchronization between devices

The Mapeo database is built on something we call "immutable logs". "Immutable" means that nothing is ever deleted, and any attempt to change the log can be detected through validating "signatures" for each entry. Every time a user adds a data point, edits, or deletes it, it is written as an entry to their log, e.g. a log might be:

  1. Added point A
  2. Added point B
  3. Edited point A
  4. Deleted point B

In reality, each of these log entries would contain the full details about each point or what has changed. We use an immutable log called Hypercore, which manages the signing of each entry and validating the signature. Hypercore also manages synchronizing these logs between devices, efficiently sharing the log messages between a group of connected users so that they all end up with replicated copies of the logs. Hypercore was very new and largely experimental when we started using it, and has since matured and improved. We have been still using an older version in Mapeo, which is the source of some bugs. 

"Observations" (data points recorded in Mapeo) are stored in these logs, but we have stored and synchronized photos separately. This helped us get something working quickly, but we have found our photo sync to be inefficient: large photos can "block" the connection between devices, slowing down the sync of other data, and when more than two devices connect to share data, often photos are shared simultaneously from multiple devices at the same time, resulting in duplicated transfers which increase the time it takes to sync.

One of our early assumptions too, that we didn't initially realize that we had made, was that users would always "complete" sync. Of course in reality failing connections between devices, phones running out of battery, and people simply walking out of range meant that sync often did not complete. In "Mapeo Next" we are better handling these scenarios so that users can easily see when sync is not complete, but they can continue using Mapeo without issue until the sync is done.

In summary, in "Mapeo Next" sync will be faster and more reliable, due to:

  1. An upgrade to Hypercore (the immutable log we use for storage) from version 6 to version 10. This requires that we migrate user data from the old format to the new one.
  2. Moving photo storage into Hypercore allows end-to-end encrypted sync of photos, and makes photo sync much more efficient.
  3. Full support for "partial sync" and recovery when devices next connect, with better user feedback about when sync is done - no more issues with sync getting "stuck" and never completing.

Easy-to-create, secure, encrypted projects

In the current version of Mapeo, we secure a project by creating a random project key, and manually install that key on each device. For users who just download Mapeo and start using it without a project key, data is encrypted, but the default key is visible in the Mapeo source code to a determined attacker. This is not a significant threat given that the current Mapeo only syncs over local wifi, not the internet: any attacker would need to be physically present on the same local network. 

A core philosophy for Mapeo development has been reducing dependency on outside technical support, and so for "Mapeo Next" we are switching to make all projects secure by default. The project key is securely and randomly generated inside the app and is shared with others via secure invites, used to add other users to a Mapeo project. We have been working hard to do this in a way that is intuitive and easy to use, avoiding complicated "key management". The project key is never shared unencrypted outside the app, greatly reducing the chance of someone compromising a project.

Control and better visibility about who is part of a project

One key challenge with existing Mapeo projects has been visibility of who is part of a project and who can see the data, and how to remove a device from a project if the device is lost, the user moves on, or if it is compromised in some way. In "Mapeo Next" it is not enough to have the project key: there must also be a signed record in the Mapeo database giving a user permission to participate, and similarly a signed record saying a user has been removed from a project will ensure that they can no longer sync data from a project. A future blog post will go into the technical details about how this has been solved in "Mapeo Next".

All this will enable users to clearly see who is part of their project, and give users full control about who they add to a project and who they remove, rather than relying on an outside technical support team to manage device permissions.

Synchronization over the internet 

Mapeo was intentionally designed to synchronize without using the internet. Devices connect to each other over a local wifi network that can be completely offline. The peer-to-peer design of the database means that participants in a Mapeo project can meet and synchronize, and eventually data passes from device-to-device until they are all in sync. This has been a critical feature for communities using Mapeo in remote areas with little or no internet connectivity. It removes the need to travel to a location with internet in order to share data. However, there are circumstances where internet is available, and meeting up in person can be hard, and the recent COVID-19 pandemic has driven home the need to also support internet sync.

Our main challenge with Mapeo is enabling sync over the internet in a way that is fully secure, so that users can sync with another device over the internet knowing that it is impossible for anyone to intercept or read the data on the way. Importantly, we needed to do this in a way that was simple and accessible to users, without requiring managing user accounts, passwords, and special access keys.

"Mapeo Next" will enable fully secure end-to-end encrypted sync of data over the internet between devices on a project. This is enabled by many of the features listed below:

  1. Project keys generated on device and never shared unencrypted are used to encrypt all network traffic.
  2. Moving photo storage to Hypercore allows us to encrypt _all_ data that is shared.
  3. Partial sync support enables just the most important data to be synced over a potentially slow internet connection.

Intelligently managing device storage

Storing all data locally on an embedded database on each device makes Mapeo extremely resilient. Every device becomes a backup, and data is not stored on a server that could become inaccessible and is out of users control. However, every device storing _all_ data in a project means that disk space is eventually going to become an issue.

This isn't actually a problem for the textual data, e.g. observations with GPS locations, descriptions, forms and metadata. Each record only uses a few kilobits so it would take millions of records to fill the storage on even phones with the smallest amount of storage. Photos however are a different story. We limit this problem in the current version of Mapeo by only synchronizing "photo previews" between mobile devices with lower capacity. These previews are only 100-200kb, so you can store about 5,000 in each Gb of phone storage. This does mean that every mobile device needs to synchronize with a laptop with larger storage in order to ensure that their full-quality original images are shared.

We need to address this in Mapeo Next because as projects grow, devices will eventually run out of disk space. Especially as we add new much-requested features such as video and audio recording, which take up even more disk space.

The switch to Hypercore storage for photos and other media in "Mapeo Next" will allow us to intelligently manage disk space and avoid out-of-space errors. Devices will be able to sync data based on capacity, and track when a photo or video has reached a "backup machine", e.g. a laptop with a large hard drive or a backup server on the internet. When data is on a backup machine devices can selectively remove it from their local storage when they run short of disk space.

This new feature will be coming in early 2024 after the initial launch of Mapeo Next. It is enabled by the new database and change in photo storage in the updated platform.

New features like GPS tracks and audio recordings

As Mapeo has organically grown from small beginnings, the tangle of new and old code has become harder to manage, and resulted in adding new features taking much longer than we expected. Another key limitation has been how we read data from the immutable logs where Mapeo data is stored: We have been writing our custom "indexes" for each datatype (these "indexes" allow us to just read, for example, "observation" record types). Adding an additional data type, like a GPS track, requires lots of modifications to complicated code.

In "Mapeo Next", on each device, we make a copy of the data in our immutable logs in a mature existing database called SQLite. SQLite is open source and [deployed on billions of devices worldwide, and is familiar to many developers. It comes with a powerful language for reading different subsets of data (called "querying"), so it enables new features in Mapeo like sorting observations by date or distance, or filtering to only show a subset of data.

Re-writing the Mapeo database engine to use SQLite will clean up a lot of old code that was causing bugs and slowing us down, and makes it much easier and faster to add support for new types of data, and to query data in new ways to support new sorting and filtering options in Mapeo.

Next steps

If you already use Mapeo, you don’t have to do anything — the current version of the app will continue working, and your data is yours, on your device, and is not going away. However, if you are looking forward to some of the new features and improvements in “Mapeo Next”, keep an eye out for the next blog posts in this series.

________________

Email us at support@mapeo.app

You can also find us on Discord!

Read more about Mapeo and download it at https://mapeo.app

Mapeo Mobile for Android is available on the Google Play Store 

Mapeo Mobile is also available for install by downloading the Android APK

Published by
Gregor MacLennan
Back to the Blog