PLAN-0003: Remove JSONB Storage
- Author
- David Lutterkort
- Implements
- No RFC - no user visible changes
- Engineering Plan pull request
- https://github.com/graphprotocol/rfcs/pull/7
- Date of submission
- 2019-12-18
- Date of approval
- 2019-12-20
- Approved by
- Jess Ngo, Jannis Pohlmann
Summary
Remove JSONB storage from graph-node
. That means that we want to remove
the old storage scheme, and only use relational storage going
forward. At a high level, removal has to touch the following areas:
- user subgraphs in the hosted service
- user subgraphs in self-hosted
graph-node
instances - subgraph metadata in
subgraphs.entities
(see this issue) - the
graph-node
code base
Because it touches so many areas and different things, JSONB storage removal will need to happen in several steps, the last being actual removal of JSONB code. The first three steps above are independent of each other and can be done in parallel.
Implementation
User Subgraphs in the Hosted Service
We will need to communicate to users that they need to update their subgraphs if they still use JSONB storage. Currently, there are ~ 580 subgraphs (list) belonging to 220 different organizations using JSONB storage. It is quite likely that the vast majority of them is not needed anymore and simply left over from somebody trying something out.
We should contact users and tell them that we will delete their subgraph after a certain date (say 2020-02-01) unless they deploy a new version of the subgraph (with an explanation why etc. of course) Redeploying their subgraph is all that is needed for those updates.
Self-hosted User Subgraphs
We will need to tell users that the 'old' JSONB storage is deprecated and support for it will be removed as of some target date, and that they need to redeploy their subgraph.
Users will need some documentation/tooling to help them understand
- which of their deployed subgraphs still use JSONB storage
- how to remove old subgraphs
- how to remove old deployments
Subgraph Metadata in subgraphs.entities
We can treat the subgraphs
schema like a normal subgraph, with the
exception that some entities must not be versioned. For that, we will need
to adopt code that makes it possible to write entities to the store without
recording their version (or, more generally, so that there will only be one
version of the entity, tagged with a block range [0,)
)
We will manually create the DDL for the subgraphs.graphql
schema and run
that as part of a database migration. In that migration, we will also copy
the existing metadata from subgraphs.entities
and
subgraphs.entity_history
into their new tables.
The Code Base
Delete all code handling JSONB storage. This will mostly affect
entities.rs
and jsonb_queries.rs
in graph-store-postgres
, but there
are also smaller things like that we do not need the annotations on
Entity
to serialize them to the JSON format that JSONB uses.
Tests
Most of the code-level changes are covered by the existing test suite. The major exception is that the migration of subgraph metadata needs to be tested and checked manually, using a recent dump of the production database.
Migration
See above on migrating data in the subgraphs
schema.
Documentation
No user-facing documentation is needed.
Implementation Plan
No estimates yet as we should first agree on this general course of action
- Notify hosted users to update their subgraph or have it deleted by date X
- Mark JSONB storage as deprecated and announce when it will be removed
- Provide tool to ship with
graph-node
to delete unused deployments and unneeded subgraphs - Add affordance to not version entities to relational storage code
- Write SQL migrations to create new subgraph metadata schema and copy existing data
- Delete old JSONB code
- On start of
graph-node
, add check for any deployments that still use JSONB storage and log warning messages telling users to redeploy (once the JSONB code has been deleted, this data can not be accessed any more)
Open Questions
None