CG production data is a graph: a movie breaks down into sequences, shots, scenes, and assets all bound together by dependency relationships. Change the mesh on a prop, and you need to know exactly what rebuilds downstream: the rig, the animation keys, the final image sequence. That chain of impact is a graph.
And yet, most studios still use relational databases by default.
This article makes the case for graph databases in CG pipelines, walks through the three most viable open-source options with working Python code, and helps you decide which one to reach for first.
Why Graph Databases Belong in Your Pipeline
A traditional relational database forces you to flatten graph-shaped data into tables. Traversing dependencies then requires recursive joins that are slow to write and slower to run.
Graph databases store your data as nodes and edges. It unlocks three concrete wins for pipeline TDs:
- Instant impact analysis. When a director asks "what breaks if I change this asset?", you can answer in milliseconds instead of writing a new SQL query.
- Dependency-aware build ordering. A directed graph gives you a topological sort for free: the correct sequence of operations to rebuild any element is implicit in the structure.
- Faster iteration. When a change cascades through your production, you can respond and re-queue only the affected work rather than rebuilding everything.
Kitsu tip: Kitsu already tracks the task graph across your production: assets, shots, sequences, and their task statuses. If you want to layer a custom graph database on top for deeper dependency analysis, Kitsu's open REST API makes it straightforward to sync production entities into your graph store of choice.
The Test Setup
To compare databases practically, we model the dependency graph for a single prop going through a CG pipeline (concept → texture/mesh → model/rig → animation keys → final image sequence) and then run one representative query 10,000 times:
"Which elements are impacted if Props 1 Mesh changes?"
All benchmarks run on an i7-6700 CPU @ 3.40GHz and include the Python client overhead, since that's what you'll actually be running in production.
Option 1: Neo4j
Best for: Production-critical pipelines where robustness and query power matter most.
Neo4j is the most mature graph database available. It has a commercial enterprise tier (with monitoring, backup, and HA clustering) and a solid community edition that's free for most studio use cases.
Getting Started
Spin up the community edition via Docker:
docker run \
--publish=7474:7474 --publish=7687:7687 \
--volume=$HOME/neo4j/data:/data \
neo4j
Install the Python driver:
pip install neo4j-driver
Populating the Graph
Neo4j uses Cypher, a purpose-built graph query language that reads almost like English. The MERGE command acts as "create if not exists", which keeps your setup scripts idempotent:
from neo4j.v1 import GraphDatabase, basic_auth
driver = GraphDatabase.driver(
"bolt://localhost:7687",
auth=basic_auth("neo4j", "tests")
)
session = driver.session()
def create_asset(name):
session.run("MERGE (a:Asset { name: $name })", name=name)
def create_shot(name):
session.run("MERGE (a:Shot { name: $name })", name=name)
def create_relation(asset1, asset2):
session.run(
"MATCH (a:Asset { name: $asset1 }), (b:Asset { name: $asset2 })"
"MERGE (a)-[r:ELEMENT_OF]->(b)",
asset1=asset1, asset2=asset2
)
def create_casting(asset, shot):
session.run(
"MATCH (a:Asset { name: $asset }), (b:Shot { name: $shot })"
"MERGE (a)-[r:CASTED_IN]->(b)",
asset=asset, shot=shot
)
# Nodes
create_asset("Props 1 concept")
create_asset("Props 1 mesh")
create_asset("Props 1 texture")
create_asset("Props 1 rig")
create_asset("Props 1 model")
create_asset("Props 1 keys")
create_shot("Shot 1")
# Edges
create_relation("Props 1 concept", "Props 1 texture")
create_relation("Props 1 concept", "Props 1 mesh")
create_relation("Props 1 mesh", "Props 1 model")
create_relation("Props 1 texture", "Props 1 model")
create_relation("Props 1 mesh", "Props 1 rig")
create_relation("Props 1 mesh", "Props 1 keys")
create_relation("Props 1 rig", "Props 1 keys")
create_casting("Props 1 model", "Shot 1")
create_casting("Props 1 keys", "Shot 1")
Querying for Impact
The [*] wildcard traverses all hops in one shot, no recursion to manage manually:
result = session.run(
"MATCH (:Asset { name: 'Props 1 mesh' })-[*]->(out)"
"RETURN out.name as name"
)
for record in result:
print(record["name"])
session.close()
Performance
10,000 queries: 3.5 seconds (keep the session open; reopening it each time costs ~17 seconds).
Verdict
Neo4j is the fastest option in this test and has the most expressive query language. If you're on a hard-deadline production with SLAs, the enterprise tier's monitoring and backup features are worth the cost. For most studios, the community edition is plenty. There's also a community ORM client that makes the Python integration more ergonomic.
Option 2: ArangoDB
Best for: Studios that want to experiment quickly and may also need document storage alongside graph data.
ArangoDB is a multi-model database that handles documents, key-value, and graph storage in one engine. This flexibility means you can store rich asset metadata as JSON documents and model the relationships between them as a graph without running two separate databases.
Getting Started
docker run -p 8529:8529 -e ARANGO_ROOT_PASSWORD=openSesame arangodb/arangodb:3.2.1
pip install python-arango
Setting Up the Graph Schema
ArangoDB requires you to define vertex collections and edge definitions explicitly. Edges are always directed:
from arango.client import ArangoClient
client = ArangoClient(username='root', password='openSesame')
db = client.create_database('cgproduction')
dependencies = db.create_graph('dependencies')
shots = dependencies.create_vertex_collection('shots')
assets = dependencies.create_vertex_collection('assets')
casting = dependencies.create_edge_definition(
name='casting',
from_collections=['assets'],
to_collections=['shots']
)
elements = dependencies.create_edge_definition(
name='element',
from_collections=['assets'],
to_collections=['assets']
)
Note: ArangoDB raises an exception if you try to create something that already exists. You'll need to wrap creation calls in your own "get or create" helpers for idempotent setup scripts.
Inserting Data
# Vertices
assets.insert({'_key': 'props1-concept', 'name': 'Props 1 Concept'})
assets.insert({'_key': 'props1-texture', 'name': 'Props 1 Texture'})
assets.insert({'_key': 'props1-mesh', 'name': 'Props 1 Mesh'})
assets.insert({'_key': 'props1-rig', 'name': 'Props 1 Rig'})
assets.insert({'_key': 'props1-model', 'name': 'Props 1 Model'})
assets.insert({'_key': 'props1-keys', 'name': 'Props 1 Keys'})
shots.insert({'_key': 'shot1', 'name': 'Shot 1 Image Sequence'})
# Edges
elements.insert({'_from': 'assets/props1-concept', '_to': 'assets/props1-texture'})
elements.insert({'_from': 'assets/props1-concept', '_to': 'assets/props1-mesh'})
elements.insert({'_from': 'assets/props1-texture', '_to': 'assets/props1-model'})
elements.insert({'_from': 'assets/props1-mesh', '_to': 'assets/props1-rig'})
elements.insert({'_from': 'assets/props1-mesh', '_to': 'assets/props1-model'})
elements.insert({'_from': 'assets/props1-mesh', '_to': 'assets/props1-keys'})
elements.insert({'_from': 'assets/props1-rig', '_to': 'assets/props1-keys'})
casting.insert({'_from': 'assets/props1-model', '_to': 'shots/shot1'})
casting.insert({'_from': 'assets/props1-keys', '_to': 'shots/shot1'})
Querying for Impact
traversal_results = dependencies.traverse(
start_vertex='assets/props1-mesh',
direction='outbound'
)
for result in traversal_results["vertices"]:
print(result["name"])
The traversal API also exposes depth-first vs breadth-first options, shortest path finding, and path length retrieval. Useful for more advanced pipeline analysis.
Performance
10,000 queries: 26 seconds. Slower than Neo4j, but still perfectly acceptable for most pipeline tooling that won't run tens of thousands of queries per session.
Verdict
ArangoDB is the most developer-friendly of the bunch. It's well documented, the Python client is clean, and the web UI makes it easy to visualise and debug your graph as you build it. The document storage model maps naturally to how pipeline TDs already think about asset data.
Because ArangoDB stores vertices as JSON documents, you can directly mirror Kitsu's asset and shot entities (fetched via the Kitsu API or gazu, the Python client) into ArangoDB with minimal transformation. This makes ArangoDB a natural companion to Kitsu if you want to add dependency tracking to your Kitsu-managed production.
Option 3: Cayley
Best for: Complementing an existing relational database with lightweight graph traversal, if you're comfortable with an experimental tool.
Cayley is a graph database from Google, written in Go. Its defining feature is that it's a layer on top of other storage backends (Bolt, PostgreSQL, etc.), which means you may be able to add graph capabilities without replacing your existing database.
Limitations to Know Upfront
- Documentation is thin.
- The Python client (
pyley) is incomplete — quad creation must be done via raw HTTP requests. - The visualisation UI is buggy.
- Recursive traversal isn't yet available in the Python client.
Getting Started
Download the Cayley binary, initialise the database, and start the HTTP server:
./cayley init -db bolt -dbpath /tmp/testdb
./cayley http --dbpath=/tmp/testdb --host 0.0.0.0 --port 64210
pip install pyley requests
Inserting Quads
Cayley models everything as quads: subject → predicate → object (+ optional label). Since the Python client doesn't support quad creation, use the REST API directly:
import requests
def create_quad(quad):
return requests.post(
"http://localhost:64210/api/v1/write",
json=[quad]
)
quads = [
{"subject": "props1-concept", "predicate": "dependencyof", "object": "props1-texture"},
{"subject": "props1-concept", "predicate": "dependencyof", "object": "props1-mesh"},
{"subject": "props1-texture", "predicate": "dependencyof", "object": "props1-model"},
{"subject": "props1-mesh", "predicate": "dependencyof", "object": "props1-model"},
{"subject": "props1-mesh", "predicate": "dependencyof", "object": "props1-rig"},
{"subject": "props1-mesh", "predicate": "dependencyof", "object": "props1-keys"},
{"subject": "props1-rig", "predicate": "dependencyof", "object": "props1-keys"},
{"subject": "props1-model", "predicate": "dependencyof", "object": "shot1-image-sequence"},
{"subject": "props1-keys", "predicate": "dependencyof", "object": "shot1-image-sequence"},
]
for quad in quads:
create_quad(quad)
Re-inserting identical quads is a no-op.
Querying for Impact
from pyley import CayleyClient, GraphObject
client = CayleyClient("http://localhost:64210", "v1")
graph = GraphObject()
query = graph.V("props1-mesh").Out().All()
Performance
10,000 queries: 50 seconds. The slowest of the group.
Verdict
Cayley has a genuinely elegant design and the concept of a graph layer over an existing backend is compelling. But it's not production-ready for most studios today: documentation is sparse, the Python client is incomplete, and performance lags behind. Watch this project, but don't ship on it yet.
Quick Comparison
| Neo4j | ArangoDB | Cayley | |
|---|---|---|---|
| Performance (10k queries) | 3.5s | 26s | 50s |
| Query language | Cypher (expressive) | AQL + traversal API | Gremlin / MQL |
| Python client quality | Good (+ ORM option) | Good | Incomplete |
| Documentation | Excellent | Good | Poor |
| Multi-model | No | Yes (doc + graph) | No |
| Web UI | Yes | Yes | Broken |
| Production-ready | ✅ | ✅ | ⚠️ |
| Best for | Speed & robustness | Flexibility & dev experience | Experimenting |
Alternatives Worth Knowing
If you're not ready to adopt a dedicated graph database, two approaches work well with tools you may already have:
PostgreSQL recursive joins can handle straightforward dependency traversal without adding a new database to your stack. Query complexity grows quickly, but it's a valid starting point.
Elasticsearch can store vertices and edges as JSON documents and supports graph-like queries. It adds the benefit of full-text and fuzzy search across your asset metadata. Useful if you want to search and traverse in the same system.
Visualising Your Graph
Once your data is in a graph database, you'll eventually want to render it in your own tools. Good options by platform:
Qt (Python/C++):
- Nodz - Python, easy to integrate
- ZodiacGraph - C++, high performance
Web/Electron:
- Cytoscape.js - versatile and production-grade
- SigmaJS - fast, well documented
- D3.js - maximum flexibility, steeper learning curve
Kitsu's web interface already provides a visual breakdown of production structure: episodes, sequences, shots, and assets. For teams that want a pre-built production graph view without custom tooling, Kitsu gives you that out of the box. Custom graph visualisations make the most sense for deep technical dependency analysis (e.g. which render farm jobs to invalidate when an asset version changes).
Recommended Action Plan
- Start with ArangoDB if you're exploring graph databases for the first time or want to prototype quickly. Its document model and clean Python client have the lowest barrier to entry.
- Switch to Neo4j when performance becomes a constraint or when you need enterprise-grade reliability. The Cypher query language is worth learning — it pays off quickly for complex traversals.
- Use Kitsu + gazu to seed your graph database with production entities. Kitsu is your source of truth for assets, shots, and task statuses; your graph database adds the dependency and build-order layer on top.
- Skip Cayley for now. Check back in 12–18 months — the core design is sound, but it needs more documentation and a more complete Python client.
- Consider Postgres first if you have simple dependency needs and want to avoid adding a new technology to your stack.
Graph databases won't replace your production tracker but they will make your pipeline smarter about what needs to rebuild when things change. If you're managing CG production with Kitsu, you already have the asset and shot graph; a dedicated graph database lets you extend that into full dependency tracking and build orchestration.