Declarative Graph Schemas#
GFQL accepts public schema declarations through the stable
graphistry.schema import path. Use this when application code owns a graph
contract and wants Cypher preflight checks to fail before query execution.
The API is experimental in this release: the import path and core declaration
objects are intended to be stable, while inference, coercion, remote transport,
and planner use are still follow-on surfaces.
The schema is optional. When you provide one, PyGraphistry uses it as the
declared contract for local GFQL validation. When you do not provide one,
validation falls back to the columns already visible on the bound local
nodes and edges dataframes. If neither a public schema nor local
dataframes are available, Cypher validation still parses and compiles the query,
but it cannot reject unknown labels or properties because there is no schema to
check against.
import graphistry
import pandas as pd
import pyarrow as pa
from graphistry.schema import EdgeType, GraphSchema, NodeType
Person = NodeType(
"Person",
pa.schema([
pa.field("id", pa.int64(), nullable=False),
pa.field("name", pa.large_string()),
]),
)
Company = NodeType(
"Company",
pa.schema([
pa.field("id", pa.int64(), nullable=False),
pa.field("name", pa.large_string()),
]),
)
WorksAt = EdgeType(
"WORKS_AT",
source=Person,
destination=Company,
properties=pa.schema([pa.field("since", pa.int32(), nullable=False)]),
)
schema = GraphSchema(
node_types=[Person, Company],
edge_types=[WorksAt],
node_id_column="id",
edge_source_column="src",
edge_destination_column="dst",
)
nodes_df = pd.DataFrame({
"id": [1, 2],
"name": ["Ada", "Graphistry"],
"label__Person": [True, False],
"label__Company": [False, True],
})
edges_df = pd.DataFrame({
"src": [1],
"dst": [2],
"since": [2024],
"label__WORKS_AT": [True],
})
g = (
graphistry
.edges(edges_df, "src", "dst")
.nodes(nodes_df, "id")
.bind(schema=schema)
)
g.gfql_validate("MATCH (p:Person)-[:WORKS_AT]->(c:Company) RETURN p.name")
Schema Objects#
NodeType(name, properties, labels=None)Declares a node contract.
labelsdefaults to(name,)and maps to the existing GFQL label-column conventionlabel__<Label>.propertiesaccepts apyarrow.Schema, a GFQLRowSchema, or a mapping shorthand such as{"id": pa.int64(), "name": pa.large_string()}or{"id": int, "name": str}. Arrow schemas are the preferred declaration path because they preserve dtype and nullability.EdgeType(name, source, destination, properties=None)Declares an edge contract and topology.
sourceanddestinationacceptNodeTypeobjects, label strings, or label iterables. Edge properties use the same Arrow-aligned schema inputs as node properties.GraphSchema(node_types, edge_types, strict=True, ...)Groups node/edge contracts and adapts them to the internal
GraphSchemaCatalogused by binder/preflight validation.strict=Falsemakes schema-boundg.gfql_validate(...)permissive by default; callers can still override per call withg.gfql_validate(..., strict=True).NodeType.to_arrow()andEdgeType.to_arrow()Export declarations as
pyarrow.Schemaobjects through GFQL’s row-schema bridge. Label/type columns are included by default so exports line up with the table columns used by binder/preflight validation.
What Preflight Checks#
When a schema is bound to a graph, Cypher preflight checks validate:
node labels against declared node types,
node and edge property names against declared properties,
relationship types against declared edge types, and
relationship source/destination labels against declared topology when the query provides enough label information.
Invalid queries raise GFQLValidationError with structured context.
This is a correctness and documentation surface first: applications can state what labels, relationship types, properties, and topology they expect, then validate user-authored or generated Cypher before running it. The same typed contract is also the foundation for later inference, coercion, remote transport, and planner/performance work, but this page covers the declared local contract.
Provided vs. Inferred Schema#
In this release, schemas are provided, not inferred. You create
NodeType, EdgeType, and GraphSchema objects directly and attach them
with graphistry.bind(..., schema=schema) or g.bind(schema=schema).
Without an explicit GraphSchema:
g.gfql_validate(...)can still use local dataframe columns already bound ong._nodesandg._edgesfor schema-aware checks.It does not infer node types, edge types, Arrow dtypes, nullability, or topology from data.
A remote-only graph such as
graphistry.bind(dataset_id="...")has no local dataframe columns, so local validation is limited to syntax, compile, and structural checks unless you also bind a declared schema.
Schema inference from existing plottables is tracked separately from this declared-schema API.
Local vs. Remote GFQL#
The public schema is consumed by local validation APIs, including:
g.gfql_validate("MATCH ...")g.gfql(..., validate=True)
gfql_remote(...) is different. It compiles Cypher strings locally and sends
the resulting GFQL wire payload to the server, but this release does not
serialize a bound GraphSchema into remote GFQL requests. Remote execution
therefore still depends on the server-side dataset schema and GFQL support. If
you want declared schema checks before a remote call, run
g.gfql_validate(query) locally first, then call g.gfql_remote(query).
Remote schema transport is planned as a follow-on after the local schema contract and serialization boundary are stable.
Compatibility Notes#
The public import path is stable:
from graphistry.schema import NodeType, EdgeType, GraphSchema
Top-level imports are also available:
from graphistry import NodeType, EdgeType, GraphSchema
This lane exposes declaration, Arrow row-schema export, and binder/preflight integration. Inference from existing plottables, Arrow import/coercion at plottable boundaries, schema effects for graph-growing calls, and remote schema transport remain separate follow-on surfaces.