Stores, scale, and strict import

TripleModel supports chunked import, on-disk pyoxigraph stores, predicate-map caching, and strict data-quality checks.

Predicate-map caching

Field → predicate IRIs are cached per model class when using the default resolver. Custom resolvers passed via resolver= bypass the cache.

Strict import

On class Rdf:

  • strict_import = True — raise ValueError when the subject has triples whose predicates are not mapped on the model (except rdf:type).

  • warn_unmapped_fields = True — emit UserWarning instead of failing.

class Person(TripleModel):
    class Rdf:
        namespace = "http://example.org/"
        type_uri = "http://example.org/Person"
        id_field = "slug"
        strict_import = True

    slug: str
    name: str = rdf_field("http://example.org/name")

Chunked import

from triplemodel import iter_graph_to_models

for chunk in iter_graph_to_models(graph, Person, chunk_size=500):
    process(chunk)

graph_to_models(..., chunk_size=500) uses the same iterator internally.

Streaming file load

For large N-Triples / N-Quads files, use load_models_streaming with an on-disk store:

people = load_models_streaming(
    "huge.nt",
    Person,
    store="disk",
    store_identifier="/path/to/oxigraph-store",
    chunk_size=500,
)

Omit store to parse into memory. Turtle and TriG still require a full parse; convert to N-Quads for multi-GB inputs.

Legacy store="sqlalchemy" / "berkeleydb" emit a DeprecationWarning and map to disk.

Store helpers

from triplemodel import open_graph, graph_store_session, store_commit

graph = open_graph("disk", "/path/to/oxigraph-store")
with graph_store_session(graph):
    Person.sync_to_graph(instance, graph)
    store_commit(graph)

See examples/stores/disk_store.py. When parse_into_store_graph creates a temporary directory, call graph.close() to remove it. For remote SPARQL as the system of record, use SPARQL and remote endpoints and SparqlModel.

Bulk load, backup, and optimize

For large files on a disk store, use bulk_load_into_graph (wraps Store.bulk_load) instead of parsing entirely into memory:

from triplemodel import bulk_load_into_graph, open_graph, optimize_store, backup_store, store_commit

graph = open_graph("disk", "/path/to/store")
bulk_load_into_graph(graph, "huge.nt", format="nt")
optimize_store(graph=graph)
backup_store("/path/to/backup-dir", graph=graph)
store_commit(graph)
graph.close()

dump_store / load_store export and import N-Quads snapshots. store_flush runs before Graph.close on supported on-disk stores. See examples/stores/bulk_load_backup.py.

Named-graph helpers: list_named_graphs, ensure_named_graph, clear_named_graph, remove_named_graph. Integrators can use iter_quads_for_pattern for low-level quad scans.

Benchmark

examples/exit_criteria_08.py loads a FOAF-shaped graph (default 100k people; set TRIPLEMODEL_BENCH_COUNT for CI smoke runs). Set TRIPLEMODEL_STORE=disk to exercise the on-disk streaming path.

Plugin hooks

triplemodel.plugins registers custom literals, resources, and predicate resolvers:

from triplemodel.plugins import (
    register_literal_type,
    register_predicate_resolver,
)

register_predicate_resolver(MyResolver)

register_parser, register_serializer, and register_store were removed in 0.10.0 (pyoxigraph has no plugin registry). See Plugin hooks and Migrating to TripleModel 0.10.0 (pyoxigraph).

Codegen (experimental)

triplemodel-codegen examples/codegen/sample.ttl -o models.py

OWL/RDFS classes and datatype properties become stub TripleModel subclasses. Output is best-effort only — see limitations in Codegen (experimental).