Stores, scale, and strict import

TripleModel supports chunked import, on-disk pyoxigraph stores, predicate-map caching, and strict data-quality checks.

Predicate-map caching

Field → predicate IRIs are cached per model class when using the default resolver. Custom resolvers passed via resolver= bypass the cache.

Strict import

On class Rdf:

strict_import = True — raise ValueError when the subject has triples whose predicates are not mapped on the model (except rdf:type).
warn_unmapped_fields = True — emit UserWarning instead of failing.

class Person(TripleModel):
    class Rdf:
        namespace = "http://example.org/"
        type_uri = "http://example.org/Person"
        id_field = "slug"
        strict_import = True

    slug: str
    name: str = rdf_field("http://example.org/name")

Chunked import

from triplemodel import iter_graph_to_models

for chunk in iter_graph_to_models(graph, Person, chunk_size=500):
    process(chunk)

graph_to_models(..., chunk_size=500) uses the same iterator internally.

Streaming file load

For large N-Triples / N-Quads files, use load_models_streaming with an on-disk store:

people = load_models_streaming(
    "huge.nt",
    Person,
    store="disk",
    store_identifier="/path/to/oxigraph-store",
    chunk_size=500,
)

Omit store to parse into memory. Turtle and TriG still require a full parse; convert to N-Quads for multi-GB inputs.

Legacy store="sqlalchemy" / "berkeleydb" emit a DeprecationWarning and map to disk.

Store helpers

from triplemodel import open_graph, graph_store_session, store_commit

graph = open_graph("disk", "/path/to/oxigraph-store")
with graph_store_session(graph):
    Person.sync_to_graph(instance, graph)
    store_commit(graph)

See examples/stores/disk_store.py. When parse_into_store_graph creates a temporary directory, call graph.close() to remove it. For remote SPARQL as the system of record, use SPARQL and remote endpoints and SparqlModel.

Bulk load, backup, and optimize

For large files on a disk store, use bulk_load_into_graph (wraps Store.bulk_load) instead of parsing entirely into memory:

from triplemodel import bulk_load_into_graph, open_graph, optimize_store, backup_store, store_commit

graph = open_graph("disk", "/path/to/store")
bulk_load_into_graph(graph, "huge.nt", format="nt")
optimize_store(graph=graph)
backup_store("/path/to/backup-dir", graph=graph)
store_commit(graph)
graph.close()

dump_store / load_store export and import N-Quads snapshots. store_flush runs before Graph.close on supported on-disk stores. See examples/stores/bulk_load_backup.py.

Named-graph helpers: list_named_graphs, ensure_named_graph, clear_named_graph, remove_named_graph. Integrators can use iter_quads_for_pattern for low-level quad scans.

Benchmark

examples/exit_criteria_08.py loads a FOAF-shaped graph (default 100k people; set TRIPLEMODEL_BENCH_COUNT for CI smoke runs). Set TRIPLEMODEL_STORE=disk to exercise the on-disk streaming path.

Plugin hooks

triplemodel.plugins registers custom literals, resources, and predicate resolvers:

from triplemodel.plugins import (
    register_literal_type,
    register_predicate_resolver,
)

register_predicate_resolver(MyResolver)

register_parser, register_serializer, and register_store were removed in 0.10.0 (pyoxigraph has no plugin registry). See Plugin hooks and Migrating to TripleModel 0.10.0 (pyoxigraph).

Codegen (experimental)

triplemodel-codegen examples/codegen/sample.ttl -o models.py

OWL/RDFS classes and datatype properties become stub TripleModel subclasses. Output is best-effort only — see limitations in Codegen (experimental).