v0.4.2 — Now with Python bindings + parallel compression

Every compressor treats
JSON as text. This one doesn't.

DataCortex auto-infers your JSON schema, reorganizes data into columns, applies type-specific encoding, then picks the optimal entropy coder. No config, no schema files. Just better compression.

Get Started

cargo install datacortex-cli

Benchmarks don't lie

DataCortex Fast mode vs the best general-purpose compressors. Higher ratio = better. Lossless, byte-exact roundtrip guaranteed.

File	Size	DataCortex	zstd -19	brotli -11	vs Best
k8s structured logs 100K rows	9.9 MB	~40x	18.9x	--	+113%
nginx access logs 100K rows	9.5 MB	~28x	17.3x	--	+62%
NDJSON analytics 10K rows	3.3 MB	27.8x	16.0x	16.4x	+70%
NDJSON events 200 rows	107 KB	22.0x	15.6x	16.6x	+32%
Twitter API nested JSON	617 KB	19.7x	16.7x	18.9x	+4%
Event tickets repetitive	1.7 MB	221.7x	176.0x	190.0x	+17%

k8s Structured Logs — Compression Ratio

DataCortex

40x

+113%

zstd -19

18.9x

baseline

NDJSON Analytics (10K rows)

DataCortex

27.8x

+70%

zstd -19

16.0x

baseline

brotli -11

16.4x

baseline

How it works

Four stages, fully automatic. DataCortex understands your data's structure and exploits it.

01 DETECT

Schema Inference

Auto-detects JSON/NDJSON format. Infers column types: integers, timestamps, booleans, UUIDs, enums, strings.

02 REORG

Columnar Layout

Reorganizes row-oriented data into columns. Similar values sit adjacent, creating massive compression opportunity.

03 ENCODE

Typed Encoding

Delta varints for integers, bitmaps for booleans, epoch encoding for timestamps, binary packing for UUIDs.

04 COMPRESS

Auto-Fallback

6 compression paths race in parallel (zstd + brotli at multiple levels). The smallest output wins automatically.

Up and running in seconds

Rust CLI, Python library, or build from source. Your choice.

terminal — CLI

            # Install

            $ cargo install datacortex-cli

            # Compress (auto-detects format)

            $ datacortex compress data.ndjson

            → data.dcx (27.8x, 0.29 bpb)

            # Decompress

            $ datacortex decompress data.dcx output.ndjson

            # Stream from stdin

            $ cat logs.ndjson | datacortex compress - -o out.dcx

            # Benchmark vs zstd

            $ datacortex bench corpus/ --compare

terminal — Python

            # Install

            $ pip install datacortex

            # Use in Python

            import datacortex

            # Compress bytes

            data = open("logs.ndjson", "rb").read()

            compressed = datacortex.compress(data)

            # Decompress

            original = datacortex.decompress(compressed)

            # File operations

            datacortex.compress_file("in.json", "out.dcx")

            datacortex.decompress_file("out.dcx", "restored.json")

Built for production

Everything you need for real-world JSON compression pipelines.

⚡

Parallel Compression

6 compression paths race concurrently via Rayon. 247% CPU utilization on multi-core machines.

📡

Streaming Support

stdin/stdout pipes, --chunk-rows for bounded memory on huge NDJSON files. Multi-frame .dcx format.

🐍

Python Bindings

pip install datacortex. 6 functions: compress, decompress, compress_file, decompress_file, info, detect_format.

📖

Custom Dictionaries

Train compression dictionaries from sample data. Reuse across files with similar schemas for even better ratios.

🔒

Byte-Exact Roundtrip

CRC-32 verified. Decompress always produces identical bytes. 381 tests, 36+ E2E scenarios, zero failures.

⚙️

Zero Configuration

No schema files, no database, no setup. Auto-infers format, types, and optimal compression strategy.

Stop wasting storage on JSON

One command. Better compression than anything else. Try it now.

View on GitHub

pip install datacortex

Every compressor treats JSON as text. This one doesn't.

Benchmarks don't lie

How it works

Up and running in seconds

Built for production

Parallel Compression

Streaming Support

Python Bindings

Custom Dictionaries

Byte-Exact Roundtrip

Zero Configuration

Stop wasting storage on JSON

Every compressor treats
JSON as text. This one doesn't.