v0.4.2 — Now with Python bindings + parallel compression

Every compressor treats
JSON as text. This one doesn't.

DataCortex auto-infers your JSON schema, reorganizes data into columns, applies type-specific encoding, then picks the optimal entropy coder. No config, no schema files. Just better compression.

Get Started
cargo install datacortex-cli
+113%
vs zstd-19 on k8s logs
40x
compression on structured logs
381
tests passing, zero failures
0
configuration required

Benchmarks don't lie

DataCortex Fast mode vs the best general-purpose compressors. Higher ratio = better. Lossless, byte-exact roundtrip guaranteed.

File Size DataCortex zstd -19 brotli -11 vs Best
k8s structured logs 100K rows 9.9 MB ~40x 18.9x -- +113%
nginx access logs 100K rows 9.5 MB ~28x 17.3x -- +62%
NDJSON analytics 10K rows 3.3 MB 27.8x 16.0x 16.4x +70%
NDJSON events 200 rows 107 KB 22.0x 15.6x 16.6x +32%
Twitter API nested JSON 617 KB 19.7x 16.7x 18.9x +4%
Event tickets repetitive 1.7 MB 221.7x 176.0x 190.0x +17%
k8s Structured Logs — Compression Ratio
DataCortex
40x
+113%
zstd -19
18.9x
baseline
NDJSON Analytics (10K rows)
DataCortex
27.8x
+70%
zstd -19
16.0x
baseline
brotli -11
16.4x
baseline

How it works

Four stages, fully automatic. DataCortex understands your data's structure and exploits it.

01 DETECT
Schema Inference
Auto-detects JSON/NDJSON format. Infers column types: integers, timestamps, booleans, UUIDs, enums, strings.
02 REORG
Columnar Layout
Reorganizes row-oriented data into columns. Similar values sit adjacent, creating massive compression opportunity.
03 ENCODE
Typed Encoding
Delta varints for integers, bitmaps for booleans, epoch encoding for timestamps, binary packing for UUIDs.
04 COMPRESS
Auto-Fallback
6 compression paths race in parallel (zstd + brotli at multiple levels). The smallest output wins automatically.

Up and running in seconds

Rust CLI, Python library, or build from source. Your choice.

terminal — CLI
# Install
$ cargo install datacortex-cli

# Compress (auto-detects format)
$ datacortex compress data.ndjson
→ data.dcx (27.8x, 0.29 bpb)

# Decompress
$ datacortex decompress data.dcx output.ndjson

# Stream from stdin
$ cat logs.ndjson | datacortex compress - -o out.dcx

# Benchmark vs zstd
$ datacortex bench corpus/ --compare
terminal — Python
# Install
$ pip install datacortex

# Use in Python
import datacortex

# Compress bytes
data = open("logs.ndjson", "rb").read()
compressed = datacortex.compress(data)

# Decompress
original = datacortex.decompress(compressed)

# File operations
datacortex.compress_file("in.json", "out.dcx")
datacortex.decompress_file("out.dcx", "restored.json")

Built for production

Everything you need for real-world JSON compression pipelines.

Parallel Compression

6 compression paths race concurrently via Rayon. 247% CPU utilization on multi-core machines.

📡

Streaming Support

stdin/stdout pipes, --chunk-rows for bounded memory on huge NDJSON files. Multi-frame .dcx format.

🐍

Python Bindings

pip install datacortex. 6 functions: compress, decompress, compress_file, decompress_file, info, detect_format.

📖

Custom Dictionaries

Train compression dictionaries from sample data. Reuse across files with similar schemas for even better ratios.

🔒

Byte-Exact Roundtrip

CRC-32 verified. Decompress always produces identical bytes. 381 tests, 36+ E2E scenarios, zero failures.

⚙️

Zero Configuration

No schema files, no database, no setup. Auto-infers format, types, and optimal compression strategy.

Stop wasting storage on JSON

One command. Better compression than anything else. Try it now.

View on GitHub
pip install datacortex