- Introduction
- Migrating from Ruby CSV
- Ruby CSV Pitfalls
- Parsing Strategy
- The Basic Read API
- The Basic Write API
- Batch Processing
- Configuration Options
- Row and Column Separators
- Header Transformations
- Header Validations
- Column Selection
- Data Transformations
- Value Converters
- Bad Row Quarantine
- Instrumentation Hooks
- Examples
- Real-World CSV Files
- SmarterCSV over the Years
- Release Notes
smarter_csv is a Ruby gem for fast & convenient importing and exporting of CSV files. It has intelligent defaults and auto-discovery of column and row separators. Importing returns Rails-ready hashes — suitable for direct use with ActiveRecord, Sidekiq, parallel processing, or S3 workflows. Exporting takes hashes or arrays of hashes and writes properly formatted CSV.
Inconvenient. Ruby's built-in csv library returns arrays of arrays, which means your application code must handle column indexing, header normalization, type conversion, and whitespace stripping manually. It also has no built-in support for chunked or parallel processing of large files.
Hidden failure modes. CSV.read has 10 ways to silently corrupt or lose data — no exception, no warning, no log line. Duplicate headers, blank header cells, extra columns, BOMs, whitespace, inconsistent empty-field representation, runaway quoted fields, and encoding issues all fail silently. See Ruby CSV Pitfalls for reproducible examples and the SmarterCSV fix for each.
Slow. On top of everything else, it is up to 129× slower than SmarterCSV for equivalent end-to-end work.
SmarterCSV was created to solve exactly these problems: nightly imports of large datasets that needed to be upserted into a database, processed in parallel, and remain robust against real-world variations in input data.
-
Performance: SmarterCSV's C extension accelerates the full ingestion pipeline — parsing, hash construction, and value conversions — not just tokenization. Real-world benchmarks against
CSV.table(the closest equivalent) show 7×–129× faster end-to-end throughput. -
Rails-ready output: Each CSV row is returned as a Ruby hash with symbol keys, numeric conversion, and whitespace stripping applied automatically. No post-processing boilerplate needed — records can be passed directly to
ActiveRecord,insert_all, Sidekiq, message queues, or JSON serializers. -
Intelligent defaults and robustness: SmarterCSV auto-detects row and column separators, handles BOMs, strips extra whitespace, and tolerates common real-world inconsistencies — all without manual configuration. This makes imports robust against data you don't fully control, such as user-uploaded files or third-party exports.
-
Flexible header and value transformations: Headers are automatically downcased, symbolized, and normalized. You can remap or drop columns with
key_mapping, override headers entirely withuser_provided_headers, and apply per-field value converters for custom type coercion (dates, booleans, currency, etc.). -
Batch and streaming processing:
chunk_sizeenables memory-efficient batch processing of arbitrarily large files — each chunk is an array of hashes ready forinsert_all, Sidekiq, or other data sinks. TheReader#eachenumerator includesEnumerable, giving you lazy evaluation,each_slice,select,map, and more. -
Bad row quarantine: Malformed rows can be collected or skipped instead of crashing the entire import.
on_bad_row: :collectlets you inspect and log bad rows after processing completes.
-
Header validation: Use
required_keysto raise an error before any data rows are processed if expected columns are missing. Works with post-transformation key names, so it's safe to combine withkey_mapping. See Header Validations. -
Instrumentation hooks:
on_start,on_chunk, andon_completecallbacks give you visibility into import progress — useful for logging, progress bars, and alerting in long-running jobs. See Instrumentation Hooks. -
Resumable imports: The
chunk_indexparameter pairs naturally with Rails 8.1'sActiveJob::Continuablefor jobs that can pause and resume mid-import without reprocessing already-completed chunks. See Examples. -
CSV writing:
SmarterCSV.generatewrites arrays of hashes to CSV, with support for header renaming and value converters on output. See The Basic Write API.
NEXT: Migrating from Ruby CSV | UP: README
