File I/O

Introduction

This section is dedicated to all packages that read or write files on you disk. Common use cases include reading and writing data or configurations to disk.

There are many file formats, and they are good for different things. Some formats are human-readable, such as CSV files. Others are compressed well to minimize the file size, such as Parquet. Others allow efficient read/write, such as Arrow. The package FileIO.jl aims to automatically detect the file extension, and uses many of the packages listed here automatically as appropriate. A list of all supported formats is found here, and the list is rather long.

Overview

Because most packages work for different files, they are not in direct competition, and a comparison does not make much sense. It is more appropriate to compare options that all operate on the same file type, or with the same purpose. With that said, here is a comparison of all the packages on this page:

Packages

There are three types of subsections under "Packages":

  1. Sections for a single package. These sections end with ".jl".

  2. Sections for packages that work with a specific file format.

  3. Sections for packages that serve a specific purpose. Currently only Saving Arbitrary Julia Objects.

The table of contents at the top of the page can likely help you find what you are looking for.

FileIO.jl

GitHub Repo stars deps FileIO Downloads
Stable Dev GitHub last commit (branch) version Coverage
A meta-package for loading all sorts of files. See the documentation for a complete list of supported file types, and details about which library is used internally. This package is actively developed, and well established in the Julia community. The only downside is that it will inherit issues/bugs from the underlying packages, meaning that it is likely not perfect. The upside is that you can use a single package with a well-defined API for any file type.

Images

Images can be loaded with FileIO.jl. However, there are two other alternatives:

  1. ImageIO.jl - See the README in the github repo for a table with supported formats.

  2. Packages specific to the relevant file format. Such packages are currently not listed on this page.

ImageIO.jl

GitHub Repo stars deps ImageIO Downloads
GitHub last commit (branch) version Coverage

Saving Arbitrary Julia Objects (Serialization)

If it often useful to save variables stored in your julia session, and to be able to redefine them in a new julia session. For example if one of the variables is the result of a long-running computation. There several packages that are good for this specific use case. The general recommendation is JLD2.jl.

All options listed in this subsection support saving and loading just about anything you throw at it: Numbers, arrays, functions, even user-defined structs. This is generally done by saving a dictionary, where the keys are usually the variable name, and the values are the thing being saved:

julia> using JLD2

julia> my_func(x) = x^2
my_func (generic function with 1 method)

julia> save("test.jld2", Dict(["my_func"=>my_func]))

julia> loaded_func = load("test.jld2")["my_func"]; loaded_func(4)
16  # Computed 4^2

See also FileIO.jl, which can also save arbitrary julia objects by calling the listed packages internally.

JLD2.jl

GitHub Repo stars deps JLD2 Downloads
GitHub last commit (branch) version Coverage
At at initial glance, the difference between JLD2 and JLD comes down to the fact that JLD2 is "without any dependency on the HDF5 C library".

JLD2 allows the user to save all variables in the current module's global scope using the syntax @save filename.

JLD.jl

GitHub Repo stars deps JLD Downloads
GitHub last commit (branch) version Coverage
The original package for saving arbitrary julia objects. It seems like new users should prefer to use JLD2, and that this is mostly a legacy package.

JLSO.jl

Julia Serialized Object (JLSO) file format for storing checkpoint data.

julia> using JLSO, Dates

julia> JLSO.save("breakfast.jlso", :food => "☕️🥓🍳", :cost => 11.95, :time => Time(9, 0))

julia> loaded = JLSO.load("breakfast.jlso")
Dict{Symbol,Any} with 3 entries:
  :cost => 11.95
  :food => "☕️🥓🍳"
  :time => 09:00:00

Serde.jl

Serde is a Julia library for (de)serializing data to/from various formats. The library offers a simple and concise API for defining custom (de)serialization behavior for user-defined types.

Inspired by the serde.rs Rust library, it supports (de)serialization of the following data formats: JSON, TOML, XML, YAML, CSV, Query. Support for MsgPack and BSON is planned.

CSV and other delimited files

CSV stands for comma seperated values, and comma is the most common delimiter in delimited files. Other common options include tab and semicolon. All delimited files are human readable, and use plain text encoding. This can make them especially easy to write, and read directly as plain text. The main drawback is that such delimited files are not the fastest nor the smallest option for working with data.

Package that support CSV often also support other delimiters, and packages that support delimited files will automatically support a comma delimiter, and therefore CSV files.

The most starred file I/O package of all is CSV.jl. It is well established and tested, and the generally reccomended package for reading delimited files.

See also FileIO.jl, which can also read delimited files.

CSV.jl

GitHub Repo stars deps CSV Downloads
Stable Dev GitHub last commit (branch) version Coverage

DelimitedFiles.jl

GitHub Repo stars deps DelimitedFiles Downloads
Stable Dev GitHub last commit (branch) version Coverage
As a previous Julia standard library, this package will have a certain quality. It does one thing, and does it well, evidenced by the number of dependents.

CSVFiles.jl

GitHub Repo stars deps CSVFiles Downloads
GitHub last commit (branch) version Coverage
This is the CSV reader user by FileIO, as it is part of the queryverse organization.

ChunkedCSV.jl

GitHub Repo stars deps ChunkedCSV Downloads
GitHub last commit (branch) version Coverage
See also TableReader.jl, which has a chunkbits keyword argument and therefore may overlap.

TableReader.jl

GitHub Repo stars deps TableReader Downloads
Stable Dev GitHub last commit (branch) version Coverage
While this package has a very impressive release announcement, the latest development was in 2019. It seems like CSV.jl has improved much since that post, and in generally preferable to this package today.

DLMReader.jl

GitHub Repo stars deps DLMReader Downloads
Stable Dev GitHub last commit (branch) version Coverage

An efficient multi-threaded package for reading(writing) delimited files. It is designed as a file parser for InMemoryDatasets.jl.

ReadWriteDlm2.jl

GitHub Repo stars deps ReadWriteDlm2 Downloads
GitHub last commit (branch) version Coverage

ReadWriteDlm2 functions readdlm2(),writedlm2(), readcsv2() and writecsv2() are similar to those of DelimitedFiles.jl, but with additional support for Dates formats, Complex, Rational, Missing types and special decimal marks. ReadWriteDlm2 supports the Tables.jl interface.

uCSV.jl

GitHub Repo stars deps uCSV Downloads
Stable Dev GitHub last commit (branch) version Coverage

A Julia package for reading and writing delimited-text; µ in size, ∞ in flexibility

Arrow.jl

GitHub Repo stars deps Arrow Downloads
Stable Dev GitHub last commit (branch) version Coverage

Parquet.jl

GitHub Repo stars deps Parquet Downloads
Stable Dev GitHub last commit (branch) version Coverage

MAT.jl

GitHub Repo stars deps MAT Downloads
Stable Dev GitHub last commit (branch) version Coverage

Excel

The main package for working with excel spreadsheets in julia is XLSX.jl. The

You might also be interested in ClipData to "Copy/paste to/from Excel, Google Sheets, and other tabular data sources into interactive Julia sessions. Interactive Julia sessions include the REPL, Pluto notebooks, Jupyter notebooks, and more!".

XLSX.jl

GitHub Repo stars deps XLSX Downloads
Stable Dev GitHub last commit (branch) version Coverage
Excellent package. This is the generally recommended package for reading excel files.

ExcelFiles.jl

GitHub Repo stars deps ExcelFiles Downloads
GitHub last commit (branch) version Coverage
ExcelFiles is used internally by FileIO.jl, as it is part of the queryverse organization. It can also be used as a standalone package. However, the last tagged version was in 2019, indicating that development is not particularly active (despite the latest commit being more recent).

ExcelReaders.jl

GitHub Repo stars deps ExcelReaders Downloads
GitHub last commit (branch) version Coverage
ExcelReaders.jl has "removed support for modern Excel files", and now only supports legacy xls files. It should therefore not be chosen over the alternatives

Taro.jl

GitHub Repo stars deps Taro Downloads
Stable Dev GitHub last commit (branch) version Coverage

Taro is a utility belt of functions to work with document files in Julia. It uses Apache Tika, Apache POI and Apache FOP (via JavaCall) to work with Word, Excel and PDF files.

JSON

Overview:

  • Use JSON3.jl or JSON.jl for most cases.

    • JSON3.jl has faster implementation.

    • JSON.jl has long history. If you need to load JSON on old Julia versions (e.g. v1.0), JSON.jl will be suitable.

  • Use BSON.jl for Binary JSON.

  • Use JSONRPC.jl for JSON-RPC 2.0.

A quote from the later linked release-announcement for JSON3.jl helps us understand why there are so many packages:

Let’s cut right to the chase and answer the elephant questions in the proverbial discourse room: why do we need another JSON package in Julia? what does it offer distinct from what JSON.jl, JSON2.jl, or LazyJSON.jl offer? why spend time and effort developing something that’s “already solved”? JSON3.jl was born from the spark of three separate ideas, and a vision that they could come together to make the best, most performant, simple, yet powerful JSON integration for Julia possible. It also exists as a way to “prove out” these ideas before trying to potentially upstream improvements into a more canonically named package like JSON.jl. I fully believe the package is ready for full-time use and reliance, but similar to JSON2.jl, it exists as a way to try out a different JSON integration API to potentially make things better, faster, easier.

JSON.jl

GitHub Repo stars deps JSON Downloads
GitHub last commit (branch) version Coverage

JSON2.jl

GitHub Repo stars deps JSON2 Downloads
GitHub last commit (branch) version Coverage

This package is not maintained. Use JSON3.jl instead.

JSON3.jl

GitHub Repo stars deps JSON3 Downloads
Stable Dev GitHub last commit (branch) version Coverage

From its README:

Yet another JSON package for Julia; this one is for speed and slick struct mapping

JSONBase.jl

GitHub Repo stars
GitHub last commit (branch) Coverage

quinnj (a founder of JSON3.jl) also provides JSONBase.jl, but its is not registered yet.

LazyJSON.jl

GitHub Repo stars deps LazyJSON Downloads
GitHub last commit (branch) version Coverage

BSON.jl

GitHub Repo stars deps BSON Downloads
GitHub last commit (branch) version Coverage
BSON is an established package. It does however seem to have some downsides, based on this discussion, which users should keep in mind. It may appear as if other more specific packages are better suited for any given task.

LightBSON.jl

GitHub Repo stars deps LightBSON Downloads
GitHub last commit (branch) version Coverage

JSONRPC.jl

GitHub Repo stars deps JSONRPC Downloads
GitHub last commit (branch) version Coverage

From its README:

An implementation for JSON RPC 2.0. See the specification for details.

Currently, only JSON RPC 2.0 is supported. This package can act as both a client & a server.

JDF.jl

JDF is a DataFrames serialization format with the following goals:

  • Fast save and load times

  • Compressed storage on disk

  • Enable disk-based data manipulation (not yet achieved)

  • Supports machine learning workloads, e.g. mini-batch, sampling (not yet achieved)

JDF.jl is the Julia package for all things related to JDF.

Disclaimer

This section could use some love. If you have used or developed Julia packages in this domain, we would love your help! Please visit the "Contributing" section of the repository that hosts this website for information on how to contribute.

This website is a community effort covering a lot of ever-changing information. It will therefore never be complete or without error. If you see something wrong, or have something to contribute, please see the "Contributing" section in the github repository.

Last modified: May 03, 2024. Built with Franklin.jl