arrayloaders.io.create_store_from_h5ads¶

arrayloaders.io.create_store_from_h5ads(adata_paths, output_path, var_subset=None, chunk_size=4096, shard_size=65536, zarr_compressor=(BloscCodec(typesize=None, cname=<BloscCname.lz4: 'lz4'>, clevel=3, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0), ), h5ad_compressor='gzip', buffer_size=1048576, shuffle=True, *, should_denseify=True, output_format='zarr')¶

Create a Zarr store from multiple h5ad files.

Parameters:

adata_paths (Iterable[PathLike[str]] | Iterable[str]) – Paths to the h5ad files used to create the zarr store.
output_path (PathLike[str] | str) – Path to the output zarr store.
var_subset (Iterable[str] | None, default: None) – Subset of gene names to include in the store. If None, all genes are included. Genes are subset based on the var_names attribute of the concatenated AnnData object.
chunk_size (int, default: 4096) – Size of the chunks to use for the data in the zarr store.
shard_size (int, default: 65536) – Size of the shards to use for the data in the zarr store.
zarr_compressor (Iterable[BytesBytesCodec], default: (BloscCodec(typesize=None, cname=<BloscCname.lz4: 'lz4'>, clevel=3, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0),)) – Compressors to use to compress the data in the zarr store.
h5ad_compressor (Literal['gzip', 'lzf'] | None, default: 'gzip') – Compressors to use to compress the data in the h5ad store. See anndata.write_h5ad.
buffer_size (int, default: 1048576) – Number of observations to load into memory at once for shuffling / pre-processing. The higher this number, the more memory is used, but the better the shuffling. This corresponds to the size of the shards created.
shuffle (bool, default: True) – Whether to shuffle the data before writing it to the store.
should_denseify (bool, default: True) – Whether or not to write as dense on disk.
output_format (Literal['h5ad', 'zarr'], default: 'zarr') – Format of the output store. Can be either “zarr” or “h5ad”.

Examples

>>> from arrayloaders.io.store_creation import create_store_from_h5ads
>>> datasets = [
...     "path/to/first_adata.h5ad",
...     "path/to/second_adata.h5ad",
...     "path/to/third_adata.h5ad",
... ]
>>> create_store_from_h5ads(datasets, "path/to/output/zarr_store")