arrayloaders.io.create_store_from_h5ads¶
- arrayloaders.io.create_store_from_h5ads(adata_paths, output_path, var_subset=None, chunk_size=4096, shard_size=65536, zarr_compressor=(BloscCodec(typesize=None, cname=<BloscCname.lz4: 'lz4'>, clevel=3, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0), ), h5ad_compressor='gzip', buffer_size=1048576, shuffle=True, *, should_denseify=True, output_format='zarr')¶
Create a Zarr store from multiple h5ad files.
- Parameters:
adata_paths (
Iterable
[PathLike
[str
]] |Iterable
[str
]) – Paths to the h5ad files used to create the zarr store.output_path (
PathLike
[str
] |str
) – Path to the output zarr store.var_subset (
Iterable
[str
] |None
, default:None
) – Subset of gene names to include in the store. If None, all genes are included. Genes are subset based on thevar_names
attribute of the concatenated AnnData object.chunk_size (
int
, default:4096
) – Size of the chunks to use for the data in the zarr store.shard_size (
int
, default:65536
) – Size of the shards to use for the data in the zarr store.zarr_compressor (
Iterable
[BytesBytesCodec
], default:(BloscCodec(typesize=None, cname=<BloscCname.lz4: 'lz4'>, clevel=3, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0),)
) – Compressors to use to compress the data in the zarr store.h5ad_compressor (
Literal
['gzip'
,'lzf'
] |None
, default:'gzip'
) – Compressors to use to compress the data in the h5ad store. See anndata.write_h5ad.buffer_size (
int
, default:1048576
) – Number of observations to load into memory at once for shuffling / pre-processing. The higher this number, the more memory is used, but the better the shuffling. This corresponds to the size of the shards created.shuffle (
bool
, default:True
) – Whether to shuffle the data before writing it to the store.should_denseify (
bool
, default:True
) – Whether or not to write as dense on disk.output_format (
Literal
['h5ad'
,'zarr'
], default:'zarr'
) – Format of the output store. Can be either “zarr” or “h5ad”.
Examples
>>> from arrayloaders.io.store_creation import create_store_from_h5ads >>> datasets = [ ... "path/to/first_adata.h5ad", ... "path/to/second_adata.h5ad", ... "path/to/third_adata.h5ad", ... ] >>> create_store_from_h5ads(datasets, "path/to/output/zarr_store")