arrayloaders.io.create_store_from_h5ads¶
- arrayloaders.io.create_store_from_h5ads(adata_paths, output_path, var_subset=None, chunk_size=4096, shard_size=65536, zarr_compressor=(BloscCodec(typesize=None, cname=<BloscCname.lz4: 'lz4'>, clevel=3, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0), ), h5ad_compressor='gzip', buffer_size=1048576, shuffle=True, *, should_denseify=True, output_format='zarr')¶
Create a Zarr store from multiple h5ad files.
- Parameters:
adata_paths (
Iterable[PathLike[str]] |Iterable[str]) – Paths to the h5ad files used to create the zarr store.output_path (
PathLike[str] |str) – Path to the output zarr store.var_subset (
Iterable[str] |None, default:None) – Subset of gene names to include in the store. If None, all genes are included. Genes are subset based on thevar_namesattribute of the concatenated AnnData object.chunk_size (
int, default:4096) – Size of the chunks to use for the data in the zarr store.shard_size (
int, default:65536) – Size of the shards to use for the data in the zarr store.zarr_compressor (
Iterable[BytesBytesCodec], default:(BloscCodec(typesize=None, cname=<BloscCname.lz4: 'lz4'>, clevel=3, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0),)) – Compressors to use to compress the data in the zarr store.h5ad_compressor (
Literal['gzip','lzf'] |None, default:'gzip') – Compressors to use to compress the data in the h5ad store. See anndata.write_h5ad.buffer_size (
int, default:1048576) – Number of observations to load into memory at once for shuffling / pre-processing. The higher this number, the more memory is used, but the better the shuffling. This corresponds to the size of the shards created.shuffle (
bool, default:True) – Whether to shuffle the data before writing it to the store.should_denseify (
bool, default:True) – Whether or not to write as dense on disk.output_format (
Literal['h5ad','zarr'], default:'zarr') – Format of the output store. Can be either “zarr” or “h5ad”.
Examples
>>> from arrayloaders.io.store_creation import create_store_from_h5ads >>> datasets = [ ... "path/to/first_adata.h5ad", ... "path/to/second_adata.h5ad", ... "path/to/third_adata.h5ad", ... ] >>> create_store_from_h5ads(datasets, "path/to/output/zarr_store")