arrayloaders.io.create_store_from_h5ads

arrayloaders.io.create_store_from_h5ads(adata_paths, output_path, var_subset=None, chunk_size=4096, shard_size=65536, zarr_compressor=(BloscCodec(typesize=None, cname=<BloscCname.lz4: 'lz4'>, clevel=3, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0), ), h5ad_compressor='gzip', buffer_size=1048576, shuffle=True, *, should_denseify=True, output_format='zarr')

Create a Zarr store from multiple h5ad files.

Parameters:
  • adata_paths (Iterable[PathLike[str]] | Iterable[str]) – Paths to the h5ad files used to create the zarr store.

  • output_path (PathLike[str] | str) – Path to the output zarr store.

  • var_subset (Iterable[str] | None, default: None) – Subset of gene names to include in the store. If None, all genes are included. Genes are subset based on the var_names attribute of the concatenated AnnData object.

  • chunk_size (int, default: 4096) – Size of the chunks to use for the data in the zarr store.

  • shard_size (int, default: 65536) – Size of the shards to use for the data in the zarr store.

  • zarr_compressor (Iterable[BytesBytesCodec], default: (BloscCodec(typesize=None, cname=<BloscCname.lz4: 'lz4'>, clevel=3, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0),)) – Compressors to use to compress the data in the zarr store.

  • h5ad_compressor (Literal['gzip', 'lzf'] | None, default: 'gzip') – Compressors to use to compress the data in the h5ad store. See anndata.write_h5ad.

  • buffer_size (int, default: 1048576) – Number of observations to load into memory at once for shuffling / pre-processing. The higher this number, the more memory is used, but the better the shuffling. This corresponds to the size of the shards created.

  • shuffle (bool, default: True) – Whether to shuffle the data before writing it to the store.

  • should_denseify (bool, default: True) – Whether or not to write as dense on disk.

  • output_format (Literal['h5ad', 'zarr'], default: 'zarr') – Format of the output store. Can be either “zarr” or “h5ad”.

Examples

>>> from arrayloaders.io.store_creation import create_store_from_h5ads
>>> datasets = [
...     "path/to/first_adata.h5ad",
...     "path/to/second_adata.h5ad",
...     "path/to/third_adata.h5ad",
... ]
>>> create_store_from_h5ads(datasets, "path/to/output/zarr_store")