Data Preparation Functions

Load User Files

Validate meta DataFrame

cellphonedb.src.core.preprocessors.method_preprocessors.meta_preprocessor(meta_raw: DataFrame) DataFrame[source]

Re-formats meta_raw if need be to ensure correct columns and indexes are present

Parameters:

meta_raw (pd.DataFrame) – A DataFrame containing a mapping between cells and cell types.

Returns:

meta DataFrame containing columns and indexes as expected by the analysis methods

Return type:

pd.DataFrame

Validate counts DataFrame

cellphonedb.src.core.preprocessors.counts_preprocessors.counts_preprocessor(counts: DataFrame, meta: DataFrame) DataFrame[source]

Ensure that counts values are of type float32, and that all cells in meta exist in counts

Parameters:
  • counts (pd.DataFrame) – Counts data

  • meta (pd.DataFrame) – Meta data (a mapping between cells and cell types)

Returns:

counts DataFrame in which counts values are of type float32 and all cells in meta are present

Return type:

pd.DataFrame

Subsample the counts data

cellphonedb.src.core.utils.subsampler.Subsampler.__init__(self, log: bool, num_pc: int = 100, num_cells: int | None = None, verbose: bool | None = None, debug_seed: int | None = None)
Parameters:
  • log (bool) – If true, each element of counts array will be converted to its natural logarithm before subsampling

  • num_pc (int) – Number of principal components to be used during sub-sampling (for more information see: https://github.com/brianhie/geosketch)

  • num_cells (int) – Number of samples to obtain from counts

  • verbose (bool)

  • debug_seed (bool) – Set to True to obtain the same sub-sampling across different runs

cellphonedb.src.core.utils.subsampler.Subsampler.subsample(self, counts: DataFrame) DataFrame
Parameters:

counts (pd.DataFrame) – counts DataFrame to be sub-sampled

Returns:

Sub-sampled counts using the parameters passed in __init__ method

Return type:

pd.DataFrame