Data Preparation Functions
Load User Files
Validate meta DataFrame
- cellphonedb.src.core.preprocessors.method_preprocessors.meta_preprocessor(meta_raw: DataFrame) DataFrame [source]
Re-formats meta_raw if need be to ensure correct columns and indexes are present
- Parameters:
meta_raw (pd.DataFrame) – A DataFrame containing a mapping between cells and cell types.
- Returns:
meta DataFrame containing columns and indexes as expected by the analysis methods
- Return type:
pd.DataFrame
Validate counts DataFrame
- cellphonedb.src.core.preprocessors.counts_preprocessors.counts_preprocessor(counts: DataFrame, meta: DataFrame) DataFrame [source]
Ensure that counts values are of type float32, and that all cells in meta exist in counts
- Parameters:
counts (pd.DataFrame) – Counts data
meta (pd.DataFrame) – Meta data (a mapping between cells and cell types)
- Returns:
counts DataFrame in which counts values are of type float32 and all cells in meta are present
- Return type:
pd.DataFrame
Subsample the counts data
- cellphonedb.src.core.utils.subsampler.Subsampler.__init__(self, log: bool, num_pc: int = 100, num_cells: int | None = None, verbose: bool | None = None, debug_seed: int | None = None)
- Parameters:
log (bool) – If true, each element of counts array will be converted to its natural logarithm before subsampling
num_pc (int) – Number of principal components to be used during sub-sampling (for more information see: https://github.com/brianhie/geosketch)
num_cells (int) – Number of samples to obtain from counts
verbose (bool)
debug_seed (bool) – Set to True to obtain the same sub-sampling across different runs
- cellphonedb.src.core.utils.subsampler.Subsampler.subsample(self, counts: DataFrame) DataFrame
- Parameters:
counts (pd.DataFrame) – counts DataFrame to be sub-sampled
- Returns:
Sub-sampled counts using the parameters passed in __init__ method
- Return type:
pd.DataFrame