{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Importing/exporting data from/to Scanpy\n",
"\n",
"## Overview\n",
"\n",
"This notebook demonstrates how to use Monet to import/export data from/to [Scanpy](https://scanpy.readthedocs.io/en/stable/).\n",
"\n",
"*Note: This functionality requires Monet >= 0.2.2, please run `pip install 'monet>=0.2.2'` to upgrade if necessary.*\n",
"\n",
"*Note: This assumes that you have [scanpy installed](https://scanpy.readthedocs.io/en/stable/installation.html) (it's not automatically installed with Monet).*\n",
"\n",
"Scanpy represents expression data using `AnnData` objects, which can hold the expression matrix as well as gene/cell annotation data. Please see the [Scanpy manual](https://scanpy.readthedocs.io/en/stable/usage-principles.html) for more details. In contrast, Monet represents expression data using `ExpMatrix` objects, which only contain the expression matrix (including the gene and cell names). The `ExpMatrix` class is a simple wrapper (subclass) of the pandas `DataFrame`, and can be used in identical fashion. Rows of the data frame correspond to genes, and columns correspond to cells."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up notebook"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# change notebook width and font\n",
"from IPython.core.display import HTML, display\n",
"display(HTML(\"\"\"\"\"\"))\n",
"\n",
"from monet import util\n",
"_LOGGER = util.configure_logger()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import data from Scanpy by converting `AnnData` objects to `ExpMatrix` objects\n",
"\n",
"Here, we use the `ExpMatrix.from_anndata()` function to convert an `AnnData` object from Scanpy into an `ExpMatrix` object from Monet."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2020-06-22 11:01:16] (numexpr.utils) INFO: Note: NumExpr detected 12 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
"[2020-06-22 11:01:16] (numexpr.utils) INFO: NumExpr defaulting to 8 threads.\n",
"[2020-06-22 11:01:16] (get_version) INFO: dirname: Trying to get version of get_version from dirname /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
"[2020-06-22 11:01:16] (get_version) INFO: dirname: Failed; Does not match re.compile('get[_-]version-([\\\\d.]+?)(?:\\\\.dev(\\\\d+))?(?:[_+-]([0-9a-zA-Z.]+))?$')\n",
"[2020-06-22 11:01:16] (get_version) INFO: git: Trying to get version from git in directory /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
"[2020-06-22 11:01:16] (get_version) INFO: git: Failed; directory is not managed by git\n",
"[2020-06-22 11:01:16] (get_version) INFO: metadata: Trying to get version for get_version in dir /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
"[2020-06-22 11:01:16] (get_version) INFO: metadata: Succeeded\n",
"[2020-06-22 11:01:16] (get_version) INFO: dirname: Trying to get version of legacy_api_wrap from dirname /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
"[2020-06-22 11:01:16] (get_version) INFO: dirname: Failed; Does not match re.compile('legacy[_-]api[_-]wrap-([\\\\d.]+?)(?:\\\\.dev(\\\\d+))?(?:[_+-]([0-9a-zA-Z.]+))?$')\n",
"[2020-06-22 11:01:16] (get_version) INFO: git: Trying to get version from git in directory /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
"[2020-06-22 11:01:16] (get_version) INFO: git: Failed; directory is not managed by git\n",
"[2020-06-22 11:01:16] (get_version) INFO: metadata: Trying to get version for legacy_api_wrap in dir /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
"[2020-06-22 11:01:16] (get_version) INFO: metadata: Succeeded\n",
"AnnData object with n_obs × n_vars = 2700 × 32738\n",
" var: 'gene_ids'\n"
]
}
],
"source": [
"# first, we load a dataset with Scanpy\n",
"from scanpy import datasets\n",
"\n",
"adata = datasets.pbmc3k()\n",
"print(adata)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/plain": [
"66"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import gc\n",
"from monet import ExpMatrix\n",
"\n",
"matrix = ExpMatrix.from_anndata(adata)\n",
"print(matrix)\n",
"\n",
"# free up memory\n",
"del adata; gc.collect()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Export data to Scanpy by converting `ExpMatrix` objects to `AnnData` objects\n",
"\n",
"Here, we use the `ExpMatrix.to_anndata()` function to convert an `ExpMatrix` object from Monet into an `AnnData` object from Scanpy. We're also showing that the exporting/importing cycle accurately preserves the expression data, by comparing the `hash` value of the resulting `ExpMatrix` object to the original `ExpMatrix` object."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"AnnData object with n_obs × n_vars = 2700 × 32738\n"
]
}
],
"source": [
"# export data to AnnData object\n",
"adata = matrix.to_anndata()\n",
"print(adata)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original hash: dc9636573cc717aa76f07b07c936457d\n",
"New hash: dc9636573cc717aa76f07b07c936457d\n",
"Identical? True\n"
]
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# now check accuracy\n",
"original_hash = matrix.hash\n",
"del matrix; gc.collect()\n",
"\n",
"matrix = ExpMatrix.from_anndata(adata)\n",
"new_hash = matrix.hash\n",
"\n",
"print('Original hash:', original_hash)\n",
"print('New hash: ', new_hash)\n",
"print('Identical?', original_hash == new_hash)\n",
"\n",
"# free up memory\n",
"del matrix; gc.collect()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}