{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Importing/exporting data from/to Scanpy\n",
    "\n",
    "## Overview\n",
    "\n",
    "This notebook demonstrates how to use Monet to import/export data from/to [Scanpy](https://scanpy.readthedocs.io/en/stable/).\n",
    "\n",
    "*Note: This functionality requires Monet >= 0.2.2, please run `pip install 'monet>=0.2.2'` to upgrade if necessary.*\n",
    "\n",
    "*Note: This assumes that you have [scanpy installed](https://scanpy.readthedocs.io/en/stable/installation.html) (it's not automatically installed with Monet).*\n",
    "\n",
    "Scanpy represents expression data using `AnnData` objects, which can hold the expression matrix as well as gene/cell annotation data. Please see the [Scanpy manual](https://scanpy.readthedocs.io/en/stable/usage-principles.html) for more details. In contrast, Monet represents expression data using `ExpMatrix` objects, which only contain the expression matrix (including the gene and cell names). The `ExpMatrix` class is a simple wrapper (subclass) of the pandas `DataFrame`, and can be used in identical fashion. Rows of the data frame correspond to genes, and columns correspond to cells."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       "    /* source: http://stackoverflow.com/a/24207353 */\n",
       "    /* .container { width:95% !important; }    */\n",
       "    div.prompt, div.CodeMirror, div.output_area { font-family:'Hack', monospace; font-size: 10.5pt; }\n",
       "    </style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# change notebook width and font\n",
    "from IPython.core.display import HTML, display\n",
    "display(HTML(\"\"\"<style>\n",
    "    /* source: http://stackoverflow.com/a/24207353 */\n",
    "    /* .container { width:95% !important; }    */\n",
    "    div.prompt, div.CodeMirror, div.output_area { font-family:'Hack', monospace; font-size: 10.5pt; }\n",
    "    </style>\"\"\"))\n",
    "\n",
    "from monet import util\n",
    "_LOGGER = util.configure_logger()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import data from Scanpy by converting `AnnData` objects to `ExpMatrix` objects\n",
    "\n",
    "Here, we use the `ExpMatrix.from_anndata()` function to convert an `AnnData` object from Scanpy into an `ExpMatrix` object from Monet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2020-06-22 11:01:16] (numexpr.utils) INFO: Note: NumExpr detected 12 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
      "[2020-06-22 11:01:16] (numexpr.utils) INFO: NumExpr defaulting to 8 threads.\n",
      "[2020-06-22 11:01:16] (get_version) INFO: dirname: Trying to get version of get_version from dirname /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
      "[2020-06-22 11:01:16] (get_version) INFO: dirname: Failed; Does not match re.compile('get[_-]version-([\\\\d.]+?)(?:\\\\.dev(\\\\d+))?(?:[_+-]([0-9a-zA-Z.]+))?$')\n",
      "[2020-06-22 11:01:16] (get_version) INFO: git: Trying to get version from git in directory /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
      "[2020-06-22 11:01:16] (get_version) INFO: git: Failed; directory is not managed by git\n",
      "[2020-06-22 11:01:16] (get_version) INFO: metadata: Trying to get version for get_version in dir /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
      "[2020-06-22 11:01:16] (get_version) INFO: metadata: Succeeded\n",
      "[2020-06-22 11:01:16] (get_version) INFO: dirname: Trying to get version of legacy_api_wrap from dirname /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
      "[2020-06-22 11:01:16] (get_version) INFO: dirname: Failed; Does not match re.compile('legacy[_-]api[_-]wrap-([\\\\d.]+?)(?:\\\\.dev(\\\\d+))?(?:[_+-]([0-9a-zA-Z.]+))?$')\n",
      "[2020-06-22 11:01:16] (get_version) INFO: git: Trying to get version from git in directory /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
      "[2020-06-22 11:01:16] (get_version) INFO: git: Failed; directory is not managed by git\n",
      "[2020-06-22 11:01:16] (get_version) INFO: metadata: Trying to get version for legacy_api_wrap in dir /home/flo/miniconda3/envs/scanpy/lib/python3.8/site-packages\n",
      "[2020-06-22 11:01:16] (get_version) INFO: metadata: Succeeded\n",
      "AnnData object with n_obs × n_vars = 2700 × 32738\n",
      "    var: 'gene_ids'\n"
     ]
    }
   ],
   "source": [
    "# first, we load a dataset with Scanpy\n",
    "from scanpy import datasets\n",
    "\n",
    "adata = datasets.pbmc3k()\n",
    "print(adata)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<ExpMatrix instance with 2700 cells and 32738 genes>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "66"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import gc\n",
    "from monet import ExpMatrix\n",
    "\n",
    "matrix = ExpMatrix.from_anndata(adata)\n",
    "print(matrix)\n",
    "\n",
    "# free up memory\n",
    "del adata; gc.collect()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export data to Scanpy by converting `ExpMatrix` objects to `AnnData` objects\n",
    "\n",
    "Here, we use the `ExpMatrix.to_anndata()` function to convert an `ExpMatrix` object from Monet into an `AnnData` object from Scanpy. We're also showing that the exporting/importing cycle accurately preserves the expression data, by comparing the `hash` value of the resulting `ExpMatrix` object to the original `ExpMatrix` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AnnData object with n_obs × n_vars = 2700 × 32738\n"
     ]
    }
   ],
   "source": [
    "# export data to AnnData object\n",
    "adata = matrix.to_anndata()\n",
    "print(adata)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Original hash: dc9636573cc717aa76f07b07c936457d\n",
      "New hash:      dc9636573cc717aa76f07b07c936457d\n",
      "Identical? True\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# now check accuracy\n",
    "original_hash = matrix.hash\n",
    "del matrix; gc.collect()\n",
    "\n",
    "matrix = ExpMatrix.from_anndata(adata)\n",
    "new_hash = matrix.hash\n",
    "\n",
    "print('Original hash:', original_hash)\n",
    "print('New hash:     ', new_hash)\n",
    "print('Identical?', original_hash == new_hash)\n",
    "\n",
    "# free up memory\n",
    "del matrix; gc.collect()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}