{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Reading and Writing Data Files with Python" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "In order plot or fit data with Python, you have to get the data into the program. If a program makes calculations using data, it can be useful to write the results to a file." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## 1. Reading Data Files" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "In Python, it is often useful for data to be in arrays. Data can be entered directly into the programs using the **`array`** function from the numpy library. For instance, the following lines assign arrays of numbers to `x`, `y`, and `yerr`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "from numpy import *\n", "\n", "x = array([0.0, 2.0, 4.0, 6.0, 8.0]) \n", "y = array([1.1, 1.9, 3.2, 4.0, 5.9]) \n", "yerr = array([0.1, 0.2, 0.1, 0.3, 0.3])" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "However, this is not a good way to handle large data sets. It is better to store the data in a separate file and have the program read the data file. You could use a text editor (*Idle* works well or you can create and edit a file in *CoCalc*) to enter the data above in the form shown below. The values of `x`, `y`, and `yerr` (the uncertainty in `y`) for a single data point are entered on the same line separated by spaces or tabs.\n", "
    \n", "0.0 1.1 0.1
    \n", "2.0 1.9 0.2
    \n", "4.0 3.2 0.1
    \n", "6.0 4.0 0.3
    \n", "8.0 5.9 0.3\n", "
\n", "\n", "Suppose that the file is saved as plain text and given the name “`input.dat`”. The **`loadtxt`** function\n", "from the numpy library can be used to read data from the text file. The following example shows how to read the data into an array called `DataIn`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 1.1 0.1]\n", " [ 2. 1.9 0.2]\n", " [ 4. 3.2 0.1]\n", " [ 6. 4. 0.3]\n", " [ 8. 5.9 0.3]]\n" ] } ], "source": [ "from numpy import *\n", "DataIn = loadtxt('input.dat')\n", "print(DataIn)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Notice that `DataIn` is a single 2-dimensional array, rather than three 1-dimensional arrays. \n", "\n", "If you add a line that starts with a number sign (`#`) to the data file, it will be ignored as a comment when the file is read. (Blank lines are also ignored.) It is a good idea to put explanatory comments at the\n", "beginning of data files because you will quickly forget what the numbers mean. Giving\n", "files descriptive names and keeping good notes about them are also helpful." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "# This line is a comment, even in a data file" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "In most cases (plotting, for example), each variable should be in a separate 1-dimensional\n", "array. Setting the **`unpack`** argument to **`True`** and providing a variable for each column\n", "accomplishes this." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 2. 4. 6. 8.]\n", "[ 1.1 1.9 3.2 4. 5.9]\n", "[ 0.1 0.2 0.1 0.3 0.3]\n" ] } ], "source": [ "from numpy import *\n", "x, y, yerr = loadtxt('input.dat', unpack=True)\n", "print(x)\n", "print(y)\n", "print(yerr)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "If you want to read in only some columns, you can use the **`usecols`** argument to specify\n", "which ones. Indices in Python start from zero, not one. The line below will read only the\n", "first and second columns of data, so only two variable names are provided. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "x, y = loadtxt('input.dat', unpack=True, usecols=[0,1])" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Sometimes you will get a file with data separated by commas, instead of spaces. For example, suppose that the file \"`input2.dat`\" contains the following time and voltage data from a pressure sensor. \n", "
    \n", "0.0, 1.1
    \n", "2.0, 1.9
    \n", "4.0, 4.2
    \n", "6.0, 4.0
    \n", "8.0, 5.9\n", "
\n", "\n", "The **`delimiter`** argument can be used to make the **`loadtxt`** function recognize commas as the separators. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 2. 4. 6. 8.]\n", "[ 1.1 1.9 3.2 4. 5.9]\n" ] } ], "source": [ "t, v = loadtxt('input2.dat', delimiter=',', unpack=True)\n", "print(t)\n", "print(v)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## 2. Writing Data Files" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "The **`savetxt`** function from the numpy library can be used to write data to a text file.\n", "Suppose that you’ve read two columns of data into the arrays `t` for time and `v` for the\n", "voltage from a pressure sensor. Also, suppose that the manual for the sensor gives the\n", "following equation to find the pressure in atmospheres from the voltage reading." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "p = 0.15 + v/10.0" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Recall that this single Python command will calculate an array `p` with the same length as\n", "the array `v`. Once you’ve calculated the pressures, you might want to write the times and\n", "pressures to a text file for later use. The following command will write `t` and `p` to the file\n", "“`output.dat`”. The file will be saved in the same directory as the program. **If you give\n", "the name of an existing file, it will be overwritten so be careful!** " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "savetxt('output.dat', (t,p))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Unfortunately, each of the arrays will appear in a different row, which is inconvenient for large data sets. The **`column_stack`** function can be used to put each array written into a different\n", "column. The arguments should be a list of arrays (the inner pair of brackets make it a list)\n", "in the order that you want them to appear. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "savetxt('output.dat', column_stack((t,p)) )" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "The default is to write the data out separated by spaces, but you can use the optional **`delimiter`** argument to specify something else. For example, the following writes comma separated data." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "savetxt('output.dat', column_stack((t,p)), delimiter=',')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "By default, the numbers will be written in scientific notation. The **`fmt`** argument can be\n", "used to specify the formatting. If one format is supplied, it will be used for all of the\n", "numbers. The form of the formatting string is “`%(width).(precision)(specifier)`”, where `width` specifies the maximum number of digits, `precision` specifies the number of digits after the decimal point, and the possibilities for `specifier` are shown below. For integer formatting, the precision argument is ignored if you give it. For scientific notation and floating point formatting, the width argument is optional.\n", "\n", "|Specifier|Meaning|Example Format|Output for -34.5678|\n", "|-|-|-|-|\n", "|i|signed integer|%5i|-34|\n", "|e|scientific notation|%5.4e|−3.4568e+001|\n", "|f|floating point|%5.2f|−34.57|\n", "\n", "A format can also be provided for each column (two in this case) as follows." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "savetxt('output.dat', column_stack((t,p)), fmt=('%i3', '%4.3f'))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "It is a good idea to add comments at the top of data files that you create to remind you\n", "of what they contain. The optional **`header`** argument, which allows you put comments at the top of the text file. The **`comment`** argument allows you to pick what proceeds the header text. If you want the string to be considered a comment \n", "when it is read by the loadtxt function, it should start with a number sign (`#`). An example is shown below." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "savetxt('output.dat', column_stack((t,p)), comments='# ', header='t (s) p (Pa)')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "If you want a mulitple-line header, you can include “`\\n`” to force a newline." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "savetxt('output.dat', column_stack((t,p)), comments='# ', header='First line\\nSecond line')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "**Remember to be very careful about overwriting existing files with the `savetxt` function!**" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Additional Documentation" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "More information is available at
\n", "http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
\n", "http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html
\n", "http://docs.scipy.org/doc/numpy/reference/generated/numpy.column_stack.html" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2 (SageMath)", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.14" } }, "nbformat": 4, "nbformat_minor": 0 }