Python/Miscellaneous Snippets and Tips

From PHASTA Wiki
Revision as of 08:27, 10 August 2021 by Jrwrigh (talk | contribs) (Add Matlab migration section)
Jump to: navigation, search

Here are miscellaneous tips and tricks when working with Python files.

Migrating from MATLAB

Numpy (the defacto numerical array library in Python) has a handy guide for migrating from MATLAB to Numpy.

Here are some tips for general Python:

Indexing

Python indices start with 0 instead of 1

This has several cascading effects in the language, such as:

>>> test = list(range(3))
>>> print(test) # Doesn't include 3, but creates 3 integers
[0, 1, 2]
>>> test[0:2] # Excludes the last index, so only 2 integers are returned
[0, 1]

[...] vs (...)

Getting a value from a Python object is done using [...]. (...) is used almost exclusively for declaring arguments to a function (like range(3)).

The primary exception to this are when declaring tuples, which are simply immutable (not changeable) list objects in Python:

>>> test = (0, 1)
>>> test[0]
0
>>> test[0] = 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

Write Data to Text File (ie. CSV)

Given data in some Python object (most likely a numpy-derived array, but possibly just a normal Python list), how do you write it out to a file? Use numpy.savetxt (or more likely np.savetxt).

Example: Given a array, A, of shape [n, m], simply use

np.savetxt('path/file.dat', A)

which creates a file with n rows and m columns.

Numpy's documentation has information on other useful arguments to change numerical formats, separators, and adding headers to the file.

Write multiple 1D arrays as columns

To do this, use numpy.column_stack to create an array with the columns "stacked" together.

Example: Given two 1D arrays, a and b, of the same size, use:

np.savetxt('path/file.dat', np.column_stack((a,b)) )

Two things to note here:

  1. np.column_stack takes a list or tuple as an argument, hence the two sets of ((...)).
  2. np.column_stack creates an entirely new array and copies the given data into it. As such, it will double the total amount of memory used; once for the original 1D arrays, and again for the brand new array storing a copy of the original data.
    • If data format is flexible, consider writing in rows instead of columns as it is much faster (~20%, no time spent copying data) and uses less memory

Write multiple 1D arrays as rows

np.savetxt will also take 2D-like array input. This means you can pass a list/tuple of arrays and it will process each array as a row.

Example: Given two 1D arrays, a and b, of the same size, use:

np.savetxt('path/file.dat', (a,b) )

Note we do not need to invoke np.column_stack, and thus we don't spend time copying data or take up memory with redundant data.

Running Python Files in Terminal

There are a few ways to run Python files (and Python code more generally): via a Unix shebang, python, inside a ipython instance, or through an IDE like spyder. We'll go over how to run a script, script.py, in the terminal:

~$ cat script.py
print('Hello World')

Note: I'll be assuming usage of Python 3.X. If using Python 2.7, use python instead of python3

Unix Shebang

Using a Unix shebang, you can execute the script directly in a Linux terminal (if it has the appropriate execute permissions). This is done by adding a #!/usr/bin/env python3 to the top of the file. This line is not read by Python (# is the comment character), but instead by the (Linux) kernel which uses the executable specified in the line to execute the file. Then you can execute the file directly.

~$ cat script.py
#!/usr/bin/env python3
print('Hello World')

~$ ./script.py
Hello World

Adding a shebang does not exclude you from running the script in any of the other methods listed below; Python will simply ignore the line since it is a commented line.

Default Python Interpreter

You can execute script.py by calling the script as an argument to the python interpreter executable, much like you can do with bash, zsh, or perl:

~$ python3 script.py
Hello World

IPython Console

The IPython Console is a very powerful, interactive python shell (ie. console or terminal) that is built into many other applications, such as Spyder and JupyterLab. It offers a host of useful features, tab completion, like system shell commands (cd, ls, etc), debugging shell, and also has special "magic" commands. One of those magic commands is %run.

%run allows you to run a script and then interact with the variables that are created in the script, similarly to Matlab's console. This is especially useful when debugging a script; if your script outputs an error and stop, you can inspect the variable states right when the error occurred.

Note: IPython is not installed in default python installations. It is included in Anaconda installations and can also be installed quite easily via pip (pip install ipython) or conda (conda install ipython).

~$ cat script.py
#!/usr/bin/env python3
print('Hello World')
test = 1

~$ ipython
Python 3.9.6 (default, Jun 30 2021, 10:22:16) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.23.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: %run script.py
Hello World

In [2]: test
Out[2]: 1

Note this can also be done with Python's default console, but it's a bit more clunky:

~$ python 
Python 3.9.6 (default, Jun 30 2021, 10:22:16) 
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exec(open('script.py').read())
Hello World
>>> test
1

However Python's built-in console does not have the nice features of IPython, so it is generally preferred.