1   Pleasure of Python Programming

author:Thava Alagu thavamuni@gmail.com
Version:0.8
Date:2016 October 07
status:Draft

1.1   Prologue

This is a gentle book of introducing Python to any one who is familiar with any other programming language such as C, Java or PHP. You can use it to learn yourself or to teach others.

Over 20 years of my experience in developing software applications on different platforms and languages, I have found Python to be the most productive language, offering least resistance from the concept to the implementation.

This book is not about data structures and not meant to be API reference manual. The goal of this book is to provide sound foundation for understanding the fundamentals, just by following simple discussions, notes and exercises. You would mostly learn the advanced topics while you start working on solving actual problems.

The book was mainly developed from my notes that I noted down for myself. When I had started learning Python, I had extensive experience in C/C++, Java and PHP in that order. I have developed some admin tools in Perl to automate few things within corporate intranet, but I found myself frequently looking at syntax rules, API reference, and my own code before writing another little program in Perl. I have poor memory, and it didn’t really stick to me.

When I first learned that Python uses indentation for blocks instead of explicit token like curly brackets, I was taken aback. I also (mistakenly) assumed that I won’t be able to use my favorite editor (that is vim for me) to quickly jump between begining and end of the block. Jumping is important to me since I spend more time jumping than standing. :-) Then much later, I found vim has all sorts of plugins readily available to customize however you want to jump– I didn’t have to write one on my own just for this purpose. After this discovery, the reluctance was gone, I started reading about Python, which unveiled the cosmic path to other unexplored galaxies for me.

Most people either love Python or hate it. No in-betweens. If you can resist the temptation to prematurely walk away from it, there are things waiting to be discovered that may permanently change your way of approach in problem solving.

Hope you enjoy reading this. Programming in Python is a pleasure. But there are rules for the game. When you learn the rules, You are in for the game!

What is Python suitable for ? It is a strange beast. It is good for quick scripting as well as large applications ! To summarize ...

Python is ...

  • A general purpose programming language.
  • Suitable for quick prototyping and scripting.
  • Suitable for building complex software system of many modules.
  • It supports both Object Oriented and Functional programming. (More about this later).

1.2   History

Python was developed by Guido van Rossum in the early 1990’s. He is the primary author and continues to play lead role (See BDFL) for future direction.

It’s module system was inspired by Modula-3 language and overall influenced by ABC Programming language among others.

This is a timeline of selected early and modern programming languages. Many languages were left out in the interest of brevity and to mainly establish better understanding of how Python fits into the history.

1.2.1   Programming Languages Timeline

Year Language Predecessors Author/Comments
1950’s Fortran, Lisp, COBOL    
1960’s ALGOL,Simula, BASIC    
1970 Pascal ALGOL 60, ALGOL W Nikklaus Wirth, Jensen
1972 Prolog   Alain Colmerauer
1972 SQL ALPHA, Quel(Ingres) IBM
1972 C B, BCPL, ALGOL 68 Dennis Ritchie
1972 Smalltalk Simula 67 Xerox PARC
1975 Scheme Lisp Sussman, Steele
1975 ABC SETL CWI
1979 Modula-2 Modula, Mesa By Niklaus Wirth
1980 Ada Green OO, Concurrent
1983 C++ C, Simula Stroustrup
1984 Common Lisp Lisp Lisp Dialect & Std
1986 Objective-C SmallTalk, C  
1987 Perl C,sed,sh,awk Larry Wall
1987 Erlang Prolog By Ericsson. Concurrent.
1989 Modula-3 Modula-2 At DEC.
1990 Haskell Miranda Open. Standardized.
1991 Visual Basic QuickBASIC Alan Cooper, sold to Microsoft
1991 Python ABC, ALGOL 68, Icon, Modula-3 Van Rossum
1995 Java C, Simula 67, C++, Smalltalk, Ada 83, Objective-C, Mesa James Gosling, Sun
1996 JavaScript Self, C, Scheme Brendan Eich at Netscape
1995 PHP Perl Rasmus Lerdorf
1995 Ruby Smalltalk, Perl Yukihiro Matsumoto
2000 C# C, C++, Java, Delphi, Modula-2 Microsoft
2003 Scala Smalltalk,Java,Haskell Standard ML, OCaml Martin Odersky
2009 Go C, Oberon, Limbo Google, Concurrent.
Source:Wikipedia

1.3   Installation

The first step is to get the python bits and install on your computer. You can download python from http://www.python.org/getit/ Most Linux platforms come with some version of Python pre-installed.

As of Jan 2013, the current production versions are 2.7.3 and 3.3.0. We will use version 2.7.3 since it is the most widely used version.

Usually the installation is simple – it just involves running the package installer (Windows) or locating the relevant package for your OS distribution and install it. If you have any difficulties on installing, See http://docs.python.org/2/using/index.html

If you are going to install multiple versions of Python on the same machine, it is recommended (not required) you install the following :

  • virtualenv: Create Python virtual environment.
  • pip: Python install package tool.

In ubuntu, the packages are available in standard repositories as python-virtualenv and python-pip. If you are just learning python, you need not use virtualenv and pip.

If you are installing an additional 3rd party python module in pypi (Python Package Index), the command to use is pip. For example, to install blist packge, you would simply run:

pip install blist

The pip installer replaces legacy easy_install command. The easy_install command has many limitations, for example, it does not support ‘uninstall’ command.

The virtualenv lets you create independant sandbox directories based on python2 or python 3 versions and work inside them. You can install as many different 3rd party modules you like and throw them away latter, without cluttering the global installation. When you are serious about real development, using virtualenv is essential.

Python installation and management has a long history of messy dependencies between internal projects distutils, setuptools, distribute, distutils2 etc which is being sorted out for major cleanup. We won’t go into all those details here, but as long as you stick to using virtualenv and pip, it is more likely that you would be a happy camper.

1.3.1   Choosing IDE

Python provides interactive shell command python and also provides a graphical IDE called IDLE that ships with python on all platforms. That is good enough to start learning without having to use any heavy weight IDE.

There are many options available for using more powerful IDE. Here is a good summary at Python IDE wiki.

Since there are too many options, I will shortlist few good editors. These are opinionated choices, but rather good place to start.

IDLE:Default Python IDE with integrated debugger. Cross-platform. Free.
pyscripter:Free, Windows Only. Arguably the best IDE on Windows.
Eclipse:With PyDev plugin supports integrated editing and debugging. Heavy weight. Free.
vim:Powerful general purpose editor with configurable Python support. Use python-mode plugin for vim. See Python Vim Configuration wiki.
Emacs:Powerful general purpose editor with configurable Python support. See Python Emacs wiki.
spyder:Cross platform, light weight with integrated debugger.
PyCharm:Powerful IDE. Good code completion and support for popular frameworks like django. Also has integrated support for vim key mappings. Not Free.

My personal preference is vim editor with python-mode plugin.

Here is another interesting collection of information about Python IDEs from this stackoverflow question.

To follow the examples described in this book, we won’t assume any specific IDE. We will just use command line python command and ipython shell.

ipython provides interactive shell similar to python shell with more powerful extended features. Go to IPython site to download it.

1.4   Tutorial

1.4.1   Hello World

Start your python interactive shell:

$ python

Python 2.7.3 (default, Aug  1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> print 'Hello World!!!'
Hello World!!!
>>>

If you use ipython instead of python, it looks like below:

$ipython

Python 2.7.3 (default, Aug  1 2012, 05:14:39)
Type "copyright", "credits" or "license" for more information.

IPython 0.12.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: print 'Hello World!!!'
Hello World!!!

Congratulations! That was really simple.

As you can see, the print is a keyword for print statement. So, it was not necessary to type print('Hello World'). However, from Python 3.0+ print has been changed to function, not a keyword. It is better to start using print() since it works on both versions.

For reasons for the change See PEP-3105 (Python Enhancement Proposal)

All changes to Python follow PEP process similar to JSR process for Java community.

1.4.2   Supported Programming Styles

In any (procedural) language, you can usually expect very basic support for:

  • Expressions involving operators
  • Conditional Statement (if, else, elif)
  • Looping (while, for)
  • Built-in Types (bool, int, float, string, etc)
  • Function definitions for reusable code

Python, ofcourse supports all these, and also supports both Functional and Object Oriented style of programming. We will look at each of these aspects later– one after another.

Initial examples here focus on the simple and common procedural style programs so that you can readily map equivalent features in other languages which you may be already familiar with.

After the introduction of functional style programming, you will be encouraged to write programs in functional style which is more ‘pythonic’.

1.4.3   Indentation

First thing you should be aware is that indentation matters in Python. In languages like C and Java, you have braces {} to define the blocks, but in Python you use the indentation. Enforcing indentation makes the program more readable in general. For some people who are already used to C-Style of braces, this can be very frustrating at first. But it is more likely that, if you accept this, you will find it is just natural after sometime. I am one example, who complained, whined and grudgingly started with Python and I can vouch that it no longer bothers me – in fact, using indentation for blocks seems more natural to me now.

1.4.4   Example: Read and Print

Let us look at a simple exercise to do the following:

Write a program which prompts for your name and year of birth and prints your age. Do this in a loop until user inputs quit for the name. Type the following in a text editor and save it in name.py:

while True:
    name = raw_input('Enter Your Name ==> ')
    if name == 'quit':
        break
    year = raw_input('Enter your Year of Birth ==> ')
    age = 2013 - int(year)
    print('Hello, {}! You are {} years old!\n'.format(name, age))

Now run this program from shell:

$ python name.py

Enter Your Name ==> Johnny
Enter your Year of Birth ==> 1900
Hello, Johnny! You are 113 years old!

Enter Your Name ==> quit

There are several things to notice from this simple program:

  • There is no braces {} for block. The while block was defined by the indentation. By convention, we use 4 spaces for single indentation. However technically any number of spaces will do.

  • You don’t declare variables. They spring into existence when you assign some value to them. Such languages are called dynamic languages. It provides convenience and generic programming capabilities (polymorphism) at the cost of some lack of compile time type checking safety. Python’s approach is arguably superior.

  • raw_input(str) is a built-in function, which prints and prompts for user input and reads the line from std input returning the result.

  • True is a boolean of type bool. String comparisons like name == ‘quit’ evaluates to boolean result. Unlike C, you don’t need special strcmp() function to compare strings.

  • You convert string to integer by using int(year) function. Explicit type conversion is required. This is different behaviour from loosely typed languages like PHP. In PHP, a variable can act as either string or integer depending on context. Python is said to be strictly typed. This helps preventing silent errors arising out of ambiguous nature of loose typing.

  • The break statement terminates the while loop when the condition is satisfied.

  • The output format specifiers like %s (for string) and %d (for integer) works just like in C language.

  • The ‘%’ operator is used to format using the input

  • The raw_input() function is a compact way of doing the following:

    import sys
    sys.stdout.write(“Enter your name :”)
    name = sys.stdin.readline()
    

1.4.5   Output Format

Suppose you want to format the output to align at specific column. Then you can use C-like format specifier as below:

>>> print( 'Name:  %15s; Age: %5d' % ('Johnny', 93) )
    Name:           Johnny; Age:    93

Note the % operator used as string format operator for the string. The print() function is unaware of the % operator. The % operator is also used as modulo operator:

>>> 9 % 2
    1

>>> 9 % 3
    0

>>> 'Number %d' % 5
    'Number 5'

>>> 'Numbers %d and %d' % (10, 20)
    'Numbers 10 and 20'

Note that the % operator takes tuple as second operand when input involves multiple components like (10, 20) in the above example. The tuple is a built-in type in Python which is immutable (means you can’t change it once you create it).

When output does not involve format specifier, more readable version would be:

>>> 'Number {} and {}'.format(10, 20)
    'Number 10 and 20'

With format(), you can reuse the same argument:

>>> 'Number {0:2d} (in decimal), {0:x}, {0:#x} (in hex) and {1}'.format(10, 20)
    'Number 10 (in decimal), a, 0xa (in hex) and 20'

Even better, you can use named parameters as below:

>>> 'Number {first} and {second}'.format(first=10, second=20)
    'Number 10 and 20'

Tip

For string output formatting you can use either of:
  • mystring.format(input1, input2)
  • mystring % (input1, input2).

1.4.6   Input Format

Suppose you want to read the name and age from single line user input. In C language, you would write:

printf("Enter Name and Age: ");
scanf("%s %d", name, &age);

In Python, you can do:

(name, age) = raw_input('Enter Name and Age: ').split()
age = int(age)

The raw_input() function returns the input line after stripping the newline at the end. The split() splits the line into a list of words separated by whitespace. You can assign the list to a tuple as above. At the last, we convert the type of age from string to integer.

Suppose if you want to read the hexa-decimal input with “%x” (and such) ? For such more complicated input processing, there is no direct equivalent of C’s scanf() in Python. You will have to use regular expression.

See also:

Note

  • In Python 3, the raw_input() function has been renamed to input().
  • The Python 2 version of input() can be simulated as eval(input()) in Python 3.0.

1.4.7   Example: Store List of Values

Let us say, you need to store all the (name, age) pairs in the first example in memory for later processing. The input may have any number of pairs.

How do you do that in Python? The answer is: we use – list built-in datastructure. The list can dynamically grow in size as more elements are added. This is a powerful concept.

Just to compare this with other languages– In C, you would write your implementation for linked list; In C++ you would probably use vector template; In Java, you would probably use java.util.Vector. However, you would notice that Python’s style is much less verbose and easier to use:

nalist = []                          # Initialize empty list
while True:
    name = raw_input('Enter Your Name ==> ')
    if name == 'quit':
        break
    year = raw_input('Enter your Year of Birth ==> ')
    age = 2013 - int(year)
    print('Hello, {}! You are {} years old!\n'.format(name, age))
    nalist.append((name,age))       # Append (name, age) tuple.

print 'Input List is: ', nalist

We just added 3 lines to our first example: first at the beginning and the last 2 lines at the end!

Suppose you enter 3 pairs of names, an example output is:

Input List is:  [('john', 20), ('jack', 40), ('jill', 30)]

To see complete documentation about list just use pydoc command:

$ pydoc list

1.4.8   Example: Simple Function

Let us look at a simple function which returns a string to indicate if a given number is negative, zero or positive:

def numdesc(no):
    """Given number, return 'negative', 'zero' or 'positive'"""
    if no < 0:
        return 'negative'
    elif no == 0:
        return 'zero'
    else:
        return 'positive'

>>> numdesc(100)
    'positive'
>>> numdesc(0)
    'zero'
>>> numdesc(-20)
    'negative'

>>> numdesc.__doc__
    "Given number, return 'negative', 'zero' or 'positive'"

The program is self explanatory. The triple quote string right next to the definition serves as the documentation for the function. Actually triple quote string used here is a general string expression whose value is thrown away since it is not assigned to anything. It is the convention to use them for documentation. The documentation generation tool such as pydoc also recognizes such conventions.

Every block is prceded by ‘:’ in the previous line and indented at additional level.

1.4.9   Example: Using File and Process

The following example illustrates how to write into file and how to invoke external command from the script:

import subprocess

def   writefile():
    """Example to illustrate file and process invoke operations"""

    outfp = open('/tmp/hello.txt', 'wb')
    outfp.write('Hello World!')
    outfp.close()
    subprocess.call(['/bin/cat', '/tmp/hello.txt'])

writefile()

The first import subprocess statement makes all the functions exported by the subprocess module available here. Then you open the file for writing (‘wb’ for write in binary mode). The ‘wb’ mode is same as ‘w’ in Unix, but different in windows. (due to how newline characters are translated before writing). It is recommended to use ‘wb’ in the interest of portability.

The subprocess.call() accepts the list of command line strings. If you prefer to pass single space separated command line, you can use either of the following:

* subprocess.call('/bin/cat  /tmp/hello.txt', shell=True)

* import os
  os.system('/bin/cat  /tmp/hello.txt')
  # Note: Use of os.system() is deprecated since new subprocess module
  # is more general and flexible.

See pydoc subprocess for more information about subprocess module.

1.4.10   Example: Your environment

Consider the following example which prints your os PATH directories and also python module search directories:

from pprint import pprint                                   # (1)
import os
import sys

def  ospath():
    """Print the list of directories as indicated by os PATH environment"""
    print('Your OS PATH directories :')
    pathlist = os.getenv('PATH').split(os.pathsep)          # (2)
    pprint(pathlist)

def  pythonpath():                                          # (3)
    """
    Print python module search path directories.
    Note that $PYTHONPATH directories are included in this list.
    """
    print('Your Python PATH module search directories :')
    pprint(sys.path)

if __name__ == '__main__':                                  # (4)
    ospath()
    pythonpath()

(#1) The pprint module provides pprint() function to pretty print list output.

(#2) os.getenv(‘PATH’) returns system path string. It is split into a list using os dependant path separator character os.pathsep. Note that in Unix, it is ‘:’ and in windows it is ‘;’

(#3) There is installation platform specific list of python module search path. User can add additional directories by setting PYTHONPATH environment variable. The sys.path reflects the final value of the module search path.

(#4) This program itself can be imported as module in another program. In such cases, it is useful to conditionally hide certain logic if it is not invoked as the main program.

1.4.11   Basic Introspection

The most effective way of introspecting any object is to use dir() command at the interactive shell. Continuing from the earlier example:

>>> dir(nalist)
....   # All the supported function and property names of the nalist object
       # are displayed.

>>> type(nalist)
    <type 'list'>

>>> help(list)
       # Displays the python doc for list.

Tip

The dir() command is the most useful and frequently used command for basic object introspection.

Now that you have got little taste of the language, we will move forward with more fundamentals before looking at more complex examples.

1.4.12   Comments

Comments are used in following forms:

# Lines starting with hash are comments.
i = 10      # You can use partial line comments like this too.
"This is dummy string expression, but may serve like a comment"

def f():
   """The triple quoted strings is the standard for documenting
   functions and modules. It is a multi-line string expression,
   and used for documenting the above function by convention.
   The Python documentation tools look for this convention to
   auto-generate the documentation"""
   ....

1.4.13   Zen of Python

Try this at your prompt:

>>> import this

1.5   Builtin Functions and Datatypes

1.5.1   Built-in functions

Note that built-in types (such as int, float, etc) and built-in functions (such as len(), isinstance()) are not keywords. This is different from many other languages, such as C where built-in types are keywords.

Python’s built-in functions include:

dir(),     id(),     len(),    isinstance(), issubclass(),  open(),
range(),   map(),    reduce(), filter(),     apply(),       locals(),
globals(), eval() ,  zip(),    enumerate(),  raw_input(),   and more ...

The built-in types are also functions since they are used as constructors:

int(), long(), float(), list(), tuple(), dict(), type(), and more ...

Since these are not keywords, you will be able to rebind them to different things:

>>> len = 10    # This essentially rebinds built-in function name 'len' to 10.
                # Not a good idea. Should avoid rebinding built-in function names.

You can pass built-in types as any other objects as function paramters which is very powerful concept useful for generic programming:

>>> def convert(val, sometype):    # This defines a simple function f
      return sometype(val)         # which just converts value to given type

>>> print(convert(10.5, int))      # Isn't that interesting ?
    10

1.5.2   Built-in DataTypes

Python defines a core set of built-in types designed for ease-of-use. The following list includes most of them. A few of them have been leftout intentionally since they are either rarely used or reserved for later advanced level discussions. Here is a summary:

Numeric Types

Numeric Type Comments
int

The integer type– It uses 32 bit or 64 bit depending on the platform:

>>> sys.float_info
>>> sys.maxsize
    9223372036854775807      # which is (2**63 - 1)
float

Uses double precision:

>>> sys.float_info
    sys.float_info(max=1.79e+308, min=2.2e-308, ... )
long Unlimited precision. Note: In Python3, int acts like long. And long is gone.
complex

Complex number:

>>> complex(1.2, 3.4)
    (1.2+3.4j)
bool Boolean Type. True or False. Can not be subclassed.

Sequence Types

Python has a set of built-in sequence Types such as str, unicode, list, tuple, etc. A sequence type is any type which implements strictly ordered collection of elements. In general this means that the type should be indexable– i.e. obj[i] gets the element at position i in the sequene. The type should implement __getitem__() method for this to work.

Since every sequence type is also essentially a container, it should also support iterator protocol. i.e. __iter__() should be defined and must return an iterator.

The builtin sequence types are summarized below. Note that some types have been changed/renamed in Python 3.2 in order to cleanup legacy behaviour.

Sequence Type Comments
str

Immutable string type in Python2.7. In Python 3, str behaves like how unicode does in Python 2.7:

>>> s = 'Some ASCII string'
unicode

Immutable sequence of unicode characters. (Python 2.7 only):

>>> s = u'San José'
bytes

New type in Python 3, for Immutable sequence of bytes. This behaves like str in Python 2.7:

>>> s = b'Some ASCII string'
list

Mutable sequence of any objects (possibly hetrogeneous):

>>> mylist = [1, 2, (3, 4), [5,6]]
tuple

Immutable sequence of any objects (possibly hetrogeneous):

>>> mytuple = (1, 2, (3, 4), [5,6])
bytearray Mutable sequence of bytes. See also: bytes vs bytearray
buffer Provides interface to internal data without copy. (legacy) Use new memoryview instead of buffer.
memoryview

Interface to internal data without copy:

>>> buf = bytearray(1024*1024)
>>> view = memoryview(buf)
>>> view = view[10:]
>>> view[:6] = bytes(b'Hello!')
>>> print(bytes(view[:6]))
    b'Hello!'
>>> print(bytes(buf[:16]))
    b'Hello!'
xrange Immutable sequence generated from specific range. Python 2.7 Unlike list, it consumes same memory irrespective of size of the memory.
range Immutable sequence generated from specific range. The range in Python 3 behaves like xrange in Python 2.7

Other Built-in Types

There are various other built-in types available which are briefly summarized below:

Type Comments
dict Dictionary type is very powerful and core datastructure. It is a mutable map of keys with associated values.
class Useful to define new types possibly extending other types. Supports object oriented programming style.
function A function is an object of type function A function can be builtin function (eg. len()) or user defined one.
method Methods are functions that are called using the attribute notation. This includes built-in methods (eg. mylist.append) or class instance method. (eg. myobj.my_method). The type keeps track of the associated instance.
Generator It is similar to function but it does yield values one after another– unlike functions which return values. It acts like a sequence generator. It can be used like any other iterable object. Very useful for efficient iterations and concurrent programming.
module A module supplies an implementation unit which can be imported into other modules for use. It exports the symbol table of the component objects through it’s attributes.
None Special type which represents a state of ‘nothing’. This is a global singleton object and can’t be extended.
type Everything is regarded as an object. Any object is of some type. The built-in types are the instances of type type.
code Code objects represents compiled python code such as function body. A function is associated with code object and context (locals, globals). However code object can be used to execute code on dynamic scope.
set, frozenset

There are 2 set types supported:

  • set – mutable set container
  • frozenset – immutable set container.

Set is different from list– it does not allow duplicates, and supports following operations:

x | y                  Set union
x & y                  Set intersection
x – y                  Set difference
x ^ y                  Symmetric difference
len(x)                 Number of elements in the set
max(x)                 Maximum value
min(x)                 Minimum value

The elements should be hashable object. A set can contain a frozenset. But a frozenset can not contain a set. Why? Because set is mutable hence non-hashable.

1.6   Strings

Following discussion is applicable to Python version 2.7 since there are changes in string specifications between Python 2.7 and 3.

1.6.1   String Literals

Strings are immutable. Adjacent string literals are auto concatenated:

greet = 'Hello '   "World!"
greet = 'Hello ' + "World!"   # Result is same as above.

# Parentheses used to allow continuation:
greet = ('Hello ' +
             'This can be long greetings')

# Backslash can be used to continue long string.
greet = 'Hello  .... \
             This is long line ...!'

Best way to use multiline strings is to use triple quotes like below:

"""Triple quotes
   are good for multiline comments.
   because you don't have to escape single quotes inside"""

'''You can also use triple quote using
   single quote character'''

1.6.2   Common String Operations

Some common string operations:

>>> fruits = ['banana', 'apple', 'orange', 'tomato']

>>> ' '.join(fruits)
    'banana apple orange tomato'

>>> ' '.join(fruits).split(' ')
    ['banana', 'apple', 'orange', 'tomato']

>>> ',\n'.join(fruits)
    'banana,\napple,\norange,\ntomato'

>>> print ',\n'.join(fruits)
    banana,
    apple,
    orange,
    tomato

>>> s = 'orange'

>>> s[1:]
    range            # Array slice from 1st index

>>> s[0:3]
    ora

>>> s[1:3]    # Prints  string[start_index:end_index)
    'ra'      # Note: s[3] excluded.

>>> s[-1]     # Last character
    'e'

>>> s[2:-1]
    'ang'

>>> type(s)
    <type 'str'>

>>> type('e')            # Unlike C, single char type is also string
    <type 'str'>

>>> 'r'   in 'orange'    # Use  'in' Operator for substring check.
    True
>>> 'ang' in 'orange'
    True
>>> 'k'   in 'orange'
    False

1.6.3   bytes vs bytearray

In Python 3.0, the default string type str is unicode capable. For storing raw bytes, there is immutable string type called bytes type. There is bytearray type which is (kind of) list of mutable bytes. It is not exactly a mutable list of raw bytes, just similar but still different. The bytearray type has some unique characteristics:

>>> s = bytearray(b'The King !')
>>> s[4:9] = b'Queen'
>>> s
    bytearray(b'The Queen!')

>>> s[0]
    84

>>> type(s[0])
    builtins.int

>>> s[0] = b't'
    TypeError: an integer is required

>>> s[0] = ord(b't')
>>> s
    bytearray(b'the Queen!')

Hence it is more like a list of small-integers (with each element in range 0-255).

1.7   Tuples

The tuple is one of the sequence types in Python. It is similar to lists, but immutable. You can not reassign individual elements of the tuple. A tuple may contain a mutable data (such as list) as one of it’s elements.

The constructor for tuples is comma, not parentheses:

>>> 1,2
    (1,2)

Representing tuple of 1 element poses problem which involves some special syntax to resolve

>>> (1)
    1                 # This is not a tuple !!!

>>> 1,                # Looks strange, but solves the problem on hand!
   (1,)

Empty tuple is what you may expect it to be:

>>> ()                # This is empty tuple!
    ()

It is always recommended to use parentheses for tuples:

>>> (1,2)             # though use of parentheses is optional.
    (1,2)

Swapping Values:

b, a = a, b

Tuple packing and unpacking:

point = 1, 2, 3          # This is tuple packing!
x, y, z = point          # This is tuple unpacking!

Proper unpacking happens even for nested tuples:

(a, b, (c, d)) = (1, 2, (3, (4, 5))) # a=1; b=2; c=3; d=(4,5)

However unpacking works fine for any sequence in RHS (of matching length):

x, y, z = 'abc'          # x = 'a' ; y = 'b' ; z = 'c'

A function may return a tuple to return multiple values:

x, y, z = get_location()

1.8   Lists

We used list data structure already in our Second Example. Let us take a closer look at lists. To recap, list is an ordered collection of values and the size can dynamically grow as we add more elements.

Lists are mutable, ie modifiable and can contain any type of elements including list:

>>> mylist = [ [1,2], (3, 4), {'one':1, 'two':2} ]

The above list contains another list, tuple and a dictionary which we will cover shortly.

The list supports following operations:

a.append   a.extend   a.insert   a.remove   a.sort
a.count    a.index    a.pop      a.reverse

Inserting and removing from arbitrary positions is supported:

>>> a = [5, 10, 40, 20, 30]
>>> a.remove(40)            # Remove by value
>>> a
    [5, 10, 20, 30]
>>> a.insert(2, 40)         # insert at any position
>>> a
    [5, 10, 40, 20, 30]
>>> a.pop()                 # pop from last
    30
>>> a
    [5, 10, 40, 20]
>>> a.pop(1)                # pop from any position
    10
>>> a
    [5, 40, 20]
>>> a.extend([45, 35])      # merge lists
>>> a
    [5, 40, 20, 45, 35]

>>> b = a                   # a, b points to same object
>>> c = a[:]                # Makes a copy of a using slice
>>> a.sort()
>>> a
    [5, 20, 35, 40, 45]
>>> b
    [5, 20, 35, 40, 45]
>>> a is b                  # a, b are same
    True
>>> c                       # c retains old copy
    [5, 40, 20, 45, 35]

# Note: id(var) prints the object identifier. (similar to C pointer)
>>> print('id(a) = %s' % id(a) ) ;  print('id(b) = %s' % id(b) )
    id(a) = 45332168
    id(b) = 45332168

>>> print('id(c) = %s' % id(c) )
    id(c) = 41829512

>>> sorted(c)             # This built-in function leaves c unchanged.
    [5, 20, 35, 40, 45]
>>> c
    [5, 40, 20, 45, 35]

1.9   Dictionaries

Dictionary is a built-in datatype in Python which is basically a mapping table used to map a set of keys to set of values. It is similar to PHP’s associative array.

Dictionary is an unordered collection of (key, value) pairs.

The keys in dictionary must be hashable and immutable values. For example, integers and strings are OK for keys, but lists are not. Tuples are allowed as keys as long as they contain only strings, numbers or tuples containing immutable elements.

Some examples:

>>> capitals = { 'Spain'      : 'Madrid',
                 'Norway'     : 'Oslo',
                 'Latvia'     : 'Riga',
                 'Costa Riga' : 'San Jose'
                }

>>> capitals.keys()                    # Note: Order not guaranteed.
    ['Costa Riga', 'Latvia', 'Norway', 'Spain']

>>> capitals.values()
    ['San Jose', 'Riga', 'Oslo', 'Madrid']

>>> capitals.items()
    [('Costa Riga', 'San Jose'),
     ('Latvia', 'Riga'),
     ('Norway', 'Oslo'),
     ('Spain', 'Madrid')]

For iterating over keys, values or items use iterkeys(), itervalues() and iteritems() for better efficiency and avoiding huge copies in Python 2. In Python 3, by default, the keys(), values() and items() return iterators instead of list. Hence these iter functions are not available in Python 3.

When the keys are strings, using keyword arguments looks better:

>>> capitals = dict(Peru='Lima', Ukraine='Kiev')
>>> capitals
    {'Peru': 'Lima', 'Ukraine': 'Kiev'}

Another way to construct dictionary is to use a list of (key, value) pairs:

>>> entries = capitals.items()
>>> entries.append(('Portugal', 'Lisbon'))
>>> entries
    [('Ukraine', 'Kiev'), ('Peru', 'Lima'), ('Portugal', 'Lisbon')]

>>> capitals = dict(entries)               # Constructor accepts list of pairs
>>> capitals
    {'Peru': 'Lima', 'Portugal': 'Lisbon', 'Ukraine': 'Kiev'}

To add a single key value pair of (‘India’, ‘Delhi’), you can do any one of the following:

>>> capitals['India'] = 'Delhi'                # 1
>>> capitals.update({ 'India': 'Delhi' })      # 2

You can construct another copy of a dictionary given one:

>>> caps2 = dict(capitals)
>>> id(caps2)
    45447840
>>> id(capitals)
    44940160

You can merge 2 dictionaries:

>>> capitals.update( { 'Canada' : 'Ottawa'} )

You can get the value by index or using get() method:

>>> capitals['Canada']
    'Ottawa'

>>> capitals.get('Canada')
    'Ottawa'

You can remove the element using pop() method:

>>> del capitals['Canada']   # Or capitals.pop('Canada') is also same.
    'Ottawa'

>>> Capitals.get('Canada',  'Missing Information')   # Get with default.
    'Missing Information'             # Prints default when entry missing.

You can iterate on the keys:

>>> for country in capitals:     # This is same as: for country in capitals.keys():
        print  country

     Costa Riga
     Latvia
     Norway
     Spain

If you want to iterate through values, use the capitals.values() instead in above loop.

You can list each element pair:

>>> for country in capitals:
        print  country, capitals[country]

Following looks better than the above but does the same thing:

>>> for country, capital in capitals.items():
        print  country, capital

1.10   Statements Overview

Following table summarizes the different statement types available:

Statement Example Description
Expression f(a+b)+g(c+d)
All functions return None by default
May raise TypeError (e.g. 5+”hello”)
Assignment
  • a = b
  • a = b = c
  • (a, (b,c)) = (1, (2,3))
  • [(a, b),c)] = ((1, 2),3)
  • inst.x = inst.x + 1
  • a[5:7] = (1, 2)
  • a += 1
  • Assignments binds names to values.
  • Complex scope resolution rules apply which is explained later.
  • inst.x = inst.x + 1 : This assignment may read class attribute x but always assigns to instance attribute x.
  • a += 1 need not be same as a = a+1. May read global and create local variable due to scoping rules.
assert
  • assert price > 0
  • assert price > 0, ‘Invalid price input’
 
pass
  • class SomeClass: pass
  • def f(n): pass
Dummy statement used when a statement is required by syntax rules.
del
  • del a
  • del a, b[4], c[8:10]
  • del global_var, local_var
  • This deletes the name binding from global/local name space as applicable.
  • Recursively deletes the object.
  • You can’t delete variable when nested block has reference to it.
print
  • print ‘hello’
  • print(‘preferred form’)
  • print(‘ this ‘ + ‘ that ‘)
  • print ‘some ‘, ‘ more ‘
  • print >> f, ‘do not use’
  • print has become function in Python 3.
  • print >> f form prints to file like object.
  • Do not use legacy form print statements, use function form as it is supported in both versions.
return
  • return
  • return 10
  • Default value of return is None
  • When returning from try clause, finally clause executed before returning from function.
yield
  • yield a+b
  • yield statement anywhere in function makes it a generator function. See Generators section.
raise
  • raise
  • raise instance
  • raise cls, tupleobj, tb
  • Bare raise statement re-raises the exception.
  • raise instance raises the specified exception.
  • raise cls, tupleobj form uses the tuple args to instantiate the class for exception.
  • Third arg, if specified must be traceback obj
  • See Exceptions section.
break
  • break
  • Breaks out of the for/while loop.
  • If loop has else clause, it breaks out of that too. (yes, loop has else clause in python! Explained Later)
  • If break from try clause, then execute finally before leaving.
continue
  • continue
  • continue next cycle of nearest enclosing loop.
  • Can’t be called from inside finally clause.
  • When called from try clause, execute finally clause before leaving.
import
  • import os
  • import re, process
  • from pprint import pprint
  • from encodings as enc
  • from wsgiref import simple_server as srv
  • You can import a module, or a specific function or component from a module.
  • You can alias them as you import for convenience.
  • See Import section for more details.
global
  • global a, b
  • when declared inside function, makes the specified global variables available. Scoping rules explained in detail later.
exec
  • exec ‘print(sum(range(3)))’

  • exec ‘a=b;c=a+b’ in

    globals(), locals()

  • exec code_obj

  • exec file_obj

  • Executes specified statements in given context of globals and locals.
  • Can execute compiled code obj directly
  • True dynamic code execution
  • Explained later.
if
if a>b :
   print 'a is greater'
elif a == b:
   print 'same'
else:
   print 'b is greater'
if x < y < z: print('yes')
  • Conditional statement.
  • Note the elif keyword used to mean else if
  • All compound statements headers end with colon (:).
while
while not finished:
   do_something()
   if problem:
       break
   some_more()
else:
   final_steps()
  • Note the break statement would break out of else too.
  • The else: clause of while is used to execute steps when while loop terminates normally (without break).
  • continue inside while loop will skip remaining block and go to next iteration.
for
for i in range(100):
   do_some_thing()
   if error_occured():
      break
   do_some_more()
else:
   finishing_touches()
  • Note the break statement will break out of else clause as well.
  • The else: clause is executed after normal termination of for loop without break being called.
  • The continue statement inside for loop skips the remaining block and goes to next iteration.
try
try:
  something()
except Exception, err:
  process_it(err)
finally:
  do_cleanup()
  • See Exceptions section for more details.
with
with open("f.txt") as f:
    lines = f.readlines()
    process_it(lines)
  • Can be used with any class which implements context manager interface (__enter__() and __exit__())
  • Does auto cleanup of object
Function definition
def f(x):
  return 2*x

g = lambda x: 2*x
  • Functions are first class objects, can be passed as argument to other functions.
  • Anonymous functions of simple form supported as lambda function.
  • See Functions section for more details.
Class definition
class Myclass(object):
   def f(n):
       return n*n
  • Classes which derive from object are newstyle classes.
  • New style classes are default in Python 3.
  • Explained in detail later.

1.11   Control Flow

1.11.1   if-elif-else Statement

The if-elif-else statement has following structure:

if a>b :
   print 'a is greater'
elif a == b:
   print 'same'
else:
   print 'b is greater'

if x < y < z:
   print(' y is between x and z')
if x != y and y != z and x != z :
   print('x, y, z are all different')

Note that you can not use else if – it must be elif. The conditional expression of the forms such as x < y < z is allowed. Note that the logical and operator is and; logical or operator is or.

1.11.2   The for Loop

The for loop has following structure:

for i in range(100):
   if (some_condition)
      continue              # Skip this iteration, go to next one.
   do_some_thing()
   if error_occured():
      break                 # Terminate loop, break the else too.
   do_some_more()
else:
   finishing_touches()

The i in range(100) is a common style for looping. The other common patterns are:

for i, v in enumerate(my_list):
   print(i, v)                    # If my_list == [10, 20, 30], it prints:
                                  #  (0, 10) (1, 20) (2, 30)

You iterate through all elements in dictionary as follows:

for key, val in locals():
   print(' %10s = %r ' % (key, val))  # Prints all variables in locals() dictionary.

The continue construct used to skip the current block and go to next iteration just similar to C language continue construct.

The break is used to break out of the loop just like C. It also breaks out of the else construct of the for loop.

The else block is executed if the for loop terminates normally without break. It can be thought of us the ‘success path after the loop’. The break could be thought of as error path to break out of the whole statement which includes else part of the statement as well.

1.11.3   The while Loop

The while loop has following structure:

while not finished:
   do_something()
   if (some_condition)
      continue               # Skip current iteration, go to next.
   if problem:
       break                 # Break the while loop and else clause.
   some_more()
   if all_done:
      finished = True        # Let the loop terminate gracefully.
else:
   after_success_steps()     # Execute this if ``break`` was not called.

The continue and break statements behave just like same named C constructs. The else clause is unique to Python. For the purpose of understanding this clause– imagine while construct acts like if for the final iteration:

if  (cond):             while (cond):
   ...                     ...
else:                   else:          # when cond is False ...
   ...                     ...

1.12   Functions

1.12.1   Simple function

A simple global level functions can be declared like below:

def  addxy(x, y):
   return x+y

def  mulxy(x, y):
   return x*y

Since functions are first class objects, they can be passed as argument:

def opxy(f, x, y):
    return op(x, y)

>>> opxy(addxy, 10, 5)
    15
>>> opxy(mulxy, 10, 5)
    50

1.12.2   Named arguments

Python allows you to pass parameters as named arguments when you call functions. This provides clarity when there are many parameters:

def subxy(x, y):
    return x - y

>>> subxy(y=10, x=20)        #  <===  Ok to change order of parameters.
10
>>> subxy(x=100)             #  <===  Not OK: Error: y is missing
...
TypeError: subxy() takes exactly 2 arguments (1 given)

We call x, y as positional arguments– in the absence of parameter name with the call first parameter is x, second parameter is y. However these positional parameters becomes ‘named arguments’ when they are called in that style as specified above.

1.12.3   Optional arguments

Optional arguments allows to skip passing some arguments and assume default values for missing parameters:

def subxy(x=0, y=0):
   return x-y

>>> subxy()
0
>>> subxy(10)
10
>>> subxy(10, 5)
5

Even optional arguments can be called with ‘named parameters’ style:

>>> subxy(y=10)        # x is assumed to be 0
-10

Note

Do not confuse Named Parameter and Optional Arguments. The y=0 in the above function definition specifies optional argument. The y=0 in the function call specifies named parameter.

1.12.4   Variable number of arguments

It is possible to support variable number of arguments.:

def p(*args):
     print(args)    # <== args is a tuple of all var args!

>>> p(10, 20, 30)
(10, 20, 30)

This is useful when you want to be flexible in accepting any number of arguments. Example:

# Following call may push any number of specified values into your stack.
>>> push_into_my_stack(10, 20, 30, 40)

1.12.5   Keyword Arguments

You can have keyword arguments. This allows passing arbitrary number of key, value pairs in named-parameter style:

def p(**kwargs):
     print(kwargs)        # <== kwargs is a dictionary!

>>> p(x=10, y=20, z=30)
{'y': 20, 'x': 10, 'z': 30}

Another alternative way of achieving the above would be:

def p(options={})
   print(options)

>>> p({'x':10, 'y':20, 'z':30})
{'y': 20, 'x': 10, 'z': 30}

However, as you can see, calling the function with dictionary argument is clumsy and not as elegant as the named parameters.

1.12.6   Combining Argument Styles

You can combine all the styles mentioned earlier like below. Some basic rules apply (as you may expect) to avoid ambiguity during function call:

  • All optional parameters should appear after mandatory parameters
  • Variable number of arguments *args should appear after any optional args.
  • The keyword args **kwargs should appear at the last if present.
def p(msg, kind='INFO', *args, **kwargs):
    print(msg)
    print(kind)
    print(args)
    print(kwargs)

>>> p('Job Incomplete.', 'WARNING', 'Job100', 'Job200', batch=1, dest='remote')
Job Incomplete.
WARNING
('Job100', 'Job200')
{'dest': 'remote', 'batch': 1}

1.13   Scoping and Typing

1.13.1   Static Scoping

Python mostly makes use of static scoping – meaning the variables referenced from a function are bound to the module in which the function is defined– not bound to the caller’s module context. This enables robust modular design.

1.13.2   Dynamic and Strict Typing

Python is dynamically and strictly typed. Dynamic because you can just create variable by assigning something to it. And reassign to different variable of different type anytime

age = 90                 # No need to predeclare the type.
age = 'it is 90 years'   # Reassigned to another object of another type!

It is strictly typed, because every object behaves strictly according to associated type and you can not refer to an undefined variable:

#!python
age = 50
age = '40'         # Reassignment of different type OK.
age =  age - 10    # Error! Type mismatch: Expected numeric, found string

This is unlike C. In C, variables are statically and strictly typed. In PHP, variables are dynamically and loosely typed:

#!php
$age = 50
$age = '40';
$age = $age - 10;     // Perfectly fine. '40' becomes int 40.
echo $age;
// Prints 30

Also note that, Python variables can’t be referenced before assignment– You will get NameError. In PHP, such an action will generate warning but returns usually a default value of empty string. In that sense, Python is ‘more strictly’ typed.

Language Typing Static/Dynamic Typing Strict/Loose Comments
C/C++/Java Static Strict  
Python Dynamic Strict  
PHP/Perl Dynamic Loose  

1.13.3   Accessing Global vs Local Variables

The module level variables are referred as global variables. Function definition introduces a new level of scoping. A class declaration introduces a new level of scoping. A namespace in Python is currently implemented using dictionaries, but that is an implementation detail.

The available namespaces can be summarised as below:

Name Space Comments
Innermost The current scope in the innermost function. print(locals()) will print the local names. This is searched first for read/write.
Enclosing Functions If the current scope is inner function, then the enclosing function has a separate namespace. This is searched second if applicable. All such enclosing function scopes are searched in that order. The variables in this scope are called non-local. You can read non-local variables, but can not re-assign them to new values in Python 2.x – however if it is mutable, you can modify it using the same reference object. You can write/reassign nonlocal objects in Python 3 using new nonlocal declaration of the variable.
current module All module level names are searched next. These variables are referred as globals.
builtin names All built-in names lives under __builtin__ module, which are searched last.

Python rules for accessing global variables from inside function is not-obvious – and often results in confusion for python beginners:

gvar = 10

# If you are only reading global variable, no need to declare them as global
def my_func():
   print('gvar is %d ' % gvar)       #  <====   Prints: gvar is 10     OK!

However, the moment you assign some value to a variable anywhere in the function, that variable becomes a local variable unless you declare it otherwise:

gvar = 10

def my_func():
   print('gvar is %d ' % gvar)
   gvar = 20

>>> my_func()
    UnboundLocalError: local variable 'gvar' referenced before assignment

To fix this error, declare gvar as global:

gvar = 10

def my_func():
   global gvar
   print('gvar is %d ' % gvar)
   gvar = 20

People usually expect the global variables to be always available inside function or always not-available inside function unless explicitly declared otherwise. ‘Global variable is available for read, but not for write’ rule is not obvious and can be confusing to Python beginners. However, once you understand this rule, there is no more confusions.

1.14   Modules And Packages

Python program is organized into a collection of modules. Each module is nothing but a python program file with .py suffix (or .pyc or .so shared object file). In addition, a directory may be used to maintain a collection of modules together as a package. The import statement is used to import a module before accessing the variables available in that module.

1.14.1   Various import forms

The import statement follows any one of this format:

import                 mod1              # Lets you access mod1.f()
import                 mod1 as m         # Lets you access m.f() instead of mod1.f()
from mod1      import  my_var            # Lets you access my_var directly.
from mod1      import  v1, v2            # Lets you access v1, v2 directly.
from mod1      import  my_var as v       # Lets you access my_var which is aliased to v.
from mod1      import  *                 # Imports all variables in mod1. Not recommended.

import                 pkg1.mod1         # Lets you access pkg1.mod1.f()
import                 pkg1.pkg2.mod1    # Lets you access pkg1.pkg2.mod1.f()
from pkg1.mod1 import  my_var            # You can access my_var directly.

If you use the statement of the form import X.Y, then X must be a package and Y may be a subpackage or a module– However, note that X can not be a module and Y can not be just object. The use of dots is reserved for packages only:

import  mod1.my_var          # <==== This is not allowed!

1.14.2   Controlling Module Search Path

The sys module contains the path object which specifies the list of module search directories. To examine it’s value, we must import the sys module first:

>> import sys
>> from pprint import pprint
>> pprint(sys.path)

 ['',
  '/home/user/myenv/bin',
  '/home/user/myenv/local/lib/python2.7/site-packages/distribute-0.6.24-py2.7.egg',
  '/home/user/myenv/local/lib/python2.7/site-packages/pip-1.1-py2.7.egg',
  '/home/user/myenv/lib/python2.7',
  '/home/user/myenv/lib/python2.7/plat-linux2',
  '/home/user/myenv/lib/python2.7/lib-tk',
  '/home/user/myenv/lib/python2.7/lib-old',
  '/home/user/myenv/lib/python2.7/lib-dynload',
  '/usr/lib/python2.7',
  '/usr/lib/python2.7/plat-linux2',
  '/usr/lib/python2.7/lib-tk',
  '/home/user/myenv/local/lib/python2.7/site-packages',
  '/home/user/myenv/local/lib/python2.7/site-packages/IPython/extensions']

The module search path (i.e. sys.path) includes certain platform dependent default directories. In addition, you can control the search path by one of the following methods:

  • Set PYTHONPATH environment variable to include additional directories. They will be prefixed into your module search path.

  • The program can dynamically modify the module search path. It can do the following first:

    >>> import sys
    >>> sys.path.insert(0, '/path/to/my/module')
    
  • If you are running virtualenv, then you can choose to place your module in the local site-packages directory without affecting other installations.

1.14.3   The package directory layout

The packages are nothing but directories with a file named __init__.py. It may initialize the package or may just be empty. They are organized like this:

pkg1                             # Top level package
    __init__.py                  # Initializes the top level package
    pkg2                         # pkg2 is under pkg1.
       __init__.py               # Initializes pkg2.
       mod1.py                   # This is module-1 in package-2.
       mod2.py                   # This is module-2 in package-2.

1.14.4   Restricting exported variables

After importing module, you can access all global variables in that module. However, there are couple of mechanisms you can use to restrict the variables being exported:

  • All variables starting with underscore (_) are excluded from the export list.

  • A module can define a global variable __all__ like below to restrict only the specified variables to be exported. All other global variables will not be exported:

    __all__ = ['mysym1', 'mysym2', 'mysym3' ]
    

Similarly, when you import a package, all global variables in __init__.py file or the restricted symbols defined by __all__ are available for access.

See http://docs.python.org/ref/import.html for more information.

You can also import from future! i.e. It is helpful for gradual migration to start using the new features that will be introduced in near future in a later release.:

>>> from __future__ import print_function

Above statement enables the use of print() function in Python 2.7 instead of print statement. In Python 3, print() is available only as function. A future statement is special in the sense it is recognized and treated at compile time.

1.14.5   Relative Vs absolute imports

In Python2.7, the imports are by default relative. i.e. If a module imports another module, it would search in the same package first before searching other sys.path directories. This has the danger of hiding system modules in case some one creates a local module with same name as one of system modules.

X.py:

import  Y      # Looks into current pkg first in Python2.7, not in Python3

In Python3, all imports are absolute by default. So, module X in some package want to import module Y which is in the same package, then following will work:

from . import Y   # Looks into current pkg first in both Python2 and Python 3
Note: The empty string (‘’) in sys.path stands for current directory (same as ‘.’)
which by default, included in interactive shell.

1.14.6   Circular Imports

If module X uses functions defined in module Y and vice versa, then following case of mutual import is possible:

# mod_x.py
import mod_y    # From module X

# mod_y.py
import mod_x    # From module Y

In general, it is good to avoid designing modules which depend on each other – it is often a sign of poor design. However, there are cases where it is perfectly alright to do so. Python does necessary checks to prevent any infinite recursion as a result of mutual imports – either direct or indirect.

If two modules access each other’s objects at the global level initialization, try delaying the import as late as possible.

For example, let us consider the following files:

# mod_x.py

def f_x():
    mod_y.some_func()

def g_x(): pass

import mod_y                  # import of mod_y is delayed as much as possible.
mod_y.some_func()

The mod_y.py file is given below:

# mod_y.py

def some_func(): pass

import mod_x                  # import of mod_x is delayed as much as possible.
mod_x.g_x()

The delaying of import as mentioned above just works fine.

1.14.7   Importable, executable module

You can make a module importable and at the same time executable by including some code conditionally like below:

if __name__ == '__main__':
  import sys
  exit(main(sys.argv))    # or run some unit tests.

1.15   Additional Datatypes

Apart from built-in types, there are many additional datatypes defined by the Python standard library. We will take a closer look at them below.

1.15.1   Collections Container Types

Type Comments
namedtuple

Good alternative to structure type in C:

>>> Point = collections.namedtuple('Point', 'x y')
>>> p = Point(x=10, y=20)
    Point(x=10, y=20)
>>> p.y
    20
deque It is double ended queue, pronounced as deck. It is like built-in list type, but optimized for insert/delete operations at both front and back. Insertion at front of built-in list has very poor performance compared to deque.
Counter

A Counter is a dict subclass for counting hashable objects:

>>> fruits = ['apple','apple','orange','grapes','grapes','apple']
>>> c = Counter(fruits)
>>> c
    Counter({'apple': 3, 'grapes': 2, 'orange': 1})
OrderedDict It is like dict, but the entries are ordered. You can sort them using key or sort function.
defaultdict

It is subclass of builtin dict with one difference: If you access a missing value in dictionary, it does not raise KeyError, but provides a ‘default’ value which you can choose:

>>> grades = [('Ram', 'B'), ('John', 'A'), ('Jack', 'B')]
>>> d = defaultdict(list)           # Default is empty list.
>>> for name, rating in grades:
...    d[rating].append(name)
>>> d.items()
    [('A', ['John']), ('B', ['Ram', 'Jack'])]
collections ABC

Defines various collections abstract base classes including Container, Iterable, Iterator, etc. These are useful to check if an instance implements specific interface, for example:

isinstance(myvar, collections.Sequence)

Also useful to define custom collections where you can leverage mixin methods implemented in the ABC classes.

For more information about other datatypes defined by Python Standard Library, See http://docs.python.org/2/library/datatypes.html

1.15.2   Iterator

You can iterate over any container objects such as list, set, etc. A container object may or may not be a sequence. A set is a container but not a sequence.

Iterator is an object which helps with efficient iteration over any container. It is somewhat similar to C language’s pointer pointing to an array element – in the sense that it consumes minimal resources and just remembers the position in the container.

How is this concept helpful ? Once a container follows iterator protocol, it becomes an iterable object. For example:

for e in my_obj:
   print('e is %r' % e)

As long as the my_obj follows iterator protocol, above will work. This essentially means the following:

  • Container must define __iter__() which returns a new instance of iterator.
  • The iterator must define __next__() which returns next value.
  • In addition, the iterator also must define __iter__() which returns itself. This helps the iterator to mimic the container only in the context of iterating.

Note the following methods:

container.__iter__() : Return a new iterator object.

        >>> x = [10, 20]
        >>> t1 = x.__iter__()
        >>> t2 = x.__iter__()      # or: t2 = iter(x)
        >>> t1 is t2               # They must be 2 separate instances.
            False

iterator.__next__() : Should return next value.
                      On reaching end, should raise StopIteration Exception.

        >>> t1.next()
            10
        >>> t1.next()
            20
        >>> t1.next()
            Traceback (most recent call last):
              File "<stdin>", line 1, in <module>
              StopIteration

 iterator.__iter__() : Should return itself.

        >>> t3 = t1.__iter__()
        >>> t3 is t1
            True

So, a container is iterable and can have multiple iterators pointing to it. The idea is that container does not preserve the state of iteration. If you want to iterate, get a new instance of iterator using the container, then do whatever you want with the iterator.

The iterator protocol is cool, because we can use for loop like below irrespective of whether obj is container object or iterator:

for e in obj:
  do_some_thing(e)

It calls __iter__() and __next__() underneath for you and stop the iteration once it gets StopIteration exception. You can easily define your own container classes as long as you follow this protocol.

Note

The itertools module provides powerful functions such as imap, islice, izip, icycle, combinations, permutations, etc which are all generators. This enables efficient traversal of iterables and operations, by processing one at a time.

1.16   Classes

1.16.1   Overview

Python’s classes are somewhat similar to C++ and Modula-3. If you are already familiar with C++, the Python’s data member variables are always public and all methods are virtual.

The method functions must be declared explicitly with the first argument representing the object– the current object is often named as self by convention. This is similar to the this object of C++ or Java:

class Vehicle:
    """Generic Vehicle Class"""
    x = y = 0                        # Class data member variables
    def __init__(self, year):
        self.year = year

    def move(self, x, y):
        self.x = x
        self.y = y
        print('Vehicle moved to (%d, %d)' % (x, y))

To instantiate class instance, we use function notation– there is no need to use new operator used in C++ or Java:

>>> v = Vehicle(2013)
>>> v.move(10,20)
Vehicle moved to (10, 20)

In above example, x, y are class variables which are inherited by the instances. In addition, you can create new instance variables anytime:

v.model = 'Honda'         # Created on assignment
...
del v.model               # The instance variable is undefined now.

Note that v.move(10, 20) is same as Vehicle.move(v, 10, 20).

1.16.2   Method Vs Static Method Vs Class Method

Method

A method combines the function object, class object and class instance.

In the above example, v.move is a bound method

>>> v.move
    <bound method Vehicle.move of <__main__.Vehicle instance at 0x26f1cb0>>

However Vehicle.move is an unbound method– i.e. unbound to any instance.

>>> Vehicle.move
    <unbound method Vehicle.move>

The underlying function object of the method is available for examination:

>>> v.move.__func__
    <function move at 0x2746140>

Static Method

Static method concept is same as in C++/Java. It is independent of the instance. This may be useful to group some functions which are logically related

class Test(object):

  @staticmethod                #  <=== Note the decorator!
  def setup_tests():
      print("Setup done for all tests")

>>> t = Test()
>>> t.setup_tests()
    Setup done for all tests

The staticmethod is a built-in method which is used as function decorator as @staticmethod just preceding the function definition. The effect of the function decorator is same as following:

class Test(object):

  def setup_tests():
      print("Setup done for all tests")
  setup_tests = staticmethod(setup_tests)   # <== Same like using decorator.

Class Method

In addition, there is class method which is not found in C++/Java. The method is concerned with the relevant class information only. A classical use case is to provide alternate constructors:

class Test(object):

  def __init__(self):
      print("Common intialization steps done.")

  @classmethod
  def remote_test(cls):
      c = cls()
      print("Additional initialization for remote tests done.")
      return c

>>> t = Test()
    Common intialization steps done.

>>> t = Test.remote_test()
    Common intialization steps done.
    Additional initialization for remote tests done.

One of the ways a singleton may be implemented is by defining a getinstance class method which returns a single instance.

1.16.3   New-style vs old-style classes

There is a built-in object called object which every top level class must derive from. If the class extends any built-in type (such as list), then it already indirectly extends from the builtin object. The top level classes which do not derive from object are called old style classes.

The new-style classes were introduced in Python 2.2 in order to unify built-in types and classes . In terms of declaration, they look like below:

class NewStyleClass1(object):
    pass

class NewStyleClass2(Othernewstyleclass):
    pass

class OldStyleClass:
    pass

In Python 3, all classes are new-style. i.e. No need to explicitly derive from object– they are all implicitly derived from object.

Only new-style classes can use new features like descriptors.

For more information, See http://www.python.org/doc/newstyle/

1.16.4   Constructors

The __init__()

The __init__() method defined in the class is the constructor. Note that the base class constructor is not automatically called– it should explicitly be called

class  Parent(object):
   def __init__(self):
       print('Parent initialized')

class  Child(Parent):
   def __init__(self):
       print('Child initialized')

c = Child()  # Prints just: Child initialized

To ensure base class constructor is called, do the following:

class Child(Parent):
    def __init__(self):
       Parent.__init__(self)        # <== Works with even oldstyle classes.
       print('Child initialized')

With new style classes, you can use super() function instead of explicitly referring to the base class:

class Child(Parent):
    def __init__(self):
       super(Child, self).__init()  # <== super available only with newstyle classes
       print('Child initialized')

Note the parameters: super(Child, self)– The Child class argument is not redundant one given the self. That is because the self object indeed could have been an instance of a GrandChild class.

Python-3 supports the shorter version of super() without having to explicitly specify the Class and object. The missing parameters are automatically detected using the stackframe and used accordingly:

class Child(Parent):
    def __init__(self):
       super().__init__()          # <== short form super() being used.
       print('Child initialized')

The __new__()

In addition, the __new__() method is called to create a new instance first before the __init__() function initializes the value. User defined __new__() method is rarely used:

class A(object):

    def __new__(cls, x):
       print('A __new__() called')
       return  object.__new__(cls)

    def __init__(self, x):
       self.x = x
       print('A __init__() called')

>>> a = A()
    A __new__() called
    A __init__() called

This is useful only when you want to control new instance creation. For example:

1.16.5   Destructor

You can do the cleanup of the instance from the destructor method __del__():

class Test(object):

def __init__(self, outfilename):
self.fout = open(outfilename, ‘w’)
def __del__(self):
print(‘Deleting Test object ...’) self.fout.close()
>>> t = Test('/tmp/mytest.out')
>>> del t                            # Explicit Delete!
    Deleting Test object ...
>>> t                                # t is undefined after delete!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 't' is not defined
>>> t = Test('/tmp/mytest.out')
>>> t = 10                       # Destructor called by garbage collection!

The chaining of destructors and use of super() method is similar to that of constructors. However, you destruct the child resources first before calling the destructor of parent objects.

1.16.6   Multiple Inheritance

You can extend multiple classes if so desired – i.e. Multiple inheritance is supported:

>>> class  Car(Vehicle, Product):
        no_of_wheels = 4

>>> Car.mro()                  # Displays method-resolution-order
    [__main__.Car, __main__.Vehicle, __main__.Product, builtins.object]

Multiple inheritance should be avoided if possible as it further complicates the use of super(). The super() really invokes the method in the next mro chain. This may lead to surprises. Either super() should be used in all base classes or should not be used at all favoring explicit invocation.

1.16.7   Everything is an Object

In Python, everything is an object. Even the fundamental built-in types are objects including int, float, etc. This is different from Java and C++ where primitive datatypes like int, float are not objects.

An int variable is bound to an integer object of type ‘int’. The most fundamental thing in the type hierarchy is ‘type’. Everything is directly or indirectly an instance of ‘type’. The ‘type’ itself is an object of type ‘type’. The buck stops there:

>>> a=10

>>> type(a)
    int

>>> type(int)
    type

>>> type(type)
    type

>>> type(list)
    type

Any class is also an object of some type – which is a metaclass. The ‘type’ type is a metaclass since an instance of ‘type’ is another type. You can also define your own metaclass which is another topic for discussion.

1.17   Exceptions

Certain operations or built-in functions may generate exceptions. You can generate exceptions or catch specific exceptions for error handling:

>>> 1/0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ZeroDivisionError: integer division or modulo by zero

>>> f = open('/tmp/non-existing-file', 'r')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  IOError: [Errno 2] No such file or directory: '/tmp/non-existing-file'

>>> x

1.17.1   Catching Exceptions

You can use try-except block to catch the exceptions generated in try-block:

try:
  something()
except ExceptionClass as exception_obj:
  process_exception(exception_obj)
  raise  exception_obj              # You can re-raise the exception like this.

The legacy syntax alternative for except is as follows:

except ExceptionClass, exception_obj:      # Works with Python 2, not with 3.
    ...

Since the above syntax is confusing, this has been deprecated in Python 3 in favour of the below form:

except ExceptionClass as exception_obj:    # Works with Python 2.6+ and 3+
                                           # Python 3 allows only this syntax.

You can catch multiple exceptions as given below. The following code works with Python 3:

>>> while True:
      s = input('Enter expr > ')      # Use raw_input() in Python 2 instead.
      try:
          eval(s)
      except  ZeroDivisionError as err:              #    1/0
          print('Got ZeroDivisionError:', err)
      except  ValueError as err:                     #    int('something')
          print('Got ValueError:', err)
      except  NameError  as err:                     #    undefined_var
          print('Got NameError:', err)
      except  Exception as err:                      #   e.g. 'some' + 3 (TypeError)
          print('Got Exception:', err)

Enter expr > x
Got NameError: name 'x' is not defined

Enter expr > 1/0
Got ZeroDivisionError: division by zero

Enter expr > int('some')
Got ValueError: invalid literal for int() with base 10: 'some'

Enter expr > 'some' + 3
Got Exception: Can't convert 'int' object to str implicitly

Enter expr > a++
Got Exception: unexpected EOF while parsing (<string>, line 1)

1.17.2   Exception Hierarchy

To examine the exception hierarchy, use the mro() method:

>>> NameError.mro()
    [builtins.NameError,
     builtins.Exception,
     builtins.BaseException,
     builtins.object]

>>> KeyboardInterrupt.mro()
    [builtins.KeyboardInterrupt,
     builtins.BaseException,
     builtins.object]

The order of except clause is important. The specific exceptions should appear first and more general base exceptions should appear later in that order. If the except Exception as err: had appeared first, it would catch other exceptions such as NameError, ValueError since those exceptions are derived from Exception. It is in general a dangerous practice to catch basic exception classes like Exception unless you really mean to– because this can hide the real problem if you are not re-raising the exception that you really didn’t intend to handle.

1.17.3   User Defined Exception

You can define your own exceptions for finer level of error handling:

>>> class MyAppError(Exception):
        def __init__(self, reason):
            self.reason = reason
...
...
if  value > max_value :
     raise MyAppError('Got a value which is too large.')

1.17.4   The finally clause

The try-finally clause is used to make sure the cleanup action specified in finally block is always executed in any case:

>>> try:
      some_function()
    except MyAppError as err:
      ...
    finally:
      print('closing any open files ...')
      ...

The code in finally block is always executed irrespective of whether there was exception generated or whether it was handled or not. Even if there was a return statement in try block, then also finally clause is executed. If the exception raised in try block was not handled, then it is re-raised later after the finally block is executed.

In versions prior to 2.5, try-except-finally was not supported– however the try..except had to be nested inside try..finally to achieve the same result:

>>> try:
       try:
          some_function()
       except MyAppError as err:
          ...
    finally:
      print('closing any open files ...')
      ...

The finally can also appear in try-except-else-finally form where the else clause is meant to be executed if there were no exceptions generated in the try-block. However it is best to avoid the else clause, since the naming of this clause is often more confusing than helpful.

1.18   Functional Programming

Python supports functional programming style to a large extent. Specifically the functions are first class objects– they can be passed as as argument to other functions and manipulated like any other variable.

However it is worth noting that Python is not a pure functional style language such as Lisp or Haskell. Pure functional languages tends to avoid state and mutable data as much as possible and tries to arrive at the solution by applying a composition of functions on input data as against the imperative style.

1.18.1   filter, map, reduce functions

The built-in functions filter, map, reduce are useful functional programming tools that can be applied on collection types such as list and set. Following example is given for Python 2:

>>> def my_filter(x):
        return (x % 3) != 0     # Use it to filter out 3's multiples.

>>> filter(my_filter,  range(20) )
    [1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

>>> def mul2(x): return 2*x
>>> map(mul2, range(10))                    # Applies func on the sequence
    [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

>>> def sumxy(x, y): return x + y           # Function can take many args.
>>> map(sumxy, range(5), range(100,105))    # Pass as many lists as func args.
    [100, 102, 104, 106, 108]

>>> map(None, range(5), range(100,105))     # None acts like identity function.
    [(0, 100), (1, 101), (2, 102), (3, 103), (4, 104)]

>>> reduce(sumxy, range(5))                 # Does cumulative sum.
    10

1.18.2   List comprehension

This feature is inspired from Haskell language. List comprehension provides easier way to construct lists without having to use map and filter. As from the previous example, to construct a list of integers excluding multiples of 3 in a specific range, we can do the following:

>>> [ i in range(20) if i % 3 != 0 ]
    [1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

# You can also generate a list of pairs like below ...
>>> [ (i, 2*i) for i in range(6) if i % 3 != 0 ]
    [(1, 2), (2, 4), (4, 8), (5, 10)]

1.18.3   Dictionary comprehension

The dictionary comprehension is now available in Python 2.7+ and 3. It is just like list comprehension applied to dictionary. For example, to invert a mapping of a given dictionary, you can do:

>>> reverse_map = { val:key for key, val in my_map.items() }

1.18.4   Anonymous Function Lambda

Lambda is used to create simple anonymous functions:

>>> f = lambda x: 2*x
>>> f(10)
    20
>>> map(f, range(4))
    [0, 2, 4, 6]

# lambda is mainly used for one-off use on-the-fly
>>> map(lambda x: 2*x, range(4))
    [0, 2, 4, 6]

Lambda function does not support multiple statements or explicit return statement– It should just be used to calculate a simple expression. The lambda function also provides Closure support– which essentially means that it remembers the variables of the (static compile-time) enclosing scope which is available for reading inside the function. The closure is a very powerful mechanism which provides ability to generate context specific dynamic functions.

It is less powerful than the more general inner functions, yet it is very useful since often simple-expression-functions are widely used.

1.19   Abstract Base Classes

From Python’s documentation:

Abstract Base Classes (abbreviated ABCs) complement duck-typing by providing a way to define interfaces when other techniques like hasattr() would be clumsy. Python comes with many builtin ABCs for data structures (in the collections module), numbers (in the numbers module), and streams (in the io module). You can create your own ABC with the abc module.

Python does not have interface like in Java. The abstract classes and interfaces are not same things, but there is significant overlap.

Consider:

class Vehicle():
   def __init__(self, model):
       self. model =  model

class Car(Vehicle):
   def __init__(self, model):
       self. model =  model

To define abstract method move(), you can do:

import abc

class Vehicle(object):
    __metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def move(self, distance):
        """Implement your own move method. This is abstract method!"""
        return

>>> v = Vehicle()

TypeError: Can't instantiate abstract class Vehicle with abstract methods move

ABC is available only from version 2.6. There is also an independent optional module http://pypi.python.org/pypi/zope.interface which can be used for your contract programming style implementations.

1.20   Internationalization

First, let us fix some basic terminologies that we will use for discussing internationalization support available in Python.

Term Comments
ASCII This is a basic 7-bits character set based on English alphabets. It defines 128 characters which includes some nonprintable characters too. ASCII code of a character is a numeric value assigned to the character. The implementation of this 7-bit ASCII character set consumes 1 byte per character. (storing the 8th bit to 0) So, if a string contains any character whose byte value is > 127, it is not a valid ASCII string.
ISO-8859-1 This is an 8-bits character set which defines 256 characters and superset of ASCII. The additional 128 characters include some western characters and graphical characters. The ISO-8859-1 encoding means to use the sequence of bytes whose numeric value matches the character code. So, if you have a garbage random sequence of bytes, it will be technically a valid ISO-8859-1 string.
ISO-8859-X Not all western and other languages were represented by ISO-8859-1 (aka Latin-1). A series of character sets ISO-8859-1, ISO-8859-2, etc were defined to include specific language(s). They were all incompatible with one another, because you can’t have everything in 1 byte.
unicode This currently defines more than 100,000 characters and can define more than 1 million characters. The first 256 code points are identical to ISO-8859-1. To store unicode string into a file, we need an encoding mechanism, because ‘single byte is 1 char’ idea no longer works.

A naive way of encoding unicode characters would be to say, ‘I will use 3 bytes per character’: which can essentially represent total 256*256*256 > 16 million different characters. However, this can be a real waste of time and space if most of the frequently used characters can be represented in single or 2 bytes. Hence different encoding mechanisms were invented for better optimization.

The most popular encoding for unicode is UTF-8. If an unicode string contains only ASCII characters, then UTF-8 encoding will return the same string. For other characters, it uses some sort of escape sequence. That is an over simplification, but you get the idea. At the worst case, it may take four UTF-8 bytes to represent single unicode character.

Another good characteristic of UTF-8 is that, you won’t accidentally see an ASCII character in the encoded string which was not originally in the source string. i.e. If an non-ASCII unicode character is translated to 3 bytes of UTF-8, none of those characters will be ASCII (i.e. byte value < 128). This is by design. So, if you are searching for an ASCII word in your file which has stored UTF-8 encoded unicode contents, you won’t get any false positives! This also means, your readline() function which looks for newline character will just work fine with UTF-8 encoded contents.

Following discussion applies to Python version 2.7. In Python 3, there are some changes which we will mention later.

By default, a string object of type str contains a sequence of raw bytes. It is upto you what you want to do with it.

unicode string is an abstract concept. Sequence of bytes is a concrete concept. You never write unicode string– you always write an (eg: utf-8) encoded unicode string. Similarly, you never read unicode string directly– you read bunch of bytes and decode them to create unicode string:

my_str.decode('utf-8')     --> my_unicode
my_unicode.encode('utf-8') --> my_str

digraph g {
    node [width=2, height=1, style=filled, penwidth=3.0, fontname="verdana", fontsize=24.0];
    nodesep = 1; pad = 0.5; rankdir=LR; mindist=1;
    edge [penwidth=3.0, fontsize=24.0, fontname="verdana"];
    str      -> unicode  [label="decode", labeldistance=2.5] ;
    unicode  -> str   [label="encode", labeldistance=2.5];
}

Python supports unicode strings natively. Try this on your terminal:

>>> import sys
>>> sys.stdout.encoding
'UTF-8'
>>> sys.stdout.isatty()
True
>>> u_king = u'\u265a \u2764 \u265b'
>>> print(u_king)
♚ ❤ ♛
>>> len(u_king)
5

digraph g {
    rankdir="LR";
    rank=same;
    labelloc="b"; label="\nKing Loves Queen Symbols"; labeldistance=2.5; style=bold;
    fontsize=24.0; fontname="verdana";
    node [width=2, height=1, penwidth=3.0, fontname="verdana", fontsize=20.0,
                                    labeldistance=2.5];
    nodesep = 1.0; pad = 0.5; mindist=0.5;
    King   [label=" ♚\n'\\u265a'\n"];
    Loves  [label=" ❤\n'\\u2764'\n"];
    Queen  [label=" ♛\n'\\u265b'\n"];
    edge [style="invis"];
    King -> Loves -> Queen;
}

Your terminal probably has been opened already in ‘UTF-8’ encoding (most popular). What this means is that, if you write unicode string into sys.stdout, then it will translate that to UTF-8 byte sequence and write it. If your terminal is capable of understanding this encoding, you will see nice ‘king loves queen’ symbols. Note that the string length is 5 which includes the 3 exotic symbols and 2 spaces.

Let us convert this unicode string to utf-8 encoded string, so that it can be safely written to a file:

>>> b_king = u_king.encode('utf-8')
>>> b_king
'\xe2\x99\x9a \xe2\x9d\xa4 \xe2\x99\x9b'
>>> len(b_king)
11

As you can see, the ‘u265a’ unicode king symbol was encoded into 3 bytes UTF-8 sequence. You can safely write this into a file in binary mode and read it later and convert it back to unicode using the same encoding:

>>> fout = open('out-bin', 'wb')
>>> fout.write(b_king)
>>> fout.close()
>>> fin = open('out-bin', 'rb')
>>> asc_in = fin.read()
>>> uni_in = asc_in.decode('utf-8')
>>> print(uni_in)
♚ ❤ ♛

You can automate this process of encoding and decoding if the input/output stream is aware of what you really want to do with the bytes. You can do this with codecs module:

>>> import codecs
>>> fin = codecs.open("out-bin", "r", "utf-8")
>>> s = fin.read()
>>> s
u'\u265a \u2764 \u265b'
>>> type(s)
unicode

You will notice few interesting things:

  • The file is opened in ‘text’ mode (i.e. ‘r’ or ‘rt’ mode, not ‘rb’ mode.)
  • The fin.read() returned unicode object instead of str object.
  • The string was auto-converted to unicode on reading a file which has stored the unicode string in utf-8 encoding.

Similar translation happens if you open a file for writing as well– you can directly write the unicode string into output stream and it will auto-convert that to utf-8 encoded byte stream.

Tip

The codecs module can help you to automate encoding/decoding of unicode and other character sets between data streams.

Now let us take a look at the changes in string type in Python 3. The ‘unicode’ has become the default string type. So the typename ‘unicode’ is gone in Python 3. The old Python 2.0 ‘str’ type is now called ‘bytes’ in Python 3.

digraph g {
    node [width=2, height=1, style=filled, penwidth=3.0, fontname="verdana", fontsize=14.0];
    nodesep = 1; pad = 0.5; rankdir=LR; mindist=1;
    edge [penwidth=3.0, fontsize=14.0, fontname="verdana"];
    "Python 2.0 str Type "     -> "Python 3 bytes Type" [label="becomes", labeldistance=2.5] ;
    "Python 2.0 unicode Type"  -> "Python 3 default string Type" [label="becomes", labeldistance=2.5];
}

Some Python 3.0 examples dealing with strings:

>>> type('hello')
    <class 'str'>
>>> type(b'hello')
    <class 'bytes'>
>>> u'hello'    # Invalid syntax error

In Python 3, you can also specify the encoding directly on file opening:

>>> fout = open('out-bin', 'wt', encoding='utf-8')
>>> u_raja =  '\u265a \u2764 \u265b'
>>> type(u_raja)
    builtins.str
>>> fout.write(u_raja)
    5
>>> fout.close()

Now open another terminal and examine file contents:

$ od -x out-bin
0000000 99e2 209a 9de2 20a4 99e2 009b
0000013

As you can see, od -x command displays 2 bytes hexa-decimals chunks with higher order byte first. So the first 3 bytes are: 0xe2 0x99 0x9a This is the UTF-8 encoding for the unicode king symbol \u265a. Now read the contents back:

>>> fin = open('out-bin', 'rt', encoding='utf-8')
>>> raja_in = fin.read()
>>> type(raja_in)
    builtins.str
>>> print(raja_in)
    ♚ ❤ ♛

Let us directly encode the string to utf-8:

>>> b_str = raja_in.encode('utf-8')
>>> b_str
    b'\xe2\x99\x9a \xe2\x9d\xa4 \xe2\x99\x9b'
>>> type(b_str)
    builtins.bytes

Python allows you to write your program by mixing with embedded unicode strings. However, identifier and reserved words should only be in ASCII. If the source code contains non-ascii character, then explicitly mentioning the encoding is a good idea like below:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
king1 = '\u265a \u2764 \u265b'
king2 = b'\xe2\x99\x9a \xe2\x9d\xa4 \xe2\x99\x9b'
king3 = '♚ ❤ ♛'
print(type(king1), king1)
print(type(king2), king2)
print(type(king3), king3)

The above program output prints:

<class 'str'> ♚ ❤ ♛
<class 'bytes'> b'\xe2\x99\x9a \xe2\x9d\xa4 \xe2\x99\x9b'
<class 'str'> ♚ ❤ ♛

Note that the only line which contains non-ascii characters is the line which does assignment to king3. The other statements are fully written in ascii. Depending on the file editor program, the file may have been written out using UTF-8 (hopefully) or some other encoding. Python has to know what encoding was used to write the original file so that it can translate properly on reading. For example, if you replace UTF-8 with ASCII, the above program will fail. Now a days, UTF-8 is assumed to be the default encoding, so deleting the UTF-8 specification line above may still work, but it is always good idea to specify this if your source code contains non-ascii characters.

Following example illustrates how explicit encoding/decoding works in Python 3:

>>> s = 'San José'
>>> enc_s = s.encode('utf-8')
>>> type(enc_s)
    <class 'bytes'>
>>> enc_s
    b'San Jos\xc3\xa9'
>>> enc_s.decode('utf-8')
    'San José'

1.21   Things Unique to Python

1.21.1   Summary

There are few things which are fairly unique to Python. These may not be so unique to all languages which exist today, but fairly unique considering other mainstream programming languages:

  • Using indentation for blocks. The idea was mainly inspired from ABC programming language.
  • The ‘else’ clause of for, while loops and try block. GvR has acknowledged the choice of the naming of ‘else’ was a bad idea as it may be confusing.
  • Scoping rules. By default, a variable referenced in a function is decided wheter it is global or local depending on (1) if it is only read (2) or being written into. See Scoping section.
  • There is no goto
  • lambda function: It is an anonymous function with limited capabilities of allowing only one expression. Multiple statements are not allowed. It is no way more powerful than a nested named function. Still it is widely used for its convenience despite it’s limitations.
  • Generators: Though the idea is inspired from prior existing languages, none of the mainstream languages have similar feature.
  • No automatic invocation of super classes.
  • It makes use of Magic Attributes of the form __attributename__ of the objects.
  • No new operator to instantiate class instance.

1.21.2   Magic Attributes

Python implementation adds various magic attributes of the form __attributename__ into the objects. They are used for special purposes. Following table includes some of those important ones.

Attribute Name Comments
obj.__dict__ A dictionary of object’s (in general writable) attributes.
inst.__class__ The class object of the instance.
class.__bases__ The tuple of base classes of a class object.
class.__name__ The name of the class or type.
class.__mro__ The tuple of base classes considered during method resolution.
... etc ...  

1.22   Additional Notes

1.22.1   Deep Vs Shallow copies

By default assignment such as a = b makes both variables point to the same object. If you want to make a copy of the object, you can use copy module:

>>> import copy
>>> a = [1, [2, 3], 4]
>>> b = copy.deepcopy(a)
>>> a[1].append(5)
>>> a
    [1, [2, 3, 5], 4]
>>> b
    [1, [2, 3], 4]

1.22.2   Linked List

The built-in lists are powerful enough for most cases. You can append or insert at arbitrary position. However inserting into list costs O(N) complexity. If that is a problem, consider using one of optional packages found in PyPI. For example, llist or blist.

1.22.3   Naming Conventions

  • Use lower_case_name for functions, methods, attributes

  • Use MyClassName for classes.

  • Avoid using camelCase

  • Module internal attributes: _mod_var

  • Class private attributes: __private_var.

    The class private variable form __private_var is internally translated to classname_private_var to avoid name collisions with other classes in the inheritance hierarchy.

1.22.4   Keywords

To findout python’s list of keywords:

>>> import keyword

>>> keyword.kwlist
    ['and',   'as',      'assert',  'break',  'class', 'continue', 'def',
     'del',   'elif',    'else',    'except', 'exec',  'finally', 'for',
     'from',  'global',  'if',      'import', 'in',    'is',      'lambda',
     'not',   'or',      'pass',    'print',  'raise', 'return',  'try',
     'while', 'with',    'yield']

The first statement import keyword makes the module named keyword available. The second statement prints the value of keyword.kwlist, that is, the object kwlist that has been exported from keyword module. The other famous modules include os, sys, re, network, etc.

As you can see, Python has a minimalistic approach when it comes to keywords.

1.22.5   Operators

Following table summarizes the operators and their precedence as documented from official python documentation.

The order displayed is the lowest precedence appearing first to highest precedence operators appearing at the last.

Operator Description
lambda Lambda expression
if – else Conditional expression
or Boolean OR
and Boolean AND
not x Boolean NOT
in, not in, is, is not <, <=, >, >=,<>, !=, == Comparisons, including membership tests and identtests.
| Bitwise OR
^ Bitwise XOR
& Bitwise AND
<<, >> Shifts
+, - Addition and subtraction
*, /, //, % Multiplication, division, remainder
+x, -x, ~x Positive, negative, bitwise NOT
** Exponentiation
x[index], x[index:index], x(arguments...), x.attribute Subscription, slicing, call, attribute reference
(expressions...), [expressions...], {key: value...}, `` expressions... `` Binding or tuple display, list display, dictionary display, string conversion

1.22.6   Singleton

The best way to implement singleton is by intercepting at the object creation time by overriding the __new__() method:

class Singleton(object):
    """Illustrates how to implement Singleton object"""
    _inst = None
    def __new__(cls, *args, **kwargs):
        if cls._inst is None:
            cls._inst = super(Singleton, cls).__new__(cls, *args, **kwargs)
        return cls._inst

Now instantiating the class any number of times will return the same instance:

>>> obj1 = Singleton()
>>> obj2 = Singleton()
>>> id(obj1) == id(obj2)
True

1.22.7   Extending immutable class

Extending immutable class is bit tricky since you can’t modify the underlying base data members once it is created. You need to override the __new__() method as illustrated in the following example:

class Mylink(str):
    """Illustrates how to extend an immutable class.

    It extends basic string class. It prefixes the given string
    with 'http://' if not already present during intialization.
    """

    def __new__(cls, s):
      if (not s.startswith('http://')):
          s = 'http://' + s;
      return super(Mylink, cls).__new__(cls, s)

Note: This can only be done from __new__() method and can not be done from __init__() method since the base class is immutable. Following won’t work since it is too late to modify base object:

class Mylink(str):
    def __init__(self, s):
        if (not s.startswith('http://')):
          s = 'http://' + s;
        str.__init__(self, s)      # <=== No effect! Does not work.

1.23   Exercises

  • Given file passwd which is in /etc/passwd format, and give another file group which is in /etc/group format, write functions for following:

    • Return numeric userid given username: username2id(username)
    • Given username, return the groups the user is part of: username2groups(username)
    The passwd file entry is of the following format:

    username:x:user_id:group_id:Descripton:/home/dir:/shell/path

    Examples:

    root:x:0:0:root:/root:/bin/bash lightdm:x:104:111:Light Display Manager:/var/lib/lightdm:/bin/false mysql:x:115:125:MySQL Server,,,:/nonexistent:/bin/false

    The group file format is as follows:

    group_name:x:group_id:user1,user2,...

    Example:

    mysql:x:125:www-data,thava

  • Return list of usernames which satisfies given regular expression: eg: my*

1.24   Resources

Here is some good resources for learning Python:

Fredrik Lundh’s effbot.org Site:
Contains useful simple code samples. He is the author of the OReilly book The Stanard Python Library
Think Python:
Good OReilly Book authored by Allen B. Downey. Free to read online.
Python Books List at Python wiki:
Good list to skim and find some interesting information.
Python 101-Introduction to Python:
Good material authored by Dave Kuhlman.

Python FAQs: Collection of Python FAQs

1.25   Source Document

This document has been written using RestructuredText and converted to HTML using rst2html command.

See the document source text


Table Of Contents

Previous topic

Thava’s Notes

Next topic

1   Universal JavaScript

This Page