Author: | Thava Alagu thavamuni@gmail.com |
---|---|
Version: | 0.8 |
Date: | 2013 August 16 |
This is a gentle book of introducing Python to any one who is familiar with any other programming language such as C, Java or PHP. You can use it to learn yourself or to teach others.
Over 20 years of my experience in developing software applications on different platforms and languages, I have found Python to be the most productive language, offering least resistance from the concept to the implementation.
This book is not about data structures and not meant to be API reference manual. The goal of this book is to provide sound foundation for understanding the fundamentals, just by following simple discussions, notes and exercises. You would mostly learn the advanced topics while you start working on solving actual problems.
The book was mainly developed from my notes that I noted down for myself. When I had started learning Python, I had extensive experience in C/C++, Java and PHP in that order. I have developed some admin tools in Perl to automate few things within corporate intranet, but I found myself frequently looking at syntax rules, API reference, and my own code before writing another little program in Perl. I have poor memory, and it didn't really stick to me.
When I first learned that Python uses indentation for blocks instead of explicit token like curly brackets, I was taken aback. I also (mistakenly) assumed that I won't be able to use my favorite editor (that is vim for me) to quickly jump between begining and end of the block. Jumping is important to me since I spend more time jumping than standing. :-) Then much later, I found vim has all sorts of plugins readily available to customize however you want to jump-- I didn't have to write one on my own just for this purpose. After this discovery, the reluctance was gone, I started reading about Python, which unveiled the cosmic path to other unexplored galaxies for me.
Most people either love Python or hate it. No in-betweens. If you can resist the temptation to prematurely walk away from it, there are things waiting to be discovered that may permanently change your way of approach in problem solving.
Hope you enjoy reading this. Programming in Python is a pleasure. But there are rules for the game. When you learn the rules, You are in for the game!
What is Python suitable for ? It is a strange beast. It is good for quick scripting as well as large applications ! To summarize ...
Python is ...
Python was developed by Guido van Rossum in the early 1990's. He is the primary author and continues to play lead role (See BDFL) for future direction.
It's module system was inspired by Modula-3 language and overall influenced by ABC Programming language among others.
This is a timeline of selected early and modern programming languages. Many languages were left out in the interest of brevity and to mainly establish better understanding of how Python fits into the history.
Year | Language | Predecessors | Author/Comments |
---|---|---|---|
1950's | Fortran, Lisp, COBOL | ||
1960's | ALGOL,Simula, BASIC | ||
1970 | Pascal | ALGOL 60, ALGOL W | Nikklaus Wirth, Jensen |
1972 | Prolog | Alain Colmerauer | |
1972 | SQL | ALPHA, Quel(Ingres) | IBM |
1972 | C | B, BCPL, ALGOL 68 | Dennis Ritchie |
1972 | Smalltalk | Simula 67 | Xerox PARC |
1975 | Scheme | Lisp | Sussman, Steele |
1975 | ABC | SETL | CWI |
1979 | Modula-2 | Modula, Mesa | By Niklaus Wirth |
1980 | Ada | Green | OO, Concurrent |
1983 | C++ | C, Simula | Stroustrup |
1984 | Common Lisp | Lisp | Lisp Dialect & Std |
1986 | Objective-C | SmallTalk, C | |
1987 | Perl | C,sed,sh,awk | Larry Wall |
1987 | Erlang | Prolog | By Ericsson. Concurrent. |
1989 | Modula-3 | Modula-2 | At DEC. |
1990 | Haskell | Miranda | Open. Standardized. |
1991 | Visual Basic | QuickBASIC | Alan Cooper, sold to Microsoft |
1991 | Python | ABC, ALGOL 68, Icon, Modula-3 | Van Rossum |
1995 | Java | C, Simula 67, C++, Smalltalk, Ada 83, Objective-C, Mesa | James Gosling, Sun |
1996 | JavaScript | Self, C, Scheme | Brendan Eich at Netscape |
1995 | PHP | Perl | Rasmus Lerdorf |
1995 | Ruby | Smalltalk, Perl | Yukihiro Matsumoto |
2000 | C# | C, C++, Java, Delphi, Modula-2 | Microsoft |
2003 | Scala | Smalltalk,Java,Haskell Standard ML, OCaml | Martin Odersky |
2009 | Go | C, Oberon, Limbo | Google, Concurrent. |
Source: | Wikipedia |
---|
The first step is to get the python bits and install on your computer. You can download python from http://www.python.org/getit/ Most Linux platforms come with some version of Python pre-installed.
As of Jan 2013, the current production versions are 2.7.3 and 3.3.0. We will use version 2.7.3 since it is the most widely used version.
Usually the installation is simple -- it just involves running the package installer (Windows) or locating the relevant package for your OS distribution and install it. If you have any difficulties on installing, See http://docs.python.org/2/using/index.html
If you are going to install multiple versions of Python on the same machine, it is recommended (not required) you install the following :
- virtualenv: Create Python virtual environment.
- pip: Python install package tool.
In ubuntu, the packages are available in standard repositories as python-virtualenv and python-pip. If you are just learning python, you need not use virtualenv and pip.
If you are installing an additional 3rd party python module in pypi (Python Package Index), the command to use is pip. For example, to install blist packge, you would simply run:
pip install blist
The pip installer replaces legacy easy_install command. The easy_install command has many limitations, for example, it does not support 'uninstall' command.
The virtualenv lets you create independant sandbox directories based on python2 or python 3 versions and work inside them. You can install as many different 3rd party modules you like and throw them away latter, without cluttering the global installation. When you are serious about real development, using virtualenv is essential.
Python installation and management has a long history of messy dependencies between internal projects distutils, setuptools, distribute, distutils2 etc which is being sorted out for major cleanup. We won't go into all those details here, but as long as you stick to using virtualenv and pip, it is more likely that you would be a happy camper.
Python provides interactive shell command python and also provides a graphical IDE called IDLE that ships with python on all platforms. That is good enough to start learning without having to use any heavy weight IDE.
There are many options available for using more powerful IDE. Here is a good summary at Python IDE wiki.
Since there are too many options, I will shortlist few good editors. These are opinionated choices, but rather good place to start.
IDLE: | Default Python IDE with integrated debugger. Cross-platform. Free. |
---|---|
pyscripter: | Free, Windows Only. Arguably the best IDE on Windows. |
Eclipse: | With PyDev plugin supports integrated editing and debugging. Heavy weight. Free. |
vim: | Powerful general purpose editor with configurable Python support. Use python-mode plugin for vim. See Python Vim Configuration wiki. |
Emacs: | Powerful general purpose editor with configurable Python support. See Python Emacs wiki. |
spyder: | Cross platform, light weight with integrated debugger. |
PyCharm: | Powerful IDE. Good code completion and support for popular frameworks like django. Also has integrated support for vim key mappings. Not Free. |
My personal preference is vim editor with python-mode plugin.
Here is another interesting collection of information about Python IDEs from this stackoverflow question.
To follow the examples described in this book, we won't assume any specific IDE. We will just use command line python command and ipython shell.
ipython provides interactive shell similar to python shell with more powerful extended features. Go to IPython site to download it.
Start your python interactive shell:
$ python Python 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> >>> print 'Hello World!!!' Hello World!!! >>>
If you use ipython instead of python, it looks like below:
$ipython Python 2.7.3 (default, Aug 1 2012, 05:14:39) Type "copyright", "credits" or "license" for more information. IPython 0.12.1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: print 'Hello World!!!' Hello World!!!
Congratulations! That was really simple.
As you can see, the print is a keyword for print statement. So, it was not necessary to type print('Hello World'). However, from Python 3.0+ print has been changed to function, not a keyword. It is better to start using print() since it works on both versions.
For reasons for the change See PEP-3105 (Python Enhancement Proposal)
All changes to Python follow PEP process similar to JSR process for Java community.
In any (procedural) language, you can usually expect very basic support for:
Python, ofcourse supports all these, and also supports both Functional and Object Oriented style of programming. We will look at each of these aspects later-- one after another.
Initial examples here focus on the simple and common procedural style programs so that you can readily map equivalent features in other languages which you may be already familiar with.
After the introduction of functional style programming, you will be encouraged to write programs in functional style which is more 'pythonic'.
First thing you should be aware is that indentation matters in Python. In languages like C and Java, you have braces {} to define the blocks, but in Python you use the indentation. Enforcing indentation makes the program more readable in general. For some people who are already used to C-Style of braces, this can be very frustrating at first. But it is more likely that, if you accept this, you will find it is just natural after sometime. I am one example, who complained, whined and grudgingly started with Python and I can vouch that it no longer bothers me -- in fact, using indentation for blocks seems more natural to me now.
Let us look at a simple exercise to do the following:
Write a program which prompts for your name and year of birth and prints your age. Do this in a loop until user inputs quit for the name. Type the following in a text editor and save it in name.py:
while True: name = raw_input('Enter Your Name ==> ') if name == 'quit': break year = raw_input('Enter your Year of Birth ==> ') age = 2013 - int(year) print('Hello, {}! You are {} years old!\n'.format(name, age))
Now run this program from shell:
$ python name.py Enter Your Name ==> Johnny Enter your Year of Birth ==> 1900 Hello, Johnny! You are 113 years old! Enter Your Name ==> quit
There are several things to notice from this simple program:
There is no braces {} for block. The while block was defined by the indentation. By convention, we use 4 spaces for single indentation. However technically any number of spaces will do.
You don't declare variables. They spring into existence when you assign some value to them. Such languages are called dynamic languages. It provides convenience and generic programming capabilities (polymorphism) at the cost of some lack of compile time type checking safety. Python's approach is arguably superior.
raw_input(str) is a built-in function, which prints and prompts for user input and reads the line from std input returning the result.
True is a boolean of type bool. String comparisons like name == 'quit' evaluates to boolean result. Unlike C, you don't need special strcmp() function to compare strings.
You convert string to integer by using int(year) function. Explicit type conversion is required. This is different behaviour from loosely typed languages like PHP. In PHP, a variable can act as either string or integer depending on context. Python is said to be strictly typed. This helps preventing silent errors arising out of ambiguous nature of loose typing.
The break statement terminates the while loop when the condition is satisfied.
The output format specifiers like %s (for string) and %d (for integer) works just like in C language.
The '%' operator is used to format using the input
The raw_input() function is a compact way of doing the following:
import sys sys.stdout.write(“Enter your name :”) name = sys.stdin.readline()
Suppose you want to format the output to align at specific column. Then you can use C-like format specifier as below:
>>> print( 'Name: %15s; Age: %5d' % ('Johnny', 93) ) Name: Johnny; Age: 93
Note the % operator used as string format operator for the string. The print() function is unaware of the % operator. The % operator is also used as modulo operator:
>>> 9 % 2 1 >>> 9 % 3 0 >>> 'Number %d' % 5 'Number 5' >>> 'Numbers %d and %d' % (10, 20) 'Numbers 10 and 20'
Note that the % operator takes tuple as second operand when input involves multiple components like (10, 20) in the above example. The tuple is a built-in type in Python which is immutable (means you can't change it once you create it).
When output does not involve format specifier, more readable version would be:
>>> 'Number {} and {}'.format(10, 20) 'Number 10 and 20'
With format(), you can reuse the same argument:
>>> 'Number {0:2d} (in decimal), {0:x}, {0:#x} (in hex) and {1}'.format(10, 20) 'Number 10 (in decimal), a, 0xa (in hex) and 20'
Even better, you can use named parameters as below:
>>> 'Number {first} and {second}'.format(first=10, second=20) 'Number 10 and 20'
Tip
Suppose you want to read the name and age from single line user input. In C language, you would write:
printf("Enter Name and Age: "); scanf("%s %d", name, &age);
In Python, you can do:
(name, age) = raw_input('Enter Name and Age: ').split() age = int(age)
The raw_input() function returns the input line after stripping the newline at the end. The split() splits the line into a list of words separated by whitespace. You can assign the list to a tuple as above. At the last, we convert the type of age from string to integer.
Suppose if you want to read the hexa-decimal input with "%x" (and such) ? For such more complicated input processing, there is no direct equivalent of C's scanf() in Python. You will have to use regular expression.
See also:
Note
Let us say, you need to store all the (name, age) pairs in the first example in memory for later processing. The input may have any number of pairs.
How do you do that in Python? The answer is: we use -- list built-in datastructure. The list can dynamically grow in size as more elements are added. This is a powerful concept.
Just to compare this with other languages-- In C, you would write your implementation for linked list; In C++ you would probably use vector template; In Java, you would probably use java.util.Vector. However, you would notice that Python's style is much less verbose and easier to use:
nalist = [] # Initialize empty list while True: name = raw_input('Enter Your Name ==> ') if name == 'quit': break year = raw_input('Enter your Year of Birth ==> ') age = 2013 - int(year) print('Hello, {}! You are {} years old!\n'.format(name, age)) nalist.append((name,age)) # Append (name, age) tuple. print 'Input List is: ', nalist
We just added 3 lines to our first example: first at the beginning and the last 2 lines at the end!
Suppose you enter 3 pairs of names, an example output is:
Input List is: [('john', 20), ('jack', 40), ('jill', 30)]
To see complete documentation about list just use pydoc command:
$ pydoc list
Let us look at a simple function which returns a string to indicate if a given number is negative, zero or positive:
def numdesc(no): """Given number, return 'negative', 'zero' or 'positive'""" if no < 0: return 'negative' elif no == 0: return 'zero' else: return 'positive' >>> numdesc(100) 'positive' >>> numdesc(0) 'zero' >>> numdesc(-20) 'negative' >>> numdesc.__doc__ "Given number, return 'negative', 'zero' or 'positive'"
The program is self explanatory. The triple quote string right next to the definition serves as the documentation for the function. Actually triple quote string used here is a general string expression whose value is thrown away since it is not assigned to anything. It is the convention to use them for documentation. The documentation generation tool such as pydoc also recognizes such conventions.
Every block is prceded by ':' in the previous line and indented at additional level.
The following example illustrates how to write into file and how to invoke external command from the script:
import subprocess def writefile(): """Example to illustrate file and process invoke operations""" outfp = open('/tmp/hello.txt', 'wb') outfp.write('Hello World!') outfp.close() subprocess.call(['/bin/cat', '/tmp/hello.txt']) writefile()
The first import subprocess statement makes all the functions exported by the subprocess module available here. Then you open the file for writing ('wb' for write in binary mode). The 'wb' mode is same as 'w' in Unix, but different in windows. (due to how newline characters are translated before writing). It is recommended to use 'wb' in the interest of portability.
The subprocess.call() accepts the list of command line strings. If you prefer to pass single space separated command line, you can use either of the following:
* subprocess.call('/bin/cat /tmp/hello.txt', shell=True) * import os os.system('/bin/cat /tmp/hello.txt') # Note: Use of os.system() is deprecated since new subprocess module # is more general and flexible.
See pydoc subprocess for more information about subprocess module.
Consider the following example which prints your os PATH directories and also python module search directories:
from pprint import pprint # (1) import os import sys def ospath(): """Print the list of directories as indicated by os PATH environment""" print('Your OS PATH directories :') pathlist = os.getenv('PATH').split(os.pathsep) # (2) pprint(pathlist) def pythonpath(): # (3) """ Print python module search path directories. Note that $PYTHONPATH directories are included in this list. """ print('Your Python PATH module search directories :') pprint(sys.path) if __name__ == '__main__': # (4) ospath() pythonpath()
(#1) The pprint module provides pprint() function to pretty print list output.
(#2) os.getenv('PATH') returns system path string. It is split into a list using os dependant path separator character os.pathsep. Note that in Unix, it is ':' and in windows it is ';'
(#3) There is installation platform specific list of python module search path. User can add additional directories by setting PYTHONPATH environment variable. The sys.path reflects the final value of the module search path.
(#4) This program itself can be imported as module in another program. In such cases, it is useful to conditionally hide certain logic if it is not invoked as the main program.
The most effective way of introspecting any object is to use dir() command at the interactive shell. Continuing from the earlier example:
>>> dir(nalist) .... # All the supported function and property names of the nalist object # are displayed. >>> type(nalist) <type 'list'> >>> help(list) # Displays the python doc for list.
Tip
The dir() command is the most useful and frequently used command for basic object introspection.
Now that you have got little taste of the language, we will move forward with more fundamentals before looking at more complex examples.
Comments are used in following forms:
# Lines starting with hash are comments. i = 10 # You can use partial line comments like this too. "This is dummy string expression, but may serve like a comment" def f(): """The triple quoted strings is the standard for documenting functions and modules. It is a multi-line string expression, and used for documenting the above function by convention. The Python documentation tools look for this convention to auto-generate the documentation""" ....
Note that built-in types (such as int, float, etc) and built-in functions (such as len(), isinstance()) are not keywords. This is different from many other languages, such as C where built-in types are keywords.
Python's built-in functions include:
dir(), id(), len(), isinstance(), issubclass(), open(), range(), map(), reduce(), filter(), apply(), locals(), globals(), eval() , zip(), enumerate(), raw_input(), and more ... The built-in types are also functions since they are used as constructors: int(), long(), float(), list(), tuple(), dict(), type(), and more ...
Since these are not keywords, you will be able to rebind them to different things:
>>> len = 10 # This essentially rebinds built-in function name 'len' to 10. # Not a good idea. Should avoid rebinding built-in function names.
You can pass built-in types as any other objects as function paramters which is very powerful concept useful for generic programming:
>>> def convert(val, sometype): # This defines a simple function f return sometype(val) # which just converts value to given type >>> print(convert(10.5, int)) # Isn't that interesting ? 10
Python defines a core set of built-in types designed for ease-of-use. The following list includes most of them. A few of them have been leftout intentionally since they are either rarely used or reserved for later advanced level discussions. Here is a summary:
Numeric Type | Comments |
---|---|
int | The integer type-- It uses 32 bit or 64 bit depending on the platform: >>> sys.float_info >>> sys.maxsize 9223372036854775807 # which is (2**63 - 1) |
float | Uses double precision: >>> sys.float_info sys.float_info(max=1.79e+308, min=2.2e-308, ... ) |
long | Unlimited precision. Note: In Python3, int acts like long. And long is gone. |
complex | Complex number: >>> complex(1.2, 3.4) (1.2+3.4j) |
bool | Boolean Type. True or False. Can not be subclassed. |
Python has a set of built-in sequence Types such as str, unicode, list, tuple, etc. A sequence type is any type which implements strictly ordered collection of elements. In general this means that the type should be indexable-- i.e. obj[i] gets the element at position i in the sequene. The type should implement __getitem__() method for this to work.
Since every sequence type is also essentially a container, it should also support iterator protocol. i.e. __iter__() should be defined and must return an iterator.
The builtin sequence types are summarized below. Note that some types have been changed/renamed in Python 3.2 in order to cleanup legacy behaviour.
Sequence Type | Comments |
---|---|
str | Immutable string type in Python2.7. In Python 3, str behaves like how unicode does in Python 2.7: >>> s = 'Some ASCII string' |
unicode | Immutable sequence of unicode characters. (Python 2.7 only): >>> s = u'San José' |
bytes | New type in Python 3, for Immutable sequence of bytes. This behaves like str in Python 2.7: >>> s = b'Some ASCII string' |
list | Mutable sequence of any objects (possibly hetrogeneous): >>> mylist = [1, 2, (3, 4), [5,6]] |
tuple | Immutable sequence of any objects (possibly hetrogeneous): >>> mytuple = (1, 2, (3, 4), [5,6]) |
bytearray | Mutable sequence of bytes. See also: bytes vs bytearray |
buffer | Provides interface to internal data without copy. (legacy) Use new memoryview instead of buffer. |
memoryview | Interface to internal data without copy: >>> buf = bytearray(1024*1024) >>> view = memoryview(buf) >>> view = view[10:] >>> view[:6] = bytes(b'Hello!') >>> print(bytes(view[:6])) b'Hello!' >>> print(bytes(buf[:16])) b'Hello!' |
xrange | Immutable sequence generated from specific range. Python 2.7 Unlike list, it consumes same memory irrespective of size of the memory. |
range | Immutable sequence generated from specific range. The range in Python 3 behaves like xrange in Python 2.7 |
There are various other built-in types available which are briefly summarized below:
Type | Comments |
---|---|
dict | Dictionary type is very powerful and core datastructure. It is a mutable map of keys with associated values. |
class | Useful to define new types possibly extending other types. Supports object oriented programming style. |
function | A function is an object of type function A function can be builtin function (eg. len()) or user defined one. |
method | Methods are functions that are called using the attribute notation. This includes built-in methods (eg. mylist.append) or class instance method. (eg. myobj.my_method). The type keeps track of the associated instance. |
Generator | It is similar to function but it does yield values one after another-- unlike functions which return values. It acts like a sequence generator. It can be used like any other iterable object. Very useful for efficient iterations and concurrent programming. |
module | A module supplies an implementation unit which can be imported into other modules for use. It exports the symbol table of the component objects through it's attributes. |
None | Special type which represents a state of 'nothing'. This is a global singleton object and can't be extended. |
type | Everything is regarded as an object. Any object is of some type. The built-in types are the instances of type type. |
code | Code objects represents compiled python code such as function body. A function is associated with code object and context (locals, globals). However code object can be used to execute code on dynamic scope. |
set, frozenset | There are 2 set types supported:
Set is different from list-- it does not allow duplicates, and supports following operations: x | y Set union x & y Set intersection x – y Set difference x ^ y Symmetric difference len(x) Number of elements in the set max(x) Maximum value min(x) Minimum value The elements should be hashable object. A set can contain a frozenset. But a frozenset can not contain a set. Why? Because set is mutable hence non-hashable. |
Following discussion is applicable to Python version 2.7 since there are changes in string specifications between Python 2.7 and 3.
Strings are immutable. Adjacent string literals are auto concatenated:
greet = 'Hello ' "World!" greet = 'Hello ' + "World!" # Result is same as above. # Parentheses used to allow continuation: greet = ('Hello ' + 'This can be long greetings') # Backslash can be used to continue long string. greet = 'Hello .... \ This is long line ...!' Best way to use multiline strings is to use triple quotes like below: """Triple quotes are good for multiline comments. because you don't have to escape single quotes inside""" '''You can also use triple quote using single quote character'''
Some common string operations:
>>> fruits = ['banana', 'apple', 'orange', 'tomato'] >>> ' '.join(fruits) 'banana apple orange tomato' >>> ' '.join(fruits).split(' ') ['banana', 'apple', 'orange', 'tomato'] >>> ',\n'.join(fruits) 'banana,\napple,\norange,\ntomato' >>> print ',\n'.join(fruits) banana, apple, orange, tomato >>> s = 'orange' >>> s[1:] range # Array slice from 1st index >>> s[0:3] ora >>> s[1:3] # Prints string[start_index:end_index) 'ra' # Note: s[3] excluded. >>> s[-1] # Last character 'e' >>> s[2:-1] 'ang' >>> type(s) <type 'str'> >>> type('e') # Unlike C, single char type is also string <type 'str'> >>> 'r' in 'orange' # Use 'in' Operator for substring check. True >>> 'ang' in 'orange' True >>> 'k' in 'orange' False
In Python 3.0, the default string type str is unicode capable. For storing raw bytes, there is immutable string type called bytes type. There is bytearray type which is (kind of) list of mutable bytes. It is not exactly a mutable list of raw bytes, just similar but still different. The bytearray type has some unique characteristics:
>>> s = bytearray(b'The King !') >>> s[4:9] = b'Queen' >>> s bytearray(b'The Queen!') >>> s[0] 84 >>> type(s[0]) builtins.int >>> s[0] = b't' TypeError: an integer is required >>> s[0] = ord(b't') >>> s bytearray(b'the Queen!')
Hence it is more like a list of small-integers (with each element in range 0-255).
The tuple is one of the sequence types in Python. It is similar to lists, but immutable. You can not reassign individual elements of the tuple. A tuple may contain a mutable data (such as list) as one of it's elements.
The constructor for tuples is comma, not parentheses:
>>> 1,2 (1,2)
Representing tuple of 1 element poses problem which involves some special syntax to resolve
>>> (1) 1 # This is not a tuple !!! >>> 1, # Looks strange, but solves the problem on hand! (1,)
Empty tuple is what you may expect it to be:
>>> () # This is empty tuple! ()
It is always recommended to use parentheses for tuples:
>>> (1,2) # though use of parentheses is optional. (1,2)
Swapping Values:
b, a = a, b
Tuple packing and unpacking:
point = 1, 2, 3 # This is tuple packing! x, y, z = point # This is tuple unpacking!
Proper unpacking happens even for nested tuples:
(a, b, (c, d)) = (1, 2, (3, (4, 5))) # a=1; b=2; c=3; d=(4,5)
However unpacking works fine for any sequence in RHS (of matching length):
x, y, z = 'abc' # x = 'a' ; y = 'b' ; z = 'c'
A function may return a tuple to return multiple values:
x, y, z = get_location()
We used list data structure already in our Second Example. Let us take a closer look at lists. To recap, list is an ordered collection of values and the size can dynamically grow as we add more elements.
Lists are mutable, ie modifiable and can contain any type of elements including list:
>>> mylist = [ [1,2], (3, 4), {'one':1, 'two':2} ]
The above list contains another list, tuple and a dictionary which we will cover shortly.
The list supports following operations:
a.append a.extend a.insert a.remove a.sort a.count a.index a.pop a.reverse
Inserting and removing from arbitrary positions is supported:
>>> a = [5, 10, 40, 20, 30] >>> a.remove(40) # Remove by value >>> a [5, 10, 20, 30] >>> a.insert(2, 40) # insert at any position >>> a [5, 10, 40, 20, 30] >>> a.pop() # pop from last 30 >>> a [5, 10, 40, 20] >>> a.pop(1) # pop from any position 10 >>> a [5, 40, 20] >>> a.extend([45, 35]) # merge lists >>> a [5, 40, 20, 45, 35] >>> b = a # a, b points to same object >>> c = a[:] # Makes a copy of a using slice >>> a.sort() >>> a [5, 20, 35, 40, 45] >>> b [5, 20, 35, 40, 45] >>> a is b # a, b are same True >>> c # c retains old copy [5, 40, 20, 45, 35] # Note: id(var) prints the object identifier. (similar to C pointer) >>> print('id(a) = %s' % id(a) ) ; print('id(b) = %s' % id(b) ) id(a) = 45332168 id(b) = 45332168 >>> print('id(c) = %s' % id(c) ) id(c) = 41829512 >>> sorted(c) # This built-in function leaves c unchanged. [5, 20, 35, 40, 45] >>> c [5, 40, 20, 45, 35]
Dictionary is a built-in datatype in Python which is basically a mapping table used to map a set of keys to set of values. It is similar to PHP's associative array.
Dictionary is an unordered collection of (key, value) pairs.
The keys in dictionary must be hashable and immutable values. For example, integers and strings are OK for keys, but lists are not. Tuples are allowed as keys as long as they contain only strings, numbers or tuples containing immutable elements.
Some examples:
>>> capitals = { 'Spain' : 'Madrid', 'Norway' : 'Oslo', 'Latvia' : 'Riga', 'Costa Riga' : 'San Jose' } >>> capitals.keys() # Note: Order not guaranteed. ['Costa Riga', 'Latvia', 'Norway', 'Spain'] >>> capitals.values() ['San Jose', 'Riga', 'Oslo', 'Madrid'] >>> capitals.items() [('Costa Riga', 'San Jose'), ('Latvia', 'Riga'), ('Norway', 'Oslo'), ('Spain', 'Madrid')]
For iterating over keys, values or items use iterkeys(), itervalues() and iteritems() for better efficiency and avoiding huge copies in Python 2. In Python 3, by default, the keys(), values() and items() return iterators instead of list. Hence these iter functions are not available in Python 3.
When the keys are strings, using keyword arguments looks better:
>>> capitals = dict(Peru='Lima', Ukraine='Kiev') >>> capitals {'Peru': 'Lima', 'Ukraine': 'Kiev'}
Another way to construct dictionary is to use a list of (key, value) pairs:
>>> entries = capitals.items() >>> entries.append(('Portugal', 'Lisbon')) >>> entries [('Ukraine', 'Kiev'), ('Peru', 'Lima'), ('Portugal', 'Lisbon')] >>> capitals = dict(entries) # Constructor accepts list of pairs >>> capitals {'Peru': 'Lima', 'Portugal': 'Lisbon', 'Ukraine': 'Kiev'}
To add a single key value pair of ('India', 'Delhi'), you can do any one of the following:
>>> capitals['India'] = 'Delhi' # 1 >>> capitals.update({ 'India': 'Delhi' }) # 2
You can construct another copy of a dictionary given one:
>>> caps2 = dict(capitals) >>> id(caps2) 45447840 >>> id(capitals) 44940160
You can merge 2 dictionaries:
>>> capitals.update( { 'Canada' : 'Ottawa'} )
You can get the value by index or using get() method:
>>> capitals['Canada'] 'Ottawa' >>> capitals.get('Canada') 'Ottawa'
You can remove the element using pop() method:
>>> del capitals['Canada'] # Or capitals.pop('Canada') is also same. 'Ottawa' >>> Capitals.get('Canada', 'Missing Information') # Get with default. 'Missing Information' # Prints default when entry missing.
You can iterate on the keys:
>>> for country in capitals: # This is same as: for country in capitals.keys(): print country Costa Riga Latvia Norway Spain
If you want to iterate through values, use the capitals.values() instead in above loop.
You can list each element pair:
>>> for country in capitals: print country, capitals[country]
Following looks better than the above but does the same thing:
>>> for country, capital in capitals.items(): print country, capital
Following table summarizes the different statement types available:
Statement | Example | Description |
---|---|---|
Expression | f(a+b)+g(c+d) |
|
Assignment |
|
|
assert |
|
|
pass |
|
Dummy statement used when a statement is required by syntax rules. |
del |
|
|
|
|
|
return |
|
|
yield |
|
|
raise |
|
|
break |
|
|
continue |
|
|
import |
|
|
global |
|
|
exec |
|
|
if | if a>b : print 'a is greater' elif a == b: print 'same' else: print 'b is greater' if x < y < z: print('yes') |
|
while | while not finished: do_something() if problem: break some_more() else: final_steps() |
|
for | for i in range(100): do_some_thing() if error_occured(): break do_some_more() else: finishing_touches() |
|
try | try: something() except Exception, err: process_it(err) finally: do_cleanup() |
|
with | with open("f.txt") as f: lines = f.readlines() process_it(lines) |
|
Function definition | def f(x): return 2*x g = lambda x: 2*x |
|
Class definition | class Myclass(object): def f(n): return n*n |
|
The if-elif-else statement has following structure:
if a>b : print 'a is greater' elif a == b: print 'same' else: print 'b is greater' if x < y < z: print(' y is between x and z') if x != y and y != z and x != z : print('x, y, z are all different')
Note that you can not use else if -- it must be elif. The conditional expression of the forms such as x < y < z is allowed. Note that the logical and operator is and; logical or operator is or.
The for loop has following structure:
for i in range(100): if (some_condition) continue # Skip this iteration, go to next one. do_some_thing() if error_occured(): break # Terminate loop, break the else too. do_some_more() else: finishing_touches()
The i in range(100) is a common style for looping. The other common patterns are:
for i, v in enumerate(my_list): print(i, v) # If my_list == [10, 20, 30], it prints: # (0, 10) (1, 20) (2, 30)
You iterate through all elements in dictionary as follows:
for key, val in locals(): print(' %10s = %r ' % (key, val)) # Prints all variables in locals() dictionary.
The continue construct used to skip the current block and go to next iteration just similar to C language continue construct.
The break is used to break out of the loop just like C. It also breaks out of the else construct of the for loop.
The else block is executed if the for loop terminates normally without break. It can be thought of us the 'success path after the loop'. The break could be thought of as error path to break out of the whole statement which includes else part of the statement as well.
The while loop has following structure:
while not finished: do_something() if (some_condition) continue # Skip current iteration, go to next. if problem: break # Break the while loop and else clause. some_more() if all_done: finished = True # Let the loop terminate gracefully. else: after_success_steps() # Execute this if ``break`` was not called.
The continue and break statements behave just like same named C constructs. The else clause is unique to Python. For the purpose of understanding this clause-- imagine while construct acts like if for the final iteration:
if (cond): while (cond): ... ... else: else: # when cond is False ... ... ...
A simple global level functions can be declared like below:
def addxy(x, y): return x+y def mulxy(x, y): return x*y
Since functions are first class objects, they can be passed as argument:
def opxy(f, x, y): return f(x, y) >>> opxy(addxy, 10, 5) 15 >>> opxy(mulxy, 10, 5) 50
Python allows you to pass parameters as named arguments when you call functions. This provides clarity when there are many parameters:
def subxy(x, y): return x - y >>> subxy(y=10, x=20) # <=== Ok to change order of parameters. 10 >>> subxy(x=100) # <=== Not OK: Error: y is missing ... TypeError: subxy() takes exactly 2 arguments (1 given)
We call x, y as positional arguments-- in the absence of parameter name with the call first parameter is x, second parameter is y. However these positional parameters becomes 'named arguments' when they are called in that style as specified above.
Optional arguments allows to skip passing some arguments and assume default values for missing parameters:
def subxy(x=0, y=0): return x-y >>> subxy() 0 >>> subxy(10) 10 >>> subxy(10, 5) 5
Even optional arguments can be called with 'named parameters' style:
>>> subxy(y=10) # x is assumed to be 0 -10
Note
Do not confuse Named Parameter and Optional Arguments. The y=0 in the above function definition specifies optional argument. The y=0 in the function call specifies named parameter.
It is possible to support variable number of arguments.:
def p(*args): print(args) # <== args is a tuple of all var args! >>> p(10, 20, 30) (10, 20, 30)
This is useful when you want to be flexible in accepting any number of arguments. Example:
# Following call may push any number of specified values into your stack. >>> push_into_my_stack(10, 20, 30, 40)
You can have keyword arguments. This allows passing arbitrary number of key, value pairs in named-parameter style:
def p(**kwargs): print(kwargs) # <== kwargs is a dictionary! >>> p(x=10, y=20, z=30) {'y': 20, 'x': 10, 'z': 30}
Another alternative way of achieving the above would be:
def p(options={}) print(options) >>> p({'x':10, 'y':20, 'z':30}) {'y': 20, 'x': 10, 'z': 30}
However, as you can see, calling the function with dictionary argument is clumsy and not as elegant as the named parameters.
You can combine all the styles mentioned earlier like below. Some basic rules apply (as you may expect) to avoid ambiguity during function call:
def p(msg, kind='INFO', *args, **kwargs): print(msg) print(kind) print(args) print(kwargs) >>> p('Job Incomplete.', 'WARNING', 'Job100', 'Job200', batch=1, dest='remote') Job Incomplete. WARNING ('Job100', 'Job200') {'dest': 'remote', 'batch': 1}
Python mostly makes use of static scoping -- meaning the variables referenced from a function are bound to the module in which the function is defined-- not bound to the caller's module context. This enables robust modular design.
Python is dynamically and strictly typed. Dynamic because you can just create variable by assigning something to it. And reassign to different variable of different type anytime
age = 90 # No need to predeclare the type. age = 'it is 90 years' # Reassigned to another object of another type!
It is strictly typed, because every object behaves strictly according to associated type and you can not refer to an undefined variable:
#!python age = 50 age = '40' # Reassignment of different type OK. age = age - 10 # Error! Type mismatch: Expected numeric, found string
This is unlike C. In C, variables are statically and strictly typed. In PHP, variables are dynamically and loosely typed:
#!php $age = 50 $age = '40'; $age = $age - 10; // Perfectly fine. '40' becomes int 40. echo $age; // Prints 30
Also note that, Python variables can't be referenced before assignment-- You will get NameError. In PHP, such an action will generate warning but returns usually a default value of empty string. In that sense, Python is 'more strictly' typed.
Language | Typing Static/Dynamic | Typing Strict/Loose | Comments |
---|---|---|---|
C/C++/Java | Static | Strict | |
Python | Dynamic | Strict | |
PHP/Perl | Dynamic | Loose |
The module level variables are referred as global variables. Function definition introduces a new level of scoping. A class declaration introduces a new level of scoping. A namespace in Python is currently implemented using dictionaries, but that is an implementation detail.
The available namespaces can be summarised as below:
Name Space | Comments |
---|---|
Innermost | The current scope in the innermost function. print(locals()) will print the local names. This is searched first for read/write. |
Enclosing Functions | If the current scope is inner function, then the enclosing function has a separate namespace. This is searched second if applicable. All such enclosing function scopes are searched in that order. The variables in this scope are called non-local. You can read non-local variables, but can not re-assign them to new values in Python 2.x -- however if it is mutable, you can modify it using the same reference object. You can write/reassign nonlocal objects in Python 3 using new nonlocal declaration of the variable. |
current module | All module level names are searched next. These variables are referred as globals. |
builtin names | All built-in names lives under __builtin__ module, which are searched last. |
Python rules for accessing global variables from inside function is not-obvious -- and often results in confusion for python beginners:
gvar = 10 # If you are only reading global variable, no need to declare them as global def my_func(): print('gvar is %d ' % gvar) # <==== Prints: gvar is 10 OK!
However, the moment you assign some value to a variable anywhere in the function, that variable becomes a local variable unless you declare it otherwise:
gvar = 10 def my_func(): print('gvar is %d ' % gvar) gvar = 20 >>> my_func() UnboundLocalError: local variable 'gvar' referenced before assignment
To fix this error, declare gvar as global:
gvar = 10 def my_func(): global gvar print('gvar is %d ' % gvar) gvar = 20
People usually expect the global variables to be always available inside function or always not-available inside function unless explicitly declared otherwise. 'Global variable is available for read, but not for write' rule is not obvious and can be confusing to Python beginners. However, once you understand this rule, there is no more confusions.
Python program is organized into a collection of modules. Each module is nothing but a python program file with .py suffix (or .pyc or .so shared object file). In addition, a directory may be used to maintain a collection of modules together as a package. The import statement is used to import a module before accessing the variables available in that module.
The import statement follows any one of this format:
import mod1 # Lets you access mod1.f() import mod1 as m # Lets you access m.f() instead of mod1.f() from mod1 import my_var # Lets you access my_var directly. from mod1 import v1, v2 # Lets you access v1, v2 directly. from mod1 import my_var as v # Lets you access my_var which is aliased to v. from mod1 import * # Imports all variables in mod1. Not recommended. import pkg1.mod1 # Lets you access pkg1.mod1.f() import pkg1.pkg2.mod1 # Lets you access pkg1.pkg2.mod1.f() from pkg1.mod1 import my_var # You can access my_var directly.
If you use the statement of the form import X.Y, then X must be a package and Y may be a subpackage or a module-- However, note that X can not be a module and Y can not be just object. The use of dots is reserved for packages only:
import mod1.my_var # <==== This is not allowed!
The sys module contains the path object which specifies the list of module search directories. To examine it's value, we must import the sys module first:
>> import sys >> from pprint import pprint >> pprint(sys.path) ['', '/home/user/myenv/bin', '/home/user/myenv/local/lib/python2.7/site-packages/distribute-0.6.24-py2.7.egg', '/home/user/myenv/local/lib/python2.7/site-packages/pip-1.1-py2.7.egg', '/home/user/myenv/lib/python2.7', '/home/user/myenv/lib/python2.7/plat-linux2', '/home/user/myenv/lib/python2.7/lib-tk', '/home/user/myenv/lib/python2.7/lib-old', '/home/user/myenv/lib/python2.7/lib-dynload', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/home/user/myenv/local/lib/python2.7/site-packages', '/home/user/myenv/local/lib/python2.7/site-packages/IPython/extensions']
The module search path (i.e. sys.path) includes certain platform dependent default directories. In addition, you can control the search path by one of the following methods:
Set PYTHONPATH environment variable to include additional directories. They will be prefixed into your module search path.
The program can dynamically modify the module search path. It can do the following first:
>>> import sys >>> sys.path.insert(0, '/path/to/my/module')
If you are running virtualenv, then you can choose to place your module in the local site-packages directory without affecting other installations.
The packages are nothing but directories with a file named __init__.py. It may initialize the package or may just be empty. They are organized like this:
pkg1 # Top level package __init__.py # Initializes the top level package pkg2 # pkg2 is under pkg1. __init__.py # Initializes pkg2. mod1.py # This is module-1 in package-2. mod2.py # This is module-2 in package-2.
After importing module, you can access all global variables in that module. However, there are couple of mechanisms you can use to restrict the variables being exported:
All variables starting with underscore (_) are excluded from the export list.
A module can define a global variable __all__ like below to restrict only the specified variables to be exported. All other global variables will not be exported:
__all__ = ['mysym1', 'mysym2', 'mysym3' ]
Similarly, when you import a package, all global variables in __init__.py file or the restricted symbols defined by __all__ are available for access.
See http://docs.python.org/ref/import.html for more information.
You can also import from future! i.e. It is helpful for gradual migration to start using the new features that will be introduced in near future in a later release.:
>>> from __future__ import print_function
Above statement enables the use of print() function in Python 2.7 instead of print statement. In Python 3, print() is available only as function. A future statement is special in the sense it is recognized and treated at compile time.
In Python2.7, the imports are by default relative. i.e. If a module imports another module, it would search in the same package first before searching other sys.path directories. This has the danger of hiding system modules in case some one creates a local module with same name as one of system modules.
X.py:
import Y # Looks into current pkg first in Python2.7, not in Python3
In Python3, all imports are absolute by default. So, module X in some package want to import module Y which is in the same package, then following will work:
from . import Y # Looks into current pkg first in both Python2 and Python 3
If module X uses functions defined in module Y and vice versa, then following case of mutual import is possible:
# mod_x.py import mod_y # From module X # mod_y.py import mod_x # From module Y
In general, it is good to avoid designing modules which depend on each other -- it is often a sign of poor design. However, there are cases where it is perfectly alright to do so. Python does necessary checks to prevent any infinite recursion as a result of mutual imports -- either direct or indirect.
If two modules access each other's objects at the global level initialization, try delaying the import as late as possible.
For example, let us consider the following files:
# mod_x.py def f_x(): mod_y.some_func() def g_x(): pass import mod_y # import of mod_y is delayed as much as possible. mod_y.some_func()
The mod_y.py file is given below:
# mod_y.py def some_func(): pass import mod_x # import of mod_x is delayed as much as possible. mod_x.g_x()
The delaying of import as mentioned above just works fine.
You can make a module importable and at the same time executable by including some code conditionally like below:
if __name__ == '__main__': import sys exit(main(sys.argv)) # or run some unit tests.
Apart from built-in types, there are many additional datatypes defined by the Python standard library. We will take a closer look at them below.
Type | Comments |
---|---|
namedtuple | Good alternative to structure type in C: >>> Point = collections.namedtuple('Point', 'x y') >>> p = Point(x=10, y=20) Point(x=10, y=20) >>> p.y 20 |
deque | It is double ended queue, pronounced as deck. It is like built-in list type, but optimized for insert/delete operations at both front and back. Insertion at front of built-in list has very poor performance compared to deque. |
Counter | A Counter is a dict subclass for counting hashable objects: >>> fruits = ['apple','apple','orange','grapes','grapes','apple'] >>> c = Counter(fruits) >>> c Counter({'apple': 3, 'grapes': 2, 'orange': 1}) |
OrderedDict | It is like dict, but the entries are ordered. You can sort them using key or sort function. |
defaultdict | It is subclass of builtin dict with one difference: If you access a missing value in dictionary, it does not raise KeyError, but provides a 'default' value which you can choose: >>> grades = [('Ram', 'B'), ('John', 'A'), ('Jack', 'B')] >>> d = defaultdict(list) # Default is empty list. >>> for name, rating in grades: ... d[rating].append(name) >>> d.items() [('A', ['John']), ('B', ['Ram', 'Jack'])] |
collections ABC | Defines various collections abstract base classes including Container, Iterable, Iterator, etc. These are useful to check if an instance implements specific interface, for example: isinstance(myvar, collections.Sequence) Also useful to define custom collections where you can leverage mixin methods implemented in the ABC classes. |
For more information about other datatypes defined by Python Standard Library, See http://docs.python.org/2/library/datatypes.html
You can iterate over any container objects such as list, set, etc. A container object may or may not be a sequence. A set is a container but not a sequence.
Iterator is an object which helps with efficient iteration over any container. It is somewhat similar to C language's pointer pointing to an array element -- in the sense that it consumes minimal resources and just remembers the position in the container.
How is this concept helpful ? Once a container follows iterator protocol, it becomes an iterable object. For example:
for e in my_obj: print('e is %r' % e)
As long as the my_obj follows iterator protocol, above will work. This essentially means the following:
Note the following methods:
container.__iter__() : Return a new iterator object. >>> x = [10, 20] >>> t1 = x.__iter__() >>> t2 = x.__iter__() # or: t2 = iter(x) >>> t1 is t2 # They must be 2 separate instances. False iterator.__next__() : Should return next value. On reaching end, should raise StopIteration Exception. >>> t1.next() 10 >>> t1.next() 20 >>> t1.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration iterator.__iter__() : Should return itself. >>> t3 = t1.__iter__() >>> t3 is t1 True
So, a container is iterable and can have multiple iterators pointing to it. The idea is that container does not preserve the state of iteration. If you want to iterate, get a new instance of iterator using the container, then do whatever you want with the iterator.
The iterator protocol is cool, because we can use for loop like below irrespective of whether obj is container object or iterator:
for e in obj: do_some_thing(e)
It calls __iter__() and __next__() underneath for you and stop the iteration once it gets StopIteration exception. You can easily define your own container classes as long as you follow this protocol.
Note
The itertools module provides powerful functions such as imap, islice, izip, icycle, combinations, permutations, etc which are all generators. This enables efficient traversal of iterables and operations, by processing one at a time.
Python's classes are somewhat similar to C++ and Modula-3. If you are already familiar with C++, the Python's data member variables are always public and all methods are virtual.
The method functions must be declared explicitly with the first argument representing the object-- the current object is often named as self by convention. This is similar to the this object of C++ or Java:
class Vehicle: """Generic Vehicle Class""" x = y = 0 # Class data member variables def __init__(self, year): self.year = year def move(self, x, y): self.x = x self.y = y print('Vehicle moved to (%d, %d)' % (x, y))
To instantiate class instance, we use function notation-- there is no need to use new operator used in C++ or Java:
>>> v = Vehicle(2013) >>> v.move(10,20) Vehicle moved to (10, 20)
In above example, x, y are class variables which are inherited by the instances. In addition, you can create new instance variables anytime:
v.model = 'Honda' # Created on assignment ... del v.model # The instance variable is undefined now.
Note that v.move(10, 20) is same as Vehicle.move(v, 10, 20).
A method combines the function object, class object and class instance.
In the above example, v.move is a bound method
>>> v.move <bound method Vehicle.move of <__main__.Vehicle instance at 0x26f1cb0>>
However Vehicle.move is an unbound method-- i.e. unbound to any instance.
>>> Vehicle.move <unbound method Vehicle.move>
The underlying function object of the method is available for examination:
>>> v.move.__func__ <function move at 0x2746140>
Static method concept is same as in C++/Java. It is independent of the instance. This may be useful to group some functions which are logically related
class Test(object): @staticmethod # <=== Note the decorator! def setup_tests(): print("Setup done for all tests") >>> t = Test() >>> t.setup_tests() Setup done for all tests
The staticmethod is a built-in method which is used as function decorator as @staticmethod just preceding the function definition. The effect of the function decorator is same as following:
class Test(object): def setup_tests(): print("Setup done for all tests") setup_tests = staticmethod(setup_tests) # <== Same like using decorator.
In addition, there is class method which is not found in C++/Java. The method is concerned with the relevant class information only. A classical use case is to provide alternate constructors:
class Test(object): def __init__(self): print("Common intialization steps done.") @classmethod def remote_test(cls): c = cls() print("Additional initialization for remote tests done.") return c >>> t = Test() Common intialization steps done. >>> t = Test.remote_test() Common intialization steps done. Additional initialization for remote tests done.
One of the ways a singleton may be implemented is by defining a getinstance class method which returns a single instance.
There is a built-in object called object which every top level class must derive from. If the class extends any built-in type (such as list), then it already indirectly extends from the builtin object. The top level classes which do not derive from object are called old style classes.
The new-style classes were introduced in Python 2.2 in order to unify built-in types and classes . In terms of declaration, they look like below:
class NewStyleClass1(object): pass class NewStyleClass2(Othernewstyleclass): pass class OldStyleClass: pass
In Python 3, all classes are new-style. i.e. No need to explicitly derive from object-- they are all implicitly derived from object.
Only new-style classes can use new features like descriptors.
For more information, See http://www.python.org/doc/newstyle/
The __init__() method defined in the class is the constructor. Note that the base class constructor is not automatically called-- it should explicitly be called
class Parent(object): def __init__(self): print('Parent initialized') class Child(Parent): def __init__(self): print('Child initialized') c = Child() # Prints just: Child initialized
To ensure base class constructor is called, do the following:
class Child(Parent): def __init__(self): Parent.__init__(self) # <== Works with even oldstyle classes. print('Child initialized')
With new style classes, you can use super() function instead of explicitly referring to the base class:
class Child(Parent): def __init__(self): super(Child, self).__init() # <== super available only with newstyle classes print('Child initialized')
Note the parameters: super(Child, self)-- The Child class argument is not redundant one given the self. That is because the self object indeed could have been an instance of a GrandChild class.
Python-3 supports the shorter version of super() without having to explicitly specify the Class and object. The missing parameters are automatically detected using the stackframe and used accordingly:
class Child(Parent): def __init__(self): super().__init__() # <== short form super() being used. print('Child initialized')
In addition, the __new__() method is called to create a new instance first before the __init__() function initializes the value. User defined __new__() method is rarely used:
class A(object): def __new__(cls, x): print('A __new__() called') return object.__new__(cls) def __init__(self, x): self.x = x print('A __init__() called') >>> a = A() A __new__() called A __init__() called
This is useful only when you want to control new instance creation. For example:
You can do the cleanup of the instance from the destructor method __del__():
class Test(object):
- def __init__(self, outfilename):
- self.fout = open(outfilename, 'w')
- def __del__(self):
- print('Deleting Test object ...') self.fout.close()
>>> t = Test('/tmp/mytest.out') >>> del t # Explicit Delete! Deleting Test object ... >>> t # t is undefined after delete! Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 't' is not defined>>> t = Test('/tmp/mytest.out') >>> t = 10 # Destructor called by garbage collection!
The chaining of destructors and use of super() method is similar to that of constructors. However, you destruct the child resources first before calling the destructor of parent objects.
You can extend multiple classes if so desired -- i.e. Multiple inheritance is supported:
>>> class Car(Vehicle, Product): no_of_wheels = 4 >>> Car.mro() # Displays method-resolution-order [__main__.Car, __main__.Vehicle, __main__.Product, builtins.object]
Multiple inheritance should be avoided if possible as it further complicates the use of super(). The super() really invokes the method in the next mro chain. This may lead to surprises. Either super() should be used in all base classes or should not be used at all favoring explicit invocation.
In Python, everything is an object. Even the fundamental built-in types are objects including int, float, etc. This is different from Java and C++ where primitive datatypes like int, float are not objects.
An int variable is bound to an integer object of type 'int'. The most fundamental thing in the type hierarchy is 'type'. Everything is directly or indirectly an instance of 'type'. The 'type' itself is an object of type 'type'. The buck stops there:
>>> a=10 >>> type(a) int >>> type(int) type >>> type(type) type >>> type(list) type
Any class is also an object of some type -- which is a metaclass. The 'type' type is a metaclass since an instance of 'type' is another type. You can also define your own metaclass which is another topic for discussion.
Certain operations or built-in functions may generate exceptions. You can generate exceptions or catch specific exceptions for error handling:
>>> 1/0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: integer division or modulo by zero >>> f = open('/tmp/non-existing-file', 'r') Traceback (most recent call last): File "<stdin>", line 1, in <module> IOError: [Errno 2] No such file or directory: '/tmp/non-existing-file' >>> x Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'x' is not defined
You can use try-except block to catch the exceptions generated in try-block:
try: something() except ExceptionClass as exception_obj: process_exception(exception_obj) raise exception_obj # You can re-raise the exception like this.
The legacy syntax alternative for except is as follows:
except ExceptionClass, exception_obj: # Works with Python 2, not with 3. ...
Since the above syntax is confusing, this has been deprecated in Python 3 in favour of the below form:
except ExceptionClass as exception_obj: # Works with Python 2.6+ and 3+ # Python 3 allows only this syntax.
You can catch multiple exceptions as given below. The following code works with Python 3:
>>> while True: s = input('Enter expr > ') # Use raw_input() in Python 2 instead. try: eval(s) except ZeroDivisionError as err: # 1/0 print('Got ZeroDivisionError:', err) except ValueError as err: # int('something') print('Got ValueError:', err) except NameError as err: # undefined_var print('Got NameError:', err) except Exception as err: # e.g. 'some' + 3 (TypeError) print('Got Exception:', err) Enter expr > x Got NameError: name 'x' is not defined Enter expr > 1/0 Got ZeroDivisionError: division by zero Enter expr > int('some') Got ValueError: invalid literal for int() with base 10: 'some' Enter expr > 'some' + 3 Got Exception: Can't convert 'int' object to str implicitly Enter expr > a++ Got Exception: unexpected EOF while parsing (<string>, line 1)
To examine the exception hierarchy, use the mro() method:
>>> NameError.mro() [builtins.NameError, builtins.Exception, builtins.BaseException, builtins.object] >>> KeyboardInterrupt.mro() [builtins.KeyboardInterrupt, builtins.BaseException, builtins.object]
The order of except clause is important. The specific exceptions should appear first and more general base exceptions should appear later in that order. If the except Exception as err: had appeared first, it would catch other exceptions such as NameError, ValueError since those exceptions are derived from Exception. It is in general a dangerous practice to catch basic exception classes like Exception unless you really mean to-- because this can hide the real problem if you are not re-raising the exception that you really didn't intend to handle.
You can define your own exceptions for finer level of error handling:
>>> class MyAppError(Exception): def __init__(self, reason): self.reason = reason ... ... if value > max_value : raise MyAppError('Got a value which is too large.')
The try-finally clause is used to make sure the cleanup action specified in finally block is always executed in any case:
>>> try: some_function() except MyAppError as err: ... finally: print('closing any open files ...') ...
The code in finally block is always executed irrespective of whether there was exception generated or whether it was handled or not. Even if there was a return statement in try block, then also finally clause is executed. If the exception raised in try block was not handled, then it is re-raised later after the finally block is executed.
In versions prior to 2.5, try-except-finally was not supported-- however the try..except had to be nested inside try..finally to achieve the same result:
>>> try: try: some_function() except MyAppError as err: ... finally: print('closing any open files ...') ...
The finally can also appear in try-except-else-finally form where the else clause is meant to be executed if there were no exceptions generated in the try-block. However it is best to avoid the else clause, since the naming of this clause is often more confusing than helpful.
Python supports functional programming style to a large extent. Specifically the functions are first class objects-- they can be passed as as argument to other functions and manipulated like any other variable.
However it is worth noting that Python is not a pure functional style language such as Lisp or Haskell. Pure functional languages tends to avoid state and mutable data as much as possible and tries to arrive at the solution by applying a composition of functions on input data as against the imperative style.
The built-in functions filter, map, reduce are useful functional programming tools that can be applied on collection types such as list and set. Following example is given for Python 2:
>>> def my_filter(x): return (x % 3) != 0 # Use it to filter out 3's multiples. >>> filter(my_filter, range(20) ) [1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19] >>> def mul2(x): return 2*x >>> map(mul2, range(10)) # Applies func on the sequence [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] >>> def sumxy(x, y): return x + y # Function can take many args. >>> map(sumxy, range(5), range(100,105)) # Pass as many lists as func args. [100, 102, 104, 106, 108] >>> map(None, range(5), range(100,105)) # None acts like identity function. [(0, 100), (1, 101), (2, 102), (3, 103), (4, 104)] >>> reduce(sumxy, range(5)) # Does cumulative sum. 10
This feature is inspired from Haskell language. List comprehension provides easier way to construct lists without having to use map and filter. As from the previous example, to construct a list of integers excluding multiples of 3 in a specific range, we can do the following:
>>> [ i in range(20) if i % 3 != 0 ] [1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19] # You can also generate a list of pairs like below ... >>> [ (i, 2*i) for i in range(6) if i % 3 != 0 ] [(1, 2), (2, 4), (4, 8), (5, 10)]
The dictionary comprehension is now available in Python 2.7+ and 3. It is just like list comprehension applied to dictionary. For example, to invert a mapping of a given dictionary, you can do:
>>> reverse_map = { val:key for key, val in my_map.items() }
Lambda is used to create simple anonymous functions:
>>> f = lambda x: 2*x >>> f(10) 20 >>> map(f, range(4)) [0, 2, 4, 6] # lambda is mainly used for one-off use on-the-fly >>> map(lambda x: 2*x, range(4)) [0, 2, 4, 6]
Lambda function does not support multiple statements or explicit return statement-- It should just be used to calculate a simple expression. The lambda function also provides Closure support-- which essentially means that it remembers the variables of the (static compile-time) enclosing scope which is available for reading inside the function. The closure is a very powerful mechanism which provides ability to generate context specific dynamic functions.
It is less powerful than the more general inner functions, yet it is very useful since often simple-expression-functions are widely used.
From Python's documentation:
Abstract Base Classes (abbreviated ABCs) complement duck-typing by providing a way to define interfaces when other techniques like hasattr() would be clumsy. Python comes with many builtin ABCs for data structures (in the collections module), numbers (in the numbers module), and streams (in the io module). You can create your own ABC with the abc module.
Python does not have interface like in Java. The abstract classes and interfaces are not same things, but there is significant overlap.
Consider:
class Vehicle(): def __init__(self, model): self. model = model class Car(Vehicle): def __init__(self, model): self. model = model
To define abstract method move(), you can do:
import abc class Vehicle(object): __metaclass__ = abc.ABCMeta @abc.abstractmethod def move(self, distance): """Implement your own move method. This is abstract method!""" return >>> v = Vehicle() TypeError: Can't instantiate abstract class Vehicle with abstract methods move
ABC is available only from version 2.6. There is also an independent optional module http://pypi.python.org/pypi/zope.interface which can be used for your contract programming style implementations.
First, let us fix some basic terminologies that we will use for discussing internationalization support available in Python.
Term | Comments |
---|---|
ASCII | This is a basic 7-bits character set based on English alphabets. It defines 128 characters which includes some nonprintable characters too. ASCII code of a character is a numeric value assigned to the character. The implementation of this 7-bit ASCII character set consumes 1 byte per character. (storing the 8th bit to 0) So, if a string contains any character whose byte value is > 127, it is not a valid ASCII string. |
ISO-8859-1 | This is an 8-bits character set which defines 256 characters and superset of ASCII. The additional 128 characters include some western characters and graphical characters. The ISO-8859-1 encoding means to use the sequence of bytes whose numeric value matches the character code. So, if you have a garbage random sequence of bytes, it will be technically a valid ISO-8859-1 string. |
ISO-8859-X | Not all western and other languages were represented by ISO-8859-1 (aka Latin-1). A series of character sets ISO-8859-1, ISO-8859-2, etc were defined to include specific language(s). They were all incompatible with one another, because you can't have everything in 1 byte. |
unicode | This currently defines more than 100,000 characters and can define more than 1 million characters. The first 256 code points are identical to ISO-8859-1. To store unicode string into a file, we need an encoding mechanism, because 'single byte is 1 char' idea no longer works. |
A naive way of encoding unicode characters would be to say, 'I will use 3 bytes per character': which can essentially represent total 256*256*256 > 16 million different characters. However, this can be a real waste of time and space if most of the frequently used characters can be represented in single or 2 bytes. Hence different encoding mechanisms were invented for better optimization.
The most popular encoding for unicode is UTF-8. If an unicode string contains only ASCII characters, then UTF-8 encoding will return the same string. For other characters, it uses some sort of escape sequence. That is an over simplification, but you get the idea. At the worst case, it may take four UTF-8 bytes to represent single unicode character.
Another good characteristic of UTF-8 is that, you won't accidentally see an ASCII character in the encoded string which was not originally in the source string. i.e. If an non-ASCII unicode character is translated to 3 bytes of UTF-8, none of those characters will be ASCII (i.e. byte value < 128). This is by design. So, if you are searching for an ASCII word in your file which has stored UTF-8 encoded unicode contents, you won't get any false positives! This also means, your readline() function which looks for newline character will just work fine with UTF-8 encoded contents.
Following discussion applies to Python version 2.7. In Python 3, there are some changes which we will mention later.
By default, a string object of type str contains a sequence of raw bytes. It is upto you what you want to do with it.
unicode string is an abstract concept. Sequence of bytes is a concrete concept. You never write unicode string-- you always write an (eg: utf-8) encoded unicode string. Similarly, you never read unicode string directly-- you read bunch of bytes and decode them to create unicode string:
my_str.decode('utf-8') --> my_unicode my_unicode.encode('utf-8') --> my_str
Python supports unicode strings natively. Try this on your terminal:
>>> import sys >>> sys.stdout.encoding 'UTF-8' >>> sys.stdout.isatty() True >>> u_king = u'\u265a \u2764 \u265b' >>> print(u_king) ♚ ❤ ♛ >>> len(u_king) 5
Your terminal probably has been opened already in 'UTF-8' encoding (most popular). What this means is that, if you write unicode string into sys.stdout, then it will translate that to UTF-8 byte sequence and write it. If your terminal is capable of understanding this encoding, you will see nice 'king loves queen' symbols. Note that the string length is 5 which includes the 3 exotic symbols and 2 spaces.
Let us convert this unicode string to utf-8 encoded string, so that it can be safely written to a file:
>>> b_king = u_king.encode('utf-8') >>> b_king '\xe2\x99\x9a \xe2\x9d\xa4 \xe2\x99\x9b' >>> len(b_king) 11
As you can see, the 'u265a' unicode king symbol was encoded into 3 bytes UTF-8 sequence. You can safely write this into a file in binary mode and read it later and convert it back to unicode using the same encoding:
>>> fout = open('out-bin', 'wb') >>> fout.write(b_king) >>> fout.close() >>> fin = open('out-bin', 'rb') >>> asc_in = fin.read() >>> uni_in = asc_in.decode('utf-8') >>> print(uni_in) ♚ ❤ ♛
You can automate this process of encoding and decoding if the input/output stream is aware of what you really want to do with the bytes. You can do this with codecs module:
>>> import codecs >>> fin = codecs.open("out-bin", "r", "utf-8") >>> s = fin.read() >>> s u'\u265a \u2764 \u265b' >>> type(s) unicode
You will notice few interesting things:
Similar translation happens if you open a file for writing as well-- you can directly write the unicode string into output stream and it will auto-convert that to utf-8 encoded byte stream.
Tip
The codecs module can help you to automate encoding/decoding of unicode and other character sets between data streams.
Now let us take a look at the changes in string type in Python 3. The 'unicode' has become the default string type. So the typename 'unicode' is gone in Python 3. The old Python 2.0 'str' type is now called 'bytes' in Python 3.
Some Python 3.0 examples dealing with strings:
>>> type('hello') <class 'str'> >>> type(b'hello') <class 'bytes'> >>> u'hello' # Invalid syntax error
In Python 3, you can also specify the encoding directly on file opening:
>>> fout = open('out-bin', 'wt', encoding='utf-8') >>> u_raja = '\u265a \u2764 \u265b' >>> type(u_raja) builtins.str >>> fout.write(u_raja) 5 >>> fout.close()
Now open another terminal and examine file contents:
$ od -x out-bin 0000000 99e2 209a 9de2 20a4 99e2 009b 0000013
As you can see, od -x command displays 2 bytes hexa-decimals chunks with higher order byte first. So the first 3 bytes are: 0xe2 0x99 0x9a This is the UTF-8 encoding for the unicode king symbol \u265a. Now read the contents back:
>>> fin = open('out-bin', 'rt', encoding='utf-8') >>> raja_in = fin.read() >>> type(raja_in) builtins.str >>> print(raja_in) ♚ ❤ ♛
Let us directly encode the string to utf-8:
>>> b_str = raja_in.encode('utf-8') >>> b_str b'\xe2\x99\x9a \xe2\x9d\xa4 \xe2\x99\x9b' >>> type(b_str) builtins.bytes
Python allows you to write your program by mixing with embedded unicode strings. However, identifier and reserved words should only be in ASCII. If the source code contains non-ascii character, then explicitly mentioning the encoding is a good idea like below:
#!/usr/bin/python # -*- coding: UTF-8 -*- king1 = '\u265a \u2764 \u265b' king2 = b'\xe2\x99\x9a \xe2\x9d\xa4 \xe2\x99\x9b' king3 = '♚ ❤ ♛' print(type(king1), king1) print(type(king2), king2) print(type(king3), king3)
The above program output prints:
<class 'str'> ♚ ❤ ♛ <class 'bytes'> b'\xe2\x99\x9a \xe2\x9d\xa4 \xe2\x99\x9b' <class 'str'> ♚ ❤ ♛
Note that the only line which contains non-ascii characters is the line which does assignment to king3. The other statements are fully written in ascii. Depending on the file editor program, the file may have been written out using UTF-8 (hopefully) or some other encoding. Python has to know what encoding was used to write the original file so that it can translate properly on reading. For example, if you replace UTF-8 with ASCII, the above program will fail. Now a days, UTF-8 is assumed to be the default encoding, so deleting the UTF-8 specification line above may still work, but it is always good idea to specify this if your source code contains non-ascii characters.
Following example illustrates how explicit encoding/decoding works in Python 3:
>>> s = 'San José' >>> enc_s = s.encode('utf-8') >>> type(enc_s) <class 'bytes'> >>> enc_s b'San Jos\xc3\xa9' >>> enc_s.decode('utf-8') 'San José'
There are few things which are fairly unique to Python. These may not be so unique to all languages which exist today, but fairly unique considering other mainstream programming languages:
Python implementation adds various magic attributes of the form __attributename__ into the objects. They are used for special purposes. Following table includes some of those important ones.
Attribute Name | Comments |
---|---|
obj.__dict__ | A dictionary of object's (in general writable) attributes. |
inst.__class__ | The class object of the instance. |
class.__bases__ | The tuple of base classes of a class object. |
class.__name__ | The name of the class or type. |
class.__mro__ | The tuple of base classes considered during method resolution. |
... etc ... |
By default assignment such as a = b makes both variables point to the same object. If you want to make a copy of the object, you can use copy module:
>>> import copy >>> a = [1, [2, 3], 4] >>> b = copy.deepcopy(a) >>> a[1].append(5) >>> a [1, [2, 3, 5], 4] >>> b [1, [2, 3], 4]
The built-in lists are powerful enough for most cases. You can append or insert at arbitrary position. However inserting into list costs O(N) complexity. If that is a problem, consider using one of optional packages found in PyPI. For example, llist or blist.
Use lower_case_name for functions, methods, attributes
Use MyClassName for classes.
Avoid using camelCase
Module internal attributes: _mod_var
Class private attributes: __private_var.
The class private variable form __private_var is internally translated to classname_private_var to avoid name collisions with other classes in the inheritance hierarchy.
To findout python's list of keywords:
>>> import keyword >>> keyword.kwlist ['and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'exec', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'not', 'or', 'pass', 'print', 'raise', 'return', 'try', 'while', 'with', 'yield']
The first statement import keyword makes the module named keyword available. The second statement prints the value of keyword.kwlist, that is, the object kwlist that has been exported from keyword module. The other famous modules include os, sys, re, network, etc.
As you can see, Python has a minimalistic approach when it comes to keywords.
Following table summarizes the operators and their precedence as documented from official python documentation.
The order displayed is the lowest precedence appearing first to highest precedence operators appearing at the last.
Operator | Description | |
---|---|---|
lambda | Lambda expression | |
if – else | Conditional expression | |
or | Boolean OR | |
and | Boolean AND | |
not x | Boolean NOT | |
in, not in, is, is not <, <=, >, >=,<>, !=, == | Comparisons, including membership tests and identtests. | |
| | Bitwise OR | |
^ | Bitwise XOR | |
& | Bitwise AND | |
<<, >> | Shifts | |
+, - | Addition and subtraction | |
*, /, //, % | Multiplication, division, remainder | |
+x, -x, ~x | Positive, negative, bitwise NOT | |
** | Exponentiation | |
x[index], x[index:index], x(arguments...), x.attribute | Subscription, slicing, call, attribute reference | |
(expressions...), [expressions...], {key: value...}, `` expressions... `` | Binding or tuple display, list display, dictionary display, string conversion |
The best way to implement singleton is by intercepting at the object creation time by overriding the __new__() method:
class Singleton(object): """Illustrates how to implement Singleton object""" _inst = None def __new__(cls, *args, **kwargs): if cls._inst is None: cls._inst = super(Singleton, cls).__new__(cls, *args, **kwargs) return cls._inst
Now instantiating the class any number of times will return the same instance:
>>> obj1 = Singleton() >>> obj2 = Singleton() >>> id(obj1) == id(obj2) True
Extending immutable class is bit tricky since you can't modify the underlying base data members once it is created. You need to override the __new__() method as illustrated in the following example:
class Mylink(str): """Illustrates how to extend an immutable class. It extends basic string class. It prefixes the given string with 'http://' if not already present during intialization. """ def __new__(cls, s): if (not s.startswith('http://')): s = 'http://' + s; return super(Mylink, cls).__new__(cls, s)
Note: This can only be done from __new__() method and can not be done from __init__() method since the base class is immutable. Following won't work since it is too late to modify base object:
class Mylink(str): def __init__(self, s): if (not s.startswith('http://')): s = 'http://' + s; str.__init__(self, s) # <=== No effect! Does not work.
Given file passwd which is in /etc/passwd format, and give another file group which is in /etc/group format, write functions for following:
username:x:user_id:group_id:Descripton:/home/dir:/shell/path
root:x:0:0:root:/root:/bin/bash lightdm:x:104:111:Light Display Manager:/var/lib/lightdm:/bin/false mysql:x:115:125:MySQL Server,,,:/nonexistent:/bin/false
group_name:x:group_id:user1,user2,...
mysql:x:125:www-data,thava
Return list of usernames which satisfies given regular expression: eg: my*
Here is some good resources for learning Python:
Python FAQs: Collection of Python FAQs
This document has been written using RestructuredText and converted to HTML using rst2html command.
See the document source text