Python Concurrency

   Thava Alagu
http://zatvia.com
    March 2013
Note: Use Right/Left Arrow keys to navigate.

Presenter Notes

Introduction - Thava

  • Opensource Enthusiast
  • MySQL Database developer earlier
  • Freelancer now
  • Worked with GE, IBM, Informix, Sun Microsystems before.
  • Overall 18+ years experience
  • Becoming Python Addict ...
Note: Use Right/Left Arrow keys to navigate.

Presenter Notes

Goals

  • Explore Concurrent Programming Idioms in Python
  • Understand the power and weaknesses
  • Get high level understanding - the big picture

Presenter Notes

Why Python ?

  • Runtime efficiency is not as good as compiled Languages e.g. C.
  • heck, most implementations (CPython) does not even run as fast as JVM.
  • But ...
  • High Developer Productivity
  • Scientific Community Loves it. See http://scipy.org
  • -- Molecular Simulations, Vector calculations
  • Native extensions are easy to do
  • Possible to use JIT (See http://pypy.org)
  • Can compile to native code (e.g. Use cython with typehints)

Presenter Notes

Why Concurrency is important ...

  • Often used as a glue language
  • -- should deal with different mobile objects of various speeds.
  • Concurrency is a center of focus for a framework
  • Everything revolves around it, depends on it!
  • -- Like planets around the Sun!

Presenter Notes

Concurrency Vs Parallelism

  • Doing many things at a time.
  • Example: Webserver handling thousands of connections at a time
  • Parallelism implies Concurrency but not vice versa.
  • Multi-tasking is concurrent.

Presenter Notes

Some Basic Stuffs

/img/conc/Threads.svg

Presenter Notes

Address Space

/img/conc/AddressSpace.svg

Presenter Notes

Process, OS Thread, User Thread

  • Process is costly
  • OS Thread is just better
  • User threads scale well, if managed properly

Presenter Notes

Threads

  • pthread C library creates userspace threads
  • => Implementation usually maps to LWPs i.e. OS Threads
  • OS Threads employ pre-emptive scheduling
  • User threads library may be used for cooperative multi-tasking
  • => Many databases use their own thread library implementation.

Presenter Notes

Life of a Task

  • Tasks often go to sleep while waiting for I/O
             -------------              ------------
====Run====> |  I/O-Wait | ======Run==> | I/O-Wait | ===Run==>
             -------------              ------------
  • Tasks get wokenup by OS after I/O is complete
  • Context switches happen if other tasks are ready to run

Presenter Notes

Synchronization Mechanisms

  • Threads use mutex locks, condition variables to share memory
  • => Often source of complexities and bugs
  • => Messages based communication often helps with conceptual simplicity.
  • Processes use Pipes, sockets, shared memory, etc.
  • Distributed systems use sockets

Presenter Notes

Python Threads ...

import time
from threading import Thread

def count(n):
    while n > 0: n -= 1

start = time.time()
t1 = Thread(target=count,args=(10000000,))  ; t1.start()
t2 = Thread(target=count,args=(10000000,))  ; t2.start()
t3 = Thread(target=count,args=(10000000,))  ; t3.start()
t1.join(); t2.join(); t3.join()

Presenter Notes

Python Threads (contd)

import time
import threading

class CountThread(threading.Thread):
  def __init__(self,count):
      threading.Thread.__init__(self)
      self.count = count

  def run(self):
      while self.count > 0:
          self.count -= 1
          time.sleep(1)
      return

t = CountThread(5)     # Optional: t.daemon = True
t.run()

Presenter Notes

Python Thread Synchronization

Threading module provides:

  • Lock
  • RLock
  • Semaphore
  • Event
  • Condition

Presenter Notes

Lock

import threading
lock = threading.Lock()
lock.acquire()
# .... critical section ....
lock.release()

# Better Alternative: with statement  auto releases lock
# while exiting code block
with lock:
   # ... critical section ...

Presenter Notes

RLock

  • Reentrant Mutex Lock
  • Same thread can lock many times, but should release that many times
  • Imagine a recursive call that must be protected for single thread
import threading
lock = threading.RLock()
lock.acquire()
# ..  Atmost 1 thread should own this critical section ....
lock.release()

Presenter Notes

Counting Semaphore

import threading
sem = threading.Semaphore(n)   # Provides n resources
sem.acquire()                  # Consume  1 resource
sem.release()                  # Release  1 resource
  • sem.acquire() waits if the counter is 0
  • sem.release() increments the counter and wakes any waiting threads
  • Can be called in any order by any body.

Presenter Notes

Events

  • Can be used for broadcast notification
  • Easy waiting instead of busy loop
import threading
e = threading.Event()
e.isSet()           # Check if event is true
e.set()             # Set event to true
e.clear()           # Set event to false
e.wait()            # wait until event becomes true

Presenter Notes

Condition Variable

  • Combination of locks/signal
  • Mutual notification without missing a beat
cv = threading.Condition([lock])
cv.acquire()                     # Acquire the underlying lock
cv.release()                     # Release the underlying lock
cv.wait()                        # Wait for condition
cv.notify()                      # Notify state change
cv.notifyAll()                   # Notify
# Signal all threads waiting

Presenter Notes

Common Problems

  • Deadlocks are easy to create. Difficult to debug.
  • Non-deterministic scheduling
  • Often source of lost sleep

Presenter Notes

Queues

  • The Queue module is thread safe!
  • Much easier to handle threads communication using Queues.
from Queue import Queue
q = Queue([maxsize])
q.put(item)
q.get()
q.empty()              # Is it empty ?
q.full()               # Is it full ?

Presenter Notes

Python Threads Performance

  • Python threads have poor performance for CPU Bound Tasks
  • The I/O bound tasks are OK.
  • The reason is GIL (Global Interpreter Lock)

Presenter Notes

CPU Bound Threads Example

  • Source: Dave Beazley's slides
import time
from threading import Thread

def count(n):
    while n > 0:
        n -= 1

start = time.time()
count(10000000); count(10000000); count(10000000)
duration = time.time() - start
print('Sequential took %6.4f seconds ' % duration)

Presenter Notes

Python CPU Bound Threads

  • The threaded version executes slower than the sequential version!!!
  • If you have more cores, it gets more slower
start = time.time()
t1 = Thread(target=count,args=(10000000,))  ; t1.start()
t2 = Thread(target=count,args=(10000000,))  ; t2.start()
t3 = Thread(target=count,args=(10000000,))  ; t3.start()
t1.join(); t2.join(); t3.join()
duration = time.time() - start
print('Thread Parallel execution took %6.4f seconds ' % duration)

Presenter Notes

Global Interpreter Lock

  • Python threads are native threads (C pthreads - OS LWP)
  • Only one Python thread can execute in the interpreter at a time.
  • GIL controls this
  • Many system calls like open, write and socket calls release GIL first.
  • So I/O bound tasks does not suffer
  • This is implementation limitation for CPython -- not design limitation.

Presenter Notes

GIL and native code

  • GIL makes native code integration simpler
  • native code gets exclusive access to python internals.
  • Release GIL on your native code to increase parallelism
PyObject *pyfunc(PyObject *self, PyObject *args) {
...
Py_BEGIN_ALLOW_THREADS        // Release GIL
// Threaded C code
...
Py_END_ALLOW_THREADS          // Get GIL
...
}

Presenter Notes

Python Thread Scheduling

  • CPU bound threads are scheduled every 100 interpreter ticks (CPython)
  • Ticks don't have consistent execution times
  • Implementations may vary

Presenter Notes

Leverage All CPUs

  • Good idea to run multiple Python interpreters running CPU bound tasks
  • Python can communicate with each other using messages: Pipes or Sockets
  • It depends on the problem
  • If you want to serialize data to send, you have Pickle!

Presenter Notes

Python Serialization - Pickle

  • Pickle module provides serialization capabilities:

    import pickle
    ...
    pickle.dump(someobj,f)      # Serialize to f
    ....
    someobj = pickle.load(f)    # unserialize from f
    
  • You can pretty much serialize most of the stuffs.

  • Use cPickle for better performance:

  • import cPickle as pickle

Presenter Notes

Python IPC mechanisms

  • Pipes, Sockets, FIFOs
  • Libraries:
  • -- MPI (Message Passing Interface)
  • -- XML-RPC
  • -- etc

Presenter Notes

Process Handling

  • use subprocess module:

    import subprocess
    p = subprocess.Popen(['python','child.py'],
                         stdin=subprocess.PIPE,
                         stdout=subprocess.PIPE)
    p.stdin.write(data) # Send data to subprocess
    p.stdout.read(size) # Read data from subprocess
    

Presenter Notes

multiprocessing Module

import time
import multiprocessing
class CountdownProcess(multiprocessing.Process):
  def __init__(self,count):
    multiprocessing. Process.__init__(self); self.count = count
  def run(self):
    while self.count > 0:
      print "Counting down", self.count ; self.count -= 1
      time.sleep(5)
    return
 p1 = CountdownProcess(10)
 p1.start()
 p2 = CountdownProcess(20)
 p2.start()
  • Use process instead of thread for parallelism

Presenter Notes

Treat Process Like Threads

  • You can treat processes with thread-like interface:

    p = Process(target=somefunc)
    p.start()
    ...
    p.join()
    
    p = Process(target=somefunc)   # Making a daemonic process
    p.daemon = True
    p.start()
    
    p = Process(target=somefunc)
    ...
    p.terminate()                  # Terminate a process
    

Presenter Notes

Pipes

  • Looks like Unix Pipes, but they are not.

  • You send discrete messages which are buffers or Python pickled objects

    (c1, c2) = multiprocessing.Pipe()  # Just like Unix pipe()
                                       #  pair of connection points
    c.send(obj)                        # Send an object
    c.recv()                           # Receive an object
    c.send_bytes(buffer)               # Send a buffer of bytes
    c.recv_bytes([max])                # Receive a buffer of bytes
    c.poll([timeout])                  # Check for data
    

Presenter Notes

Pipe Example

p1, p2 = multiprocessing.Pipe()
cons = multiprocessing.Process(target=consumer, args=(p1,p2))
cons.start()
p2.close()                # Close the input end of producer
sequence = xrange(100)
producer(sequence, p1)    # Produce data. e.g. p1.send(item)
p1.close()

Presenter Notes

Message Queues

  • Interprocess Queue available similar to std Queue
  • Queues are implemented on top of pipes
  • Putting an item on a queue returns immediately
  • q.put(), q.get()

Presenter Notes

Process Pools

  • Process Pools to distribute tasks among a group of processes!
p = multiprocessing.Pool([numprocesses])

Presenter Notes

Main Approaches

  • Event Driven, Asynchronous Programming
  • Cooperative multi-tasking: Co-routines

In addition, Actors based model: * Kind of asynchronous model * With focus on message based interactions.

Presenter Notes

Asyncore

  • asyncore is std library module
  • Lets call your callbacks on any I/O Operation.
  • By providing a wrapper over socket and doing select/poll on events.
import asyncore
from asyncore import dispatcher

class MyApp(dispatcher):
    ....  # Provides all callbacks for accept, read, write, etc.

myapp = MyApp(socket())
asyncore.loop()

Presenter Notes

Asyncore (contd)

  • Low level interface.
  • Typically you can't combine different asynchronous frameworks.
  • -- You must call the main event loop provided.
  • Bare metal framework.

Presenter Notes

Python Coroutines

  • Like subroutines but at peer level
  • Provides cooperative multi-tasking using single thread.
   Co-routine1                    Co-routine2
do_something() ...
  Hey you! start now  ------->
                                 do_somthing_here()
                                 ....
                                 do_xxx()
                       <-----    Hey give me some yyy
   Ok, let me work....

Presenter Notes

Co-routines/Generators

  • You can use Python generators for co-routines
  • Generators are primarily sequence generating functions which preserves it's state across function calls.
  • Generators is just one use case of Co-routines concept.

Presenter Notes

Generator

  • Typically used in for loops iterator
def countdown(n):
    while n > 0:
    yield n
    n -= 1

for x in countdown(10):
    print x

>>> c = countdown(10)
>>> c.next()
10
>>> c.next()
9
>>>

Presenter Notes

Generator

  • When generator yields a value, it also yields control.
  • It preserves the function state to come back later.
  • Like jumping across functions using inter function goto !!!
def countdown(n):
  while n > 0:
    yield n
    n -= 1

Presenter Notes

Coroutine

def ping(v):
    while True:
       if v is None: v = 0
       print('ping got value ' + str(v))
       v = (yield (v+1))

p = ping(10)
>>> p.next()
ping got value 10
>>> p.next()          # p.next() == p.send(None)
ping got value 0
1
>>> p.send(100)
ping got value 100
101

Presenter Notes

Generator/Coroutine (contd)

  • Generator was implemented first
  • Coroutine support added by making (yield x) as expression vs statement.
  • i.e. 2-way interaction with generator makes it as coroutine.
  • Very simple but powerful concept.
  • You can throw exception into the generator
g.throw(RuntimeError,"Something bad happened!")

Presenter Notes

Power of Coroutines

  • Create a pipeline of tasks
  • Use like filters
  • Create a tree-like broadcast pipeline
  • Implement a state machine
  • Like OO but more simpler
  • Deterministic Control Flow
  • Fairly lockless, easier synchronization

Presenter Notes

Case Study: Async or Threads ?

  • Apache uses processes + Threads
  • nginx uses processes + asynchronous model
  • nginx wins by handsdown
  • yay to Asynchronous model !!!

Presenter Notes

C10K problem

  • C10k - concurrent ten thousand webserver connections problem.

Presenter Notes

Asynchronous Programming

  • Non-blocking with callbacks is the theme.
  • select, poll, kqueue, aio, and epoll
  • Squid, nginx
  • asyncore, Twisted, Tornado Web, eventlet, gevent

And then ....

  • Stackless Python!!!
Highly scalable implementation of python.

Presenter Notes

Greenthreads

  • Terminology used to refer to user threads implemented by VM.

Presenter Notes

Threads, LWPs

/img/conc/Threads.svg

Presenter Notes

Synchronous Tasks

/img/conc/synctasks.svg

Presenter Notes

Asynchronous Tasks

/img/conc/asynctasks.svg

Presenter Notes

Stackless

  • CPython uses interpreter stack to evaluate Python calls
  • This makes scaling more difficult.
  • JVM uses interpreter stack only for native calls.
  • There is an implementation of Python -- Stackless Python
  • Allows scaling with huge amount of green threads.
  • Stack need not be contiguous memory-- can be tree of stack frames.
  • Stackless provides you with tasklets and channels

Presenter Notes

CPython alternative for Stackless

  • Subset of Stackless Python was implemented in std CPython as greenlet
  • Has some little performance overhead compared to Stackless Python
  • Allows you to scale well

Presenter Notes

Best Python Async Framework Library

  • We have too many options, really!
Twisted (callbacks)      (Defacto)
Tornado (callbacks)      (by Facebook)
Gevent (greenlet)        (libevent based, high performant)
Eventlet (greenlet)      (Predecessor to Gevent)
Concurrence (stackless)  (Highly scalable)
Circuits (async)
Dieselweb (generator)
Cogen (generator)        (Programmatically cleaner)

Presenter Notes

gevent

  • First, it is high performance network library based on libev
  • Flexible asynchronous minimal framework
  • Based on co-routines
  • Uses greenlets for lightweight threads
  • Provides WSGI compliant HTTP Server
  • Looks promising!

Presenter Notes

Twisted

  • It is event driven async framework. Comes with wsgi server.
  • Supports Deferred with callbacks
  • Most popular
  • Extensive tools and functionality

Presenter Notes

Tornado

  • Tornado is asynchronous framework
  • Scalable non-blocking web server and tools written in Python.
  • Powers friendfeed website (bought by facebook)
  • Well proven high performance framework
  • One single-threaded frontend can serve 3353 requests per second
  • nginx with four frontends can serve 8213 requests per second (with simple 4 core machine).

Presenter Notes

Actor model based Python Framework

Akka

  • Concurrency toolkit for Scala (and JVM).
  • Supports Actor model
  • complex extensible framework for concurrent programming.

Pykka

  • Inspired by Akka, available for Python.
  • More simpler subset of Akka features.

Presenter Notes

Other Asynchronous Technologies

Node.js

  • Node.js is a server-side Javascript
  • Asynchronous, event driven framework.

Presenter Notes

Q & A

Presenter Notes