I recently had to handle a case where I needed to be able to refresh a Python module's imports on the fly. The case is for working with Twitter Streams. When new code is deployed that affects the scraping module, I want to be able to restart the stream as fast as humanly possible in order to minimize the lost of tweets. Twitter suggests firing up a new stream first and then killing the second one in order to prevent any interruptions, but for all intents and purposes this is close enough.
Turns out is fairly easy to do in Python using the signal library.
I recreated the case I wanted by creating a library file and a scraper file that effectively runs as a daemon.
def scrape_me_bro(): print "Scraping is fun"
import time import signal import lib def scrape(): # Assume we are hitting Streaming API # and doing something buzzwordy with it while True: lib.scrape_me_bro() time.sleep(2) def reload_libs(signum, frame): print "Received Signal: %s at frame: %s" % (signum, frame) print "Excuting a Lib Reload" reload(lib) # Register reload_libs to be called on restart signal.signal(signal.SIGHUP, reload_libs) # Main scrape()
The basic functionality is scrape.py imports a function from lib and calls it every two seconds. It also registers a handler to call when it receives the interrupt signal. Fairly straightforward.
To test this just run scrape.py in a separate window, edit lib.py to print out something else, grep for scrape.py's PID and then send a restart signal to it with kill -s HUP pid
Output should look something like this:
$ python scrape.py scraping is fun scraping is fun scraping is fun scraping is fun scraping is fun scraping is fun Received Signal 1 at frame <frame object at 0x2407530> Executing a lib reload scraping just got funner scraping just got funner scraping just got funner scraping just got funner scraping just got funner scraping just got funner