python 两个list 求交集,并集,差集

在python中,数组可以用list来表示。如果有两个数组,分别要求交集,并集与差集,怎么实现比较方便呢? 当然最容易想到的是对两个数组做循环,即写两个for循环来实现。这种写法大部分同学应该都会,而且也没有太多的技术含量, [python] #!/usr/bin/env python #coding:utf-8 def diff(listA,listB): #求交集的两种方式 retA = [i for i in listA if i in listB] retB = list(set(listA).intersection(set(listB))) print "retA is: ",retA print "retB is: ",retB #求并集 retC = list(set(listA).union(set(listB))) print "retC1 is: ",retC #求差集,在B中但不在A中 retD = list(set(listB).difference(set(listA))) print "retD is: ",retD retE = [i for i in listB if i not in listA] print "retE is: ",retE def main(): listA = [1,2,3,4,5] listB = [3,4,5,6,7] diff(listA,listB) if __name__ == '__main__': main() [/python] retA is: [3, 4, 5] retB is: [3, 4, 5] retC1 is: [1, 2, 3, 4, 5, 6, 7] retD is: [6, 7] retE is: [6, 7] 转自 http://blog.csdn.net/bitcarmanlee/article/details/51622263 结合代码来看,大体上是两种思路: 1.使用列表解析式。列表解析式一般来说比循环更快,而且更pythonic显得更牛逼。 2.将list转成set以后,使用set的各种方法去处理。

pymongo 聚合查询group

获取重复的手机号 db.weikephone.insert({"wid":100000185, "phone": 13818070900}) db.weikephone.insert({"wid":100000186, "phone": 13818070900}) db.weikephone.insert({"wid":100000187, "phone": 13818070901}) db.weikephone.insert({"wid":100000188, "phone": 13818070902}) [python] import pymongo mondb = pymongo.MongoClient('localhost',27017) mondb = mondb["test"] pipline = [ {"$group" : {"_id" : "$phone", "count" : {"$sum" : 1}}}, {"$match" : {"count" : {"$gt" : 1}}} ] print list(mondb.weikephone.aggregate(pipline)) [/python] 等同于 select phone,count(*) as count from t_wm_weikephone group by phone having count>1

python 解决 InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information

因为SSL的问题,urllib3需要pyopenssl。 最简单的方法是: pip install pyopenssl ndg-httpsclient pyasn1 还需要安装libffi-dev,libssl-dev 1.Ubuntu下安装 sudo apt-get install libffi-dev libssl-dev 2.CentOS下安装 yum install libffi-devel openssl-devel 参考地址 http://www.virson.cn/4465.html

supervisord管理进程详解

Supervisor是由python语言编写,基于linux操作系统的一款服务器管理工具, 用以监控服务器的运行,发现问题能立即自动预警及自动重启等功能。 Supervisor类似于monit, monit和supervisor的一个比较大的差异是supervisor管理的进程必须由supervisor来启动, monit可以管理已经在运行的程序; supervisor还要求管理的程序是非daemon程序,supervisord会帮你把它转成daemon程序, 因此如果用supervisor来管理nginx的话,必须在nginx的配置文件里添加一行设置daemon off让nginx以非daemon方式启动。 一、Supervisor的组成 1. supervisord The server piece of supervisor is named supervisord. It is responsible for starting child programs at its own invocation, responding to commands from clients, restarting crashed or exited subprocesseses, logging its subprocess stdout and stderr output, and generating and handling “events” corresponding to points in subprocess lifetimes. The server process uses a configuration file. This is typically located in /etc/supervisord.conf. This configuration file is an “Windows-INI” style config file. It is important to keep this file secure via proper filesystem permissions because it may contain unencrypted usernames and passwords. 2. supervisorctl The command-line client piece of the supervisor is named supervisorctl. It provides a shell-like interface to the features provided by supervisord. From supervisorctl, a user can connect to different supervisord processes, get status on the subprocesses controlled by, stop and start subprocesses of, and get lists of running processes of a supervisord. The command-line client talks to the server across a UNIX domain socket or an internet (TCP) socket. The server can assert that the user of a client should present authentication credentials before it allows him to perform commands. The client process typically uses the same configuration file as the server but any configuration file with a [supervisorctl] section in it will work. 3. Web Server A (sparse) web user interface with functionality comparable to supervisorctl may be accessed via a browser if you start supervisord against an internet socket. Visit the server URL (e.g. http://localhost:9001/) to view and control process status through the web interface after activating the configuration file’s [inet_http_server] section. 4. XML-RPC Interface The same HTTP server which serves the web UI serves up an XML-RPC interface that can be used to interrogate and control supervisor and the programs it runs. See XML-RPC API Documentation. Platform Requirements 二、Supervisor安装 首先必须安装好python环境,linux自带python,但建议安装2.7.0以上的版本。 Supervisor可以通过 $ sudo easy_install supervisor 安装。安装成功后显示finished, 可以再次进入python环境, 输入"import supervisor", 如果没有提示错误,则表示安装成功。 当然也可以通过Supervisor官网下载后setup.py install安装。 出现错误提示: Installed /usr/local/python2.7.3/lib/python2.7/site-packages/supervisor-4.0.0_dev-py2.7.egg Processing dependencies for supervisor==4.0.0-dev Searching for meld3>=1.0.0 Reading https://pypi.python.org/simple/meld3/ Download error on https://pypi.python.org/simple/meld3/: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed -- Some packages may not be found! Couldn't find index page for 'meld3' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading https://pypi.python.org/simple/ Download error on https://pypi.python.org/simple/: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed -- Some packages may not be found! No local packages or download links found for meld3>=1.0.0 error: Could not find suitable distribution for Requirement.parse('meld3>=1.0.0') 解决办法: 上网查询了问题原因: 是curl的证书太老了需要下载最新的证书: 下载最新的证书文件 、 $ wget http://curl.haxx.se/ca/cacert.pem 更名为ca-bundle.crt放置到默认目录 $ mv cacert.pem ca-bundle.crt $ mv ca-bundle.crt /etc/pki/tls/certs 下载并安装好证书后, 还是出现上述的问题, 根据证书过期联想到时间, 输入date命令查看时间, 原来是时间太小了, 用date -s 修改时间后,就可以正常的easy_install了。 三、Supervisor配置 接下来是对supervisor配置,首先要生成配置文件,在shell终端输入: $ echo_supervisord_conf > /etc/supervisord.conf 可以通过文本编辑器修改这个文件, $ vim /etc/supervisord.conf 下面是一个示例的配置文件: ;/etc/supervisord.conf [unix_http_server] file = /var/run/supervisor.sock chmod = 0777 chown= root:root [inet_http_server] # Web管理界面设定 port=9001 ;username = admin ;password = yourpassword [supervisorctl] ; 必须和'unix_http_server'里面的设定匹配 serverurl = unix:///var/run/supervisord.sock [supervisord] logfile=/var/log/supervisord/supervisord.log ; (main log file;default $CWD/supervisord.log) logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB) logfile_backups=10 ; (num of main logfile rotation backups;default 10) loglevel=info ; (log level;default info; others: debug,warn,trace) pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid) nodaemon=true ; (start in foreground if true;default false) minfds=1024 ; (min. avail startup file descriptors;default 1024) minprocs=200 ; (min. avail process descriptors;default 200) user=root ; (default is current user, required if root) childlogdir=/var/log/supervisord/ ; ('AUTO' child log dir, default $TEMP) [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface ;管理的单个进程的配置,可以添加多个program [program:chatdemon] command=python /home/felinx/demos/chat/chatdemo.py autostart = true startsecs = 5 user = felinx redirect_stderr = true ; 对这个program的log的配置,上面的logfile_maxbytes是supervisord本身的log配置 stdout_logfile_maxbytes = 20MB stdoiut_logfile_backups = 20 stdout_logfile = /var/log/supervisord/chatdemo.log [program:app] command=python app.py --port=61000 directory=/opt/test/supervisor ; 先进入到这个目录,再执行command, 对于程序环境要在当前目录的程序很有用 autostart=true ; start at supervisord start (default: true) autorestart=unexpected ; whether/when to restart (default: unexpected) startsecs=1 ; number of secs prog must stay running (def. 1) user=root ; 配置一组进程,对于类似的program可以通过这种方式添加,避免手工一个个添加 [program:groupworker] command=python /home/felinx/demos/groupworker/worker.py numprocs=24 process_name=%(program_name)s_%(process_num)02d autostart = true startsecs = 5 user = felinx redirect_stderr = true stdout_logfile = /var/log/supervisord/groupworker.log ;(更多配置说明请参考:http://supervisord.org/configuration.html) 编辑完成后保存退出. 使用命令启动supervisor: $ supervisord $ supervisorctl 用ps命令查看运行情况,应用现在已经自动运行了。 四、Supervisord管理 Supervisord安装完成后有两个可用的命令行supervisord和supervisorctl, 命令使用解释如下: ? supervisord, 初始启动Supervisord,启动、管理配置中设置的进程。 ? supervisorctl stop programxxx, 停止某一个进程(programxxx),programxxx为[program:chatdemon]里配置的值,这个示例就是chatdemon。 ? supervisorctl start programxxx, 启动某个进程 ? supervisorctl restart programxxx,重启某个进程 ? supervisorctl stop groupworker: ,重启所有属于名为groupworker这个分组的进程(start,restart同理) ? supervisorctl stop all, 停止全部进程,注:start、restart、stop都不会载入最新的配置文件。 ? supervisorctl reload, 载入最新的配置文件,停止原有进程并按新的配置启动、管理所有进程。 ? supervisorctl update, 根据最新的配置文件,启动新配置或有改动的进程,配置没有改动的进程不会受影响而重启。 注意:显示用stop停止掉的进程,用reload或者update都不会自动重启。 五、页面管理 supervisor自带有Web Server, 可以通过页面来管理进程, 前提是开启配置文件中的[inet_http_server]项。 如果服务器是单网卡,可以修改如下: [inet_http_server] ; inet (TCP) server disabled by default port=127.0.0.1:51000 ; (ip_address:port specifier, *:port for all iface) ;username=user ; (default is no username (open server)) ;password=123 ; (default is no password (open server)) 如果是多网卡,则需要指定一张网卡: [inet_http_server] ; inet (TCP) server disabled by default port=192.168.2.13:51000 ; (ip_address:port specifier, *:port for all iface) ;username=user ; (default is no username (open server)) ;password=123 ; (default is no password (open server)) 在浏览器地址栏中输入: http://192.168.2.13:51000 就可以进行页面化的管理了。 原文地址 http://blog.chinaunix.net/uid-26000296-id-4759916.html

python 中如何计算时间差

Q:如何方便的计算两个时间的差,如两个时间相差几天,几小时等 A:使用datetime模块可以很方便的解决这个问题,举例如下: [python] >>> import datetime >>> d1 = datetime.datetime(2005, 2, 16) >>> d2 = datetime.datetime(2004, 12, 31) >>> (d1 - d2).days 47 [/python] 上例演示了计算两个日期相差天数的计算。 import datetime starttime = datetime.datetime.now() #long running endtime = datetime.datetime.now() print (endtime - starttime).seconds 上例演示了计算运行时间的例子,以秒进行显示。 [python] >>> d1 = datetime.datetime.now() >>> d3 = d1 + datetime.timedelta(hours=10) >>> d3.ctime() [/python] 上例演示了计算当前时间向后10小时的时间。 其本上常用的类有:datetime和timedelta两个。它们之间可以相互加减。每个类都有一些方法和属性可以查看具体的值,如datetime可以查看:天数(day),小时数(hour),星期几(weekday())等;timedelta可以查看:天数(days),秒数(seconds)等。 原文地址 http://blog.csdn.net/aarchbishop/article/details/667491

cookielib和urllib2模块相结合模拟网站登录

1.cookielib模块 cookielib模块的主要作用是提供可存储cookie的对象,以便于与urllib2模块配合使用来访问Internet资源。例如可以利用本模块的CookieJar类的对象来捕获cookie并在后续连接请求时重新发送。coiokielib模块用到的对象主要有下面几个:CookieJar、FileCookieJar、MozillaCookieJar、LWPCookieJar。其中他们的关系如下: 2013_11_04_02

2.urllib2模块

说到urllib2模块最强大的部分绝对是它的opener, urllib2模块的 OpenerDirector 操作类。这是一个管理很多处理类(Handler)的类。而所有这些 Handler 类都对应处理相应的协议,或者特殊功能。分别有下面的处理类:
  • BaseHandler
  • HTTPErrorProcessor
  • HTTPDefaultErrorHandler
  • HTTPRedirectHandler
  • ProxyHandler
  • AbstractBasicAuthHandler
  • HTTPBasicAuthHandler
  • ProxyBasicAuthHandler
  • AbstractDigestAuthHandler
  • ProxyDigestAuthHandler
  • AbstractHTTPHandler
  • HTTPHandler
  • HTTPCookieProcessor
  • UnknownHandler
  • FileHandler
  • FTPHandler
  • CacheFTPHandler
cookielib模块一般与urllib2模块配合使用,主要用在urllib2.build_oper()函数中作为urllib2.HTTPCookieProcessor()的参数。 由此可以使用python模拟网站登录。 先写个获取CookieJar实例的demo:
#!/usr/bin/env python
#-*-coding:utf-8-*- 
import urllib
import urllib2
import cookielib
#获取Cookiejar对象(存在本机的cookie消息)
cookie = cookielib.CookieJar()
#自定义opener,并将opener跟CookieJar对象绑定
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#安装opener,此后调用urlopen()时都会使用安装过的opener对象
urllib2.install_opener(opener) 
url = "http://www.baidu.com"   
urllib2.urlopen(url)
然后写个用POST方法来访问网站的方式(用urllib2模拟一起post过程):
#! /usr/bin/env python
#coding=utf-8

import urllib2
import urllib
import cookielib

def login():
    email = raw_input("请输入用户名:")
    pwd = raw_input("请输入密码:")
    data={"email":email,"password":pwd}  #登陆用户名和密码
    post_data=urllib.urlencode(data)   #将post消息化成可以让服务器编码的方式
    cj=cookielib.CookieJar()   #获取cookiejar实例
    opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    #自己设置User-Agent(可用于伪造获取,防止某些网站防ip注入)
    headers ={"User-agent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1"}
    website = raw_input('请输入网址:')
    req=urllib2.Request(website,post_data,headers)
    content=opener.open(req)
    print content.read()    #linux下没有gbk编码,只有utf-8编码

if __name__ == '__main__':
    login()
注意这个例子经过测试,发现只有人人网和开心网之类的网站可以,而像支付宝,百度网盘,甚至是我们学校的教务系统都不能成功登录,就会显示如下的报错消息:
Traceback (most recent call last):
  File "login.py", line 23, in <module>
    login()
  File "login.py", line 19, in login
    content=opener.open(req)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 405: Method Not Allowed
可能是这些网站在编写时不接受客户端请求该方法,具体原因我也不知道为什么。而且这个程序不能自动通过有验证码验证的网站,所以纯粹学习它的原理吧。 然后放一下用python模拟登录的几个示例(转自:http://www.nowamagic.net/academy/detail/1302882
#  -*- coding: utf-8 -*-
# !/usr/bin/python

import urllib2
import urllib
import cookielib
import re

auth_url = 'http://www.nowamagic.net/'
home_url = 'http://www.nowamagic.net/';
# 登陆用户名和密码
data={
    "username":"nowamagic",
    "password":"pass"
}
# urllib进行编码
post_data=urllib.urlencode(data)
# 发送头信息
headers ={
    "Host":"www.nowamagic.net", 
    "Referer": "http://www.nowamagic.net"
}
# 初始化一个CookieJar来处理Cookie
cookieJar=cookielib.CookieJar()
# 实例化一个全局opener
opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
# 获取cookie
req=urllib2.Request(auth_url,post_data,headers)
result = opener.open(req)
# 访问主页 自动带着cookie信息
result = opener.open(home_url)
# 显示结果
print result.read()
1. 使用已有的cookie访问网站
import cookielib, urllib2
ckjar = cookielib.MozillaCookieJar(os.path.join('C:\Documents and Settings\tom\Application Data\Mozilla\Firefox\Profiles\h5m61j1i.default', 'cookies.txt'))
req = urllib2.Request(url, postdata, header)
req.add_header('User-Agent', \
'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)')
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(ckjar) )
f = opener.open(req)
htm = f.read()
f.close()
2. 访问网站获得cookie,并把获得的cookie保存在cookie文件中
import cookielib, urllib2
req = urllib2.Request(url, postdata, header) 
req.add_header('User-Agent', \ 
    'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)')
ckjar = cookielib.MozillaCookieJar(filename) 
ckproc = urllib2.HTTPCookieProcessor(ckjar)
opener = urllib2.build_opener(ckproc)
f = opener.open(req) 
htm = f.read() 
f.close()

ckjar.save(ignore_discard=True, ignore_expires=True)
3. 使用指定的参数生成cookie,并用这个cookie访问网站
import cookielib, urllib2

cookiejar = cookielib.CookieJar()
urlOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))
values = {'redirect':", 'email':'abc@abc.com', 
          'password':'password', 'rememberme':", 'submit':'OK, Let Me In!'}
data = urllib.urlencode(values)

request = urllib2.Request(url, data)
url = urlOpener.open(request)
print url.info()
page = url.read()

request = urllib2.Request(url)
url = urlOpener.open(request)
page = url.read()
print page
另外,补充一下urllib2的方法: 1.geturl(): 这个返回获取的真实的URL,这个很有用,因为urlopen(或者opener对象使用的)或许会有重定向。获取的URL或许跟请求URL不同。 URL重定向(URL redirection,或称网址重定向或网域名称转址),是指当使用者浏览某个网址时,将他导向到另一个网址的技术。常用在把一串很长的网站网址,转成较短的网址。因为当要传播某网站的网址时,常常因为网址太长,不好记忆;又有可能因为换了网路的免费网页空间,网址又必须要变更,不知情的使用者还以为网站关闭了。这时就可以用网路上的转址服务了。这个技术使一个网页是可借由不同的统一资源定位符(URL)连结。
>>> import urllib2
>>> url = "http://www.baidu.com"
>>> req = urllib2.Request(url)
>>> response = urllib2.urlopen(req)
>>> response.geturl()
'http://www.baidu.com'
>>> print response.info()
Date: Fri, 28 Mar 2014 03:30:01 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: Close
Vary: Accept-Encoding
Set-Cookie: BAIDUID=AF7C001FCA87716A52B353C500FC45DB:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BDSVRTM=0; path=/
Set-Cookie: H_PS_PSSID=1466_5225_5288_5723_4261_4759_5659; path=/; domain=.baidu.com
P3P: CP=" OTI DSP COR IVA OUR IND COM "
Expires: Fri, 28 Mar 2014 03:29:06 GMT
Cache-Control: private
Server: BWS/1.1
BDPAGETYPE: 1
BDQID: 0xea1372bf0001780d
BDUSERID: 0
我们可以通过urllib2 默认情况下会针对 HTTP 3XX 返回码自动进行 redirect 动作(URL重定向),无需人工配置。要检测是否发生了 redirect 动作,只要检查一下 Response 的 URL 和 Request 的 URL 是否一致就可以了。
import urllib2
my_url = 'http://www.google.cn'
response = urllib2.urlopen(my_url)
redirected = response.geturl() == my_url
print redirected
my_url = 'http://rrurl.cn/b1UZuP'
response = urllib2.urlopen(my_url)
redirected = response.geturl() == my_url
print redirected
Debug Log 使用 urllib2 时,可以通过下面的方法把 debug Log 打开,这样收发包的内容就会在屏幕上打印出来,方便调试,有时可以省去抓包的工作
import urllib2
httpHandler = urllib2.HTTPHandler(debuglevel=1)
httpsHandler = urllib2.HTTPSHandler(debuglevel=1)
opener = urllib2.build_opener(httpHandler, httpsHandler)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://www.google.com')
转自 http://www.cnblogs.com/sysu-blackbear/p/3629770.html

python实现定制交互式命令行

Python的交互式命令行可通过启动文件来配置。 当Python启动时,会查找环境变量PYTHONSTARTUP,并且执行该变量中所指定文件里的程序代码。该指定文件名称以及地址可以是随意的。按Tab键时会自动补全内容和命令历史。这对命令行的有效增强,而这些工具则是基于readline模块实现的(这需要readline程序库辅助实现)。 此处为大家举一个简单的启动脚本文件例子,它为python命令行添加了按键自动补全内容和历史命令功能。 [python 1="~" language="@python"][/python]$ cat .pythonstartup import readline import rlcompleter import atexit import os #tab completion readline.parse_and_bind('tab: complete') #history file histfile = os.path.join(os.environ['HOME'], '.pythonhistory') try: readline.read_history_file(histfile) except IOError: pass atexit.register(readline.write_history_file,histfile) del os,histfile,readline,rlcompleter 设置环境变量 [python 1="~" language="@python"][/python]$ cat .bash_profile|grep PYTHON export PYTHONSTARTUP=/home/python/.pythonstartup 验证Tab键补全和历史命令查看。 [python 1="~" language="@python"][/python]$ python Python 2.7.5 (default, Oct 6 2013, 10:45:13) [GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import md5 >>> md5. md5.__class__( md5.__getattribute__( md5.__reduce__( md5.__subclasshook__( md5.__delattr__( md5.__hash__( md5.__reduce_ex__( md5.blocksize md5.__dict__ md5.__init__( md5.__repr__( md5.digest_size md5.__doc__ md5.__name__ md5.__setattr__( md5.md5( md5.__file__ md5.__new__( md5.__sizeof__( md5.new( md5.__format__( md5.__package__ md5.__str__( md5.warnings >>> import os >>> import md5

python多线程模块threadpool简单使用

python实现线程池通常使用threading或thread模块来编写,现在已经有了threadpool模块来实现线程池。 英文文档见:http://www.chrisarndt.de/projects/threadpool/ 中文文档见:http://gashero.yeax.com/?p=44 现给出一个简易的使用threadpool模块来实现线程池的例子: [python]#!/usr/bin/env python import threadpool impor time,random def hello(str): time.sleep(2) return str def print_result(request, result): print "the result is %s %r" % (request.requestID, result) data = [random.randint(1,10) for i in range(20)] pool = threadpool.ThreadPool(5) requests = threadpool.makeRequests(hello, data, print_result) [pool.putRequest(req) for req in requests] pool.wait() [/python] 转自 http://dgfpeak.blog.51cto.com/195468/861994/