您现在的位置是:主页 > news > 网站规划与建设论文/百度竞价查询

网站规划与建设论文/百度竞价查询

admin2025/4/22 13:04:43news

简介网站规划与建设论文,百度竞价查询,专做充电器的网站,python和java做网站首先说明,此方案是一个不可行的方案。与导入Mysql数据库不同,Hive数据库不支持记录级数据插入;即使一些版本支持,插入速度也是奇慢。Hive主要优势在于处理批量数据,数据量越大越能体现出性能优势;数据量小&…

网站规划与建设论文,百度竞价查询,专做充电器的网站,python和java做网站首先说明,此方案是一个不可行的方案。与导入Mysql数据库不同,Hive数据库不支持记录级数据插入;即使一些版本支持,插入速度也是奇慢。Hive主要优势在于处理批量数据,数据量越大越能体现出性能优势;数据量小&…
首先说明,此方案是一个不可行的方案。与导入Mysql数据库不同,Hive数据库不支持记录级数据插入;即使一些版本支持,插入速度也是奇慢。Hive主要优势在于处理批量数据,数据量越大越能体现出性能优势;数据量小,如记录级数据插入,则没有可用性。所以,对于使用python将json数据解析出来再一条条插入的方法肯定是行不通的。方案记录在此,为通过python连接操作Hive数据库等提供一些参考。
一、环境准备
1、安装thritf依赖库

#yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel

2、拷贝库文件夹如果在脚本中引用,此步不拷也可以
cp -r $HIVE_PATH/lib/py /usr/local/lib/python2.7/site-packages

3、默认Hadoop环境、Python环境、Hive的thritf服务及Json文件已经准备好

二、实现数据插入的脚本
简版是为了测试用,正式版本是如果方案可行的话的可正式使用的脚本。
1、json2hive_python简版
[spark@Master Py_logproc]$ pwd
/home/spark/opt/Log_Data/Py_logproc
[spark@Master Py_logproc]$ cat json2hive_python_recordasarray_basic.py 
# -*- encoding:utf-8 -*-
#!/usr/bin/env python
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
import json
import warnings
warnings.filterwarnings("ignore")def hiveExe(sql):try:transport = TSocket.TSocket('127.0.0.1', 10000)transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = ThriftHive.Client(protocol)transport.open()client.execute(sql)transport.close()except Thrift.TException, tx:print '%s' % (tx.message)if __name__=="__main__": import sysreload(sys)sys.setdefaultencoding( "utf-8" ) if len(sys.argv)==1:  print "need argv"  else:  print sys.argv  for json_array in open('/home/spark/opt/Log_Data/Py_logproc/log_tmpdir/yemaopythonlog'):yemao_array = json.loads(json_array)for yemao in yemao_array: print yemao['time']if not yemao.has_key('_reason'):id              =   yemao['id']time            =   yemao['time']url_from        =   yemao['url_from']url_current     =   yemao['url_current']url_to          =   yemao['url_to']options         =   yemao['options']ip              =   yemao['ip']uid             =   yemao['uid']new_visitor     =   yemao['new_visitor']province        =   yemao['province']city            =   yemao['city']site            =   yemao['site']device          =   yemao['device']browser         =   yemao['browser']phone           =   yemao['phone']token           =   yemao['token']dorm            =   yemao['dorm']order_phone     =   yemao['order_phone']order_dormitory =   yemao['order_dormitory']order_amount    =   yemao['order_amount']order_id        =   yemao['order_id']uname           =   yemao['uname']site_id         =   yemao['site_id']address         =   yemao['address']dorm_id         =   yemao['dorm_id']dormentry_id    =   yemao['dormentry_id']tag             =   yemao['tag']rid             =   yemao['rid']cart_quantity   =   yemao['cart_quantity']response        =   yemao['response']paytype         =   yemao['paytype']if yemao.has_key('data'):data = yemao['data']else:data = '0'data = '"'+str(data)+'"'if yemao.has_key('info'):info = yemao['info']else:info = '0'if yemao.has_key('status'):status = yemao['status']else:status = '0'log_date        =   int(sys.argv[1])if __name__ == '__main__':insert_sql="insert into yemao_logpy(id,time,url_from,url_to,url_current,ip,dorm_id,browser,log_date) values ('%s',  '%s',  '%s','%s','%s','%s','%s','%s', %d)" % (id,time,url_from,url_current,url_to,ip,dorm_id,browser,log_date)print insert_sqlhiveExe(insert_sql)print 'yemao_array_python2hive done'
[spark@Master Py_logproc]$ 

2、json2hive_python正式版
[spark@Master Py_logproc]$ pwd
/home/spark/opt/Log_Data/Py_logproc
[spark@Master Py_logproc]$ cat json2hive_python_recordasarray.py 
# -*- encoding:utf-8 -*-
#!/usr/bin/env python
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
#from db import getDB
import json
import warnings
warnings.filterwarnings("ignore")def hiveExe(sql):try:transport = TSocket.TSocket('127.0.0.1', 10000)transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = ThriftHive.Client(protocol)transport.open()client.execute(sql)transport.close()except Thrift.TException, tx:print '%s' % (tx.message)if __name__=="__main__": import sysreload(sys)sys.setdefaultencoding( "utf-8" ) if len(sys.argv)==1:  print "need argv"  else:  print sys.argv  for json_array in open('/home/spark/opt/Log_Data/Py_logproc/log_tmpdir/yemaopythonlog'):yemao_array = json.loads(json_array)for yemao in yemao_array: print yemao['time']if not yemao.has_key('_reason'):id              =   yemao['id']time            =   yemao['time']url_from        =   yemao['url_from']url_current     =   yemao['url_current']url_to          =   yemao['url_to']options         =   yemao['options']ip              =   yemao['ip']uid             =   yemao['uid']new_visitor     =   yemao['new_visitor']province        =   yemao['province']city            =   yemao['city']site            =   yemao['site']device          =   yemao['device']browser         =   yemao['browser']phone           =   yemao['phone']token           =   yemao['token']dorm            =   yemao['dorm']order_phone     =   yemao['order_phone']order_dormitory =   yemao['order_dormitory']order_amount    =   yemao['order_amount']order_id        =   yemao['order_id']uname           =   yemao['uname']site_id         =   yemao['site_id']address         =   yemao['address']dorm_id         =   yemao['dorm_id']dormentry_id    =   yemao['dormentry_id']tag             =   yemao['tag']rid             =   yemao['rid']cart_quantity   =   yemao['cart_quantity']response        =   yemao['response']paytype         =   yemao['paytype']if yemao.has_key('data'):data = yemao['data']else:data = '0'data = '"'+str(data)+'"'if yemao.has_key('info'):info = yemao['info']else:info = '0'if yemao.has_key('status'):status = yemao['status']else:status = '0'log_date        =   int(sys.argv[1])if __name__ == '__main__':insert_sql="insert into yemao_logpy(id,time,url_from,url_current,url_to,options,ip,uid,new_visitor,province,city,site,device,browser,phone,token,dorm,order_phone,order_dormitory,order_amount,order_id,uname,site_id,address,dorm_id,dormentry_id,tag,rid,cart_quantity,response,paytype,data,info,status,log_date) values ('%s',  '%s',  '%s',  '%s', '%s', '%s',  '%s',  '%s',  '%s', '%s', '%s',  '%s',  '%s',  '%s', '%s', '%s',  '%s',  '%s',  '%s', '%s', '%s',  '%s',  '%s',  '%s', '%s', '%s',  '%s',  '%s',  '%s', '%s', '%s',  %s,  '%s',  '%s',%d)" % (id,time,url_from,url_current,url_to,options,ip,uid,new_visitor,province,city,site,device,browser,phone,token,dorm,order_phone,order_dormitory,order_amount,order_id,uname,site_id,address,dorm_id,dormentry_id,tag,rid,cart_quantity,response,paytype,data,info,status,log_date)hiveExe(insert_sql)print 'yemao_array_python2hive done'
[spark@Master Py_logproc]$ 

三、其他参考脚本
1、连接Hive执行SQL的脚本1

[spark@Master Py_logproc]$ cat py2hive_pre1.py 
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive  
from hive_service.ttypes import HiveServerException  
from thrift import Thrift  
from thrift.transport import TSocket  
from thrift.transport import TTransport  
from thrift.protocol import TBinaryProtocol  
def hiveExe(sql):  try:  transport = TSocket.TSocket('127.0.0.1', 10000)   transport = TTransport.TBufferedTransport(transport)  protocol = TBinaryProtocol.TBinaryProtocol(transport)  client = ThriftHive.Client(protocol)  transport.open()  client.execute(sql)  print "The return value is : "   print client.fetchAll()  print "............"  transport.close()  except Thrift.TException, tx:  print '%s' % (tx.message)  
if __name__ == '__main__':  hiveExe("show tables") 
[spark@Master Py_logproc]$ 

2、连接Hive执行SQL的脚本2
[spark@Master Py_logproc]$ cat py2hive_pre2.py 
#!/usr/bin/env python
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
def hiveExe(sql):try:transport = TSocket.TSocket('127.0.0.1', 10000) transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = ThriftHive.Client(protocol)transport.open()client.execute(sql)print "The return value is : " print client.fetchAll()print "............"transport.close()except Thrift.TException, tx:print '%s' % (tx.message)
if __name__ == '__main__':hiveExe("select * from yemao1_log limit 10")
[spark@Master Py_logproc]$ 

3、连接Hive执行SQL的脚本3
[spark@Master Py_logproc]$ cat 3333.py 
#!/usr/bin/env python
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
def hiveExe(sql):try:transport = TSocket.TSocket('127.0.0.1', 10000) transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = ThriftHive.Client(protocol)transport.open()client.execute(sql)print "The return value is : " print client.fetchAll()print "............"transport.close()except Thrift.TException, tx:print '%s' % (tx.message)
if __name__ == '__main__':select_sql="select url_current from yemao1_log limit 100";hiveExe(select_sql)
[spark@Master Py_logproc]$

4、数据准备的脚本
[spark@Master Py_logproc]$ cat pre_data.sh 
#/bin/bash
export yesterday=`date -d last-day +%Y%m%d`cd /home/spark/opt/Log_Data/Py_logproc
for tar in /home/spark/opt/Log_Data/yemao/yemao*$yesterday.tar.gz; 
do
tar zxvf $tar -C /home/spark/opt/Log_Data/Py_logproc/log_tmpdir;
grep  -h "\[{.*}\]" /home/spark/opt/Log_Data/Py_logproc/log_tmpdir/1450972318-15_12_24.log >> ./log_tmpdir/yemaopythonlog;
rm -rf /home/spark/opt/Log_Data/Py_logproc/log_tmpdir/*.log
done