您现在的位置是:主页 > news > 网站规划与建设论文/百度竞价查询
网站规划与建设论文/百度竞价查询
admin2025/4/22 13:04:43【news】
简介网站规划与建设论文,百度竞价查询,专做充电器的网站,python和java做网站首先说明,此方案是一个不可行的方案。与导入Mysql数据库不同,Hive数据库不支持记录级数据插入;即使一些版本支持,插入速度也是奇慢。Hive主要优势在于处理批量数据,数据量越大越能体现出性能优势;数据量小&…
网站规划与建设论文,百度竞价查询,专做充电器的网站,python和java做网站首先说明,此方案是一个不可行的方案。与导入Mysql数据库不同,Hive数据库不支持记录级数据插入;即使一些版本支持,插入速度也是奇慢。Hive主要优势在于处理批量数据,数据量越大越能体现出性能优势;数据量小&…
首先说明,此方案是一个不可行的方案。与导入Mysql数据库不同,Hive数据库不支持记录级数据插入;即使一些版本支持,插入速度也是奇慢。Hive主要优势在于处理批量数据,数据量越大越能体现出性能优势;数据量小,如记录级数据插入,则没有可用性。所以,对于使用python将json数据解析出来再一条条插入的方法肯定是行不通的。方案记录在此,为通过python连接操作Hive数据库等提供一些参考。
一、环境准备
1、安装thritf依赖库
#yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel
一、环境准备
1、安装thritf依赖库
#yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel
2、拷贝库文件夹(如果在脚本中引用,此步不拷也可以)
cp -r $HIVE_PATH/lib/py /usr/local/lib/python2.7/site-packages
cp -r $HIVE_PATH/lib/py /usr/local/lib/python2.7/site-packages
3、默认Hadoop环境、Python环境、Hive的thritf服务及Json文件已经准备好
二、实现数据插入的脚本
简版是为了测试用,正式版本是如果方案可行的话的可正式使用的脚本。
1、json2hive_python简版
简版是为了测试用,正式版本是如果方案可行的话的可正式使用的脚本。
1、json2hive_python简版
[spark@Master Py_logproc]$ pwd
/home/spark/opt/Log_Data/Py_logproc
[spark@Master Py_logproc]$ cat json2hive_python_recordasarray_basic.py
# -*- encoding:utf-8 -*-
#!/usr/bin/env python
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
import json
import warnings
warnings.filterwarnings("ignore")def hiveExe(sql):try:transport = TSocket.TSocket('127.0.0.1', 10000)transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = ThriftHive.Client(protocol)transport.open()client.execute(sql)transport.close()except Thrift.TException, tx:print '%s' % (tx.message)if __name__=="__main__": import sysreload(sys)sys.setdefaultencoding( "utf-8" ) if len(sys.argv)==1: print "need argv" else: print sys.argv for json_array in open('/home/spark/opt/Log_Data/Py_logproc/log_tmpdir/yemaopythonlog'):yemao_array = json.loads(json_array)for yemao in yemao_array: print yemao['time']if not yemao.has_key('_reason'):id = yemao['id']time = yemao['time']url_from = yemao['url_from']url_current = yemao['url_current']url_to = yemao['url_to']options = yemao['options']ip = yemao['ip']uid = yemao['uid']new_visitor = yemao['new_visitor']province = yemao['province']city = yemao['city']site = yemao['site']device = yemao['device']browser = yemao['browser']phone = yemao['phone']token = yemao['token']dorm = yemao['dorm']order_phone = yemao['order_phone']order_dormitory = yemao['order_dormitory']order_amount = yemao['order_amount']order_id = yemao['order_id']uname = yemao['uname']site_id = yemao['site_id']address = yemao['address']dorm_id = yemao['dorm_id']dormentry_id = yemao['dormentry_id']tag = yemao['tag']rid = yemao['rid']cart_quantity = yemao['cart_quantity']response = yemao['response']paytype = yemao['paytype']if yemao.has_key('data'):data = yemao['data']else:data = '0'data = '"'+str(data)+'"'if yemao.has_key('info'):info = yemao['info']else:info = '0'if yemao.has_key('status'):status = yemao['status']else:status = '0'log_date = int(sys.argv[1])if __name__ == '__main__':insert_sql="insert into yemao_logpy(id,time,url_from,url_to,url_current,ip,dorm_id,browser,log_date) values ('%s', '%s', '%s','%s','%s','%s','%s','%s', %d)" % (id,time,url_from,url_current,url_to,ip,dorm_id,browser,log_date)print insert_sqlhiveExe(insert_sql)print 'yemao_array_python2hive done'
[spark@Master Py_logproc]$
2、json2hive_python正式版
[spark@Master Py_logproc]$ pwd
/home/spark/opt/Log_Data/Py_logproc
[spark@Master Py_logproc]$ cat json2hive_python_recordasarray.py
# -*- encoding:utf-8 -*-
#!/usr/bin/env python
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
#from db import getDB
import json
import warnings
warnings.filterwarnings("ignore")def hiveExe(sql):try:transport = TSocket.TSocket('127.0.0.1', 10000)transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = ThriftHive.Client(protocol)transport.open()client.execute(sql)transport.close()except Thrift.TException, tx:print '%s' % (tx.message)if __name__=="__main__": import sysreload(sys)sys.setdefaultencoding( "utf-8" ) if len(sys.argv)==1: print "need argv" else: print sys.argv for json_array in open('/home/spark/opt/Log_Data/Py_logproc/log_tmpdir/yemaopythonlog'):yemao_array = json.loads(json_array)for yemao in yemao_array: print yemao['time']if not yemao.has_key('_reason'):id = yemao['id']time = yemao['time']url_from = yemao['url_from']url_current = yemao['url_current']url_to = yemao['url_to']options = yemao['options']ip = yemao['ip']uid = yemao['uid']new_visitor = yemao['new_visitor']province = yemao['province']city = yemao['city']site = yemao['site']device = yemao['device']browser = yemao['browser']phone = yemao['phone']token = yemao['token']dorm = yemao['dorm']order_phone = yemao['order_phone']order_dormitory = yemao['order_dormitory']order_amount = yemao['order_amount']order_id = yemao['order_id']uname = yemao['uname']site_id = yemao['site_id']address = yemao['address']dorm_id = yemao['dorm_id']dormentry_id = yemao['dormentry_id']tag = yemao['tag']rid = yemao['rid']cart_quantity = yemao['cart_quantity']response = yemao['response']paytype = yemao['paytype']if yemao.has_key('data'):data = yemao['data']else:data = '0'data = '"'+str(data)+'"'if yemao.has_key('info'):info = yemao['info']else:info = '0'if yemao.has_key('status'):status = yemao['status']else:status = '0'log_date = int(sys.argv[1])if __name__ == '__main__':insert_sql="insert into yemao_logpy(id,time,url_from,url_current,url_to,options,ip,uid,new_visitor,province,city,site,device,browser,phone,token,dorm,order_phone,order_dormitory,order_amount,order_id,uname,site_id,address,dorm_id,dormentry_id,tag,rid,cart_quantity,response,paytype,data,info,status,log_date) values ('%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', %s, '%s', '%s',%d)" % (id,time,url_from,url_current,url_to,options,ip,uid,new_visitor,province,city,site,device,browser,phone,token,dorm,order_phone,order_dormitory,order_amount,order_id,uname,site_id,address,dorm_id,dormentry_id,tag,rid,cart_quantity,response,paytype,data,info,status,log_date)hiveExe(insert_sql)print 'yemao_array_python2hive done'
[spark@Master Py_logproc]$
三、其他参考脚本
1、连接Hive执行SQL的脚本1
2、连接Hive执行SQL的脚本2
3、连接Hive执行SQL的脚本3
4、数据准备的脚本
1、连接Hive执行SQL的脚本1
[spark@Master Py_logproc]$ cat py2hive_pre1.py
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
def hiveExe(sql): try: transport = TSocket.TSocket('127.0.0.1', 10000) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = ThriftHive.Client(protocol) transport.open() client.execute(sql) print "The return value is : " print client.fetchAll() print "............" transport.close() except Thrift.TException, tx: print '%s' % (tx.message)
if __name__ == '__main__': hiveExe("show tables")
[spark@Master Py_logproc]$
2、连接Hive执行SQL的脚本2
[spark@Master Py_logproc]$ cat py2hive_pre2.py
#!/usr/bin/env python
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
def hiveExe(sql):try:transport = TSocket.TSocket('127.0.0.1', 10000) transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = ThriftHive.Client(protocol)transport.open()client.execute(sql)print "The return value is : " print client.fetchAll()print "............"transport.close()except Thrift.TException, tx:print '%s' % (tx.message)
if __name__ == '__main__':hiveExe("select * from yemao1_log limit 10")
[spark@Master Py_logproc]$
3、连接Hive执行SQL的脚本3
[spark@Master Py_logproc]$ cat 3333.py
#!/usr/bin/env python
import sys
sys.path.append('/home/spark/opt/hive-1.2.1/lib/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
def hiveExe(sql):try:transport = TSocket.TSocket('127.0.0.1', 10000) transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = ThriftHive.Client(protocol)transport.open()client.execute(sql)print "The return value is : " print client.fetchAll()print "............"transport.close()except Thrift.TException, tx:print '%s' % (tx.message)
if __name__ == '__main__':select_sql="select url_current from yemao1_log limit 100";hiveExe(select_sql)
[spark@Master Py_logproc]$
4、数据准备的脚本
[spark@Master Py_logproc]$ cat pre_data.sh
#/bin/bash
export yesterday=`date -d last-day +%Y%m%d`cd /home/spark/opt/Log_Data/Py_logproc
for tar in /home/spark/opt/Log_Data/yemao/yemao*$yesterday.tar.gz;
do
tar zxvf $tar -C /home/spark/opt/Log_Data/Py_logproc/log_tmpdir;
grep -h "\[{.*}\]" /home/spark/opt/Log_Data/Py_logproc/log_tmpdir/1450972318-15_12_24.log >> ./log_tmpdir/yemaopythonlog;
rm -rf /home/spark/opt/Log_Data/Py_logproc/log_tmpdir/*.log
done