+------------+------------+-----------+--------------------+--------------------+--------------------+------------------+------------------+-------------------+-----------+
|l_returnflag|l_linestatus|sum_qty|sum_base_price|sum_disc_price|sum_charge|avg_qty|avg_price|avg_disc|count_order|
+------------+------------+-----------+--------------------+--------------------+--------------------+------------------+------------------+-------------------+-----------+
|N|O|7.5697385E7|1.135107538838699...|1.078345555027154...|1.121504616321447...|25.501957856643052|38241.036487881756|0.04999335309103123|2968297|
+------------+------------+-----------+--------------------+--------------------+--------------------+------------------+------------------+-------------------+-----------+
scala> sqlContext.sql("CREATE TEMPORARY VIEW item USING com.aliyun.oss " +
|"OPTIONS (" +
|"oss.bucket 'select-test-sz', " +
|"oss.prefix 'data', " +
|"oss.schema 'L_ORDERKEY long, L_PARTKEY long, L_SUPPKEY long, L_LINENUMBER int, L_QUANTITY double, L_EXTENDEDPRICE double, L_DISCOUNT double, L_TAX double, L_RETURNFLAG string, L_LINESTATUS string, L_SHIPDATE string, L_COMMITDATE string, L_RECEIPTDATE string, L_SHIPINSTRUCT string, L_SHIPMODE string, L_COMMENT string'," +
|"oss.data.format 'csv'," + // we only support csv now
|"oss.input.csv.header 'None'," +
|"oss.input.csv.recordDelimiter 'n'," +
|"oss.input.csv.fieldDelimiter '|'," +
|"oss.input.csv.commentChar '#'," +
|"oss.input.csv.quoteChar '"'," +
|"oss.output.csv.recordDelimiter 'n'," +
|"oss.output.csv.fieldDelimiter ','," +
|"oss.output.csv.commentChar '#'," +
|"oss.output.csv.quoteChar '"'," +
|"oss.endpoint 'oss-cn-shenzhen.aliyuncs.com', " +
|"oss.accessKeyId 'Your Access Key Id', " +
|"oss.accessKeySecret 'Your Access Key Secret')")
res2: org.apache.spark.sql.DataFrame = []
scala> sqlContext.sql("select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from item where l_shipdate > '1997-09-16' group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus").show()
scala> sqlContext.sql("select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from item where l_shipdate > '1997-09-16' group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus").show()
+------------+------------+-----------+--------------------+--------------------+--------------------+------------------+-----------------+-------------------+-----------+
|l_returnflag|l_linestatus|sum_qty|sum_base_price|sum_disc_price|sum_charge|avg_qty|avg_price|avg_disc|count_order|
+------------+------------+-----------+--------------------+--------------------+--------------------+------------------+-----------------+-------------------+-----------+
|N|O|7.5697385E7|1.135107538838701E11|1.078345555027154...|1.121504616321447...|25.501957856643052|38241.03648788181|0.04999335309103024|2968297|
+------------+------------+-----------+--------------------+--------------------+--------------------+------------------+-----------------+-------------------+-----------+从下图可以看出:在Spark SQL上使用OSS Select查询数据耗时为38s,在Spark SQL上不使用OSS Select查询数据耗时为2.5min,使用OSS Select可大幅度加快查询速度 。
【Spark使用OSS Select加速数据查询】了解更多请加微信:kinnah333
推荐阅读
- 通过OSS 构建大数据分析PB级数仓
- 别克昂科威点烟器怎么使用
- 壁挂炉停暖如何使用 壁挂炉怎么停暖气功能
- 《怪物猎人崛起》大剑流斩连段使用心得
- 使用洗衣机方式不当比没洗之前更脏
- 移动式脚手架使用规范 移动脚手架使用规范是什么
- 华为电话手表怎么使用
- 如何使用指南针确定方向,如何使用指南针确定方向的方法
- 宝骏510寿命
- 一辆汽车能开多少年
