少年游

欲买桂花同载酒,终不似,少年游。

0%

hive创建不同类型的空map

hive创建空map

  • 众所周知,hive默认的空类型的map对应的类型是map<string, string>
    1
    2
    3
    4
    5
    6
    7
    beeline> select map() as empty_map;
    +------------+
    | empty_map |
    +------------+
    | {} |
    +------------+
    1 row selected (0.583 seconds)

如何才能创建各种类型的map呢? 比如map<string, bigint>, map<string, boolean>, map<string, array<string>>

创建任意类型的空map, 要啥有啥

  • 先创建一个带类型map,然后删除key。
1
2
3
4
5
6
7
8
9
10
add jar hdfs:///lib/brickhouse-0.7.1-SNAPSHOT.jar;
create temporary function map_remove_keys as 'brickhouse.udf.collect.MapRemoveKeysUDF';

set hivevar:empty_bool_map=map_remove_keys(map('', false), array(''));
set hivevar:empty_bigint_map=map_remove_keys(map('', bigint(-2)), array(''));
set hivevar:empty_double_map=map_remove_keys(map('', double(-2)), array(''));
set hivevar:empty_string_map=map_remove_keys(map('', ''), array(''));
set hivevar:empty_list_map= map_remove_keys(map('', array()), array(''));

select ${empty_bool_map}, ${empty_bigint_map}, ${empty_double_map}, ${empty_string_map}, ${empty_list_map};

通过spark程序创建

  • val spark_udf_function = spark.udf.register()
    1
    2
    3
    4
    5
    6
    spark.udf.register("boolean_empty_map", () => Map[String, Boolean]())
    spark.udf.register("bigint_empty_map", () => Map[String, Long]())
    spark.udf.register("double_empty_map", () => Map[String, Double]())
    spark.udf.register("string_empty_map", () => Map[String, String]())
    spark.udf.register("list_empty_map", () => Map[String, Array[String]]())
    spark.sql("SELECT boolean_empty_map(), bigint_empty_map(), double_empty_map(), string_empty_map(), list_empty_map()").printSchema()

map合并union_map

  • union_map能够相同类型的map

    1
    2
    3
    4
    add jar hdfs:///lib/brickhouse-0.7.1-SNAPSHOT.jar;
    CREATE TEMPORARY FUNCTION union_map AS 'brickhouse.udf.collect.UnionUDAF';

    select id, union_map(m) from (select 1 as id, map('a','1', 'b','2') as m union all select 1 as id, map('a', 0, 'c','3', 'd','4') as m) t group by id;
  • 小 bug, union_map 在合并 key 带有冲突的情况,由于并行原因,不能实现新加入的 key 覆盖。

    1
    2
    3
    4
    5
    6
    7
    8
    select id,union_map(m) from 
    (
    select 1 as id, map('a','1', 'b','2') as m
    union all
    select 1 as id, map('a', '1', 'c','3', 'd','4') as m
    union all
    select 1 as id, map('a', '0', 'c','4', 'd','5') as m
    ) t group by id;

参考

[1]. brickhouse udf