Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
388 views
in Technique[技术] by (71.8m points)

HiveQL: How to write a query to select and filter records based on nested JSON array values

In our logging database we store custom UI data as a serialized JSON string. I have been using lateral view json_tuple() to traverse the JSON object and extract nested values. However, I need to filter some of my query results based on whether an array of objects contains certain values or not. After doing some digging I think I need to use lateral view explode(), but I am not a HiveQL expert and I'm not sure exactly how to use this in the way I need.

EX: (simplified for clarity and brevity)

// ui_events table schema
eventDate, eventType, eventData

// serialized JSON string stored in eventData
{ foo: { bar: [{ x: 1, y: 0 }, { x: 0, y: 1 }] } }

// HiveQL query
select
  eventDate,
  x,
  y

from ui_events

lateral view json_tuple(eventData, 'foo') as foo

lateral view json_tuple(foo, 'bar') as bar

// <-- how to select only sub-item(s) in bar where x = 0 and y = 1

where
  eventType = 'custom'
  and // <-- how to only return records where at least 1 `bar` item was found above?

Any help would be greatly appreciated. Thanks!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Read comments in the code. You can filter the dataset as you want:

with
my_table as(
select stack(2, '{ "foo": { "bar": [{ "x": 1, "y": 0 }, { "x": 0, "y": 1 }] } }',
                '{ "foo": { } }'
            ) as EventData
) 

select * from
(
select --get_json_object returns string, not array. 
       --remove outer [] 
       --and replace delimiter between },{ with ,,,
       --to be able to split array
       regexp_replace(regexp_replace(get_json_object(EventData, '$.foo.bar'),'^\[|\]$',''),
                      '\},\{', '},,,{'
                      )bar
  from my_table t
) s --explode array
    lateral view explode (split(s.bar,',,,')) b as bar_element
    --get struct elements
    lateral view json_tuple(b.bar_element, 'x','y') e as x, y

Result:

s.bar                           b.bar_element   e.x e.y
{"x":1,"y":0},,,{"x":0,"y":1}   {"x":1,"y":0}   1   0
{"x":1,"y":0},,,{"x":0,"y":1}   {"x":0,"y":1}   0   1

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...