The past few days I have been playing with Hive for some data analysis and I wanted to put down what I learned
a) Exporting data from hive to csv
If you are using hue, then it provides a convenient way to export to csv or excel format. But if not then you can use the following preamble before the select statement
INSERT OVERWRITE LOCAL DIRECTORY '/path/out.csv' ROW FORMATTED DELIMITED FIELDS TERMINATED BY ','
a) Exporting data from hive to csv
If you are using hue, then it provides a convenient way to export to csv or excel format. But if not then you can use the following preamble before the select statement
INSERT OVERWRITE LOCAL DIRECTORY '/path/out.csv' ROW FORMATTED DELIMITED FIELDS TERMINATED BY ','
b) Hive does not allow "select" statements in the "where" clause. for example
SELECT file_id, file_type
FROM file_metadatas
WHERE file_type IN (select types from allowed_file_types)
WILL NOT WORK in Hive, but will work in MySQL
Instead one can use a join
SELECT a.file_id, a.file_type
FROM file_metadatas a
JOIN allowed_file_types b
ON b.types = a.file_type
c) Date functions: Computing a date that is ~6 months behind the current date
DATE_SUB(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP( ))),180)
d) hive regex: Regular expressions can be used to extract parts of a text field. Here is an example
Extracting the file extension: regexp_extract(file_name,'\\.[0-9a-zA-Z]+$',0)) extracts the 0th (1st) match of file_name with the input string
e) string manipulation support: Strings can be manipulated in hive. One such example is the "lower" function that converts all characters to lower