Thursday, June 29, 2017

Sample Data Set for Hive Practice

1. Take sample data source for use case from below link:

                    http://www.grouplens.org/system/files/ml-1m.zip

2. It contains data around movies, users, ratings.  unzip it.

3. Below are the 3 files in archive:

  movies.dat, ratings.dat, users.dat

4. Files in above are delimited by '::' just to have better readability (and one example to handle delimiter) change the delimiter to something other, you can keep the same, I am changing it to '#'

sed 's/::/#/g' movies.dat
sed 's/::/#/g' users.dat
sed 's/::/#/g' ratings.dat

Contents of the file would be:

movies:

structure: 
id#name#genre

sample data :
1#Toy Story (1995)#Animation|Children's|Comedy
2#Jumanji (1995)#Adventure|Children's|Fantasy
3#Grumpier Old Men (1995)#Comedy|Romance
4#Waiting to Exhale (1995)#Comedy|Drama

users:

structure:
id#gender#age#occupationid#zipcode

sample data:
1#F#1#10#48067
2#M#56#16#70072
3#M#25#15#55117
4#M#45#7#02460
5#M#25#20#55455


ratings:

structure:
userid#movieid#rating#tmstmp

Sample Data:
1#1193#5#978300760
1#661#3#978302109
1#914#3#978301968
1#3408#4#978300275
1#2355#5#978824291

just to have meaningful data, create an occupation data set

create a file named occupation.dat with below data:

vim occupation.dat

copy paste below and save the file.

0#other/not specified
1#academic/educator
2#artist
3#clerical/admin
4#college/grad student
5#customer service
6#doctor/health care
7#executive/managerial
8#farmer
9#homemaker
10#K-12 student
11#lawyer
12#programmer
13#retired
14#sales/marketing
15#scientist
16#self-employed
17#technician/engineer
18#tradesman/craftsman
19#unemployed
20#writer


Move the above files into the HDFS:

I have created 4 directories in /hive/data named user, movie, rating, occupation

hadoop fs -put occupations.dat /hive/data/occupation
hadoop fs -put users.dat /hive/data/user
hadoop fs -put movies.dat /hive/data/movie
hadoop fs -put ratngs.dat /hive/data/rating

1 comment:

pavani said...

best interior designer in hyderabad
Vaishnavi Interiors the most renowned and leading interior designing company in Hyderabad, Telangana was established in 2009 by a group of interior designers, architects and interior decorator professionals. Our Vaishnavi Interior designers are the best interior designers in Hyderabad and deal with all kinds of interior designs, home interior design, home decors, commercial designing.Our interior archietct's turnkey project execution method offers a complete interior design package, which helps you get the best interior designs and interior decorator services for your home at affordable price and within the committed timelines for delivery.
https://www.vaishnaviinteriors.in/