1. Take sample data source for use case from below link:
http://www.grouplens.org/system/files/ml-1m.zip
2. It contains data around movies, users, ratings. unzip it.
3. Below are the 3 files in archive:
movies.dat, ratings.dat, users.dat
4. Files in above are delimited by '::' just to have better readability (and one example to handle delimiter) change the delimiter to something other, you can keep the same, I am changing it to '#'
sed 's/::/#/g' movies.dat
sed 's/::/#/g' users.dat
sed 's/::/#/g' ratings.dat
Contents of the file would be:
movies:
structure:
id#name#genre
sample data :
1#Toy Story (1995)#Animation|Children's|Comedy
2#Jumanji (1995)#Adventure|Children's|Fantasy
3#Grumpier Old Men (1995)#Comedy|Romance
4#Waiting to Exhale (1995)#Comedy|Drama
users:
structure:
id#gender#age#occupationid#zipcode
sample data:
1#F#1#10#48067
2#M#56#16#70072
3#M#25#15#55117
4#M#45#7#02460
5#M#25#20#55455
ratings:
structure:
userid#movieid#rating#tmstmp
Sample Data:
1#1193#5#978300760
1#661#3#978302109
1#914#3#978301968
1#3408#4#978300275
1#2355#5#978824291
just to have meaningful data, create an occupation data set
create a file named occupation.dat with below data:
vim occupation.dat
copy paste below and save the file.
0#other/not specified
1#academic/educator
2#artist
3#clerical/admin
4#college/grad student
5#customer service
6#doctor/health care
7#executive/managerial
8#farmer
9#homemaker
10#K-12 student
11#lawyer
12#programmer
13#retired
14#sales/marketing
15#scientist
16#self-employed
17#technician/engineer
18#tradesman/craftsman
19#unemployed
20#writer
Move the above files into the HDFS:
I have created 4 directories in /hive/data named user, movie, rating, occupation
hadoop fs -put occupations.dat /hive/data/occupation
hadoop fs -put users.dat /hive/data/user
hadoop fs -put movies.dat /hive/data/movie
hadoop fs -put ratngs.dat /hive/data/rating
http://www.grouplens.org/system/files/ml-1m.zip
2. It contains data around movies, users, ratings. unzip it.
3. Below are the 3 files in archive:
movies.dat, ratings.dat, users.dat
4. Files in above are delimited by '::' just to have better readability (and one example to handle delimiter) change the delimiter to something other, you can keep the same, I am changing it to '#'
sed 's/::/#/g' movies.dat
sed 's/::/#/g' users.dat
sed 's/::/#/g' ratings.dat
Contents of the file would be:
movies:
structure:
id#name#genre
sample data :
1#Toy Story (1995)#Animation|Children's|Comedy
2#Jumanji (1995)#Adventure|Children's|Fantasy
3#Grumpier Old Men (1995)#Comedy|Romance
4#Waiting to Exhale (1995)#Comedy|Drama
users:
structure:
id#gender#age#occupationid#zipcode
sample data:
1#F#1#10#48067
2#M#56#16#70072
3#M#25#15#55117
4#M#45#7#02460
5#M#25#20#55455
ratings:
structure:
userid#movieid#rating#tmstmp
Sample Data:
1#1193#5#978300760
1#661#3#978302109
1#914#3#978301968
1#3408#4#978300275
1#2355#5#978824291
just to have meaningful data, create an occupation data set
create a file named occupation.dat with below data:
vim occupation.dat
copy paste below and save the file.
0#other/not specified
1#academic/educator
2#artist
3#clerical/admin
4#college/grad student
5#customer service
6#doctor/health care
7#executive/managerial
8#farmer
9#homemaker
10#K-12 student
11#lawyer
12#programmer
13#retired
14#sales/marketing
15#scientist
16#self-employed
17#technician/engineer
18#tradesman/craftsman
19#unemployed
20#writer
Move the above files into the HDFS:
I have created 4 directories in /hive/data named user, movie, rating, occupation
hadoop fs -put occupations.dat /hive/data/occupation
hadoop fs -put users.dat /hive/data/user
hadoop fs -put movies.dat /hive/data/movie
hadoop fs -put ratngs.dat /hive/data/rating
1 comment:
best interior designer in hyderabad
Vaishnavi Interiors the most renowned and leading interior designing company in Hyderabad, Telangana was established in 2009 by a group of interior designers, architects and interior decorator professionals. Our Vaishnavi Interior designers are the best interior designers in Hyderabad and deal with all kinds of interior designs, home interior design, home decors, commercial designing.Our interior archietct's turnkey project execution method offers a complete interior design package, which helps you get the best interior designs and interior decorator services for your home at affordable price and within the committed timelines for delivery.
https://www.vaishnaviinteriors.in/
Post a Comment