Stevens Institute of Technology Hadoop Program Worksheet

Description

In each case, write a program using Hadoop (map/reduce) and the language of your choice, to:Find the distribution of bigrams for your dataset (only digits, not the decimal point A bigram is 2 successive digits/letters/etc. For example, the string 938193 has 3 (93, 81, 93). The distribution would be: 93 – 2, and 81 – 1 . Assume that the data set is large enough so that bigrams at the boundaries of nodes are not significant (most likely you will have only 1 mapper in any case since this is a very small dataset, so it won’t be an issue.Your submission should be copied into MSWord, and should include (in one file):Your “mapper” programThe K/V value pairs emitted by your mapperYour “reducer” programThe K/V pairs emitted by your reducerThe Answers
Data:First 1000 Digits of Pi: 3.14159265358979323846264338327950288419716939937510 58209749445923078164062862089986280348253421170679 82148086513282306647093844609550582231725359408128 48111745028410270193852110555964462294895493038196 44288109756659334461284756482337867831652712019091 45648566923460348610454326648213393607260249141273 72458700660631558817488152092096282925409171536436 78925903600113305305488204665213841469519415116094 33057270365759591953092186117381932611793105118548 07446237996274956735188575272489122793818301194912 98336733624406566430860213949463952247371907021798 60943702770539217176293176752384674818467669405132 00056812714526356082778577134275778960917363717872 14684409012249534301465495853710507922796892589235 42019956112129021960864034418159813629774771309960 51870721134999999837297804995105973173281609631859 50244594553469083026425223082533446850352619311881 71010003137838752886587533208381420617177669147303 59825349042875546873115956286388235378759375195778 1857780532171226806613001927876611195909216420198

1 program

