University of Tampa Python Spark Project


Answer these questions using Spark code. Submit your code (in a py file) and the answers to the questions (in a text file). The answers should use the full dataset, not the small dataset. Start with the code shown below. (Hint: for any tasks that say max/largest, don’t use sortByKey, because that’s much slower than a better option.)Which day had the largest number of installed drives, and what was this number?How many distinct drives (by model+serial) are installed (i.e., that exist in the data) in each year?What’s the max drive capacity per year?Full dataset: change the file path to: file:///ssd/data/backblaze.csv (146 million rows) – my solution took 17minRun spark like this: spark-submit –master=local[5]Or to hide log messages: spark-submit –master=local[5] 2> /dev/nullLook at /home/jeckroth/cinf201/2022-spring/spark/ for some more example code.Starting code with some examples that you can remove:””””from pyspark.sql import SparkSession

spark = SparkSession.builder.appName(“Backblaze”).getOrCreate()

schema = “day DATE, serial STRING, model STRING, capacity LONG, failure INTEGER”
d =“file:///home/jeckroth/cinf201/spark/assignment/small-backblaze.csv”, format=”csv”, sep=”,”, header=”true”)
d = d.rdd

# print first 10 rows

## How many failures occurred each year?

# make key (year) & value (failure 0/1)
d2 = row: (, row.failure))
# add up failures per year
failureCounts = d2.reduceByKey(lambda cnt, rowcnt: cnt + rowcnt)

## Which model (not serial number) has the most failures overall?

# grab model & failure from data, model is the key
d3 = row: (row.model, row.failure))
# count failures for that model; result so far: [(modelX, 55), (modelY, 2100)]
d3 = d3.reduceByKey(lambda cnt, rowcnt: cnt + rowcnt)
# flip keys and values; result so far: [(55, modelX), (2100, modelY)]
d3 = pair: (pair[1], pair[0]))
# sort by value (second in the pair)
d3 = d3.sortByKey(ascending=False) ### NOT EFFICIENT TECHNIQUE

1 attachmentsSlide 1 of 1attachment_1attachment_1



Python Spark

User generated content is uploaded by users for the purposes of learning and should be used following Studypool’s honor code & terms of service.

Looking for this assignment?

do my essay homework

Reviews, comments, and love from our customers and community

Article Writing

Great service so far. Keep doing what you do, I am really impressed by the work done.



PowerPoint Presentation

I am speechless…WoW! Thank you so much! Definitely, the writer is talented person. She provided me with an essay a day early before the due date!

Stacy V.

Part-time student

Dissertation & Thesis

This was a very well-written paper. Great work fast. I was in pretty desperate need for help to finish this paper before the due date, which was in nine hours.

M.H.H. Tony


Annotated Bibliography

I love working with this company. You always go above and beyond and exceed my expectations every time. Kate did a WONDERFUL job. I would highly recommend her.

Francisca N.


Book Report / Review

I received my order wayyyyyyy sooner than I expected. Couldn’t ask for more. Very good at communicating & fast at replying. And change & corrections she put in the effort to go back and change it!

Mary J.


Essay (Any Type)

On time, perfect paper. All concerns & matters I had Tom was able to answer them! I will definitely provide him with more orders!

Prof. Kate (Ph.D)


Case Study

Awesome! Great papers, and early! Thank you so much once again! Definitely recommend to trust James with your assignments! He won’t disappoint!

Kaylin Green


Proofreading & Editing

Thank you Dr. Rebecca for editing my essays! She completed my task literally in 3 hours. For sure will work with her again, she is great and follows all instructions

Rebecca L.


Critical Thinking / Review

Extremely thorough summary, understanding and examples found for social science readings, with edits made as needed and on time. It’s like having a tutoring service available (:

Arnold W.



Perfect!I only paid about $80, which i think was a good price considering what my paper entailed. My paper was done early and it was well written!

Joshua W.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>