KSU Programming Data Wrangling Excel Project

Description

Prepare Data for Listings123 🛀Use the information in the email you received from your manager, Stevie, along with the full Listings123 data (DataExport_2020). You will be preparing your data and applying data cleaning best practices to the DataExport_2020 data. Make sure you fill out the Prompts sheet when completing each step.This project’s email thread is located below, in case you need to reference it.
You will prepare your data for analysis by applying the following steps:Explore the DataSummarize Data Using a Pivot TableClean Inconsistent, Invalid, Duplicate, and Missing DataClean Text Data I (Search and Count)Clean Text Data II (Split Text to Columns)Merge TablesSummarize Data WranglingIMPORTANT
Before submitting your project, make sure:
⎯⎯ All the column names AND sheet names are named exactly as stated in the instructions. Do not change any existing names in the spreadsheet unless instructed.
⎯⎯ The indicated columns are organized in the correct sheet.

⎯⎯ True and False values are stated as TRUE and FALSE, not 1/0, T/F, etc.
Otherwise, your project will be graded incorrectly.TIP
If you’d like more of a challenging experience or less guidance when completing this project do not open the HINT drop downs until after you’ve attempted the project step.STEP 1: Explore the DataBefore jumping into wrangling your data, it’s important to explore the data and address the requests mentioned in the email.INSTRUCTIONSGo to the Step 1 in the Prompts sheet in your spreadsheet and answer Questions 1-3. You will need to reference the DataExport_2020 sheet to answer these questions.Before answering Question 4 in the Prompts sheets, calculate the Number of Reviews Per Day column in the DataExport_2020 sheet.HINTThen proceed to answer Step 1, Question 4 in the Prompts sheet
NOTE
All of your work for Step 1 should be in the Prompts (Step 1) and DataExport_2020 sheets. STEP 2: Summarize Data Using A Pivot Table Listings123 is hoping to find specific traits within the listings data that might indicate the success of a host. Listings123 believes it might be different for each city. Summarize the data by city using a Pivot Table. INSTRUCTIONSCreate a Pivot Table in the Summary By City sheet using the data from the DataExport_2020 sheet. The pivot table should contain the following fields summarized by City.COUNT of Host ID
(Rename this field as “COUNT of Listings”)COUNTUNIQUE of Host IDAVERAGE of BathroomsAVERAGE of BedroomsAVERAGE of BedsSUM of Number of ReviewsAVERAGE of Days ListedAVERAGE of Review Scores LocationAVERAGE of Review Scores ValueHINTCreate a new field in column L of the Summary By City sheet that finds the percentage of listings with existing hosts. Name this column “% of Listings with Repeat Hosts”HINTFormat your pivot table to make it legible. Apply customization best practices that include your new column.HINTGo to the Step 2 in the Prompts sheet and answer Questions 1-3. You will need to reference the Summary By City sheet to answer these questions.NOTE
All of your work for Step 2 should be in the Prompts (Step 2) and Summary By City sheets.STEP 3: Clean Inconsistent, Invalid, Duplicate, and Missing DataMid-way through your project, you’re messaging David (Database Administrator) directly, for more information about the quality of the data, and you receive the following Slack message: Analysts do find bad listings in the data. We once found a listing with 852 bedrooms! I have been removing these listings from the data, and the engineering team has worked to implement validation checks to prevent this type of dirty data.
I’d recommend you clean DataExport_2020 before continuing your project.As an analyst, you should always check your data before starting your analysis whether you are cleaning your data for dirty data or to prepare it to fit the analysis.For this step you’ll be working in a copy of DataExport_2020 sheet called Copy of DataExport_2020.
IMPORTANT
You’ll notice there are new blank columns (J:Q) in the Copy of DataExport_2020 sheet. Do not remove them as part of your data cleaning process. Feel free to ignore them for now. You will come back to them later in the project.INSTRUCTIONSGo to the Copy of DataExport_2020 sheet in your spreadsheet and examine the data.

Use the following checklist to start your data cleaning process:HINTGo through the Copy of DataExport_2020 sheet and clean the inconsistent data types.HINTClean the invalid values in the Copy of DataExport_2020 sheet. Ignore columns I and X.HINTClean the duplicate rows in the Copy of DataExport_2020 sheet. HINTClean the missing values in the Copy of DataExport_2020 sheet. Ignore columns J:Q.
Based on information you gathered from David you are able to replace missing values, such as:
⎯⎯ If the Room Type is a “Private room” or “Shared room” replace the blank Bathrooms with 1, Beds with 1, or Bedrooms with 1.
⎯⎯ There are ~17 empty values in the Amenities column. If this is less than 10% of the data delete the rows with empty amenity values. Otherwise, replace accordingly.
You want to get rid of the #DIV/0! error message in the Number of Reviews Per Day column. (You originally calculated this in DataExport_2020.) You know this error is happening because there are values being divided by 0. To remove the error, you want to adjust your formula to account for specific conditions.
Go to the Number of Reviews Per Day (Clean) column in the Copy of DataExport_2020 sheet and adjust your formula to divide the Number of Reviews by the Days Listed, if the value for the days listed is greater than 0, otherwise, divide the Number of Reviews by 1.HINTGo to the Step 3 in the Prompts sheet and answer Questions 1. You will need to reference the Copy of DataExport_2020 sheet to answer these questions.TIP
In Step 7, Question 1 in your Prompts sheet you will have to document all the data cleaning steps you have completed in your analysis. It’s recommended you start noting them down now.NOTE
All of your work for Step 3 should be in the Prompts (Step 3) and Copy of DataExport_2020 sheets.STEP 4 – Clean Text Data I (Search and Count)You know that amenities are a key factor for renters choosing between listings. This is a column you want to analyze in the future, but notice that right now it is not in a clean format.Continue using the following checklist in your data cleaning process:

HINTINSTRUCTIONSWhen pulling the data, there were some amenities that didn’t transfer correctly. Specifically, there are two non-descriptive amenities in the amenities columns. Find and replace the following amenities with “Other” in the Copy of DataExport_2020 sheet:
⎯⎯ “translation missing: en.hosting_amenity_49”
⎯⎯”translation missing: en.hosting_amenity_50″HINTIn the same sheet indicate whether each listing has the amenities listed below in its assigned column. Return True or False.Internet (column J)Kitchen (column K)24-Hour Check-In (column L)Free Parking (column M)Private Entrance (column N)Smoking Allowed (column O)Other (column P)HINTGo to the Step 4 in the Prompts sheet and answer Question 1. You will need to reference the Copy of DataExport_2020 sheet to answer this question.HINTTIP
In Step 7, Question 1 in your Prompts sheet you will have to document all the data cleaning steps you have completed in your analysis. It’s recommended you continue noting them down now.NOTE
All of your work for Step 4 should be in the Prompts (Step 4), Copy of DataExport_2020, and Scratch Work sheets.STEP 5 – Clean Text Data II (Split Text to Columns)Continue using the following checklist in your data cleaning process:INSTRUCTIONSCopy the Amenities column from the Copy of DataExport_2020 sheet and paste it in column A of the Scratch Work sheet.Split the text within the Amenities column into separate columns in the Scratch Work sheet.Calculate the Number of Amenities column in the Copy of DataExport_2020 sheet.HINT

Go to the Step 5 in the Prompts sheet and answer Questions 1-2. You will need to reference the Copy of DataExport_2020 sheet to answer these questions. HINTTIP
In Step 7, Question 1 in your Prompts sheet you will have to document all the data cleaning steps you have completed in your analysis. It’s recommended you continue noting them down now. NOTE
All of your work for Step 5 should be in the Prompts (Step 5), Copy of DataExport_2020, and Scratch Work sheets. STEP 6: Merge Tables While preparing your data your manager, Stevie sends another email message:

STEP 6: Merge Tables While preparing your data your manager, Stevie sends another email message:
Hi, great to meet you! I have attached the data below. Let me know if you have any questions. 📎 Click here to access Alexis’ data INSTRUCTIONSManually copy the data into your current spreadsheet in the sheet called Restaurants Hotels. Update the formatting if necessary.Fill in the missing values in the Number of full-service restaurants column in the Restaurants Hotels sheet by manually looking up the data in city-data.com.HINTIn the Merged City Data sheet paste the values from the Summary By City sheet (excluding the Grand Total row). Merge the Number of full-service restaurants and Fire-safe hotels and motels data with the data in the Restaurants Hotels sheet.HINTYou’ll notice there is no data for New Orleans, LA. Alexis has been using a computer program to automatically pull data from city-data.com, but due to an error, the program is not able to collect data for New Orleans, Look up Number of full-service restaurants and Fire-safe hotels and motels data from city-data.com for New Orleans, LA, and manually enter it into the Merged City Data sheet.Go to the Step 6 in the Prompts sheet and answer Questions 1-2. You will need to reference the Merged City Data sheet to answer these questions.TIP
In Step 7, Question 1 in your Prompts sheet you will have to document all the data cleaning steps you have completed in your analysis. It’s recommended you continue noting them down now.NOTE
All of your work for Step 6 should be in the Prompts (Step 6), Restaurants Hotels, and Merged City Data sheets.

STEP 7: Summarize Data WranglingSuppose a teammate on the analytics team is onboarding on the project and asked about your data wrangling steps thus far. The goal is for your teammate to have a smooth onboarding process given the clear organization and in-depth detail of your response.INSTRUCTIONSGo to the Step 7 in the Prompts sheet and answer Question 1. Use numbered steps to document your wrangling process.NOTE
All of your work for Step 7 should be in the Prompts (Step 7) sheet.That’s a wrap! 🌯Hooray!!! 🎉 You’ve completed Part 2 of the final project. That was a big lift, but very important work you will complete as an analyst.

Explanation & Answer:
1 Excel Project

User generated content is uploaded by users for the purposes of learning and should be used following Studypool’s honor code & terms of service.

Reviews, comments, and love from our customers and community:

Article Writing

Keep doing what you do, I am really impressed by the work done.

Alexender

Researcher

PowerPoint Presentation

I am speechless…WoW! Thank you so much!

Stacy V.

Part-time student

Dissertation & Thesis

This was a very well-written paper. Great work fast.

M.H.H. Tony

Student

Annotated Bibliography

I love working with this company. You always go above and beyond and exceed my expectations every time.

Francisca N.

Student

Book Report / Review

I received my order wayyyyyyy sooner than I expected. Couldn’t ask for more.

Mary J.

Student

Essay (Any Type)

On time, perfect paper

Prof. Kate (Ph.D)

Student

Case Study

Awesome! Great papers, and early!

Kaylin Green

Student

Proofreading & Editing

Thank you Dr. Rebecca for editing my essays! She completed my task literally in 3 hours. For sure will work with her again, she is great and follows all instructions

Rebecca L.

Researcher

Critical Thinking / Review

Extremely thorough summary, understanding and examples found for social science readings, with edits made as needed and on time. Transparent

Arnold W.

Customer

Coursework

Perfect!

Joshua W.

Student

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>