Chapter 2 Data sources

2.1 Glassdoor

One dataset we used is gathered from the Job Market Report from Glassdoor. This website has the monthly data from October 2016 to November 2019.

Each monthly dataset mainly contains the national and ten cities’ job opening number and median base pay for different types of jobs, for different industry and for different company sizes.

The dataset looks like:

##      Metro Dimension.Type   Month          Dimension            Measure
## 1 National    Quick Facts 2019-03  U.S. Job Openings  U.S. Job Openings
## 2 National    Quick Facts 2019-03    U.S. Median Pay    U.S. Median Pay
## 3  Atlanta    Quick Facts 2019-03 Metro Job Openings Metro Job Openings
## 4  Atlanta    Quick Facts 2019-03  U.S. Job Openings  U.S. Job Openings
## 5  Atlanta    Quick Facts 2019-03   Metro Median Pay   Metro Median Pay
## 6  Atlanta    Quick Facts 2019-03    U.S. Median Pay    U.S. Median Pay
##       Value     YoY
## 1 3,906,967 -24.50%
## 2  $52,748    1.40%
## 3    79,074 -21.00%
## 4 3,906,967 -24.50%
## 5  $54,826    1.90%
## 6  $52,748    1.40%

We encountered a major problem dealing with this data source. Glassdoor restructured and updated the datasets while we were doing data analysis part. Only the datasets from June 2017 to March 2019 are left with the same format and structure, so we only used these datasets as well as the Oct 2019 dataset (which we saved earlier).

2.2 Indeed

Since the dataset from Glassdoor contains only job openings in 10 cities, and we are also interested in the frequent words in the job descriptions of different jobs, more data are needed.

Initially, we found that Indeed provides a job searching API, which allows users to get access to data about job openings. Its results would contain information about job titles, recruiting companies and locations. However, our application for API was denied. Then, we decided to scrape data from Indeed (only a limited amount!). This is the sample search results.

When searching on Indeed, users can filter the results with key words, job titles, locations or companies. In the web scraping pipeline, search results on job openings with title of “Data scientist”, “Data Analyst”, “Financial Analyst” and “Business Analyst” around the US were included. At the first phase, scraping results are like:

Only a preview of the job description is included, because the complete content of job description requires extra web requests, and we are concerned about its cost to Indeed server.

However, the analysis of the preview is not satisfying, so in the next phase, we also scraped the complete job descriptions. The complete job description page looks like this: sample details page

In the search results, there exist a bunch of duplicates. That’s because Indeed repeatedly presented some promotions/ads for some companies.

Eventually, the scraping results contain 4 files, each of which contains search results of one job title in the US. Duplicates were removed. A preview of the data is as below:

##                                                             title
## 1                                           Intern Data Scientist
## 2                                           Data Scientist Intern
## 3 2020 Summer Internship - MIS Data Analyst/Scientist - Phoeni...
## 4                  2020 Summer Data Scientist Internship (Dayton)
## 5 2020 PhD Data Scientist Internship - Policy, Research, & Eco...
## 6                                           Data Scientist Intern
##            company empn_rate posted_time
## 1 OGE Energy Corp.       1.0  4 days ago
## 2            Chubb       3.8  4 days ago
## 3 Freeport McMoRan       4.1  4 days ago
## 4         Centauri       4.5  6 days ago
## 5             Uber       3.5  6 days ago
## 6          Premise       3.7  7 days ago
##                                         location salary
## 1                              Oklahoma City, OK   <NA>
## 2                                Jersey City, NJ   <NA>
## 3          Phoenix, AZ 85004 (Central City area)   <NA>
## 4                                Beavercreek, OH   <NA>
## 5 San Francisco, CA 94103 (South Of Market area)   <NA>
## 6                                 Washington, DC   <NA>
##                                                                                                                                                          preview
## 1       Experience working with geospatial data.Ability to map business problems to analytical techniques and data.Performs planned supervised work assignments.
## 2  Data Science Intern / Data Engineering Intern.Algorithm design and development to refine data processing.Chubb's Information Technology group manages all of…
## 3 Analytics student who enjoys grappling with vast data sets (and has experience with SQL), tapped from our long-standing investment in collecting and cleaning…
## 4                                   Monitor data statistics to identify and interpret the trend in a set of complex data resulting in the findings of a pattern.
## 5 We have ongoing projects with academics, conduct our own data investigations, and support the public policy and communications teams with other public-facing…
## 6       This position will be involved in data collection, data cleaning, data integrity, general data munging/wrangling, and data integrity.What you get to do:
##                                                                            link
## 1 http://www.indeed.com/rc/clk?jk=e025426c7f56cdf9&fccid=e6a4afc5eacd4c1c&vjs=3
## 2 http://www.indeed.com/rc/clk?jk=00e3b7080d16a9fa&fccid=0b74c73a7d280485&vjs=3
## 3 http://www.indeed.com/rc/clk?jk=ff5312b3cdee9450&fccid=2e1d4d51c6f0e2bb&vjs=3
## 4 http://www.indeed.com/rc/clk?jk=6ad3af86ebc762b4&fccid=2540607df11515db&vjs=3
## 5 http://www.indeed.com/rc/clk?jk=b79f1e2c04dd2f9b&fccid=f766f8bfbc3effb7&vjs=3
## 6 http://www.indeed.com/rc/clk?jk=e78e987b66c7273c&fccid=950f3c78af807bb4&vjs=3
##            job_key empn_id   job_type
## 1 e025426c7f56cdf9      NA internship
## 2 00e3b7080d16a9fa      NA internship
## 3 ff5312b3cdee9450      NA internship
## 4 6ad3af86ebc762b4      NA internship
## 5 b79f1e2c04dd2f9b      NA internship
## 6 e78e987b66c7273c      NA internship
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     detail
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Position Summary: |  | A student or recent graduate participating in on-the-job training. Providing opportunities to gain knowledge and experience in a field of study. Performs planned supervised work assignments. | Primary Duties: | Provide general operational and administrative support to the business unit or department. | Draft routine correspondence and create reports. | Answers and direct calls. | May schedule, plans, and coordinates meetings and events. | May audit system data for accuracy. | May coordinate with other departments on special projects. | Requirements: | A student currently working towards a degree in a Business discipline e.g. Business Management, Business Administration, etc., Human Resources, Marketing, Finance, Accounting, Public Relations, Corporate Communications, Information Technology, Supply Chain or other related discipline. |  | Preferred Qualifications: | Experience writing advanced SQL (or Python Pandas or Hadoop Hive) queries. | Ability to create compelling visualizations in software such as SAS Visual Analytics, Tableau or similar applications. | Ability to code in languages such as SAS, R or Python | Experience working with geospatial data | Ability to map business problems to analytical techniques and data. | Experience conducting financial analysis: NPV, Discounted Cash Flows, IRR, etc. | A high-degree of mathematical acumen; understanding of basic statistics. | Knowledge, Skills, and Abilities: | Intermediate skills and knowledge in use of personal computers and MS office products. | Strong listening skills; ability to take direction. | Abiility to follow established procedures to accomplish requirements of job. | Strong oral and written communicate skills. | Strong organizational skills; ability to manage multiple projects simultaneously and adhere to established timelines. | Proven time management skills. | Ability to develop and/or analyze reports. | Ability to conduct research through various methods. | Ability to maintain a high degree of confidentiality. | Ability to adapt quickly to a changing environment. | Strong analytical skills. | Ability to work effectively in a team environment. | Working Conditions: | Work is performed in an office environment utilizing office equipment including a computer, monitor, keyboard and mouse. | Work is often performed with short deadlines and may involve sensitive matters requiring objectivity and confidentiality. | May be required to work non-standard hours. | May travel occasionally to Company locations for meetings or events. | Special Safety Requirements: | All positions in which driving is an essential function of the job, regardless if the job code is marked safety sensitive or not, will also be included as safety sensitive. Individuals in positions in which driving is an essential function are subject to the terms and conditions set forth in OGE Energy Corp.'s Drug Testing Plan. | Required Skills | Required Experience
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Data Scientist Intern(Job Number: 331344) | Description | Chubb's Information Technology group manages all of our critical business systems and data including, for example, information about customers, rates, premium and expenses. Our Information Technology team is dedicated to creating competitive advantages in products, customer services and business costs by driving digital transformation in our business with a modernized focus on agile methodology and data analytics. | The IT department consists of a number of disciplines where we will have internship opportunities including Information Management, Project Management, Business Analysis, Infrastructure, Architecture, Development, Risk Management and Security and Compliance. The IT summer internship will run from June through August and is offered in several locations. It will provide the opportunity to participate in one of the disciplines and gain an awareness of the systems, projects and interactions in a corporate IT organization. | Position Summary: Data Science Intern / Data Engineering Intern | This internship provides university students with practical experience in Big Data. During the program, interns will be exposed to technologies and tools such as Python, R, SQL, Microsoft Azure, GitHub, Spark, Power BI/Qlik Sense. Interns will also explore data engineering and machine learning. We are looking for students who are quantitative, curious, collegial, and effective communicators. In exchange, we will offer challenging assignments, opportunities for networking, and formal and informal training. Sample deliverables include: | Business Intelligence dashboard development in Power BI or Qlik | ETL pipeline configuration using the Azure Cloud | Algorithm design and development to refine data processing | Optimization of digital campaigns using machine learning | Demonstration of statistical analysis project or application in front of executivesQualifications | Pursuit of a Bachelor’s Degree Mathematics, Statistics, Economics, Computer Science, or related major with a cumulative GPA of at least 3.0 | Strong verbal and written communication skills | Ability to work both independently and as part of a team | Ability to learn and add value in the assigned discipline and department | Exposure to the following programming languages: Python, R, SQL, C#, Scala, or Java | Ability to communicate effectively with non-technical audiences |  | EEO Statement | At Chubb, we are committed to equal employment opportunity and compliance with all laws and regulations pertaining to it. Our policy is to provide employment, training, compensation, promotion, and other conditions or opportunities of employment, without regard to race, color, religious creed, sex, gender, gender identity, gender expression, sexual orientation, marital status, national origin, ancestry, mental and physical disability, medical condition, genetic information, military and veteran status, age, and pregnancy or any other characteristic protected by law. Performance and qualifications are the only basis upon which we hire, assign, promote, compensate, develop and retain employees. Chubb prohibits all unlawful discrimination, harassment and retaliation against any individual who reports discrimination or harassment.Work Locations - Jersey City Jersey City 07302 |  | Job - Information Technology |  | Travel - No |  | Job Posting - Dec 5, 2019, 10:35:12 AMTRUE
## 3 Freeport-McMoRan (FCX) is a leading international mining company with headquarters in Phoenix, Arizona. FCX operates large, long-lived, geographically diverse assets with significant proven and probable reserves of copper, gold and molybdenum. | Freeport-McMoRan’s internship program has been referred to as one of the top programs in the mining industry. By providing access to top minds and technology in mining today, our structured internship will provide you the skills and experience to help prepare you for a successful career. Our internship program is tailored to full-time students currently enrolled at an accredited four-year university and recent graduates in North America. Internships are temporary full-time paid positions and typically run from May through August. | A foreign national may be considered for H1B visa sponsorship upon completion of one (1) year of service with consistently exceeded performance. | A foreign national may be considered for employment-based permanent residency sponsorship upon completion of (3) years of service with consistently exceeded performance expectations. | Description | Under general supervision, performs MIS related duties assigned per department. As such responsibilities are carried out; unscheduled and scheduled overtime may be required. Also may be required to work outside of normal working hours. You will be an instrumental member of our vision for analytics-driven mining at Freeport. We have gathered vast data sets from across the mining ecosystem by deploying sensor technology and the data infrastructure to support it. Now, we are in a position to use the richest data pool in our history to solve complex and meaningful problems, so our team is growing and seeking bright individuals eager to be a part of the journey | Data Analyst/Scientist Tasks Include: | Assist with the deployment of ensemble machine learning methods, deep learning, and advanced optimization techniques to drill into our toughest challenges | Leverage sharp research skills to test hypotheses and draw insights to support our goal of industry-leading, resource-efficient copper mining | Actively support the build, implementation, and maintenance of practical, analytical assets that effectively solve business problems using statistical methodologies | Utilize modern cloud technologies, such as Microsoft Azure, to deliver innovative analytics solutions at scale and visualize their impact for leadership | This internship will be located in Phoenix, Arizona. | Qualifications | Minimum Qualifications | Full-time student at the sophomore level or above currently enrolled at an accredited four year university majoring in an appropriate MIS, Computer Technology, Analytics or closely related field; OR | Recent college graduate having graduated within 12 months prior to internship start date majoring in MIS, Computer Technology, Analytics or closely related field | Proficient in the use of Microsoft Office applications including Outlook, Word, Excel, and PowerPoint | Possesses strong data analysis and problem solving skills | Highly accurate and detail oriented | Possesses strong oral and written communication skills | Demonstrates initiative, organizational skills and ability to work well under pressure | Demonstrates ability to collaborate and work effectively in a team environment | Must be able to demonstrate our core values of Safety, Respect, Integrity, Excellence and Commitment | Preferred | Analytics student who enjoys grappling with vast data sets (and has experience with SQL), tapped from our long-standing investment in collecting and cleaning data | Quantitative enthusiast who can build complex predictive models and unlock their value through Python, R and / or other rigorous analytical environments | Researcher with a background in breaking down problem statements into their root causes and tailoring solutions based off of your findings | Criteria/Conditions | Ability to understand and apply verbal and written work and safety-related instructions and procedures given in English | Ability to communicate in English with respect to job assignments, job procedures, and applicable safety standards | Must be able to work in a potentially stressful environment | Position is in busy, non-smoking office located in Phoenix, AZ | Location requires mobility in an office environment; each floor is accessible by elevator and internal staircase | Work is in an office, mine, or manufacturing plant setting, which may include exposure to extremes in temperature and humidity, moving mechanical parts, risk of electrical shock, toxic chemicals, explosives, fumes or airborne particles | While performing the duties of this job, the employee is regularly required to stand, sit, demonstrate manual dexterity, climb stairs and ladders, work on elevated platforms, talk, hear and see | Occasionally may be required to lift moderately heavy objects up to thirty (30) pounds during the course of the workday | Personal protective equipment is required when performing work in a mine, outdoor, manufacturing or plant environment, including hard hat, hearing protection, safety glasses, safety footwear, and as needed, respirator, rubber steel-toe boots, protective clothing, gloves and any other protective equipment as required | Freeport-McMoRan promotes a drug/ alcohol free work environment through the use of mandatory pre-employment drug testing and on-going random drug testing as per applicable state laws | Freeport-McMoRan has reviewed the jobs at its various office and operating sites and determined that many of these jobs require employees to perform essential job functions that pose a direct threat to the safety or health of the employees performing these tasks or others. Accordingly, the Company has designated the following positions as safety-sensitive: | Site-based positions, or positions which require unescorted access to site-based operational areas, which are held by employees who are required to receive MSHA, OSHA, DOT, HAZWOPER and/or Hazard Recognition Training; or | Positions which are held by employees who operate equipment, machinery or motor vehicles in furtherance of performing the essential functions of their job duties, including operating motor vehicles while on Company business or travel (for this purpose “motor vehicles” includes Company owned or leased motor vehicles and personal motor vehicles used by employees in furtherance of Company business or while on Company travel); or | Positions which Freeport-McMoRan has designated as safety sensitive positions in the applicable job or position description and which upon further review continue to be designated as safety-sensitive based on an individualized assessment of the actual duties performed by a specifically identified employee. | Equal Opportunity Employer/Protected Veteran/Disability
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Centauri is seeking intelligent, creative self-starters who are majoring in Statistics, Geographic Information Systems (GIS; GIS focus must be analytical), Data Science, Applied Math or related disciplines for internship opportunities. The successful candidate will be provided with hands-on experience working as a consultant to members of one of our teams. These will be full-time paid internship opportunities awarded on a competitive basis. The summer intern assignments will begin in late May/early June following the completion of the school year and last through mid-August. | Position Responsibilities: | Use technical proficient abilities to help develop prototype software applications | Participate in implementation and testing of new systems | Ability to interpret complex information in a non-technical and engaging manner | Collect data and conduct a thorough analysis using data science techniques | Identify problematic areas in data and conduct the necessary research to provide a lasting solution | Monitor data statistics to identify and interpret the trend in a set of complex data resulting in the findings of a pattern | Position Requirements / Selection Criteria: | Knowledge of Python, R, Django/Flask, R Shiny, GIS | Must have knowledge of spatial analysis, geospatial technology and statistical methods | Emphasis on emerging Juniors and Seniors | Must be working towards a technical degree in STEM | Emphasis on students with coursework in the areas of: Statistics, Geographic Information Systems (GIS; GIS focus must be analytical), Data Science, or Applied Math | Must work well both in a team environment, as well as independently | Must have strong research, writing and communication skills | Must have excellent computer skills and experience formatting, graphs, diagrams and tables | Must be a full-time student at an accredited, U.S. college or university | Must have demonstrated a high level of academic achievement | Previous internship experience highly desirable | Local candidates preferred | Security Requirements: Must meet eligibility requirements for a TS/SCI government security clearance.TRUE
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  At Uber, we ignite opportunity by setting the world in motion. We take on big problems to help drivers, riders, delivery partners, and eaters get moving in more than 600 cities around the world. | We welcome people from all backgrounds who seek the opportunity to help build a future where everyone and everything can move independently. If you have the curiosity, passion, and collaborative spirit, work with us, and let’s move the world forward, together. | About the Role | We’re looking for PhD intern candidates to join the Policy, Research & Economics team in Summer 2020 (3 months). Our team conducts research that forms the foundation for business decisions and our policy positions and helps policymakers, thought leaders, and the general public gain a better understanding of our platform. We seek candidates with a strong background in economics, transportation research, or other quantitative social sciences. | There are three subgroups that are hiring interns: | The Business Economics team uses research to inform business strategy, operations, and product decisions. For instance, we know that the flexible work model is very valuable to Uber drivers (see Chen et al., Angrist et al.) and that dynamic pricing is vital in protecting the health and efficiency of the dispatch market (see Castillo et al.); however, it’s likely that consistency (e.g., of pricing or earnings) also carries some value for riders and drivers. What values should we put on these opposing virtues, and how should we alter the product to reflect them? | The Policy Economics team has ongoing projects with leading academics on topics such as labor market dynamics, occupational licensing, and the economics of marketplaces. We conduct rigorous research that places our business and driver-partners in the context of the economies in which Uber operates. | The Mobility Research team conducts research on topics such as public transportation, congestion, environmental impact, micromobility, travel behavior, and transportation equity. We have ongoing projects with academics, conduct our own data investigations, and support the public policy and communications teams with other public-facing data analysis.What You’ll Do | Develop and execute a piece of independent research that could be used to inform business decisions or the policy conversation | Conduct rigorous, careful statistical and econometric analysis in support of our research priorities | Develop assets (maps, visuals etc.) that explain our research for policy and communications needs | Communicate cross-functionally to understand the intersection of policy, product, legal, and operations | Present your results internally; in some cases, your project may be developed into a public-facing piece of work | What You’ll Need | PhD student currently in your fourth year or above in Economics, Statistics, Public Policy, Urban Planning, Transportation Research, or other quantitative social science. | 2+ years of quantitative research or data science experience | Strong data skills and the ability to learn to use tools such as SQL, Python, R, and GIS mapping tools to work efficiently at scale | The capacity to work independently and execute a research plan with minimal oversight | The ability to organize and synthesize analyses and communicate data insights with clarity | Enthusiasm for learning and growth | About the Team | The Policy, Research and Economics team, led by the Chief Economist, is responsible for addressing the thorny policy and business challenges impacting Uber globally. As a data scientist on this team, you would be expected to bring analytical and empirical rigor to bear on important questions relating to economics, transportation policy, and urban planning.
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               TRUETRUETRUETRUEOur Team:TRUETRUETRUETRUEWhat you get to do:TRUETRUEYou will work to validate and improve automated algorithms which turn raw data with geospatial and text attributes into meaningful information. This data is then visualized on web-based maps that can be easily digested by a non-technical person. | Aggregate multiple proprietary and open-source datasets that will be inputs to train computer vision algorithms. | Utilize Python and packages to write code towards data wrangling, enrichment, analysis, and visualization of spatial data. | Identify and leverage opportunities to continually improve data quality, systems, processes, and standards. | You will collaborate with other data scientists, analysts, product managers, operations, and other departments to ensure products and technology meets internal standards. | Internship dates: If college student 16 December 2019 to 17 January 2020. If high school student 23 December 2019 to 3 January 2020. | TRUEYour background likely includes:TRUETRUETRUETRUEInterest in Data Science, Geographical Information Systems, Computer Science or equivalent work experience & analytical skills. | Experience with data analysis, quality, cleaning, and extraction/validation tools. | Experience with Python and associated packages for processing data. | Entry level SQL writing skills (Google BigQuery a plus but not necessary). | Experience with visualization tools like Periscope Data, Data Studio, Tableau, ArcGIS a plus. | Passion for learning and sharing knowledge with the team around you. | TRUEPress:TRUETRUEPRNewswire:TRUETRUETRUEThomson Reuters Foundation:TRUETRUETRUEThe Economist:TRUETRUETRUETechCrunch:TRUETRUETRUEThe New York Times:TRUETRUETRUEBuzzFeed:TRUETRUETRUEWashington Post:TRUETRUETRUEWired:

2.3 City data

2.3.1 2018 median income

The dataset is available in the Statista website, originated from US Census Bureau. If connected to campus Internet, we can download the excel file from the website for free.

The dataset is about the median household income for each city in 2018 (in U.S. dollars). There are 2 columns, city names and median household income, and there’re 50 records. Every data type of the variables is character.

There are two issues with this dataset. The city names do not match with the city names in the datasets we processed before, and the income column is not the numeric type, with dollar sign($) and comma (,) in it.

2.3.2 Population

The dataset is available in the United States Census Bureau website and we can download the excel file from the website for free.

The dataset is about the historical population for each city from 2010-2018 labeled by geographic location. There are 20 columns of unique identifiers and 9 columns of populations in each city (one column for one year). There’re 782 rows in the data. The id columns are character and the population columns are numeric.

There is one issue with this dataset. The city names also do not match with the city names in the previous dataset.

2.3.3 Violent crime

The dataset is available in the United States Census Bureau, Population Division and we can download the excel file from it.

The dataset is about the different crime rates for each city in 2019 labeled by geographic location. There are 10 columns of city names, and multiple columns showing crime rates of different crime types in each cities. There’re 9583 rows in the data. The location columns are character and the crime rate columns are numeric.

There are some issues with this dataset. The city names do not match with the city names in the previous dataset. In addition, this dataset is from 2017, not in line with other datasets.

2.3.4 CPI

The dataset is available in the United States Census Bureau website and we can download the excel file from the above link as long as our computers are connected to campus Internet.

The dataset is about the CPI (consumer price index) levels in different US cities. There are 2 columns of city names and CPI levels. There’re 22 rows in the dataset. The location column is character and the CPI column is numeric.

There is one issue with this dataset. The city names do not match with the city names in the previous dataset.