In dit labo maken we een verkennende analyse open data van Uber. Terloops maken we kennis met de Pandas en Plotly bibliotheken.
import os
import kagglehub
import pandas as pd
import plotly.express as px
# Download latest version
path = kagglehub.dataset_download("gauravduttakiit/uber-pickups-in-ny-city")
# Load data into Pandas DataFrame
csv_file = os.path.join(path, "uber-pickups-in-new-york-city", "uber-raw-data-apr14.csv")
df = pd.read_csv(csv_file)
print("โ
Data loaded successfully!")โ
Data loaded successfully!
df.head()Loading...
df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 564516 entries, 0 to 564515
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date/Time 564516 non-null object
1 Lat 564516 non-null float64
2 Lon 564516 non-null float64
3 Base 564516 non-null object
dtypes: float64(2), object(2)
memory usage: 17.2+ MB
df["Date/Time"].dtypedtype('O')# Convert "Date/Time" column to datetime type
df["Date/Time"] = pd.to_datetime(df["Date/Time"])
df["Date/Time"].dtypedtype('<M8[ns]')# Plot a heatmap of the Uber pickups on a specific date
target_date = pd.to_datetime("2014-04-01").date()
fig = px.density_map(
df.loc[df["Date/Time"].dt.date == target_date, :], # April 1, 2014
lat="Lat",
lon="Lon",
radius=10,
map_style="open-street-map",
)
fig.show()Loading...