The de-facto library for data manipulation and analysis in Python
Pandas
to measure the growth of IDXSample Output
DataFrame
object from a CSV file hosted remotely on GitHub.
We then plot the market capitalization data using the plot()
method from Pandas,
and display the plot using plt.show()
from Matplotlib. With pandas
, we use the
square bracket notation bbri_nasabah[]
to access columns in the DataFrame.
df['market_cap']
or df.market_cap
selects the ‘market_cap’ column from the DataFrame.
df['column name']
to access the column,
but outside of those cases, you can use df.column_name
as well.df[['currency', 'market_cap']]
.Series
object in return.
This is in constrast to selecting multiple columns, which returns another DataFrame with
the selected columns.
loc
and iloc
accessors. To select
a single row, you can use the loc
accessor with the row index label. Just like indexing
a single column, this returns a Series object.
iloc
accessor to select rows by their integer index.
loc
or iloc
accessors, respectively.
drop()
method. The drop()
method returns a new DataFrame with the specified rows or columns removed.
pandas
DataFrames Attributeshape
attribute. The shape
attribute returns a tuple representing the
dimensions of the DataFrame, with the first element being the number of rows and
the second element being the number of columns.
columns
, which returns the column labels of the DataFrame.
dtypes
attribute returns the data types of each column in the DataFrame.
.values
attribute returns the underlying NumPy array of the DataFrame.
This could be useful when you need to perform operations that are easier with NumPy
arrays, such as matrix multiplication or reshaping.
The .T
attribute returns the transpose of the DataFrame, which swaps the rows and columns,
making it more readable when you have a large number of columns relative to rows.
.index
, which provides the row labels of the
DataFrame. This index is like a list of row labels. In spreadsheets, this would be
the row numbers on the leftmost column. When we read in a CSV file, the index is
automatically generated as a RangeIndex
object, unless we specify a column to be the index
through index_col
parameter in pd.read_csv()
.
.drop()
method, remember to now use the new index (row name)
in the drop()
method.
.drop()
operations together, but remember that the DataFrame is not modified in place
unless you reassign it to the same variable.
RangeIndex
, you can use the reset_index()
method.
lambda
functions a great deal when starting out
with Python and Pandas, but it’s good to know that they exist and can be used
to write concise code.
.apply(lambda x: ...)
: Applies a function to each element of the column
lambda x: 'positive' if x > 0 else 'negative'
df['change_direction']
: Creates a new column in the DataFrame named change_direction
and
assigns the results of the .apply()
method to this column.
pandas
to read in data from CSV files and Excel spreadsheets.loc
and iloc
accessors to select rows in a DataFrame.drop()
method.shape
, columns
, dtypes
, values
, T
, and index
.&
(and) and |
(or).
idxmin()
method to get the index of the minimum value.
Similarly, the idxmax()
method can be used to get the index of the maximum value.
head()
: Returns the first n
rows of the DataFrame.tail()
: Returns the last n
rows of the DataFrame.describe()
: Generates descriptive statistics that summarize the central tendency, dispersion, and shape of a dataset’s distribution.info()
: Prints a concise summary of a DataFrame, including the index dtype and column dtypes, non-null values, and memory usage.unique()
: Returns unique values in a column.value_counts()
: Returns a Series containing counts of unique values.sort_values()
: Sorts the DataFrame by the values along either axis.dropna()
: Removes missing values from the DataFrame, either by dropping rows (axis=0
) or columns (axis=1
).value_counts()
and sort_values()
on the DataFrame that
we’ve been working with.
groupby()
method to group data based on one or more columns and then apply an aggregation function to each group.
A common aggregation function is mean()
, which calculates the average value for each group.
matplotlib
installed, you can visualize the results using a bar plot.
Sample Output
value_counts()
and sort_values()
.groupby()
.currency
) and rows (29) for the exercises.Graded Quiz
market_cap_change_%
column with (-0.5)
used in the condition.