WEDNESDAY – NOVEMBER 8,2023.

Import the “Counter” class from the “collections” module, which is used to count the frequency of words.

Define the column names you want to analyze:
Create a list named “columns_to_analyze” containing the names of the columns you want to analyze for word frequencies.In this code, the specified columns are ‘threat_type,’ ‘flee_status,’ ‘armed_with,’ and ‘body_camera.’

Specify the file path to your Excel document:
Set the “directory_path” variable to specify the file path to the Excel file we want to analyze.
Load your data into a data frame:
Use the pd.read_excel function to read the data from the Excel file specified by “directory_path” into a Pandas DataFrame named ‘df.’

Initialize a dictionary to store word counts for each column:
Create an empty dictionary named “word_counts” to store the word counts for each specified column.
Iterate through the specified columns:
Use a for loop to iterate through each column name specified in the “columns_to_analyze” list.
Retrieve and preprocess the data from the column:
Within the loop, retrieve the data from the current column using “df[column_name].” Convert the data to strings using “.astype(str)” to ensure a consistent data type, and store it in the “column_data” variable.

Tokenize the text and count the frequency of each word:
Tokenize the text within each column using the following steps:
Join all the text in the column into a single string using ‘ ‘.join(column_data).
Split the string into individual words using .split(). This step prepares the data for word frequency counting.
Use the “Counter” class to count the frequency of each word in the “words” list and store the results in the “word_counts” dictionary under the column name as the key.

Print the words and their frequencies for each column:
After processing all specified columns, iterate through the “word_counts” dictionary.
For each column, print the column name, followed by the individual words and their counts. This information is used to display the word frequencies for each specified column.

Leave a Reply

Your email address will not be published. Required fields are marked *