WEDNESDAY – NOVEMBER 29, 2023.

Temporal Analysis of Violations

A valuable perspective for examining the dataset involves conducting a temporal analysis of the recorded violations. This entails investigating how the frequency and characteristics of violations evolve over time. Grouping the data by inspection dates allows for the identification of trends in both compliance and non-compliance. For instance, one could explore whether certain types of violations are more prevalent during specific months or seasons. Additionally, delving into the time intervals between consecutive inspections for each establishment offers insights into the efficacy of corrective actions implemented by businesses. Visual tools such as line charts or heatmaps can effectively illustrate temporal patterns in violation occurrences.

MONDAY – NOVEMBER 27, 2023.

This week, I plan to analyze the dataset available at:

https://data.boston.gov/dataset/active-food-establishment-licenses

Approach 1 for Data Analysis: Inspection Results Overview

Within the dataset, which encompasses information about diverse food establishments, with a particular emphasis on restaurants, a thorough examination can reveal insights into their adherence to health and safety standards. The dataset comprises details like business names, license information, inspection results, and specific violations observed during inspections. One method of dissecting this information involves creating a comprehensive overview of inspection results for each establishment. This might entail computing the percentage of inspections resulting in a pass, fail, or other status. Furthermore, uncovering patterns in the types of violations documented and their occurrence across different establishments can offer valuable insights. Visual aids such as pie charts or bar graphs can effectively convey the distribution of inspection outcomes and the most frequently encountered violations.

FRIDAY – NOVEMBER 24,2023

Concluding my analysis of this dataset, I focused on Business Growth and Collaboration aspects.

To support business growth, it is essential to understand key factors such as business size, service offerings, and collaborative opportunities. Examining businesses like “IMMAD, LLC” in Forensic Science or “Sparkle Clean Boston LLC” in Clean-tech/Green-tech reveals specific niches with potential for growth. Strategically implementing targeted marketing and innovation in these areas can pave the way for expansion.

Furthermore, recognizing businesses open to collaboration is crucial for fostering a mutually beneficial environment. For example, “Boston Property Buyers” and “Presidential Properties,” both operating in Real Estate, present opportunities for collaborative ventures, shared resources, and a stronger market presence through strategic partnerships.

Lastly, businesses with no digital presence or incomplete information, indicated as “Not yet” and “N/A,” present opportunities for improvement. Implementing digital strategies, such as creating a website or optimizing contact information, can enhance visibility and accessibility, contributing to overall business success.

WEDNESDAY – NOVEMBER 22,2023

Continuing my analysis within the same dataset, I delved into Digital Presence and Communication aspects.

The dataset provides insights into businesses’ online presence, including websites, email addresses, and phone numbers. Understanding the digital landscape is crucial in today’s business environment. For example, businesses like “Boston Chinatown Tours” and “Interactive Construction Inc.” have websites, offering opportunities for digital marketing, customer engagement, and e-commerce. Assessing the effectiveness of these online platforms and optimizing them for user experience can enhance business visibility and interaction with customers.

Additionally, a critical aspect is the analysis of contact information, such as email addresses and phone numbers, which plays a vital role in communication strategies. Businesses like “Eye Adore Threading” and “Alexis Frobin Acupuncture” have multiple contact points, ensuring accessibility for potential clients. Employing data-driven communication strategies, such as email marketing or SMS campaigns, can contribute to improved customer engagement and retention.

Exploring the “Other Information” field, which indicates whether a business is “Minority-owned” or “Immigrant-owned,” can influence marketing narratives. Incorporating these aspects into digital communication can positively resonate with diverse audiences, fostering a sense of community and inclusivity.

MONDAY – NOVEMBER 20,2023

Today, I commenced the examination of a new dataset, available at https://data.boston.gov/dataset/women-owned-businesses, focusing on businesses’ key attributes such as Business Name, Business Type, Physical Location/Address, Business Zipcode, Business Website, Business Phone Number, Business Email, and Other Information. The initial step in data analysis involves categorizing businesses based on their types, facilitating a comprehensive understanding of the diverse industries represented. For instance, businesses like “Advocacy for Special Kids, LLC” and “HAI Analytics” fall under the Education category, while “Alexis Frobin Acupuncture” and “Eye Adore Threading” belong to the Healthcare sector. “CravenRaven Boutique” and “All Fit Alteration” represent the Retail industry, showcasing a variety of business types.

Following this, it is crucial to examine the geographical distribution of businesses. The physical locations and zip codes reveal clusters of businesses within specific regions, providing insights into the economic landscape of different areas. Businesses such as “Boston Sports Leagues” and “All Things Visual” in the 2116 zip code highlight concentrations of services in that region. Understanding the spatial distribution enables targeted marketing and resource allocation for business growth.

Moreover, analyzing the “Other Information” field, which includes details like “Minority-owned” and “Immigrant-owned,” offers valuable socio-economic insights. This information aids in identifying businesses contributing to diversity and inclusivity within the entrepreneurial landscape. Focusing on supporting minority and immigrant-owned businesses could be a strategic approach for community development and economic empowerment.

FRIDAY – NOVEMBER 17,2023

Upon reviewing the data for “Hyde Park” today, several data analysis techniques can be applied to gain insights into demographic trends across different decades. To begin with, a temporal trend analysis can be conducted to observe population changes over time, identifying peaks and troughs in each demographic category. For age distribution patterns, the use of bar charts would be effective in highlighting shifts in the population structure.

Moving on to educational attainment, trends can be visualized through pie charts or bar graphs, offering a clear understanding of changes in the level of education within the community. The nativity and race/ethnicity data can benefit from percentage distribution analysis, allowing for the tracking of variations in the composition of the population over the specified time periods.

For labor force participation rates, a breakdown by gender can be visualized to discern patterns in workforce dynamics. Utilizing pie charts or bar graphs for housing tenure analysis can reveal shifts in the proportion of owner-occupied and renter-occupied units, providing valuable insights into housing trends.

In summary, a combination of graphical representation and statistical measures will facilitate a comprehensive understanding of the demographic, educational, labor, and housing dynamics in Hyde Park over the specified decades.

WEDNESDAY – NOVEMBER 15,2023.

Today, I examined the second sheet, “Back Bay,” in the Excel file available at https://data.boston.gov/dataset/neighborhood-demographics. The dataset on Back Bay offers valuable insights into the neighborhood’s evolution across different decades, enabling a comprehensive analysis of various demographic aspects. Notable patterns include population fluctuations, showing a decline until 1990 followed by relative stability. The age distribution highlights shifts in the percentage of residents across different age groups, particularly a substantial increase in the 20-34 age bracket from 32% in 1950 to 54% in 1980. Educational attainment data displays changing proportions of individuals with varying levels of education, notably showcasing a significant rise in those with a Bachelor’s Degree or Higher from 20% in 1950 to 81% in 2010. Nativity data reveals fluctuations in the percentage of foreign-born residents, while the race/ethnicity distribution indicates a decrease in the white population and a rise in the Asian/PI category. Labor force participation demonstrates gender-based variations, and housing tenure data underscores changes in the ratio of owner-occupied to renter-occupied units. Overall, this dataset provides a nuanced understanding of the socio-demographic landscape in Back Bay over the decades.

MONDAY – NOVEMBER 13, 2023

I am presently examining the dataset on Analyze Boston, specifically concentrating on the “Allston” sheet within the “neighborhoodsummaryclean_1950-2010” Excel file, accessible at https://data.boston.gov/dataset/neighborhood-demographics. The dataset provides a thorough overview of demographic and socioeconomic trends in Allston spanning multiple decades. Notably, there is evident population growth from 1950 to 2010. The age distribution data reveals intriguing patterns, including shifts in the percentage of residents across various age groups over the years. Educational attainment data reflects changes in the population’s education levels, notably showcasing a significant increase in the percentage of individuals holding a Bachelor’s degree or higher. The nativity data sheds light on the proportion of foreign-born residents, indicating shifts in immigration patterns. Changes in the racial and ethnic composition are apparent, with a declining percentage of White residents and an increase in Asian/PI residents. The labor force participation data by gender is noteworthy, illustrating fluctuations in male and female employment rates. Housing tenure data suggests a rise in the number of renter-occupied units over the years. Potential data analysis avenues may involve exploring correlations between demographic shifts, educational attainment, and housing tenure to gain deeper insights into the socio-economic dynamics of Allston.

FRIDAY – NOVEMBER 10,2023

I loaded police shooting data from an Excel file into a Pandas DataFrame for today’s research with the goal of examining how police use of force, both justified and unjustified, varies among various racial groups. I also focused on occurrences involving both men and women. In order to do this, I created a function that assessed the justification for using force in relation to the different threat classifications and weaponry. After that, I used this function on the dataset to add a new column that stated the force’s explanation. I then narrowed down the data to only include incidents that involved people who were Asian, White, Black, or Hispanic. I computed the frequencies and percentages of “False” justified force situations for each race after separating the data by gender. I made bar plots with Seaborn and Matplotlib to show these percentages for incidences involving men and women. As seen in the produced bar graphs, the analysis sheds light on potential differences in how different racial and gender groups see the legitimacy of the use of police force.

WEDNESDAY – NOVEMBER 8,2023.

Import the “Counter” class from the “collections” module, which is used to count the frequency of words.

Define the column names you want to analyze:
Create a list named “columns_to_analyze” containing the names of the columns you want to analyze for word frequencies.In this code, the specified columns are ‘threat_type,’ ‘flee_status,’ ‘armed_with,’ and ‘body_camera.’

Specify the file path to your Excel document:
Set the “directory_path” variable to specify the file path to the Excel file we want to analyze.
Load your data into a data frame:
Use the pd.read_excel function to read the data from the Excel file specified by “directory_path” into a Pandas DataFrame named ‘df.’

Initialize a dictionary to store word counts for each column:
Create an empty dictionary named “word_counts” to store the word counts for each specified column.
Iterate through the specified columns:
Use a for loop to iterate through each column name specified in the “columns_to_analyze” list.
Retrieve and preprocess the data from the column:
Within the loop, retrieve the data from the current column using “df[column_name].” Convert the data to strings using “.astype(str)” to ensure a consistent data type, and store it in the “column_data” variable.

Tokenize the text and count the frequency of each word:
Tokenize the text within each column using the following steps:
Join all the text in the column into a single string using ‘ ‘.join(column_data).
Split the string into individual words using .split(). This step prepares the data for word frequency counting.
Use the “Counter” class to count the frequency of each word in the “words” list and store the results in the “word_counts” dictionary under the column name as the key.

Print the words and their frequencies for each column:
After processing all specified columns, iterate through the “word_counts” dictionary.
For each column, print the column name, followed by the individual words and their counts. This information is used to display the word frequencies for each specified column.

MONDAY – NOVEMBER 6,2023.

1. Import the necessary libraries: Import the “pandas” library and assign it the alias ‘pd’ for data manipulation. Import the “matplotlib. pyplot” library and assign it the alias ‘plt’ for data visualization.

2. Load the Excel file into a DataFrame: Specify the file path to the Excel file that you want to load (update this path to your Excel file’s location).
Specify the name of the sheet within the Excel file from which data should be read. Use the pd.read_excel function to read the data from the Excel file into a Pandas DataFrame named ‘df.’

3. Drop rows with missing ‘race,’ ‘age,’ or ‘gender’ values: Remove rows from the DataFrame where any of these three columns (race, age, gender) have missing values.

4. Create age groups: Define the boundaries for age groups using the ‘age_bins’ variable. Provide labels for each age group, corresponding to ‘age_bins,’ using the ‘age_labels’ variable.

5. Cut the age data into age groups for each race category:Create a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’

6. Count the number of individuals in each age group by race and gender:Group the data by race, gender, and age group. Count the number of individuals in each combination.Use the unstack() function to reshape the data, making it more suitable for visualization. Fill missing values with 0 using fill (0).

7. Calculate the median age for each race and gender combination: Group the data by race and gender. Calculate the median age for each combination.

8. Print the median age for each race and gender combination: Print a header indicating “Median Age by Race and Gender.” Print the calculated median age for each race and gender combination.

9. Create grouped bar charts for different genders: The code iterates over unique gender values in the DataFrame.

10. For each gender: Subset the DataFrame to include only data for that gender. Create a grouped bar chart that displays the number of individuals in different age groups for each race-gender combination.
Set various plot properties such as the title, labels, legend, and rotation of x-axis labels. Display the plot using plt. show().

FRIDAY – NOVEMBER 3,2023.

Import the necessary libraries:

Import pandas as pd: Imports the Pandas library and assigns it the alias ‘pd’.
import matplotlib. pyplot as plt: Imports the Matplotlib library, specifically the ‘pyplot’ module, and assigns it the alias ‘plt’. Matplotlib is used for creating plots and visualizations.

Load the Excel file into a Data Frame:
Directory_path: Specify the file path to the Excel file you want to load. Make sure to update this path to the location of your Excel file.
sheet_name: Specifies the name of the sheet within the Excel file from which data should be read.
df = pd.read_excel(directory_path, sheet_name=sheet_name): Uses the pd.read_excel function to read the data from the Excel file into a Pandas DataFrame named ‘df’.

Calculate the median age of all individuals:

Median_age = df[‘age’].median(): Calculates the median age of all individuals in the ‘age’ column of the DataFrame and stores it in the ‘median_age’ variable.
print(“Median Age of All Individuals:”, median_age): Prints the calculated median age to the console.

Create age groups:

age_bins: Defines the boundaries for age groups. In this case, individuals will be grouped into the specified age ranges.
age_labels: Provides labels for each age group, corresponding to the ‘age_bins’.

Cut the age data into age groups:
df[‘Age Group’] = pd. cut(df[‘age’], bins=age_bins, labels=age_labels): Creates a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’

Count the number of individuals in each age group:
age_group_counts = df[‘Age Group’].value_counts().sort_index(): Counts the number of individuals in each age group and sorts them by the age group labels. The result is stored in the ‘age_group_counts’ variable.

Create a bar graph to analyze age groups:
plt. figure(figsize=(10, 6): Sets the size of the figure for the upcoming plot.
age_group_counts.plot(kind=’bar’, color=’skyblue’): Plots a bar graph using the ‘age_group_counts’ data, where each bar represents an age group. ‘skyblue’ is the color of the bars.
plt. title(‘Age Group Analysis’): Sets the title of the plot.
plt.xlabel(‘Age Group’): Sets the label for the x-axis.
plt.ylabel(‘Number of Individuals’): Sets the label for the y-axis.
plt.xticks(rotation=45): Rotates the x-axis labels by 45 degrees for better readability.
plt.show(): Displays the bar graph on the screen.

WEDNESDAY – NOVEMBER 1,2023.

Consequence the fundamental libraries:

The code imports the “pandas” library for information investigation and the “Counter” course from the “collections” module for tallying components in a list.

Specify the columns to be analyzed:
The code indicates the names of the columns you need to analyze from an Exceed Expectations record. These columns contain data such as “threat_type,” “flee_status,” “armed_with,” and others.

Set the record way to the Exceed Expectations document:
The code sets the record way to the area of your Exceed expectations record. You ought to supplant this way with the real way to your Exceed Expectations file.

Load the information from the Exceed expectations record into a DataFrame:
The code employments the “pd.read_excel” work to stack the information from the Exceed expectations record into a Pandas DataFrame, which may be a table-like structure for data.

Initialize a word reference for word counts:
The code initializes a lexicon called “word_counts” to store word frequencies for each of the desired columns. Each column will have its claim word recurrence counts.

Process each indicated column:
For each column indicated for examination, the code performs the following steps:
It recovers the information from that column and changes it to strings to guarantee uniform data type. This can be imperative for content processing.
It tokenizes the content within the column by breaking it into personal words. Tokenization is the method of part content into smaller units, such as words or phrases.
It tallies how numerous times each word shows up in that column utilizing the “Counter” lesson, and these word counts are put away within the “word_counts” word reference beneath the column’s name.

Print the words and their frequencies:
Finally, the code goes through the “word_counts” lexicon for each indicated column and shows the words and how numerous times they appear in that column. This gives bits of knowledge into the foremost common words or expressions in each column.

MONDAY – OCTOBER 30,2023

Data collection:
Gather geographic information about police stations, including latitude and longitude coordinates. Precise location data is critical for subsequent analysis.
Calculating distance:
Use the obtained coordinates to calculate the distance between police stations. The goal of this step is to understand the spatial distribution and extent of law enforcement within the region.
Demographic analysis:
Analyze race, age, and shooting data. Identify areas with the highest frequency of shootings. This analysis helps identify potential hotspots.
Proximity analysis:
Find out how far the shooting incident occurred from the police station. This analysis provides insight into response times and areas where increased law enforcement may be required.
Segment your data:
Split the data into a training set and a test set. Consider the distribution of the population to ensure that your model is representative and can make accurate predictions and classifications.