Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
The provided text is a comprehensive overview of the Kashmir conflict, extracted from a Wikipedia article. It covers the historical background, including the partition, wars between India and Pakistan, and UN mediation attempts. The resource examines internal conflicts, political movements, the rise of separatism, and human rights abuses in the region. The text presents the national stances of India, Pakistan, China, and Kashmiri people regarding the region, alongside efforts to resolve the ongoing dispute. Furthermore, the resource explores Pakistan’s relationship with militants, Al-Qaeda’s involvement, and other recent developments, such as the revocation of Kashmir’s special status by India.
The Kashmir Conflict: A Study Guide
Quiz
Instructions: Answer the following questions in 2-3 sentences each.
What was the primary reason given by Maharaja Hari Singh for initially choosing to remain independent in 1947?
What was the “Instrument of Accession” and what did it entail?
What is the “Dixon Plan” and why did it ultimately fail?
Describe Nehru’s initial stance on a plebiscite in Kashmir and how it changed over time.
What was “Operation Gibraltar” and what was its goal?
What was the initial objective of the Jammu Kashmir Liberation Front (JKLF) and how did other groups change the dynamic of the conflict?
What is Article 370 of the Indian constitution, and what did it grant to Jammu and Kashmir?
Briefly describe the events of the 1989 popular insurgency and militancy.
What roles are associated with the Inter-Services Intelligence (ISI)?
What impact has militancy had on the demographics of the Kashmir Valley?
Quiz Answer Key
Maharaja Hari Singh chose to remain independent initially because he believed that the State’s Muslims would be unhappy with accession to India, and the Hindus and Sikhs would become vulnerable if he joined Pakistan. He hoped to maintain peace and stability within his diverse kingdom.
The Instrument of Accession was a legal document signed by Maharaja Hari Singh in October 1947, acceding the State of Jammu and Kashmir to the Union of India. In exchange for military assistance, the Maharaja transferred control of defense, external affairs, and communications to India.
The Dixon Plan was proposed by UN mediator Sir Owen Dixon, suggesting a plebiscite be limited to the Kashmir Valley while recognizing the pro-India sentiments in Jammu and Ladakh, and pro-Pakistan sentiments in Azad Kashmir and the Northern Areas. It failed because Pakistan believed that India’s commitment to a plebiscite for the whole state should not be abandoned, and India rejected the plan and wanted to keep troops in Kashmir for security purposes.
Nehru initially offered a plebiscite after law and order were restored in Kashmir, promising the people the right to decide their future. However, his stance evolved, and he later withdrew the plebiscite offer, primarily due to Pakistan’s military pact with the United States and skepticism about the plebiscite’s wisdom and practicality.
Operation Gibraltar was a covert operation launched by Pakistan in 1965, involving the infiltration of Pakistani soldiers and irregulars into Indian-administered Kashmir. The goal was to incite a local rebellion and destabilize the region, leading to its annexation by Pakistan.
The JKLF initially aimed for the complete independence of the former princely state of Jammu and Kashmir from both India and Pakistan. Later, other groups like Hizbul Mujahideen, supported by Pakistan, emerged with the goal of merging with Pakistan and introducing an Islamist dimension to the conflict.
Article 370 granted special autonomous status to the state of Jammu and Kashmir within the Indian constitution. It specified that the State must concur in the application of laws by the Indian parliament, except those that pertain to Communications, Defence and Foreign Affairs.
The 1989 insurgency erupted in the Indian-administered Kashmir Valley due to years of political disenfranchisement, alienation, and with logistical support from Pakistan. This insurgency was driven by separatist sentiments and led to widespread violence and displacement.
The Inter-Services Intelligence (ISI) is Pakistan’s intelligence agency and it has been accused of having provided weapons, training, advice, and planning assistance to militant outfits operating in Jammu and Kashmir, especially in the 1990s and early 2000s. Some believe that the ISI was also coordinating the shipment of arms from the Pakistani side of Kashmir to the Indian side.
The militancy in Kashmir resulted in the exodus of Kashmiri Hindus (Pandits) from the predominantly Muslim Kashmir Valley in the early 1990s, significantly altering the region’s demographic composition. A minimum of 506,000 people in the Indian-administered Kashmir Valley are internally displaced due to militancy in Kashmir, about half of whom are Hindu pandits.
Essay Questions
Instructions: Choose one of the following questions and write a well-organized essay addressing the prompt, using evidence from the source material.
Analyze the role of external actors, specifically Pakistan and the United Nations, in shaping the trajectory of the Kashmir conflict.
Discuss the evolution of Kashmiri identity and political movements from the Dogra rule to the rise of separatism in the late 20th century.
Evaluate the arguments for and against holding a plebiscite in Kashmir, considering the historical context and contemporary views.
Explore the human rights abuses committed by both state and non-state actors in the Kashmir conflict, and their impact on the civilian population.
Assess the significance of the revocation of Article 370 in 2019 and its potential implications for the future of the region.
Glossary of Key Terms
Instrument of Accession: A legal document signed by Maharaja Hari Singh in 1947, acceding the State of Jammu and Kashmir to India.
Plebiscite: A direct vote by eligible voters to decide on an important question, such as sovereignty or political status.
Line of Control (LOC): The de facto border between Indian-administered and Pakistani-administered Kashmir, established after the Indo-Pakistani War of 1947.
Article 370: A provision in the Indian constitution that granted special autonomous status to Jammu and Kashmir, allowing it to have its own constitution and laws.
Azad Kashmir: A region administered by Pakistan, also known as Pakistan-administered Kashmir.
Gilgit-Baltistan: A region administered by Pakistan, formerly known as the Northern Areas.
Kashmiri Pandits: A Hindu minority community native to the Kashmir Valley, many of whom were displaced due to militancy.
Militancy/Insurgency: Armed resistance or rebellion against a government or authority, often involving guerilla warfare tactics.
Inter-Services Intelligence (ISI): The primary intelligence agency of Pakistan.
Dogra Dynasty: The Hindu dynasty that ruled the princely state of Jammu and Kashmir from 1846 to 1947.
National Conference: A major political party in Jammu and Kashmir, initially led by Sheikh Abdullah.
Muslim Conference: A political party in Jammu and Kashmir that advocated for the rights of Muslims and later supported accession to Pakistan.
Operation Gibraltar: A covert operation launched by Pakistan in 1965, involving the infiltration of Pakistani soldiers into Indian-administered Kashmir.
UN Resolutions: Resolutions passed by the United Nations Security Council regarding the Kashmir dispute, calling for a plebiscite and peaceful resolution.
Jihad: A religious term referring to a struggle or striving, often interpreted as a holy war by some Islamist groups.
Razakars: Volunteers.
Mujahideen: Guerrilla fighters in Islamic countries, especially those who are fighting against non-Muslim forces.
Aburi Hakoomat: Provisional government.
Sadr-i-Riyasat: Constitutional Head of State.
The Kashmir Conflict: Historical Background, Stances, and Key Issues
Kashmir Conflict: Briefing Document
This document provides a briefing on the Kashmir conflict based on the provided Wikipedia excerpt. It covers the historical background, key events, national stances, and ongoing issues related to this protracted dispute.
I. Historical Background and Key Events:
Princely State and Partition: From 1846 to 1947, Kashmir was a princely state ruled by the Dogra dynasty under British paramountcy. “According to the 1941 census, the state’s population was 77 percent Muslim, 20 percent Hindu and 3 percent others (Sikhs and Buddhists).” Despite the Muslim majority, the Hindu Maharaja Hari Singh initially chose to remain independent after the partition of India and Pakistan.
Accession to India: Faced with a tribal invasion from Pakistan in 1947, the Maharaja signed the Instrument of Accession to India. “Accordingly, the Maharaja signed an instrument of accession on 26 October 1947, which was accepted by the Governor General the next day.” India accepted the accession but with the “proviso that it would be submitted to a ‘reference to the people’ after the state is cleared of the invaders.”
Indo-Pakistani Wars: The conflict triggered the first Indo-Pakistani War in 1947. Further wars in 1965 and 1971, and the Kargil conflict in 1999, were also linked to the Kashmir dispute.
Internal Conflict and Insurgency: “In 1989, an armed insurgency erupted against Indian rule in Indian-administered Kashmir Valley, after years of political disenfranchisement and alienation, with logistical support from Pakistan.” This insurgency, initially driven by Kashmiri separatists, was later fueled by Pakistan-backed Jihadist groups. The insurgency led to the exodus of Kashmiri Hindus (Pandits) and increased militarization in the region.
Article 370: The Indian Constitution included Article 370, granting special autonomous status to Jammu and Kashmir. This article has been a point of contention, with some advocating for its abrogation and full integration of Kashmir into India. It specified that the State must concur in the application of laws by Indian parliament, except those that pertain to Communications, Defence and Foreign Affairs. Central Government could not exercise its power to interfere in any other areas of governance of the state.
Post 2000s: “The 2010s were marked by civil unrest within the Kashmir Valley, fuelled by unyielding militarisation, rights violations, mis-rule and corruption,” demonstrating the ongoing tensions. Further unrest in the region erupted after the 2019 Pulwama attack.
II. National Stances:
India: Considers Kashmir an “integral part” of India, based on the Instrument of Accession. India does not accept the two-nation theory and considers that Kashmir, despite being a Muslim-majority region, is in many ways an “integral part” of secular India. Willing to grant autonomy within the Indian constitution if there was consensus among political parties on this issue.
Pakistan: Maintains that Kashmir is a disputed territory and the “jugular vein of Pakistan”, whose final status should be determined by the Kashmiri people. Pakistan’s claims to the disputed region are based on the rejection of Indian claims to Kashmir, namely the Instrument of Accession.
China: China has a secondary role, controlling Aksai Chin and the Shaksgam Valley.
Kashmiri Views: A significant portion of Kashmiris desire independence or accession to Pakistan, while others support remaining with India with greater autonomy.
III. UN Involvement and Settlement Formulas:
UN Mediation: The UN has been involved since 1948, passing resolutions calling for a plebiscite to determine the future of Kashmir. The UNCIP appointed its successor, Sir Owen Dixon, to implement demilitarisation prior to a statewide plebiscite.
Dixon Plan: Sir Owen Dixon proposed that a plebiscite be limited to the Valley, agreeing that people in Jammu and Ladakh were clearly in favor of India; equally clearly, those in Azad Kashmir and the Northern Areas wanted to be part of Pakistan. Pakistan did not accept this plan because it believed that India’s commitment to a plebiscite for the whole state should not be abandoned.
Contemporary Views: The article notes that, many neutral parties to the dispute have noted that the UN resolution on Kashmir is no longer relevant.
IV. Pakistan’s Relation with Militants:
Support for Militancy: Several sources, including Pakistani officials, acknowledge Pakistan’s support for militant groups operating in Kashmir. “In 2009, the President of Pakistan Asif Zardari asserted at a conference in Islamabad that Pakistan had indeed created Islamic militant groups as a strategic tool for use in its geostrategic agenda and ‘to attack Indian forces in Jammu and Kashmir.’”
ISI Involvement: The British Government have formally accepted that there is a clear connection between Pakistan’s Inter-Services Intelligence (ISI) and three major militant outfits operating in Jammu and Kashmir, Lashkar-e-Tayiba, Jaish-e-Mohammed and Harkat-ul-Mujahideen, provided with “weapons, training, advice and planning assistance”.
V. Human Rights Abuses:
Indian-Administered Kashmir: Accusations of human rights violations by Indian security forces, including extrajudicial killings, arbitrary arrests, and sexual violence. Popular perception holds that the Indian Armed Forces are more to blame for human rights violations than the separatist groups.
Pakistan-Administered Kashmir: Concerns regarding political freedoms, electoral credibility, and the status of women. “UNCR reports that the status of women in Pakistani-administered Kashmir is similar to that of women in Pakistan. They are not granted equal rights under the law, and their educational opportunities and choice of marriage partner remain ‘circumscribed’”.
Gilgit-Baltistan: The main demand of the people of Gilgit-Baltistan is constitutional status for the region as a fifth province of Pakistan. “Almost six decades after Pakistan’s independence, the constitutional status of the Federally Administered Northern Areas (Gilgit and Baltistan), once part of the former princely state of Jammu and Kashmir and now under Pakistani control, remains undetermined, with political autonomy a distant dream.”
VI. Key Issues and Themes:
Self-determination vs. Territorial Integrity: The conflict revolves around the Kashmiri people’s right to self-determination versus India’s claim to territorial integrity.
Role of Pakistan: Pakistan’s involvement, both overt and covert, has significantly shaped the conflict.
Human Rights: The conflict has resulted in widespread human rights abuses on both sides.
Regional Instability: The Kashmir dispute remains a major source of tension between India and Pakistan, with the potential to escalate into larger conflicts.
This briefing provides a foundation for understanding the complexities of the Kashmir conflict. Further research into specific events, political figures, and socio-economic factors is recommended for a more comprehensive analysis.
The Kashmir Conflict: Frequently Asked Questions
Frequently Asked Questions: The Kashmir Conflict
What are the historical roots of the Kashmir conflict?
The Kashmir conflict stems from the 1947 partition of British India into India and Pakistan. Princely states were given the choice to join either nation or remain independent. Jammu and Kashmir, a princely state with a Muslim-majority population ruled by a Hindu Maharaja, Hari Singh, initially chose to remain independent. However, an invasion by Pakistani tribesmen, combined with internal revolts, led the Maharaja to accede to India in October 1947. This accession is disputed by Pakistan, which argues that the Maharaja was an unpopular ruler who used force to suppress the Kashmiri population. The conflict has its origins in the tensions surrounding Partition, the indecision of the Maharaja, and the competing claims of India and Pakistan over the region. Also of note is that From 1846 till the 1947 partition of India, Kashmir was ruled by maharajas of Gulab Singh’s Dogra dynasty, as a princely state under British Paramountcy. The British Raj managed the defence, external affairs, and communications for the princely state and stationed a British Resident in Srinagar to oversee the internal administration. According to the 1941 census, the state’s population was 77 percent Muslim, 20 percent Hindu and 3 percent others (Sikhs and Buddhists).[ 56 ] Despite its Muslim majority, the princely rule was an overwhelmingly a Hindu-dominated state.[ 57 ] The Muslim majority suffered under the high taxes of the administration and had few opportunities for growth and advancement.
What is the Instrument of Accession and why is it significant?
The Instrument of Accession is the legal document signed by Maharaja Hari Singh in October 1947, acceding the state of Jammu and Kashmir to India. India considers this document the legal basis for its claim over Kashmir. The Indian government accepted the accession but stated that it would be submitted to a “reference to the people” (a plebiscite) after the state was cleared of invaders. Pakistan disputes the validity of the Instrument of Accession, arguing that it was obtained through “fraud and violence” and did not reflect the will of the Kashmiri people. They insist states should accede according to their majority population. The instrument, and its contested validity, remain central to the dispute.
What are the main positions of India and Pakistan regarding Kashmir?
India considers Kashmir an integral part of India by virtue of the Instrument of Accession. While willing to grant autonomy to the region, India rejects any external interference and views cross-border militancy as terrorism sponsored by Pakistan. Pakistan maintains that Kashmir is a disputed territory whose final status must be determined by the Kashmiri people through a plebiscite, citing the UN resolutions and the initial Indian promise of a reference to the people. Pakistan accuses India of human rights abuses and suppression of the Kashmiri population. They insist that the Maharaja was not a popular leader, and was regarded as a tyrant by most Kashmiris. Pakistan maintains that the Maharaja used brute force to suppress the population.
What role have UN resolutions played in the Kashmir conflict?
The United Nations has passed several resolutions on Kashmir, primarily calling for a plebiscite to determine the future of the region. These resolutions are based on the premise of allowing the Kashmiri people to exercise their right to self-determination. However, due to disagreements between India and Pakistan over the conditions for holding a plebiscite, such as troop withdrawal, the resolutions have never been implemented. Some argue that these resolutions are no longer relevant, as the conditions for a free and fair plebiscite can no longer be met.
What has been the impact of militancy and insurgency in Kashmir?
In 1989, an armed insurgency erupted against Indian rule in Indian-administered Kashmir Valley, after years of political disenfranchisement and alienation, with logistical support from Pakistan. The insurgency was actively opposed in Jammu and Ladakh, where it revived long-held demands for autonomy from Kashmiri dominance and greater integration with India. Spearheaded by a group seeking creation of an independent state based on demands for self-determination, the insurgency was taken over within the first few years of its outbreak by Pakistan-backed Jihadist groups striving for merger with Pakistan. The militancy has resulted in tens of thousands of casualties, including both combatants and civilians. It has also led to human rights abuses by both state and non-state actors, including extrajudicial killings, torture, and sexual violence. The militancy also resulted in the exodus of Kashmiri Hindus (Pandits) from the predominantly Muslim Kashmir Valley in the early 1990s. Counterinsurgency by the Indian government was coupled with repression of the local population and increased militarisation of the region, while various insurgent groups engaged in a variety of criminal activity. The 2010s were marked by civil unrest within the Kashmir Valley, fuelled by unyielding militarisation, rights violations, mis-rule and corruption, wherein protesting local youths violently clashed with Indian security forces, with large-scale demonstrations taking place during the 2010 unrest triggered by an allegedly staged encounter, and during the 2016 unrest which ensued after the killing of a young militant from a Jihadist group, who had risen to popularity through social media. Further unrest in the region erupted after the 2019 Pulwama attack.
What is the role of external actors, particularly China and the United States, in the Kashmir conflict?
China has a secondary role in the Kashmir conflict, administering the Aksai Chin region and the Shaksgam Valley, both of which are claimed by India. China officially opposes “unilateral actions” to resolve the Kashmir issue. The United States, while not directly involved in the dispute, has urged India and Pakistan to seek a bilateral solution. The US has also expressed concerns about the presence of Al-Qaeda and other terrorist groups in the region and their potential to destabilize the region and provoke conflict between India and Pakistan.
What are some of the human rights concerns in the region?
Both Indian-administered and Pakistani-administered Kashmir have been subject to significant human rights concerns. In Indian-administered Kashmir, these concerns include excessive use of force by security forces, extrajudicial killings, arbitrary arrests and detentions, torture, and restrictions on freedom of expression and assembly. Some surveys have found that in the Kashmir region itself (where the bulk of separatist and Indian military activity is concentrated), popular perception holds that the Indian Armed Forces are more to blame for human rights violations than the separatist groups. In Pakistani-administered Kashmir, concerns include restrictions on political freedoms, lack of an independent judiciary, and discrimination against women. UNCR reports that the status of women in Pakistani-administered Kashmir is similar to that of women in Pakistan. They are not granted equal rights under the law, and their educational opportunities and choice of marriage partner remain “circumscribed”. Domestic violence, forced marriage, and other forms of abuse continue to be issues of concern.
What are some of the proposed solutions to the Kashmir conflict?
Various solutions have been proposed over the years, including:
Plebiscite: Holding a plebiscite under UN supervision to allow the Kashmiri people to choose between joining India or Pakistan or becoming independent.
Partition: Dividing the territory along the Line of Control, with adjustments to reflect demographic realities and strategic considerations. The Chenab formula is also one of these.
Autonomy: Granting greater autonomy to both Indian-administered and Pakistani-administered Kashmir, with guarantees of human rights and democratic governance.
Joint Control: Establishing a joint Indo-Pakistani mechanism to manage the region, with the possibility of eventual self-governance.
No Solution: Maintaining the status quo, which neither side is willing to do.
The Kashmir Conflict: History, Perspectives, and Resolution Efforts
The Kashmir conflict is a territorial dispute over the Kashmir region, primarily between India and Pakistan, but also involving China. The conflict began after the partition of India in 1947, with both India and Pakistan claiming the entire former princely state of Jammu and Kashmir. The conflict has led to multiple wars and skirmishes between India and Pakistan.
Background:
From 1752, the Afghan Durrani Empire ruled Kashmir until 1819 when the Sikh Empire conquered it.
After the First Anglo-Sikh War (1845–1846), Kashmir was ceded to the East India Company, which then transferred it to Gulab Singh, the Raja of Jammu, who became the Maharaja of Jammu and Kashmir.
Internal Conflict & Political Movements:
In 1932, Sheikh Abdullah and Chaudhry Ghulam Abbas founded the All-Jammu and Kashmir Muslim Conference to advocate for the rights of Muslims in the state.
In 1938, the party was renamed the National Conference to represent all Kashmiris regardless of religion.
National Stances:
India’s View:
The Instrument of Accession signed by Maharaja Hari Singh in 1947 was a legal and irrevocable act.
The Constituent Assembly of Jammu and Kashmir ratified the accession to India and called for a permanent merger.
India considers Kashmir an integral part of its secular nation, despite the Muslim-majority population.
India accuses Pakistan of fueling insurgency and terrorism in Kashmir.
Pakistan’s View: Pakistan views Kashmir as its “jugular vein” and believes the issue should be resolved according to UN resolutions, suggesting a plebiscite to allow Kashmiris to decide their future.
China’s View: China is also party to the Kashmir conflict, holding approximately 15% of the land area.
Efforts to Resolve the Dispute:
Proposed solutions have included independence for Kashmir, formal partition between India and Pakistan, and greater autonomy for Azad Kashmir and Jammu and Kashmir.
The Simla Agreement of 1972 stipulates that all differences, including Kashmir, should be settled through bilateral negotiations between India and Pakistan.
As of 2024, there has been little meaningful dialogue to end the conflict, and India holds the territorially advantageous position.
Human Rights Abuses:
There are concerns over human rights abuses in both Indian-administered and Pakistan-administered Kashmir.
In the Muslim-majority Kashmir Valley, there is a high rate of concern over human rights abuses, whereas, in the Hindu and Buddhist majority areas, concerns are low.
The Kashmir Conflict: An Overview of India and Pakistan’s Dispute
The Kashmir conflict is a major point of contention in India-Pakistan relations. The conflict began after the partition of India in 1947 when both countries claimed the entirety of the former princely state of Jammu and Kashmir. This dispute has led to multiple wars and skirmishes between the two nations.
Historical Context:
The conflict’s roots trace back to the end of British rule in the Indian subcontinent in 1947, which led to the creation of India and Pakistan.
The British Paramountcy over the 562 Indian princely states ended and these states were left to decide whether to join India, Pakistan, or remain independent.
Jammu and Kashmir, the largest of these princely states, had a predominantly Muslim population ruled by a Hindu Maharaja, Hari Singh, who initially decided to remain independent.
Accession and War:
After the partition of India and a rebellion in the western districts of the state, Pakistani tribal militias invaded Kashmir, leading the Hindu ruler of Jammu and Kashmir to join India.
The resulting Indo-Pakistani War ended with a UN-mediated ceasefire along a line that was eventually named the Line of Control.
Points of contention for India:
India considers itself to be in legal possession of Jammu and Kashmir due to the accession of the state.
India views Pakistan’s assistance to rebel forces as a hostile act and the involvement of the Pakistani army as an invasion of Indian territory.
From India’s perspective, a plebiscite was meant to confirm the accession, which it considered already complete.
India accuses Pakistan of fueling instability through proxy wars.
Points of contention for Pakistan:
Pakistan maintains that Kashmir is its “jugular vein” and that its final status should be determined by the Kashmiri people.
Pakistan rejects India’s claim to Kashmir, arguing that the Maharaja was unpopular and used force to suppress the population.
Pakistan holds that the popular Kashmiri insurgency demonstrates that the Kashmiri people no longer wish to remain within India and that Kashmir either wants to be with Pakistan or independent.
Pakistan views India as disregarding UN Security Council resolutions by not holding a plebiscite.
Attempts to resolve the conflict:
Numerous attempts have been made to resolve the conflict, including UN mediation and bilateral agreements such as the Simla Agreement of 1972.
The Simla Agreement stated that the countries would settle their differences by peaceful means through bilateral negotiations while maintaining the sanctity of the Line of Control.
Other considerations:
China also claims portions of the Kashmir region.
The conflict has had a significant impact on the people of Kashmir, with many becoming refugees or internally displaced.
Both India and Pakistan have been accused of human rights abuses in the region.
Kashmir Conflict: Human Rights Concerns and Allegations
The Kashmir conflict has a significant human rights dimension, with accusations against both India and Pakistan.
Reports and Findings:
The OHCHR (Office of the High Commissioner for Human Rights) has released reports on the human rights situation in both Indian-Administered Kashmir and Pakistan-Administered Kashmir.
Freedom House categorizes both Indian-administered Kashmir and Pakistani-administered Kashmir as “not free”.
A 2010 Chatham House opinion poll found that concern over human rights abuses varied across the region, with high concern in the Muslim-majority Kashmir Valley and low concern in the Hindu and Buddhist-majority areas.
Human Rights Abuses in Indian-Administered Kashmir:
Scholars and organizations have reported human rights abuses by Indian forces, including extrajudicial killings, rape, torture, and enforced disappearances.
Amnesty International has accused the Indian government of refusing to prosecute perpetrators of abuses and notes that no member of the Indian military in Jammu and Kashmir had been tried in a civilian court as of June 2015.
Armed Forces Special Powers Act (AFSPA): This act grants broad powers to the military, including the right to shoot to kill and detain individuals without charge, leading to concerns about human rights violations. Some human rights organizations have asked the Indian government to repeal the Public Safety Act, since “a detainee may be held in administrative detention for a maximum of two years without a court order”.
Enforced Disappearances and Mass Graves: The State Human Rights Commission (SHRC) has found thousands of unmarked graves believed to contain victims of unlawful killings and enforced disappearances.
Sexual Violence: Reports indicate a high incidence of sexual abuse and rape, with allegations that security forces use rape as a cultural weapon of war.
Kashmiri Pandits: There have been killings and displacement of Kashmiri Pandits (Hindus) due to the conflict.
Human Rights Abuses in Pakistan-Administered Kashmir:
There have been instances of human rights abuses in Azad Kashmir, including political repressions and forced disappearances.
Lack of Freedoms: Residents of Azad Kashmir are reportedly not free, with Pakistani authorities exercising strict controls on basic freedoms.
Human Rights Watch has accused the ISI (Pakistan’s intelligence agency) and the military of torture.
Religious Discrimination: Claims of religious discrimination and restrictions on religious freedom in Azad Kashmir have been made against Pakistan.
Lack of Representation: Criticisms have been raised regarding the lack of human rights, justice, democracy, and Kashmiri representation in the Pakistan National Assembly.
Kashmiri Perspectives:
Kashmiri scholars claim that India’s military occupation inflicts violence and humiliation, with Indian forces responsible for human rights abuses.
There are assertions that the Kashmiri people have not been able to exercise the right to self-determination.
Kashmiri Perspectives on the Kashmir Conflict
Kashmiri views on the Kashmir conflict are varied and complex, with a central theme of a desire for self-determination.
Key aspects of Kashmiri perspectives include:
Historical Grievances: Kashmiris feel they have been ruled by various empires and governments, fostering a sense of not being in control of their own fate for centuries.
Right to Self-determination: Since the 1947 accession of Kashmir to India was provisional and conditional, Kashmiris maintain their right to determine their future. They assert that state elections do not satisfy this requirement.
Desire for a Plebiscite: Many Kashmiris want a plebiscite to achieve freedom. A constitutional expert, A. G. Noorani, says the people of Kashmir are very much a party to the dispute.
Rejection of Indian Rule: A significant portion of Kashmiris oppose Indian rule, citing broken promises of a plebiscite, violations of autonomy, and subversion of the democratic process as reasons for the 1989-1990 rebellion. Some Kashmiris believe that they were better off under Dogra rule than under Indian rule.
Views on Elections: Kashmiris assert that except for the 1977 and 1983 elections, no state election has been fair. The Hurriyat parties do not want to participate in elections under the framework of the Indian Constitution, viewing them as a diversion from self-determination.
Impact of Military Presence: Opponents of Indian rule say that India has a large military presence in Kashmir, resulting in violence, human rights abuses, and a sense of humiliation among Kashmiris.
Divergent Regional Views:
A 2007 poll indicated that 87% of people in Srinagar wanted independence.
In contrast, 95% of people in Jammu city felt the state should be part of India.
Economic Considerations: Some, like Markandey Katju, argue that secession would harm Kashmir’s economy due to its dependence on Indian markets.
Identity and Religion: Kashmiris have a distinct sense of identity, with Islam being an integral part of it. Some Kashmiris might prefer Pakistan due to religious affinity and socio-economic links if India and Pakistan cannot guarantee the existence and peaceful development of an independent Kashmir.
Views on Voter Turnout: High voter turnout is not necessarily an endorsement of Indian rule, as voters may be motivated by factors such as development and local governance.
Settlement Formulas: Kashmiris seek an “honourable solution” that ensures their dignity without necessarily signifying a victory for either India or Pakistan.
Al-Qaeda and the Kashmir Conflict: Involvement and Claims
Al-Qaeda’s involvement in the Kashmir conflict is a complex issue with various reports and claims.
Key points regarding Al-Qaeda’s involvement:
Osama bin Laden’s Stance: In a 2002 “Letter to American People,” Osama bin Laden stated that one of the reasons he was fighting America was its support for India on the Kashmir issue.
US Intelligence Assessments: In 2002, US Secretary of Defense Donald Rumsfeld suggested Al-Qaeda was active in Kashmir, though without hard evidence. US officials believed Al-Qaeda aimed to provoke conflict between India and Pakistan, potentially forcing Pakistan to move troops to the Indian border and relieving pressure on Al-Qaeda elements in Pakistan. US intelligence analysts also suggested that Al-Qaeda and Taliban operatives in Pakistani-administered Kashmir were assisting terrorists trained in Afghanistan to infiltrate Indian-administered Kashmir.
Al-Qaeda’s Claims: In 2006, Al-Qaeda claimed to have established a wing in Kashmir, raising concerns for the Indian government.
Counterclaims: In 2007, the Indian Army stated that there was no evidence to verify media reports of an Al-Qaeda presence in Indian-administered Jammu and Kashmir. They also ruled out Al-Qaeda ties with militant groups in Kashmir, including Lashkar-e-Taiba and Jaish-e-Mohammed, though they had information about Al-Qaeda’s strong ties with these groups’ operations in Pakistan.
Destabilization Efforts: In 2010, US Defense Secretary Robert Gates stated that Al-Qaeda was seeking to destabilize the region and planning to provoke a nuclear war between India and Pakistan.
Links to Militant Groups: Fazlur Rehman Khalil, leader of Harkat-ul-Mujahideen, signed al-Qaeda’s 1998 declaration of holy war, which called on Muslims to attack all Americans and their allies.
Killing of Ilyas Kashmiri: In June 2011, a US drone strike killed Ilyas Kashmiri, chief of Harkat-ul-Jihad al-Islami, a Kashmiri militant group associated with Al-Qaeda. He was described as a “prominent” Al-Qaeda member and the head of military operations for Al-Qaeda.
New Battlefields: Waziristan became a new battlefield for Kashmiri militants fighting NATO in support of Al-Qaeda.
Appointment of Farman Ali Shinwari: In April 2012, Farman Ali Shinwari, a former member of Kashmiri separatist groups Harkat-ul-Mujahideen and Harkat-ul-Jihad al-Islami, was appointed chief of al-Qaeda in Pakistan.
Investigation Findings: A 2002 investigation by a Christian Science Monitor reporter claimed that Al-Qaeda and its affiliates were prospering in Pakistani-administered Kashmir with the tacit approval of Pakistan’s Inter-Services Intelligence agency (ISI).
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
Wikipedia is a collaboratively edited, free online encyclopedia. The provided text gives an overview of its history, creation, governance, community, and content policies. It also addresses criticisms, such as accuracy concerns, biases, censorship, and explicit content. The text highlights Wikipedia’s language editions, its access methods, and its cultural influence including its impact on education and journalism. It details various studies about the reliability of Wikipedia content. Finally, the source also lists awards that the website has won and describes associated projects.
Wikipedia: A Comprehensive Study Guide
Content Overview
This study guide covers the history, structure, policies, community, and cultural impact of Wikipedia, drawing from the provided Wikipedia article. It’s designed to help you review the key concepts and details presented in the source material.
Key Concepts
History and Founding: Understand the origins of Wikipedia and its relationship to Nupedia.
Openness and Collaboration: Explore Wikipedia’s model of open collaboration and its consequences, including vandalism and edit wars.
Policies and Content: Familiarize yourself with the core content policies, including Neutral Point of View (NPOV), Verifiability, and No Original Research.
Governance: Learn about the roles of administrators and the Arbitration Committee in managing disputes and maintaining order.
Community and Diversity: Understand the demographics of Wikipedia editors and the ongoing challenges related to diversity and bias.
Language Editions: Explore the multilingual nature of Wikipedia and the variations in content and policies across different language versions.
Reception and Criticism: Consider the criticisms of Wikipedia, including concerns about accuracy, systemic bias, and discouragement in education.
Operation and Technology: Review the technological infrastructure, including the LAMP platform, automated editing, and mobile access.
Cultural Influence: Examine Wikipedia’s impact on education, journalism, and research.
Quiz: Short-Answer Questions
Answer each question in 2-3 sentences.
What were the two main goals of Jimmy Wales and Larry Sanger in creating Wikipedia?
Explain the meaning of the name “Wikipedia.”
What is the “neutral point of view” policy, and why is it important to Wikipedia?
Describe the role of Wikipedia administrators.
What is the Arbitration Committee, and what does it do?
What are some of the challenges Wikipedia faces related to diversity?
List three of the largest language editions of Wikipedia by article count as of February 2025, according to the document.
Explain the concept of “systemic bias” in the context of Wikipedia.
What are Wikipedia bots, and what role do they play in maintaining the encyclopedia?
How does Wikipedia combat misinformation about current events, like the COVID-19 pandemic?
Quiz: Answer Key
Jimmy Wales aimed to create a publicly editable encyclopedia, while Larry Sanger suggested using a wiki to achieve this goal. Together, they wanted to create a free encyclopedia of the highest possible quality, accessible to every person in their own language.
The name “Wikipedia” is a blend of the words “wiki,” referring to the collaborative editing system, and “encyclopedia,” indicating its purpose as a comprehensive source of information. This combination highlights the collaborative nature of the encyclopedia.
The “neutral point of view” (NPOV) policy requires articles to represent significant views fairly, proportionately, and without bias. NPOV is crucial for ensuring the encyclopedia is seen as objective and trustworthy by a global audience.
Wikipedia administrators are volunteer editors who have been granted additional technical abilities, including the ability to delete pages, protect pages from editing, and block users. They help enforce policies and maintain order within the Wikipedia community.
The Arbitration Committee is a group of editors elected by the Wikipedia community to resolve disputes that cannot be solved through other methods. They have the authority to issue binding rulings and sanctions to maintain the integrity of the encyclopedia.
Wikipedia faces challenges related to gender, geographic, and ideological diversity among its editors and content. These challenges can lead to systemic biases in the coverage of topics and perspectives.
As of February 2025, the English Wikipedia had 6,956,747 articles, the Cebuano Wikipedia had 6,116,785 articles, and the German Wikipedia had 2,989,789 articles.
“Systemic bias” refers to the ways in which Wikipedia’s content and structure may reflect the perspectives and priorities of certain groups, such as Western cultures or English speakers. This bias can result in underrepresentation or misrepresentation of other cultures, topics, or viewpoints.
Wikipedia bots are computer programs used to perform simple and repetitive tasks, such as correcting common misspellings, formatting articles, and detecting vandalism. These bots help maintain the encyclopedia’s quality and consistency.
Wikipedia combats misinformation by relying on its community of editors to monitor and verify content, citing reliable sources, and adhering to its neutral point of view policy. It also partners with organizations like the World Health Organization to combat health-related misinformation.
Essay Questions
Discuss the impact of Wikipedia’s open editing model on the quality and reliability of its content. What are the benefits and drawbacks of allowing anyone to edit?
Analyze the challenges Wikipedia faces in achieving and maintaining a neutral point of view across its diverse range of articles and language editions.
Explore the criticisms of Wikipedia’s coverage of topics and the existence of systemic biases. How might these biases be addressed and mitigated?
Evaluate Wikipedia’s role as a source of information in education. Should students be encouraged to use Wikipedia, and if so, how should they be taught to critically assess its content?
Discuss the role and impact of Wikipedia bots on the functioning and content of the website. How do these bots help maintain the integrity of Wikipedia, and what are the potential downsides to relying on automated editing?
Glossary of Key Terms
Administrator: A Wikipedia editor granted additional technical abilities, including the ability to delete pages, protect pages, and block users.
Arbitration Committee (ArbCom): A group of editors elected by the Wikipedia community to resolve complex disputes.
Bot: A computer program used to perform automated tasks on Wikipedia, such as correcting errors or reverting vandalism.
CC Attribution / Share-Alike 4.0: A Creative Commons license allowing users to share and adapt content with proper attribution.
Edit War: A content dispute on Wikipedia where editors repeatedly revert each other’s changes to an article.
GFDL (GNU Free Documentation License): A copyleft license for free documentation, often used in conjunction with Creative Commons on Wikipedia.
LAMP platform: A web service stack comprising Linux, Apache, MySQL, and PHP/Python/Perl.
MediaWiki: The wiki software used by Wikipedia.
Nupedia: A free online encyclopedia project that predated Wikipedia and served as its original inspiration.
Neutral Point of View (NPOV): A core content policy requiring articles to represent significant viewpoints fairly, proportionately, and without bias.
No Original Research: A core content policy prohibiting the inclusion of unpublished facts, ideas, or arguments in Wikipedia articles.
Systemic Bias: Skews in Wikipedia content resulting from the demographics and perspectives of its editors.
Vandalism: Deliberate attempts to disrupt Wikipedia by adding false, biased, or nonsensical content.
Verifiability: A core content policy requiring all material in Wikipedia articles to be attributable to reliable, published sources.
Wikimedia Foundation: The American nonprofit organization that hosts Wikipedia and its sister projects.
Wikipedians: Volunteers who write and maintain Wikipedia.
Wikification: The process of finding “missing” links in Wikipedia.
Wiki: Software that allows users to easily create and edit web pages collaboratively.
Wikipedia Zero: An initiative to provide free access to Wikipedia in developing countries, which was later discontinued.
Wikipedia: History, Content, Community, and Challenges
Wikipedia Briefing Document
This document summarizes the main themes and important ideas presented in the provided Wikipedia article about Wikipedia itself.
I. Overview & Key Facts:
Definition: Wikipedia is a “free-content online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki.”
Scale: It is “the largest and most-read reference work in history” and consistently ranked among the most visited websites globally.
Founding & Hosting: Founded by Jimmy Wales and Larry Sanger on January 15, 2001, and hosted by the Wikimedia Foundation since 2003.
Governance: The Wikimedia Foundation is an American nonprofit funded by donations from readers.
Language Editions: As of February 2025, there are 341 language editions. The largest editions (by article count) are English, Cebuano, German, French, Swedish, and Dutch.
Active Users: Over 309,457 active editors, and over 117,918,423 registered users.
Content License: CC Attribution / Share-Alike 4.0; most text is also dual-licensed under GFDL.
Traffic: As of February 2023, Wikipedia attracts around 2 billion unique devices monthly, with the English Wikipedia receiving 10 billion pageviews each month.
II. History & Development:
Nupedia Precursor: Wikipedia began as a complementary project for Nupedia, a free online encyclopedia with expert-written and formally reviewed articles.
Wiki Strategy: Larry Sanger is credited with the idea of using a wiki to create a publicly editable encyclopedia. On January 10, 2001, Sanger proposed on the Nupedia mailing list to create a wiki as a “feeder” project for Nupedia.
Neutral Point of View: The “neutral point-of-view” policy was established early in Wikipedia’s development.
Rapid Growth: The site experienced rapid growth after its launch.
III. Openness & Community:
Open Collaboration: Wikipedia relies on the principle of open collaboration by a community of volunteer editors.
Restrictions: Certain tasks require registration, like editing protected pages or creating new pages.
Review of Changes: Changes are reviewed by other editors, and some pages are protected to prevent vandalism.
Vandalism & Disputes: Vandalism and disputes are ongoing challenges, addressed through various mechanisms, including dispute resolution processes and the Arbitration Committee.
IV. Policies & Content:
Content Policies: Core content policies and guidelines exist, including the requirement for verifiable information and a neutral point of view.
“Five Pillars”: Wikipedia operates under five fundamental principles (not explicitly defined in the excerpt but referenced).
Verifiability: “Wikipedia’s verifiability policy requires inline citations for any material challenged or likely to be challenged, and for all quotations, anywhere in article space.”
Notability: Articles must meet specific notability criteria to warrant inclusion.
No Original Research: “Wikipedia does not publish original thought.” Articles should be based on reliable, published sources.
V. Governance & Administration:
Administrators: Administrators have elevated privileges to manage the site and enforce policies.
Dispute Resolution: A formal dispute resolution process exists, culminating in the Arbitration Committee.
VI. Language Editions & Cultural Influence:
Global Reach: Jimmy Wales described Wikipedia as “an effort to create and distribute a free encyclopedia of the highest possible quality to every single person on the planet in their own language.”
Independent Editions: While sharing global policies, language editions may diverge in policy and practice.
Meta-Wiki Coordination: Meta-Wiki coordinates the various language editions and provides statistics.
Systemic Bias: There is recognition of systemic biases, leading to efforts to address them.
Combatting Fake News: Wikipedia is seen as a “trusted source to combat fake news.”
COVID-19 Coverage: The encyclopedia has been used extensively during the COVID-19 pandemic to disseminate information and combat misinformation.
VII. Challenges & Criticisms:
Accuracy of Content: The accuracy of Wikipedia’s content is a recurring topic of debate and research.
Discouragement in Education: Some educators discourage students from citing Wikipedia due to concerns about reliability.
Systemic Bias: Concerns about systemic bias exist, including gender bias, geographical bias (“Wikipedia’s view of the world is written by the west”), and ideological bias.
Sexism: The article highlights specific instances raising concerns about sexism, such as the initial rejection of an article about Nobel Prize winner Donna Strickland due to perceived lack of media coverage, attributed partly to gender bias in media.
Vandalism & Edit Wars: The website faces constant challenges of vandalism and edit warring, requiring constant moderation.
VIII. Operation & Technology:
Wikimedia Foundation: The Wikimedia Foundation oversees the technical infrastructure and operations.
Automated Editing: “Computer programs called bots have often been used to perform simple and repetitive tasks, such as correcting common misspellings and stylistic issues, or to start articles such as geography entries in a standard format from statistical data.”
Mobile Access: Wikipedia is accessible through mobile apps and optimized for mobile devices.
Wikipedia Zero (Discontinued): An initiative to provide free access to Wikipedia in developing countries.
IX. Cultural Influence:
Impact on Publishing: Wikipedia’s availability has affected the biography publishing business.
Research Use: Wikipedia is widely used as a corpus for linguistic research.
Academic Studies: Studies have explored the influence of Wikipedia on various aspects of knowledge and culture.
X. Key Quotes:
“Wikipedia is a free-content online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki.”
“Wikipedia is the largest and most-read reference work in history.”
Jimmy Wales: “an effort to create and distribute a free encyclopedia of the highest possible quality to every single person on the planet in their own language.”
“Wikipedia’s verifiability policy requires inline citations for any material challenged or likely to be challenged, and for all quotations, anywhere in article space.”
“Computer programs called bots have often been used to perform simple and repetitive tasks, such as correcting common misspellings and stylistic issues, or to start articles such as geography entries in a standard format from statistical data.”
This briefing document provides a comprehensive overview of Wikipedia based on the provided source material, touching on its history, governance, content, community, challenges, and cultural impact.
Wikipedia: Origins, Governance, and Impact
How did Wikipedia originate?
Wikipedia began as a complementary project to Nupedia, a free online encyclopedia where articles were written and reviewed by experts. Larry Sanger proposed using a wiki as a feeder project for Nupedia, leading to Wikipedia’s launch on January 15, 2001. Jimmy Wales is credited with the vision of a publicly editable encyclopedia, and Sanger with the wiki strategy.
What are the core principles that govern Wikipedia’s content?
Wikipedia’s content is guided by several key policies, including “neutral point of view” (NPOV), verifiability, and no original research. Articles must be written from a neutral perspective, representing significant viewpoints fairly and proportionately. All material must be attributable to reliable, published sources, and original research is prohibited. Citing sources is required for anything challenged or likely to be challenged.
How is Wikipedia governed and how are disputes resolved?
The Wikimedia Foundation hosts Wikipedia and oversees the project, but the community of editors plays a significant role in governance. Disputes are resolved through discussion, and when necessary, more formal mechanisms like dispute resolution processes. Administrators have the ability to enforce policies, and the Arbitration Committee is the final step in dispute resolution.
How does Wikipedia address vandalism and ensure the accuracy of information?
Wikipedia employs various methods to combat vandalism, including bots that detect and revert malicious edits. Edits by new or unregistered users are closely monitored, and pages prone to vandalism can be protected, restricting editing to established users or administrators. “Flagged revisions” allow for community review of changes before they are visible to all readers. Despite these measures, errors and biases can still occur.
What is the role of bots in Wikipedia’s operation?
Bots play a crucial role in Wikipedia by performing simple, repetitive tasks such as correcting misspellings, standardizing article formats, and notifying editors of common errors. Some bots are designed to create geography entries and other articles from statistical data. Anti-vandal bots quickly detect and revert vandalism. All bots must be approved before activation.
How does Wikipedia address systemic bias and ensure diversity in its content?
Wikipedia acknowledges the existence of systemic biases, including gender, geographical, and ideological biases. Efforts are underway to counter these biases, such as initiatives to increase the representation of women editors and topics related to underrepresented regions and groups. The various language editions diverge on some points of policy and practice, which can lead to differing coverage of topics.
How is Wikipedia funded, and how can users access it?
Wikipedia is hosted by the Wikimedia Foundation, a non-profit organization primarily funded by donations from readers. Wikipedia’s content is available under a Creative Commons Attribution-ShareAlike license, allowing free use and distribution with attribution. Content can be accessed through the website, mobile apps, and some offline versions. The Wikimedia Foundation also promotes access in developing countries.
How is Wikipedia being utilized and how is its influence perceived in various fields?
Wikipedia has become a widely used general reference website. It serves as a corpus for linguistic research in computational linguistics, information retrieval, and natural language processing. It is used in entity linking (wikification) and word-sense disambiguation. Some view it as a trusted source to combat fake news. Wikipedia is cited by courts and has been suggested to have influenced a drop in biography sales.
Wikipedia: Definition, Scope, History, Content, Community, and Criticism
Here is a discussion of the overview of Wikipedia, based on the provided source:
Definition: Wikipedia is a free online encyclopedia that is written and maintained by a community of volunteers known as Wikipedians. It uses open collaboration and the MediaWiki software.
Scope and Popularity: Wikipedia is the largest and most-read reference work in history and is consistently ranked among the top ten most visited websites. As of December 2024, it was ranked fifth by Semrush and seventh by Similarweb.
History:
Wikipedia was founded by Jimmy Wales and Larry Sanger on January 15, 2001.
It began as a complementary project for Nupedia, a free online English-language encyclopedia project with articles written by experts and reviewed under a formal process.
The name “Wikipedia” is a blend of the words “wiki” and “encyclopedia”.
Content and Policies:
Wikipedia operates under a policy of “neutral point-of-view“.
Entries must be encyclopedic and meet standards of “notability,” meaning the topic has been covered in mainstream media or major academic journal sources independent of the article’s subject.
Wikipedia aims to convey established and recognized knowledge, avoiding original research.
Content requires references to reliable sources.
Wikipedia does not take sides and strives to present information from a neutral point of view.
Community and Editing:
Wikipedia is maintained by a community of volunteers, and each article has a “talk” page for discussion, coordination, and debate.
The community has been described as “cultlike“.
Editing restrictions exist for certain controversial or vandalism-prone pages, where only registered, autoconfirmed, or extended confirmed editors can make changes.
Dispute Resolution:
Wikipedia has a semi-formal dispute resolution process.
Editors can raise issues in community forums, seek third opinions, or initiate a “request for comment” to determine community consensus.
Language Editions:
Wikipedia aims to create and distribute a free encyclopedia of the highest possible quality to every person on the planet in their own language.
Each language edition functions independently but is coordinated by Meta-Wiki.
Reception and Criticism:
Wikipedia has faced criticism regarding its reliability, systemic bias, and unevenness of coverage.
Some critics argue that articles can be dominated by persistent voices with an “ax to grind”.
However, Wikipedia has also been lauded as a valuable source of information and a means to combat fake news.
Cultural Influence:
Wikipedia’s coverage of events such as the COVID-19 pandemic demonstrates its cultural significance and role in providing information [10.2.1].
Operation:
Wikipedia is hosted by the Wikimedia Foundation, a non-profit organization funded mainly by donations.
It involves software operations, automated editing, and hardware support.
Access to Content:
Wikipedia’s content is available under the CC Attribution / Share-Alike 4.0 license, with most text also dual-licensed under GFDL.
Content can be accessed through various methods, including mobile access [9.2.1].
Awards and Recognition:
Wikipedia has received awards such as the Erasmus Prize and the Princess of Asturias Award on International Cooperation.
Explicit Content:
Wikipedia has faced criticism for allowing information about graphic content, including images and videos of sexual content.
The policy of “Wikipedia is not censored” has been controversial.
Wikipedia Community Dynamics: Culture, Contributions, and Challenges
Here’s a discussion of the community aspects of Wikipedia, based on the provided source:
Community Maintenance: Wikipedia is maintained by a community of volunteers. These volunteers are known as “Wikipedians” [Me].
Communication Channels: Each article and user on Wikipedia has a dedicated “talk” page that serves as the primary communication channel for editors to discuss, coordinate, and debate.
Community Culture:Wikipedia’s community has been described as cultlike.
There is a preference for cohesiveness, which sometimes requires compromise, even if it means disregarding credentials. This has been referred to as “anti-elitism”.
Becoming a Wikipedia insider involves learning Wikipedia-specific technological codes, submitting to a dispute resolution process, and learning the internal culture.
Editor Contributions and Activity:The English Wikipedia has a large number of registered editors, but only a fraction are considered active. An editor is considered active if they have made one or more edits in the past 30 days.
Editors who do not comply with Wikipedia cultural rituals may be seen as outsiders, which could affect how their contributions are received.
Editors who do not log in may be considered “second-class citizens” because their contributions cannot be attributed to a particular editor with certainty.
Community Diversity & Bias: Academic studies show that Wikipedia over-represents the views of a specific demographic, typically an educated, technically inclined, English-speaking white male from a developed Christian country in the northern hemisphere. This bias results in cultural, gender, and geographical biases on Wikipedia.
Dispute Resolution: Wikipedia has developed a semi-formal dispute resolution process to determine community consensus. Editors can raise issues in community forums, seek third opinions, or initiate a “request for comment”. Wikipedia encourages local resolutions of conflicts. The Arbitration Committee is the ultimate dispute resolution process, but it focuses on how disputes are conducted rather than ruling on specific views.
Language Editions: Although each language edition of Wikipedia functions independently, some efforts are made to supervise them all. They are coordinated in part by Meta-Wiki.
Wikimedia Movement Affiliates: Wikipedia is supported by independently-run organizations and groups affiliated with the Wikimedia Foundation, including Wikimedia chapters, thematic organizations, and user groups. These affiliates participate in the promotion, development, and funding of Wikipedia.
Editor Harassment: Editor harassment has been identified as an issue within the Wikipedia community.
Increasing Diversity: Increasing diversity within the Wikimedia community is a focus.
Wikipedia Content Accuracy and Reliability
Here’s a discussion of content accuracy on Wikipedia, based on the provided source:
No Guarantee of Validity: Wikipedia “makes no guarantee of validity” of its content because no one is ultimately responsible for the accuracy of the claims made in it [24, W 54].
Expertise vs. Open Structure: Traditional encyclopedias, such as Encyclopædia Britannica, are written by experts, giving them a reputation for accuracy. Wikipedia’s open structure allows anyone to contribute.
Nature Study: A peer review in 2005 of scientific entries in both Wikipedia and Encyclopædia Britannica by the journal Nature found few differences in accuracy. The study concluded that “the average science entry in Wikipedia contained around four inaccuracies; Britannica, about three”.
However, this study has faced criticism regarding its sample size and selection method.
Encyclopædia Britannica disputed the findings by Nature, and Nature issued a rebuttal.
Vandalism: Any change that deliberately compromises Wikipedia’s integrity is considered vandalism. Obvious vandalism is generally easy to remove from Wikipedia articles, with a median time to detect and fix it being a few minutes. However, some vandalism can take much longer to detect and repair.
Seigenthaler Biography Incident: In the Seigenthaler biography incident, an anonymous editor introduced false information into the biography of American political figure John Seigenthaler in May 2005, falsely presenting him as a suspect in the assassination of John F. Kennedy. It remained uncorrected for four months. This incident led to policy changes at Wikipedia for tightening up the verifiability of biographical articles of living people.
“Verifiability, Not Truth”: Among Wikipedia editors, the guiding principle is often phrased as “verifiability, not truth,” meaning that readers are responsible for checking the truthfulness of articles and making their own interpretations [11, W 35]. This can sometimes lead to the removal of valid information if it is not properly sourced.
Conflicting Views on Reliability:
Tyler Cowen suggests that Wikipedia may be more likely to be accurate than the median refereed journal article on economics, while also cautioning that errors are frequently found on Internet sites.
Amy Bruckman argues that the content of a popular Wikipedia page is actually the most reliable form of information ever created due to the number of reviewers.
Critics argue that Wikipedia’s open nature and lack of proper sources for much of the information makes it unreliable.
Editors of traditional reference works such as the Encyclopædia Britannica have questioned the project’s utility and status as an encyclopedia.
Use as a Source: Legal Research in a Nutshell (2011) cites Wikipedia as a “general source” that “can be a real boon” in “coming up to speed in the law governing a situation” and, “while not authoritative, can provide basic facts as well as leads to more in-depth resources”.
Medical Information: A 2014 article in The Atlantic stated that 50% of physicians look up conditions on Wikipedia. James Heilman of WikiProject Medicine noted that less than 1% of Wikipedia’s medical articles have passed Wikipedia’s peer review process to be classified as “good” or “featured”.
Combating Fake News: Wikipedia co-founder Jimmy Wales has claimed that Wikipedia has largely avoided the problem of “fake news” because the Wikipedia community regularly debates the quality of sources in articles. In 2017–18, Facebook and YouTube announced they would rely on Wikipedia to help their users evaluate reports and reject false news.
Wikipedia’s Language Editions: Structure, Content, and Global Reach
Here’s a discussion of language editions of Wikipedia, based on the provided source:
Number of Editions: There are currently 341 language editions of Wikipedia, also referred to as language versions, or simply Wikipedias.
Article Count: As of February 2025, the six largest editions, in order of article count, are English, Cebuano, German, French, Swedish, and Dutch.
Bot Contributions: The Cebuano and Waray Wikipedias owe their positions to the article-creating bot Lsjbot. As of 2013, this bot had created about half the articles on the Swedish Wikipedia and most of the articles in the Cebuano and Waray Wikipedias.
Million+ Article Editions: Besides the top six, twelve other Wikipedias have more than a million articles each: Russian, Spanish, Italian, Polish, Egyptian Arabic, Chinese, Japanese, Ukrainian, Vietnamese, Waray, Arabic, and Portuguese.
Traffic Distribution: As of January 2021, the English Wikipedia receives 48% of Wikipedia’s cumulative traffic, with the remaining traffic split among other languages. The top 10 editions represent approximately 85% of the total traffic.
Contributors: Since Wikipedia is web-based, contributors to the same language edition may use different dialects or come from different countries, potentially leading to conflicts over spelling or points of view.
Global vs. Local Policies: While various language editions adhere to global policies like “neutral point of view”, they may diverge on policy and practice points, such as the use of non-free images under fair use claims.
Goal: Jimmy Wales described Wikipedia as “an effort to create and distribute a free encyclopedia of the highest possible quality to every single person on the planet in their own language” [22, W 49].
Coordination: Though each language edition functions independently, efforts are made to supervise them, coordinated in part by Meta-Wiki. Meta-Wiki provides statistics and maintains a list of articles every Wikipedia should have, covering basic content by subject.
Article Availability: Articles strongly related to a particular language may not have counterparts in other editions, even if they meet notability criteria.
Translation: Translated articles represent a small portion of articles in most editions, partly because fully automated translation is not allowed. Articles available in multiple languages may offer “interwiki links” to counterpart articles.
Regional Contributions: A 2012 study estimated that North America contributed 51% of the edits to the English Wikipedia and 25% to the Simple English Wikipedia.
Editor Retention: The Economist noted in 2014 that the English-language Wikipedia had seen a decline in the number of editors, while non-English Wikipedias had maintained a relatively constant number of active editors.
Wikipedia’s Cultural Impact and Significance
Here’s a discussion of the cultural impact of Wikipedia, based on the provided source:
Trusted Source: Wikipedia has become a trusted source to combat fake news. In 2017–18, Facebook and YouTube announced they would rely on Wikipedia to help their users evaluate reports and reject false news.
Cultural Significance:
Wikipedia’s content has been used in academic studies, books, conferences, and court cases [33, W 122, 272, 273].
The Parliament of Canada’s website refers to Wikipedia’s article on same-sex marriage in the “related links” section of its “further reading” list for the Civil Marriage Act.
The encyclopedia’s assertions are increasingly used as a source by organizations such as the US federal courts and the World Intellectual Property Organization.
Content appearing on Wikipedia has also been cited as a source and referenced in some US intelligence agency reports.
In December 2008, the scientific journal RNA Biology launched a new section for descriptions of families of RNA molecules and requires authors who contribute to the section to also submit a draft article on the RNA family for publication in Wikipedia.
Wikipedia has also been used as a source in journalism, often without attribution, and several reporters have been dismissed for plagiarizing from Wikipedia.
Recognition:
In 2006, Time magazine recognized Wikipedia’s participation (along with YouTube, Reddit, MySpace, and Facebook) in the rapid growth of online collaboration and interaction by millions of people worldwide.
The Washington Post reported in 2007 that Wikipedia had become a focal point in the 2008 US election campaign.
A 2007 Reuters article reported the recent phenomenon of how having a Wikipedia article vindicates one’s notability.
Governmental Affairs: Wikipedia was involved in a governmental affair in 2007, when an Italian politician raised a parliamentary question about the necessity of freedom of panorama, claiming that the lack of such freedom forced Wikipedia to forbid all images of modern Italian buildings and art, damaging tourist revenues.
Crowdsourcing: A working group led by Peter Stone called Wikipedia “the best-known example of crowdsourcing…that far exceeds traditionally-compiled information sources, such as encyclopedias and dictionaries, in scale and depth”.
Open and Decentralized Web: Hossein Derakhshan describes Wikipedia as “one of the last remaining pillars of the open and decentralized web”.
Awards: Wikipedia has won many awards, including a Golden Nica for Digital Communities of the annual Prix Ars Electronica contest in 2004, a Judges’ Webby Award for the “community” category in 2004, the Quadriga A Mission of Enlightenment award in 2008, the annual Erasmus Prize in 2015, and the Spanish Princess of Asturias Award on International Cooperation in 2015 [37, 38, W 123, 292, 294, 295, 296].
Satire: Wikipedia has been the subject of satire.
Publishing: The most obvious economic effect of Wikipedia has been the death of commercial encyclopedias, especially printed versions like Encyclopædia Britannica.
Biography Publishing: Wikipedia’s influence on the biography publishing business has been a concern for some.
Research Use: Wikipedia has been widely used as a corpus for linguistic research in computational linguistics, information retrieval, and natural language processing.
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
The source material presents a comprehensive guide to using the Pandas library in Python. It covers fundamental concepts like importing data from various file formats (CSV, text, JSON, Excel) into dataframes. The video provides instruction on cleaning, filtering, sorting, and indexing data. Also, it highlights the group by function, merging dataframes, and creating visualizations. The guide also teaches how to conduct exploratory data analysis, identifying patterns and outliers within a dataset.
Pandas Data Manipulation: A Comprehensive Study Guide
I. Quiz
Answer the following questions in 2-3 sentences each.
What is a Pandas DataFrame, and why is the index important?
Explain how to read a CSV file into a Pandas DataFrame, including handling potential Unicode errors.
Describe how to read a text file into a Pandas DataFrame using read_table and specify a separator.
How can you specify column names when reading a CSV file if the file doesn’t have headers?
Explain how to filter a Pandas DataFrame based on values in a specific column.
Describe the difference between loc and iloc when filtering data in a Pandas DataFrame using the index.
Explain how to sort a Pandas DataFrame by multiple columns, specifying the sorting order for each.
How do you create a MultiIndex in a Pandas DataFrame, and how does it affect data access?
Describe how to group data in a Pandas DataFrame using the groupby function and calculate the mean of each group.
Explain the different types of joins available in Pandas, including inner, outer, left, and right joins.
II. Answer Key
A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. The index is crucial because it provides a way to access, filter, and search data within the DataFrame, acting as a label for each row.
To read a CSV file, use pd.read_csv(‘file_path’). To handle Unicode errors, prepend the file path with r (e.g., pd.read_csv(r’file_path’)) to read the path as a raw string, preventing misinterpretation of backslashes.
Use pd.read_table(‘file_path’, sep=’delimiter’) to read a text file into a DataFrame. The sep argument specifies the separator between columns in the text file (e.g., sep=’\t’ for tab-separated).
To specify column names when a CSV lacks headers, use pd.read_csv(‘file_path’, header=None, names=[‘col1’, ‘col2’, …]). This sets header=None to prevent Pandas from using the first row as headers and then assigns names using the names parameter.
To filter by column values, use boolean indexing: df[df[‘column_name’] > value]. This selects rows where the condition inside the brackets is True.
loc filters by label, using the actual index value (string, number, etc.) to select rows and columns. iloc filters by integer position, using the row and column number (starting from 0) to select data.
To sort by multiple columns, use df.sort_values(by=[‘col1’, ‘col2’], ascending=[True, False]). The by argument takes a list of column names, and ascending takes a list of boolean values specifying the sorting order for each column.
A MultiIndex is created using df.set_index([‘col1’, ‘col2’]), creating a hierarchical index. It allows you to select specific values based on either index (using .loc).
Use df.groupby(‘column_name’).mean() to group data by a column and calculate the mean of each group. This groups rows with the same value in ‘column_name’ and computes the mean of the numeric columns for each group.
Inner: Returns rows with matching values in both DataFrames.
Outer: Returns all rows from both DataFrames, filling in missing values with NaN.
Left: Returns all rows from the left DataFrame and matching rows from the right, filling in missing values with NaN.
Right: Returns all rows from the right DataFrame and matching rows from the left, filling in missing values with NaN.
III. Essay Questions
Discuss the importance of data cleaning in the data analysis process, providing specific examples of cleaning techniques relevant to the source material.
Compare and contrast the different methods for filtering and sorting data in Pandas DataFrames, illustrating the use cases for each method.
Explain the concept of indexing in Pandas and how MultiIndexing can be used to organize and access complex datasets.
Describe how you can perform exploratory data analysis using Pandas and relevant libraries, and why it is important.
Explain the concept of joining in Pandas and how different types of joins can be used to combine related data from multiple sources.
IV. Glossary of Key Terms
DataFrame: A two-dimensional labeled data structure in Pandas, similar to a table, with columns of potentially different types.
Series: A one-dimensional labeled array in Pandas, capable of holding any data type.
Index: A label for each row in a Pandas DataFrame or Series, used for data alignment and selection.
MultiIndex: A hierarchical index in Pandas, allowing multiple levels of indexing on a DataFrame.
NaN (Not a Number): A standard missing data marker used in Pandas.
Filtering: Selecting a subset of rows from a DataFrame based on specified conditions.
Sorting: Arranging rows in a DataFrame in a specific order based on the values in one or more columns.
Grouping: Aggregating data in a DataFrame based on the values in one or more columns.
Joining: Combining data from two or more DataFrames based on a common column or index.
Inner Join: Returns rows with matching values in both DataFrames.
Outer Join: Returns all rows from both DataFrames, filling in missing values with NaN.
Left Join: Returns all rows from the left DataFrame and matching rows from the right, filling in missing values with NaN.
Right Join: Returns all rows from the right DataFrame and matching rows from the left, filling in missing values with NaN.
Concatenation: Appending or merging DataFrames together, either horizontally or vertically.
Aggregation: Computing summary statistics (e.g., mean, sum, count) for groups of data.
Exploratory Data Analysis (EDA): An approach to analyzing data sets to summarize their main characteristics, often with visual methods.
Unicode Error: An error that occurs when reading a file with characters that are not properly encoded.
loc: A Pandas method used to access rows and columns by label.
iloc: A Pandas method used to access rows and columns by integer position.
Lambda Function: A small anonymous function defined using the lambda keyword.
Heatmap: Data visualization that uses a color-coded matrix to represent the correlation between variables.
Box Plot: A graphical representation of the distribution of data showing the minimum, first quartile, median, third quartile, and maximum values, as well as outliers.
Pandas Python Data Analysis Tutorial Series
Okay, here’s a briefing document summarizing the main themes and ideas from the provided text excerpts, which appear to be transcripts of a series of video tutorials on using the Pandas library in Python for data analysis.
Briefing Document: Pandas Tutorial Series Overview
Main Theme:
This series of tutorials focuses on teaching users how to leverage the Pandas library in Python for various data manipulation, analysis, and visualization tasks. The content covers a range of essential Pandas functionalities, from basic data input and output to more advanced techniques like filtering, grouping, data cleaning, and exploratory data analysis.
Key Ideas and Concepts:
Introduction to Pandas and DataFrames:
Pandas is imported using the alias pd: “we are going to say import and we’re going to say pandas now this will import the Panda’s library but it’s pretty common place to give it an alias and as a standard when using pandas people will say as PD”
Data is stored and manipulated within Pandas DataFrames.
DataFrames have an index, which is important for filtering and searching: “as you can see right here there’s this index and that’s really important in a data frame it’s really what makes a data frame a data frame and we use index a lot in pandas we’re able to filter on the index search on the index and a lot of other things”
The distinction between a Series and a DataFrame is mentioned, suggesting that this will be covered in more detail in a later video.
Data Input/Output:
Pandas can read data from various file formats, including CSV, text, JSON, and Excel.
The pd.read_csv(), pd.read_table(), pd.read_json(), and pd.read_excel() functions are used to import data.
Specifying the file path is crucial. The tutorial demonstrates how to copy the file path: “you have this countries of the world CSV you just need to click on it and right click and copy as path and that’s literally going to copy that file path for us so you don’t have to type it out manually”
The R prefix is used when reading files from a filepath to read the string as raw text.
The sep parameter allows specifying delimiters for text files: “we need to use a separator and I’ll show you in just a little bit how we can do this in a different way but with that read CSV this is how we can do it we’ll just say sep is equal to we need to do back SLT now let’s try running this and as you can see it now has it broken out into country and region”
Headers can be specified or skipped during import using the header parameter.
Column names can be manually assigned using the names parameter when the file doesn’t contain headers or when renaming is desired.
Imported DataFrames should typically be assigned to a variable (e.g., df) for later use.
Data Inspection:
df.info() provides a summary of the DataFrame, including column names, data types, and non-null counts: “we’re going to bring data Frame 2 right down here and we want to take a look at some of this data we want to know a little bit more about it something that you can do is data frame 2. info and we’ll do an open parenthesis and when we run this it’s going to give us a really quick breakdown of a little bit of our data”
df.shape returns the number of rows and columns in a DataFrame.
df.head(n) displays the first n rows of the DataFrame.
df.tail(n) displays the last n rows of the DataFrame.
Specific columns can be accessed using bracket notation (e.g., df[‘ColumnName’]).
loc and iloc are used for accessing data by label (location) and integer position, respectively.
Filtering and Ordering:
DataFrames can be filtered based on column values using comparison operators (e.g., df[‘Rank’] < 10).
The isin() function allows filtering based on a list of specific values within a column.
The str.contains() function allows filtering for rows where a column contains a specific string.
The filter() function can be used to select columns based on a list of items or to filter rows based on index values using the like parameter.
sort_values() is used to order DataFrames by one or more columns. Ascending or descending order can be specified.
Multiple sorting criteria can be specified by passing a list of column names to sort_values().
Indexing:
The index is an important component of a DataFrame and can be customized.
The set_index() function allows setting a column as the index. The parameter inplace = True saves this to the existing dataframe.
The reset_index() function reverts the index to the default integer index.
Multi-indexing allows for hierarchical indexing using multiple columns.
sort_index() sorts the DataFrame based on the index.
loc and iloc are used for accessing data based on the index. loc uses the string/label of the index, iloc uses the integer position.
Grouping and Aggregating:
groupby() groups rows based on the unique values in one or more columns. This creates a GroupBy object.
Aggregate functions (e.g., mean(), count(), min(), max(), sum()) can be applied to GroupBy objects to calculate summary statistics for each group.
The agg() function allows applying multiple aggregate functions to one or more columns simultaneously using a dictionary to specify the functions for each column.
Grouping can be performed on multiple columns to create more granular groupings.
The describe() function provides a high-level overview of aggregate functions, which is a shortcut.
Merging and Joining DataFrames:
merge() combines DataFrames based on shared columns or indices. It’s analogous to SQL joins.
Different types of joins (inner, outer, left, right, cross) can be performed using the how parameter.
Suffixes can be specified to differentiate columns with the same name in the merged DataFrame.
join() is another function for combining DataFrames, but it can be more complex to use than merge().
Cross joins create a Cartesian product of rows from both DataFrames.
Data Visualization:
Pandas integrates with Matplotlib for basic plotting.
The plot() function creates various types of plots, including line plots, bar plots, scatter plots, histograms, box plots, area plots, and pie charts, based on the kind parameter.
subplots=True creates separate subplots for each column.
Titles and labels can be added to plots using the title, xlabel, and ylabel parameters.
Bar plots can be stacked using stacked=True.
scatter() plots require specifying both x and y column names.
Histogram bins can be adjusted using the bins parameter.
Figure size can be adjusted to increase the visualization’s scale.
Matplotlib styles can be used to modify the appearance of plots.
Data Cleaning:
Data cleaning involves handling missing values, inconsistencies, and formatting issues.
string.strip() removes leading and trailing characters from strings. Lstrip() removes leading characters, and Rstrip() removes trailing characters.
string.replace() replaces specific substrings within strings.
Regular expressions can be used with string.replace() for more complex pattern matching. The caret (^) can be used to return any character except.
apply() applies a function to each element of a column (often used with lambda functions).
Data types can be changed using astype().
fillna() fills missing values with a specified value.
pd.to_datetime() converts columns to datetime objects.
drop_duplicates() removes duplicate rows.
The inplace=True parameter modifies the DataFrame directly.
Columns can be split into multiple columns using string.split() with the expand=True parameter.
Boolean columns can be replaced with ‘yes’ and ‘no’ values to standardize responses.
isna() or isnull() identifies missing values.
drop() removes rows or columns based on labels or indices. The drop = True parameter drops a former index and creates an equivalent new one.
dropna() removes rows with missing values.
Exploratory Data Analysis (EDA):
EDA involves exploring the data to identify patterns, relationships, and outliers.
info() and describe() provide high-level summaries of the data.
The float format can be adjusted via pd.setor_option.
isnull().sum() counts missing values in each column.
nunique() shows the number of unique values in each column.
sort_values() sorts the data based on specific columns.
corr() calculates the correlation matrix, showing the relationships between numeric columns.
Heatmaps (using Seaborn) visualize the correlation matrix.
Grouping (groupby()) and aggregation help understand data distributions and relationships across groups.
Transposing DataFrames (transpose()) can be useful for plotting group means.
Box plots visualize the distribution of data and identify outliers.
select_dtypes() filters columns based on data type.
Target Audience:
The tutorial series is designed for individuals who want to learn data analysis and manipulation using Python and the Pandas library, regardless of their prior experience with data science.
Overall Impression:
The series appears to be a comprehensive introduction to Pandas, covering a wide range of essential topics in a practical, hands-on manner. The instructor emphasizes best practices, common pitfalls, and useful techniques for working with real-world datasets. The inclusion of practical examples and visual aids helps make the learning process more engaging and effective.
Pandas DataFrame: Common Operations and FAQs
Frequently Asked Questions About Pandas Based on Provided Sources
Here are some frequently asked questions (FAQs) about using the Python Pandas library, based on the provided text excerpts.
1. How do I import the Pandas library and what is the standard alias?
To import the Pandas library, you use the statement import pandas. It’s common practice to give it the alias pd, like this: import pandas as pd. This allows you to refer to Pandas functions and objects using the shorter pd. prefix, which is a widely accepted convention in the Pandas community.
2. How do I read different file types (CSV, text, JSON, Excel) into Pandas DataFrames?
Pandas provides specific functions for reading various file formats:
CSV: pd.read_csv(“file_path.csv”)
Text: pd.read_table(“file_path.txt”) (often requires specifying a separator, e.g., sep=”\t” for tab-separated files)
JSON: pd.read_json(“file_path.json”)
Excel: pd.read_excel(“file_path.xlsx”) (can specify a sheet name using sheet_name=”Sheet1″)
You typically assign the result of these functions to a variable (e.g., df = pd.read_csv(…)) to create a DataFrame object, making it easier to work with the data later.
3. What is a Pandas DataFrame and why is the index important?
A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. Think of it as a table with rows and columns. The index is a crucial component of a DataFrame; it provides labels for the rows. The index allows you to filter, search, and select data based on these labels. By default, Pandas creates a numerical index (0, 1, 2, …), but you can set a specific column as the index for better data access.
4. How can I handle Unicode errors when reading files?
When reading files with backslashes in the file path, you might encounter Unicode errors. To resolve this, prepend r to the file path string to treat it as a raw string. For example: pd.read_csv(r”C:\path\to\file.csv”). This ensures that backslashes are interpreted literally and not as escape characters.
5. How can I deal with files that don’t have column headers, or if I want to rename headers?
When reading files, Pandas may automatically infer column names from the first row. You can override this behavior using the header argument. header=None tells Pandas that there are no existing headers, using the first row as data. You can then specify custom column names using the names argument, passing it a list of strings representing the new column names.
6. How can I filter data within Pandas DataFrames?
You can filter rows in a DataFrame based on column values using comparison operators (>, <, ==, etc.) or functions:
Filtering by Column Value: df[df[“column_name”] > 10] returns rows where the value in “column_name” is greater than 10.
Using isin(): df[df[“country”].isin([“Bangladesh”, “Brazil”])] returns rows where the “country” column contains either “Bangladesh” or “Brazil”.
Using str.contains(): df[df[“country”].str.contains(“United”)] returns rows where the “country” column contains the string “United”.
7. How can I sort and order data within Pandas DataFrames?
Use the sort_values() method to sort a DataFrame by one or more columns. The by argument specifies the column(s) to sort by. ascending=True (default) sorts in ascending order, while ascending=False sorts in descending order. You can sort by multiple columns by providing a list to the by argument. The order of columns in this list determines the sorting priority. You can also specify different ascending/descending orders for different columns by providing a list of boolean values to the ascending argument.
8. How can I perform groupby aggregations in Pandas?
The groupby() method groups rows based on unique values in one or more columns. You can then apply aggregate functions (e.g., mean(), count(), min(), max(), sum()) to the grouped data.
df.groupby(“base_flavor”).mean() # Mean ratings grouped by base flavor
You can use the agg() method to apply multiple aggregations to different columns simultaneously. The argument to agg() is a dictionary where keys are column names and values are lists of aggregation functions:
The Pandas library in Python is a tool for data analysis, offering data structures like DataFrames and Series.
Key aspects of Pandas:
Alias When importing the Pandas library, it is common to use the alias PD.
DataFrames Pandas uses DataFrames, which are different from standard Python. When importing files using Pandas, the data is called in as a data frame. The index is an important component of a data frame, enabling filtering and searching. Assigning a DataFrame to the variable name DF is a common practice.
Series The next video in this series will explain what series are.
File Reading Pandas can read various file types such as CSV, text, JSON, and Excel. The specific function used depends on the file type (e.g., read_csv, read_table, read_json, read_excel).
File Paths File paths can be copied and pasted into the read function. To avoid Unicode errors, raw text reading may be necessary.
Arguments When reading files, arguments can be specified, such as the file path or separator.
Display Options Pandas allows you to adjust the display settings to show more rows and columns.
Data Inspection You can use .info() to get a quick breakdown of the data, .shape to see the dimensions (rows, columns), .head() and .tail() to view the first or last few rows, and column names to select specific columns.
Filtering and Ordering DataFrames can be filtered based on column values, specific values, or string content. The isin() function is available to check specific values. Data can be filtered by index using .filter(), .loc[], and .iloc[]. Data can be sorted using .sort_values() and .sort_index().
Indexing The index is customizable and allows for searching and filtering. The index can be set using set_index(). Multi-level indexing is supported.
Group By Pandas has the group by function to group together the values in a column and display them all on the same row. You can then perform aggregate functions on those groupings. The aggregate function has its own function (aggregate), where a dictionary can be passed through.
Merging, Joining, and Concatenating Pandas enables combining DataFrames through merging, joining, and concatenating.
Visualizations Pandas allows you to build visualizations such as line plots, scatter plots, bar charts, and histograms.
Cleaning Data Pandas is equipped with tools for data cleaning, including removing duplicates (drop_duplicates), dropping unnecessary columns (drop), and handling inconsistencies in data. The .fillna() function fills empty values.
Exploratory Data Analysis (EDA) Pandas is used for exploratory data analysis, which involves identifying patterns, understanding relationships, and detecting outliers in a dataset. EDA includes using .info() and .describe() to get a high-level overview of the data. Correlations between columns can be identified using .corr() and visualized with heatmaps.
Pandas DataFrames: Features, Functionalities, and Data Analysis
Pandas DataFrames are a central data structure in the Pandas library, crucial for data analysis in Python.
Key features and functionalities of DataFrames:
Definition A data frame is how Pandas calls in data, differing from standard Python.
Usual variable name Assigning a DataFrame to the variable name DF is a common practice.
Indexing The index is a customizable and important component, enabling filtering and searching. The index can be set using set_index().
Filtering and Ordering DataFrames can be filtered based on column values, specific values using isin(), or string content. Data can be filtered by index using .filter(), .loc[], and .iloc[]. Data can be sorted using .sort_values() and .sort_index().
Display Options Pandas allows adjusting display settings to show more rows and columns.
Data Inspection Tools like .info() provide a breakdown of the data. The .shape shows dimensions. Methods such as .head() and .tail() allow viewing the first or last few rows.
Merging, Joining, and Concatenating Pandas enables combining DataFrames through merging, joining, and concatenating.
Cleaning Data Pandas is equipped with tools for data cleaning, including removing duplicates (drop_duplicates), dropping unnecessary columns (drop), and handling inconsistencies in data. The .fillna() function fills empty values.
Exploratory Data Analysis Pandas is used for exploratory data analysis, including using .info() and .describe() to get a high-level overview of the data. Correlations between columns can be identified using .corr() and visualized with heatmaps.
File Reading When reading files using Pandas, the data is called in as a data frame.
Pandas: Data Import Guide
Pandas can import data from a variety of file types. When the files are imported using Pandas, the data is read in as a data frame. The specific function used depends on the file type.
Types of files that Pandas can read:
CSV
Text
JSON
Excel
Functions for reading different file types:
read_csv
read_table
read_json
read_excel
Key considerations when importing files:
File Paths The file path needs to be specified, and can be copied and pasted into the read function.
Raw Text Reading Raw text reading may be necessary to avoid Unicode errors. To specify raw text reading, use r before the file path.
Arguments When reading files, arguments can be specified, such as the file path or separator.
Alias When importing the Pandas library, it is common to use the alias PD.
Headers The header argument can be used to rename headers or specify that there is no header in the CSV. The default behavior is to infer column names from the first row. You can set header=None if there are no column names, which will cause numerical indexes to be created.
Separator When reading in a file, you can specify the separator. When pulling in a CSV, it will automatically assume that the separator is a comma. When importing text files, you may need to specify the separator.
Missing Data When merging data, if a value doesn’t have a match, it will return NaN.
Sheet names When importing Excel files, you can specify a sheet name to read in a specific sheet, otherwise it will default to the first sheet in the file.
Filtering Pandas DataFrames
Pandas DataFrames can be filtered in a variety of ways.
Filtering Based on Column Values
You can filter DataFrames based on the data within their columns. To do this, specify the column to filter on. Comparison operators, such as greater than or less than, can be used.
Specific values can be specified.
Filtering Based on Index
You can also filter based off of the index.
The main ways to filter by index are the .filter() function and the .loc[] and .iloc[] indexers.
The .filter() Function
With .filter() you can specify which columns to keep by using items = and then listing the columns.
By default, .filter() chooses the axis for you, but you can also specify the axis.
You can also use like = to specify a string, and it will filter by the indexed values that contain that string.
The .loc[] and .iloc[] Indexers
.loc[] looks at the actual name or value.
.iloc[] looks at the integer location.
With multi-indexing, .loc[] is able to specify the index, whereas .iloc[] goes based off the initial index, or the integer based index.
Pandas DataFrame Sorting: Values and Index
Pandas DataFrames can be ordered using the .sort_values() and .sort_index() functions.
Sorting by Values (.sort_values())
The .sort_values() function allows you to sort a DataFrame based on the values in one or more columns.
Specify the column(s) to sort by using the by parameter.
Determine the sorting order using the ascending parameter, which can be set to True (ascending) or False (descending). The default is ascending.
Multiple columns can be specified for sorting by passing a list of column names to the by parameter. The order of importance in sorting is determined by the order of columns in the list.
You can specify different ascending/descending orders for each column when sorting by multiple columns by passing a list of boolean values to the ascending parameter.
Example: To sort a DataFrame by the ‘Rank’ column in ascending order: df.sort_values(by=’Rank’, ascending=True).
Sorting by Index (.sort_index())
The .sort_index() function sorts the DataFrame based on its index.
You can specify the axis to sort on and whether the order is ascending or not.
Learn Pandas in Under 3 Hours | Filtering, Joins, Indexing, Data Cleaning, Visualizations
The Original Text
what’s going on everybody welcome back to another video today we are going to be learning pandas in under 3 [Music] hours so in this lesson we’re going to cover a ton of things as well as some projects at the very end you’re going to learn how you can read data into pandas and actually store it in a data frame we’ll be filtering quering grouping and a ton of other things just on that data and then we’ll be diving into Data visualization data cleaning exploratory data analysis and a ton more so without further Ado letun them on my screen and get started so the first thing that we need to do is import our pandas Library so we’re going to say import and we’re going to say pandas now this will import the Panda’s library but it’s pretty common place to give it an alias and as a standard when using pandas people will say as PD so this is just a quick Alias that you can use uh that’s what I always use and I’ve always used it because that’s how I learned it and I want to teach it to you the right way so that’s how we’re going to do it in this video so let’s hit shift enter now that that is imported we can start reading in our files now right down here I’m going to open up my file explorer and we have several different types of files in here we have CSV files text files Json files and an Excel worksheet which is a little bit different than a CSV so we’re going to import all of those I’m going to show you how to import it as well as some of the different things that you need to be aware of when you’re importing so we’re going to import some of those different file types and I’ll show you how to do that within pandas so the first thing that we need to say is PD Dot and let’s read it in a CSV because that’s a pretty common one we’ll say read CSV and this is literally all you have to write in order to call that in now it’s not going to call it in as a string like it would in one of our previous videos if you’re just using the regular operating system of python when you’re using pandas it calls it in as a data frame and I’ll talk about some of the nuances of that so let’s go down to our file explorer we have this of the world CSV you just need to click on it and right click and copy as path and that’s literally going to copy that file path for us so you don’t have to type it out manually you can if you’d like and we’re just going to paste it in between these parentheses now if we run it right now it will not work I’ll do that for you it’s saying we have this Unicode error uh basically what’s happening is is it’s reading in these backslashes and this colon and all those backslashes in there and this period at the end what we need to do is read this in as a raw text so we’re just going to say R and now it’s going to read this as a literal string or a literal value and not as you know with all these backslashes which does make a big difference when we run this it’s going to populate our very first data frame so let’s go ahead and run it and now we have this CSV in here with our country and our region now if we go and pull up this file and let’s do that really quickly let’s bring up this countries of the world it automatically populated those headers for us in the data frame but we don’t have any column for those 0 1 2 3 so if we go back as you can see right here there’s this index and that’s really important in a data frame it’s really what makes a data frame a data frame and we use index a lot in pandas we’re able to filter on the index search on the index and a lot of other things which I’ll show you in future videos but this is basically how you read in a file now if we go right up here in between these parentheses and we hit shift tab this is going to come up for us let’s hit this plus button and what this is is these are all the arguments or all the things that we can specify when we’re reading in a file and there are a lot of different options so let’s go ahead and take a look really quickly really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it’s going to teach you just about everything you need to know about pandas so huge shout out to yud me for sponsoring this Panda series and let’s get back to the video the first thing is obviously the file path we can specify a separator which there is no default so when we’re pulling in the CSV when we’re reading in the CSV it’s automatically going to assume it’s a comma because it’s a comma separated uh file you can choose delimers headers names index columns and a lot of other things as you can see right here now I will say that I don’t use almost any of these uh the few that I’m going to show you really quickly in just a second are up the very top but you can do a ton of different things and I’m just going to slowly go through them so that’s what those are you can also go down here this is our doc string and you can see exactly how these parameters work it’ll show you and give you a text and walk you through how to do this again most of these you’ll probably never use but things like a separator could actually be useful and things like a header could be useful because it is possible that you want to either rename your headers or you don’t have a header in your CSV and you don’t want it to autop populate that header so that is something that you can specify so for example this header one and I’ll show you how to do this uh the default behaviors to infer that there are column names if no names are passed this behavior is identical to header equals zero so it’s saying that first row or that first index which it’s like right here that zero is going to be read in as a header but we can come right over here and we’ll do comma header is equal to and we can say none and as you can see there are no headers now instead it’s another index so we have indexes on both the x- axis and the Y AIS and so right now we have this zero and one index indicating the First Column and the second column if we want to specify those names we can say the header equals none then we can say names is equal to and we’ll give it a list and so the first one was country and what’s that second one oh region so right here that’s the first um the first row but we’ll rename it and we’ll just say country and region and when we run that we’ve now populated the country and the region uh we’re just pretending that our CSV does not have these values in it and we have to name it ourselves that’s how you do it but let’s get rid of all that because we actually do want those in there so we’re just going to get rid of those and read it in as normal and there we go now typically when you’re reading in a file what you need to do is you want to assign that to a variable almost always when you see any tutorial or anybody online or even when you’re actually working people will say DF is equal to DF stands for data frame again this is a data frame in the next video in the series I’m going to walk through what a series is as well as what a data frame is because that’s pretty important to know when you’re working with these data frames but we’ll assign it to this value and then we’ll say we’ll call it by saying DF and we’ll run it and that’s typically how you’ll do things because you want to save this data frame so later on you can do things like dataframe Dot and you can uh you know pass in different modules but you can’t really do that it’s not as easy to do it if you’re calling this entire CSV and importing it every time so let’s copy this because now we’re going to import a different type of file so now we’ve been doing read CSV but we can also import text files now you can do that with the read CSV we can import text files let’s look at this one we have the same one it’s countries of the world except now it’s a text file because I just converted it for this video I’ll copy that as a path and so now when we do this oops let me get those quotes in there it’ll say world. txt it will still work as you can see this did not import properly um we have this country back SLT region and then all of our values are the exact same with this back SLT that’s because we need to use a separator and I’ll show you in just a little bit how we can do this in a different way but with that read CSV this is how we can do it we’ll just say sep is equal to we need to do back SLT now let’s try running this and as you can see it now has it broken out into country and region we could also do it the more proper way and this is the way you should do it and I’ll get rid of these really quickly but just want to keep them there in case you want to see that but you can also do read table and let’s get rid of this separator and now we have no separators just reading it in as a table let’s run this and it reads it in properly the first time this read table can be used for tons of different data types but typically I’ve been using it for like text files um we can also read in that CSV so let’s change this right here to CSV we can read it in as a CSV but just like we did in the last one when we read in the text file using read CSV this read table to you’re going to need to specify the separator so I’ll just copy this and we’ll say comma and now it reads it in properly again you can use that for a ton of different file types but you just need to specify a few more things if you don’t want to use the more specific read uncore function when you’re using pandas now let’s copy this again we’re going to go right down here and now let’s do Json files Json files usually hold semi-structured data um which is definitely different than and very structured data like a CSV where has columns and rows so let’s go to our file explorer we have this Json sample we will copy this in as path let’s paste it right here and we’ll do reor Json again these different functions were built out specifically for these file types that’s why you know each one has a different name so now we’re reading this in as the Json let’s read it in and it it in properly now let’s go ahead and copy this and take a look at Excel files cuz Excel files are a little bit different than other ones that we’ve looked at um so let’s just do read uncore cell and let’s go down to our file explorer and let’s actually open up this workbook as you can see we have sheet one right here but we also have this world population which has a lot more data let’s say we just wanted to read in sheet one one we can do that or by default it’s going to read in this world population because it’s the first sheet in the Excel file well let’s go ahead and take a look at that let’s get out of here and let’s say oops I forgot to copy the file path let’s go ahead and copy as path and we’ll put it right here and let’s just read it in with no arguments or anything in there or no parameters when we read it in it’s reading in that very first sheet so this is the one that has all of the data now let’s say we wanted to read in that extra sheet name or the second sheet name we’ll just go comma sheet unor name say is equal to and then we can specify sheet was it sheet one like this yes it was so we just had to specify the sheet name right here and then it brought in that sheet instead of the default which is the very first sheet in that Excel now that definitely covers a lot of how you read in those files again you can come in here and hit shift Tab and this plus sign and take a look at all the documentation and you can specify a lot of different things things that I didn’t think were very important for you guys to know especially if you’re just starting out the ones that we looked at today are what I would say are like the ones that I use almost all the time so I wanted to show you those but if you’re interested in any of these other ones or you have very unique data and you need to do that um you know it’s worth really getting in here and figuring things out a few other things that I wanted to show you just in this kind of first video or this intro video on how to read in files um one thing that you may have noticed especially in this file right here is we’re only looking at the first five and then the last five so if we wanted to see all the data all the data is in these like little three dots right here right we want to be able to see that data but right now we can’t and that’s because of some settings that are already within pandas and all we need to do is change that so this one has 234 rows and four columns so obviously we can see all the columns well let’s just change the rows all we’ll say is pd. setor option now what we need to do is we’re going to change the rows we’re not going to change the columns at least not on this one so we’ll say quote display. max. rows now if we just run this for whatever data we bring in it’s going to be able to show the max rows and then we’ll say 235 although this 34 rows I’m just going to be safe let’s run this and now it has changed it so let’s read in this file again and you’ll see how it’s changed now we have all the numbers and we have this little bar on the right that allows us to go down all the way to the bottom and all the way to the top so now we can actually look and kind of skim and see our values I like that better than just having that you know shorter version um we can do the exact same thing on columns as well so if we look at this one this is our Json file has the same thing right here we have what was it 38 columns but we can only see I think it’s it’s 20 or something like that I can’t remember um but we have 38 we can only see like let’s say 15 of them or 20 of them we’ll do the exact same thing and we’ll just say pd. set options. max. columns and we’ll set that to 40 for that one when we run this oops let’s get over here when we run this one again we can now scroll over and see every single one of our columns now that one is a in my opinion a lot more useful I like being able to see every single column so definitely something that you should be using especially when you have these really large files you want to be able to see a lot of the data and a lot of the columns so when you’re slicing and dicing and doing all the things we’re about to learn in this Panda series you know you know what you’re looking at I also want to show you just how to kind of look at your data in these data frames as well so that’s also pretty important so let’s go right down here and the very last one that we imported was this one right here this read Excel so this data frame is the only one that’s going to read in let’s run it um this is the last one to be run so this variable right here DF uh it won’t be applied to all these other ones um which we can always go back and change those typically you’ll do something like data frame two you want to do something like that um so let’s keep data Frame 2 oops so what we’re going to do is we’re going to bring data Frame 2 right down here and we want to take a look at some of this data we want to know a little bit more about it something that you can do is data frame 2. info and we’ll do an open parenthesis and when we run this it’s going to give us a really quick breakdown of a little bit of our data so we have our columns right here rank CCA 3 country and capital it’s saying we have 234 values in those columns because there’s 234 scroll up here because there’s 234 uh rows that tells me that there’s no missing data in here at least not you know completely missing like null values there is something in each of those rows the count tells me it’s non null so there’s no null values and it tells me the data type so it’s ringing in as an integer an object an object and an object and it also tells us how much memory it’s using which is also pretty neat because when you get really really large data types memory usage and and knowing how to work around that stuff does become more important than when you’re working at these really small You Know sample sizes that we’re looking at we can also do oops let me get rid of that can also do data frame two and we’ll do shape and for this one we do not need the parentheses and all this is going to tell us is we have 234 rows and four columns we’re also able to look at uh the first few values or rows in each of these data frames so we can just say dataframe 2. head and if we do that it’s going to give us the first five values but we can specify how many we want we can say head 10 it’ll give us the first 10 rows right here we can do the exact same thing and let’s go right down here and we’ll say tail so they’ll give us the last 10 rows within our data frame now let’s copy this and let’s say we don’t want to actually look at all of these values or all these columns we can specify that by saying df2 and oops let’s get rid of all of this and we’ll say with a quote we’ll say Rank and now we can take just a look at the rank data now we can’t do that by doing the index or at least not like this if we want to use this index that is right here we can but there’s a very special function called L and I look for that and I’m going to have an entire video on this because it does get a little bit more complex but there’s df2 looc and there’s Lo and IO stands for location and I location that’s only for the indexes whether it’s the x axis or the Y AIS those are the indexes and for location it’s looking for the actual text the actual string of the index so if we come up here that data Frame 2 we can specify 224 and it’ll give us this information right here in a little different format so let’s go bracket and we’ll say 224 and when we run this it gives us our rank CCA country capital with our values over here kind of like a dictionary almost now let’s copy this and we’ll say df2 do IO and right now these look the exact same but we haven’t really talked a lot about changing the index and you can change the index to a string or a different column or something like that and we’ll look at that in future videos the iock looks at the integer location so even if these um let’s go right up here even if this index had changed to let’s say this rank or this CCA three or country or whatever you make this index the ILO will still look at the integer location so that 224 would still be 224 even if it was usbekistan so then when we look at this it’s going to be the exact same but if we had changed that Index this Lo is the one that we could search on and we could search whoan is that you spell us beckan hey I nailed it so that is how you use Lo IO again I just wanted to show you a little bit about how you can look at your data frame or search within your data frame hello everybody today we’re going to be looking at filtering and ordering data frames in pandas there are a lot of different ways you can filter and order your data in pandas and I’m going to try to show you all of the main ways that you can do that so let’s kick it off by importing our data set so we’re going to say data frame is equal to and we’ll say pandas and I need to import my andas so we’ll say import andas as PD that’s pretty important I think um so pd. read CSV and we’ll do R and then we’ll say the world population CSV so let’s run this all our data frame right here and this is the data frame that we’re going to be filtering through and ordering in pandas so let’s kick it off the first thing that we can do is filter based off of The Columns so the data within our columns so Asia Europe Africa or whatever data we may have in that column so let’s go right down here we’re going to say DF and then within it we’re going to specify what column we’re going to be filtering on so we’re going to say DF with another bracket and we’ll say rank so we’re going to be looking at this rank column right here and then we’ll say in that rank column we want to do greater than 10 and that’s actually going to be a lot of them let’s do less than so when we run this it’s only going to return these values that are less than 10 we can also do less than or equal to you know all of these um comparison operators so less than or equal to so now we have all of the ranks 1 through 10 now if we look at these countries we can specify by specific values almost exactly like we did here but instead of doing a comparison operator like we did right here and including those names let’s say Bangladesh and Brazil we can use the is in function almost like an in function in SQL if you know SQL so let’s go right down here and we’re going to say specific underscore countries so right now we’re just going to make a list of the countries that we want and then we’ll say Bangladesh and Brazil so let’s go right down here and we’ll say okay for these specific countries from the data frame let’s do our bracket we’ll say in this country column so we’ll do data frame and then another bracket for Country so in this country column we can do do is in and then an open parenthesis and then look for our specific countries so we’re looking at just this column and we’re saying is in so we’re looking at are these values within this column and we’re getting this error and this looks very very odd let me um this doesn’t look right there we go I just had some syntax errors I apologize made it way more complicated than it needs to be but here’s how you use this is in function so we’re looking at Bangladesh and Brazil and we return those rows with Bangladesh and Brazil really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to Panda courses if you want to master pandas this is the course that I would recommend it’s going to teach you just about everything you need to know about pandas so huge shout out to you and me for sponsoring this Panda series and let’s get back to the video we can also do a contains function kind of similar to is in in except it’s more like the like in SQL as well I’m comparing a lot of this to SQL cuz when you’re filtering things I always my brain always goes to SQL but in pandas it’s called the contains so let’s do let’s actually copy this because I don’t want to make the same mistake again let’s do that and we’ll do the bracket but instead of dot is in we’re going to do string do contains and then an open parenthesis so we’re going to be looking for a string if it contain if it contains let’s do United almost like United States or or any other United so let’s run this and as you can see we have United Arab Emirates United Kingdom United States United States Virgin Islands so we can kind of search for a specific string or a number or a value within our data or within that column of country now so far we’ve only been looking at how you can filter on these columns we can also filter based off of the index as well and there’s two different ways you can do it or two of the main ways there’s filter and then there’s Lo and IO Lo stands for location and I look stands for integer location and if you’ve seen other previous videos I’ve kind of mentioned those so we can take a quick look at all of those so really quickly we need to set an index because the index right now is uh not the best we’ll set our index to Country so let’s say df2 is equal to D DF do setor index and we’ll say country I’m just doing df2 because later on I want to use that data frame again so I’m just going to assign it to another data frame so that we can just easily switch back and forth so now we have this index as the country and what we can do is use the filter function so let’s go down here we’ll say df2 filter and we’ll do an open parenthesis and now we can specify our items so these these are actually going to be specifying which columns we want to keep so we’re going to say items is equal to then we’ll make a list we’ll say continent hope that’s how we spell continent I’m always messing up with my uh my stuff here my spelling then we’ll do CCA 3 because why not you can specify whichever ones you want when we run this it’s going to only bring in those two columns Now by default it’s choosing the axis for us but we can also specify which axis we want to search on so if we say axis is equal to zero it’s actually going to search this axis this is the zero axis this is the one axis so where our columns are is one so if we go back and do one we’re searching on that one Axis or those header accesses again and this is the default but you can specify that so if you just want to search on uh you know filtering right here you can do that and let’s actually copy this and do that right down here just so you can see what it looks like but let’s let’s search for Zimbabwe and we’ll do Zimbabwe and we’ll be looking at the zero axis which is the up and down on the left hand side and when we filter on that we can filter by Zimbabwe by looking just at the country index we can also use the like just like we did before and I’ll show you the exact same demonstration that we did which you can say like is equal to and instead of having to put in a concrete um text you can just say United just like we did before and we’re searching where the AIS is equal to zero which again is this left-handed access so now we’re looking for United and it’s going to give us all of the countries or all the indexed values that have United in it like we were talking about before we also have Lo and ILO so we can say data frame 2. Loke now this is a specific value so we’ll do United States so location is just looking at the actual name or the value of it not its position so if we search for United States it’s going to give us this right here where it gives us all of the columns for United States and then all of the uh values for United States or we can do the io which is the energer location which is not the exact same because we’re looking at the string for the L we’re looking at this string but underneath it there still is a position that’s that integer location let’s do a completely Rand random one let’s just say three if we look at the third position it’s going to give us ASM which I’m not exactly sure what it is but it still gives us basically the same kind of output which is the columns and the values so that’s another way that you can search within your index when you’re actually trying to filter down that data now let’s go look at the order bu and let’s start with the very first one that we looked at let’s do data frame that’s why I kept it because I wanted to use it later now we can sort and order these values instead of it just being kind of a jumbled mess in here we can sort these columns however we would like ascending descending multiple columns single columns and let’s look at how to do that so we’ll say data frame and then we’ll do data frame look at rank again just like we were doing above and let’s do data frame where it’s less than 10 I should have just gone and copied this I apologize so now we have this data frame that is greater than 10 now we can do dot sort underscore values and this is the function that’s going to allow us to sort everything that we want to sort so we can do buy is equal to and we’ll just order it by the exact same thing that we were doing uh or calling it on we’ll do rank so now what this going to do it’s going to order our rank column and as you can see it did that 1 2 3 4 5 we can also do it with ascending or descending so if you want to you can look in here and see what you can do so we’ll do ascending we’ll say that’s equal to true and so that’s the automatic default so that didn’t change anything but if we say false it’s going to be descending from highest to lowest so now we have it in the opposite direction now we don’t have to just order or sort this on one single column we can do multiple columns and we can do that by making a list right here whoops make a list just like that and we’ll input different ones as well so now let’s input our country and when we run this it will give us rank of 9876 as well as the country of Russia Bangladesh Brazil now if you noticed the country really didn’t change because the rank stayed the exact same that’s because there’s an order of importance here and it starts with the very first one if we change this around and we look at this one and put a com right here now the country is going to to be descended and the rank would come second so it’s not going the rank isn’t going to really have any effect here so now we have the country United States Russia Pakistan and the rank really didn’t get ordered at all now if we want to see how that can actually work let’s do continent right here and actually put it right here and do country here so if we run this it’s first going to come and it’s going to organize or sort the continent then it’s going to come back and go to the country and then it’s going to sort the country so keep so keep your eye right here in this Asia area because we’re going to sort this differently than ascending so we have ascending false and that applies to both of these it’s false and false but we can specify which one we want to do we can do a false here and a true here so we’ll do false comma true and what this is going to do is it’s going to say false for the continent so the continent right here is going to stay the exact same and so that is a lot of how you can filter and order your data within pandas hello everybody today we’re going to be looking at indexing in pandas if you remember from previous videos the index is an object that stores the access labels for all pandas objects the index in a data frame is extremely useful because it’s customizable and you can also search and filter based off of that index in this video we’re going to talk all about indexing how you can change the index and customize that as well as how you can search and filter on that index and then we’re also going to be looking at something a little bit more advanced called multi indexing and you won’t always use it but it’s really good to know in case you come across a data frame that has that in it so let’s get started by importing pandas import pandas as PD now we’ll get our first data frame we’ll say DF is equal to pd. read CSV and I’ve already copied this but we’re going to do R and we’re going to put this file path so I have this world population CSV I will have that in the description just like I do in all of my other videos let’s run DF and let’s take a look at this data frame so we have a lot of information here we have rank country continent population as well as the default index from zero all the way up to 233 now if you haven’t watched any of my previous videos on pandas the index is pretty important and it’s basically just a number or a label for each row it doesn’t even necessarily have to be a unique number um you can create or add an index yourself if you want to and it doesn’t have to be unique but it it really should be unique uh especially if you want to use it appropriately for what we’re doing the country is actually going to be a pretty great index because the country you know is going to be all unique because we’re looking at every single row as a different um country as well as the population so let’s go ahead and create this country or add this country as our index now we can do this in a lot of different ways but the first way that you can do this if you already know what you are going to create that index on is we can just go right in here when we’re reading in this file and we’ll say comma index underscore oops I spelled that completely wrong index underscore column and we’ll say that is equal to and then we’re going to say quote country so we’re taking this country and we’re going to assign it as the index now let’s read this in and as you can see this is our index now it looks a little bit different we didn’t have this country header right here which is specifying that this is still the country but you can you can tell that this is the index based off the um bold letters as well as it being on the far left and all the regular columns for the data is over here while the country header is right here and it’s lower than all the others just a quick way that you can see that that is the index now before we move on I want to show you some other ways that you can do this as well but I’m going to show you how to reverse this index before we move on and we’ll say data frame so we had our data frame right here so we have data frame dot we’ll say reset unor index and then we’ll say in place is equal to True which means we don’t have to assign this to another variable and all that stuff it’ll just be true so now when we run that data frame again the index was reset to the default numbers so now let’s go down here and I’ll show you how to do this in a different way you can do DF do we’ll say setor index and then we’ll just say country so very similar to when we were reading in that file and we said set the index or that index column we said index column equals country if we do this and we run it in it works but if we say data frame right down here it’s not going to save that if we want to save it just like we did above we’re going to say in place is equal to true that is going to save it to where we don’t have to assign it another variable so now when we run this the data frame right here which is going to populate this the data frame is going to say in place is equal to true so that country will now be our index again let’s run this and there we go really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it’s going to teach you just about everything you need to know about pandas so huge shout out to you me for sponsoring this Panda series and let’s get back to the video now what’s really great about this index is we’re able to search based off just this index and so we can filter on it and and basically look through our data with it and there are two different ways that you can do that at least this is a very common way that people who use pandas will do to kind of search through that index the first one is called lock and there’s lock and iock that stands for location or integer location let’s look at lock first let’s say DF do loock and then we’ll do a bracket now we’re able to specify the actual string the label so let’s go right up here and let’s say Albania so we’ll say Albania so again this is just looking at the location let’s run this now it’s going to bring up all the Albania data just like here where it’s kind of looks like a colum in a column and we can get this exact same data but using iock right here and when we ran lock we were searching based off Albania which is in the 0 one position so if we actually pull the one position for that integer the ilock we can look at the one position and this should give us the exact same data now let’s take a look at multi- indexing and we’ll come back to a little bit of this in a second so multi- indexing is creating multiple indexes we’re not just going to create the country as the index now we’re going to add an additional index on top of that so let’s pull up our data frame right now we have the country but let’s do dot reset index and we’ll say in place equals true oops oops let’s run it so now we have our data frame now let’s set our index but this time when we set our index we’re going to add the country as the index as well as the continent as an index so we’ll say data frame. setor index then we’ll do a parenthesis and instead of just doing country like we did before we’re going to create a list oops and we’ll do it like that and then we’ll say oops continent and separate by a comma so we have continent and country let’s just say in place is equal to true now when we run this we’re going to have two indexes let’s see what this looks like and let’s run this so now we have country as well as continent as our index now you may notice that these indexes are repeating themselves on this continent index we have Europe right here and Europe right here as well as Asia and Asia and it looks a little bit funky but we are able to sort these values and make they look a lot better so let’s go ahead and try this we’ll do DF do sortore index and when we run this it should sort our index alphabetically and we can also look in here and see what kind of things we can you know specify we can specify the axis but it’s automatically going to be looking at the zero this is zero and this is one so we have two axes within our data frame you choose the level whether it’s ascending or not ascending in place kind string sort remaining all of these different things the only one that I really you know think is worth looking at is the ascending we already know some of these other ones but if we look at ascending let’s run it now it’s sorted these and so now it’s kind of grouped together so we have Africa and all the African ones as well as South America and all the South American ones let’s really quickly say pd. setor option and we’ll say display. max. columns and just like this let’s run it and I need to specify whoops specify right here let’s see how many rows we have 235 so let’s do 235 let’s run this and now when we run this you can see that Africa is all grouped together and all the countries are in alphabetical order under it and then we go all the way down to Asia and again just all in alphabetical order if we wanted to we could say ascending equals true and then when we run this Oh meant to say false and then when we run this it’s the exact opposite so it starts with South America the last one and then goes in reverse alphabetical order we could also say false make it a list and do comma true and just like this and then it would sort this First Column as false and this next column as true so you can really customize it but you know for what we’re doing we don’t need any of that we just need to be able to see this right here so now when we try to search by our index like we did before we did data frame. Loke now when we did that and we said you know let’s say Angola when we specified Angola it’s not going to work properly because it’s searching in this first index for the first string that we have we can search Africa and let’s search for Africa and now we have all of the African countries and if we want to specify to Angola we can also go down another level oops by doing angle Angola and now we have what we were looking at before where we’re calling all the data within those but we couldn’t do it just based off Africa because we had an additional Index right here here so once we called both indexes now we get this view but let’s look at that I look really quick when we run this let’s just say one because right up here oh we have Angola zero and then one so you think it may pull up Angola let’s go ahead and run this and it’s still pulling up Albania let’s go right up here if you remember when we didn’t have the multiple indexes it was pulling up Albania the difference when you’re doing these multi- indexes is that the L is able to specify this whereas this one does not go based off that multi- indexing it’s going to go based off the initial index or the integer based index so that’s a lot about indexing in pandas we’ll cover even a few more things in future videos as we get more and more into pandas but this is a lot of what indexing looks like within pandas and again super important to learn how to do and know how to do because it’s a pretty important building block as we go through this Panda series hello everybody today we’re going to be taking look at the group by function and aggregating within pandas Group by is going to group together the values in a column and display them all on the same row and this allows you to perform aggregate functions on those groupings so let’s start reading in our data and take a look so we’re going to do import pandas as PD and then we’re going to say our data frame is equal equal to and we’ll say pd. read CSV we’ll do an open parenthesis R and our file path and we’re going to be looking at the flavors CSV right here so right here we have our flavor of ice cream we have our base flavor whether it was vanilla or chocolate whether I liked it or not the flavor rating texture rating and its overall or its total rating now these are all my own personal scores so you know I’ve spent years researching this so these are all very accurate but this should be a low stress environment to learn Group by and the aggregate functions so the first thing that we can do is look at our group by now you can’t Group by well you can you can Group by flavor but as you can see these are all unique values what we need is something that has duplicate values or or similar values on different rows that’ll group together so this base flavor is actually a perfect one to group it on and we’ll do that by saying DF do group by do an open parenthesis and we’ll just specify base flavor and this will then group together those values and I need to make sure I can spell properly this will group those flavors together so let’s run this and as you can see it actually is its own object so it has a group by data frame Group by object so now that we’ve grouped them let’s give it a variable so we’ll say group underscore byor frame let’s say that’s equal to Let’s copy this we’ll run it and now what we need to do is run our aggregations in order to get an output so we’re going to say mean and that’s all we’re going to put just for now just to get an output that we can take a look off and then we’ll build from there so let’s go ahead and run this and right here we have our base flavor which is now saying is the index of chocolate or vanilla and then it’s taking the mean or the average of all all the columns that have integers notice that it did not take the liked column and it did not take the flavor column because those are strings and they cannot aggregate those and we’ll take a look at that later but it took all the values that have integers and then it gave us the average of those ratings really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it’s going to teach you just about every everything you need to know about pandas so huge shout out to UD me for sponsoring this Panda series and let’s get back to the video so right off the bat as averages with chocolate I have a much higher rating overall than the ones with vanilla bases now we can actually combine all of this together into one line and we can do something like this so we’ll say DF do groupby and we’ll say mean just like this and this will actually run it before we didn’t have any aggregating function on there so didn’t run but now that we combine it all into one it will run properly now there are a lot of different aggregate functions but I’m going to show you some of the most popular ones or the most common ones that you will see so let’s copy this right here so we can do dot count and when we run this we can look at the count and this will show us the actual count of the rows that were aggregated so for chocolate we had three so there going to be three all the way across and for vanilla we had six so we’re looking at a higher count of vanilla which if you’re comparing it to this mean up here that could be a big skew towards the chocolate because if you have one or two good chocolates it could really pull the numbers up whereas if you had two good vanillas but all the other ones were bad it pulls that average down so knowing the count of something is really good let’s take a look at the next one and we can do Min and Max and I’ll just run these really quickly we can do Min and when we run this the first thing that you should notice is that it now has a flavor and a liked column and that’s because Min and Max will actually look at the first letter in the string or the first set of letters if there are um you know chocolate something it’ll look at the first and then it’ll actually populate it so chocolate with the CH chocolate is the very first or the minimum value for that string and for a cake batter that is the minimum value in vanilla as well now with the liked it’s interesting because apparently I liked all the chocolate ones I’m going to go take a look so chocolate I liked chocolate I liked chocolate I like so there is no no option in this liked column so yes was the only option and now let’s look at Max whoops and it should do the exact opposite which is going to take the highest value even if it’s a string so Rocky Road the letter R comes later in the alphabet so that’s what it’s looking at and so does vanilla and then we have yes as well and then of course right here it’s taking the max value so before when we were looking at Min I just focused on those but it still does the exact same thing to these integer um columns as well so for the max value for vanilla it was mint chocolate chip that was our base so I had a rating of 10 for this vanilla row or grouping and then we can also look at the sum and there are all the sums for these and again it only does integer because we can’t add the strings here are the sum or the total values for all of them and for the total values since we had you know six rows that were grouping into this vanilla we now have a lot or much higher score for vanilla now that’s a really simple way to do your aggregations but there is actually an aggregation function and let’s take a look at this because this is um a little bit more complex although when I write it out or show you hopefully it makes a lot of sense we can do a so this is our aggregate function and what we need to pass into our aggregate function is actually a dictionary so let’s do an open parenthesis and we’re going to do a squiggly bracket and then we need to specify what we’re going to be aggregating on or what column so let’s do this flavor rating let’s copy this we’ll do flavor rating and I need to put that as a string and then we’ll do a colon and now we can specify what aggregate functions we want so we’ve done sum count mean Min and Max all of those and we can actually put all of those into here and perform all of those aggregations on just one column so let’s make a list and then let’s say mean Max count and uh what’s another one sum so let’s do all four of those only on this flavor rating column and when we run this we have our base flavor right here chocolate and vanilla but now we don’t have multiple columns we have one column with multiple Columns of our aggregations and it is possible to pass in multiple Colum like that so we’ll do texture rating and we’ll just come right here and do a comma then we’ll say uh uh texture rating and then a colon I don’t know why I spelled it out when I copied it but I did and then we’ll do the exact same ones and now when we run it we’re getting the exact same columns mean Max count and sum for flavor rating then mean Max count and sum for our texture rating now so far we’ve only grouped grouped on one column but we can actually group on multiple columns let’s go back up here to our data and I should have just copy this down here let’s go back down and just look at this so really we only grouped it on this base flavor but you can do multiple groupings or group by multiple columns so let’s do our base flavor which we did already as well as the liked column so we’re going to say DF do group by then we’ll do an open parentheses and then instead of just passing through one string we’re going to do a list and we’ll say base flavor oops comma and then we’ll do liked so now when it groups this it should put two groupings and let’s run this and just see oops I got to say let’s just do mean so now we have our chocolate and a vanilla and remember chocolate only had yes so that’s the only one that it’s going to group on but vanilla had a no and a yes so if we look at the vanilla we have our base flavor vanilla and then within liked we have no and a yes which can show us that within our vanilla when we group on these our NOS were really low but our yeses were really high we actually had a pretty similar rating or very close to the same rating as the ones we really liked in chocolate and just like we did above we can take this doag and I’m going to copy this and it’ll perform it on each of those rows let me close that and what did I do wrong oh I need the squiggly bracket and it’ll show us each of those so the mean Max count and sum for all of the chocolate and vanilla as well as the groupings of light yes and no now after we’ve looked at all that and that’s how I usually do it there is one uh shortcut function that can give you some of these things just really quickly and so let’s go back up here and take this it’s just called describe um and if you’ve ever done it it’s just going to give you some highlevel overview of some of those different aggregations so let’s run this and it’s going to give us our chocolate and vanilla and within each column it’s going to give us our count our mean our standard deviation I believe is what that is our minimum 25% 50 75 and 100 which is our Max then our count and our mean so a lot of those aggregate functions but the describe is you know a very generalized um function we can’t get as specific as we were with the previous ones that we were looking at but I just wanted to throw this out there in case this is something that you’d be interested in because it you know technically is showing a lot of those aggregate functions just you know all at one time hello everybody today we’re going to be talking about merging joining and concatenating data frames in pandas this whole video is basically around being able to combine two separate data frames together into one data frame these are really important to understand when we’re actually using the merge and the join right here we have what’s called an inner join and the Shaded part is what’s going to be returned it’s only the things that are in both the left and the right data frames then we have an outer join or a full outer join and this will take all the data from the left data frame and the right data frame and everything that is similar so basically it just takes everything we also have a left join which is going to take everything from the left and then if there’s anything that’s similar it’ll also include that and then the exact opposite of that is the right join which is going to give us everything from the right data frame and it’s going to give us everything that is similar but it’s not going to give us anything that is just unique to the left data frame so this is just for reference because in a little bit when we start merging these these become very important so I just wanted to kind of show you how that works visually so let’s get started by pulling in our files so first we’re going to say import and is aspd we’ll run this and then we’ll say data frame one and we’ll also have a data frame two and these are the different data frames the left and the right data frame that we’ll be using to join merge and concatenate so we’ll say data frame one is equal to pd. CSV read and we’ll do R and here is our file path so we have this lr. CSV that’s our Lord of the Rings CSV and let’s call that really quickly so we can see what’s in there and I’m having a dyslexic moment uh because it’s supposed to be read CSV uh I apologize for that but this is our data frame this is our data frame one we have three columns it’s their Fellowship ID 101 2 3 and four their first name froto Sam wiise gandal and Pippen and their skills hide and gardening spells and fireworks so this is our very first data frame that we’re going to be working with let’s go down a little bit let’s pull this down here and we’re just going to say data Frame 2 Data frame two and this is the Lord of the Rings 2 so let’s pull this one in now as you can see it’s very similar we have Fellowship ID 1 2 6 7 8 so we have three different IDs here we don’t have six seven and eight in this upper this First Data frame we also have the first name so froto and Sam or Sam wise are in the very first and the second data frame but now we have three new people barir Eland and legalis and now we have this age column which again is unique to just this second data frame first one that I want to look at is merge and I want to look at merge first because I think this one is the most important I use this one more than any of the ones that we’re going to talk about today the merge is just like the joins that we were just looking at the outer the inner the left and the right and there’s also one called cross and I’ll show you that one although if I’m being honest I don’t really use that one that much but it’s worth showing just in case you come into a scenario where you do want to do that so let’s go right down here and I want to be able to see these while we do it so we’re going to say data frame one and when we specify data frame frame one as the very first data frame when we say data frame. merge this is automatically going to be our left data frame then if we do our parenthesis right here and we say data Frame 2 this is our right data frame and let’s see what happens when we do this so what it’s going to do and this we didn’t specify this it’s just a default it’s going to do an inner join so it’s only going to give us an output where specific values or the keys are the same now you can’t see this but what is happening is is it’s taking this Fellowship ID and saying I have 101 here a 102 here this is the exact same as up here with this Fellowship ID and fellowship ID of 101 and 2 but when we look at 13 and 4 those aren’t in this right data frame and 678 is not in this left data frame so the only ones that match are this 101 and2 and that’s why they get pulled in down here but because we didn’t explicit itely say here’s what I want to join or merge between these two data frames it actually is looking at the fellowship ID and the first name so it’s taking in these unique values of froto and Sam wise which are the same in both which is why I pulled it over but really quickly let’s just check and make sure that we did it on the inner join because again we didn’t specify anything that was just the default so we’re going to say how is equal to and then we’ll say enter and if we run this it’s going to be the exact same because again the inner is the default but now just to show you how it’s kind of joining these two uh data frames together I’m going to say on is equal to and then I’m only going to put Fellowship ID so let’s run this now the first thing that you may have noticed is this first name undor X and this first name uncore Y what the merge does as kind of a default is when you are only joining on a fellowship ID we have this right data frame with fellow ship ID the left data frame with the fellowship ID if you’re just joining on these and you’re not joining on the first name and the first name then it’s going to separate those into an underscore X and an underscore Y and even though they have the exact same values since we are not merging on that column it automatically separates that into two separate columns so we can see the values within each of those columns if we went into this on and we make a list and let’s do it like that and we say comma and then we write first name oops first name and then we run this it’s going to look exactly like it did before again it automatically pulled in both of these columns when it was merging at the first time even though we didn’t write anything but if we actually write this it’s doing exactly what it was doing when we just had df2 we’re just now writing it out now there are other arguments that we can pass into this merge function let’s hit shift Tab and let’s scroll down here So within this merge function we have a lot of different arguments you can pass into it first we have this right which is the right data frame which is this data frame two then we have the how and the on which we’ve already shown how to do there’s a left on right on left Index right index not something you’ll probably use that much but you definitely can if you want to look into that and there’s all these doc strings which show you exactly how to use all of these so if you’re interested in looking at the left and the right and the left index it’s all in here but one that is really good is the sort and you can sort it saying either it’s false or true then we have these suffixes now if you remember when we took these out what it automatically did was it put in these underscore X andore Y you can customize that and you can put in whatever you’d like instead of the underscore X andore Y you can put in some custom um string for that we also have an indicator and a validates again all things that you can go in here and look at I’m just going to show you the stuff that I use the most so these things right here are things that I definitely use the most so now that we’ve looked at the inner join let’s copy this right down here and let’s look at the outer join and these get a little bit more tricky I think the inner join is probably the easiest one to understand let’s look at the outer this spelled o u t r i I don’t know why I always want to say o u t t r but let’s run this and see what we get so now this looks quite different the inner join only gave us the values that are the exact same this one is going to give us all of the values regardless of if they are the same so we have 1 2 3 4 six seven and eight so let’s scroll back up here so we have 1 2 3 4 1 2 and 6 7 and 8 so we don’t have a 105 and then if you notice in this data frame right here if the value doesn’t have so if we can’t join on the fellowship ID or the first name like legalis wasn’t one that we joined on or that has a similar value in the left data frame it just gives us an Nan which is not a number and it’s going to do that for any value where it couldn’t find that join or it couldn’t match uh something within that either ID or first name so in age we also have that for the ones that weren’t in the right data frame we only had 101 and 102 so we’ll have the age for both Frodo and Sam but for Gandalf and Pippen we don’t have their corresonding IDs and so it’s just going to be blank for Gandalf and Pippen and you can see that right here so again outer joins are kind of the opposite of inner joins they’re going to return everything from both if there is overlapping data it won’t be duplicated now let’s go on to the left join and I’m going to pull this down right here and now we’re just going to say how is equal to left and let’s run this so what this is going to do is it’s going to take everything from from the left table or the left data frame right here so everything from data frame one then if there is any overlap it’ll also pull the overlapped or the you know whatever we’re able to merge on from data frame two so let’s go back up to our data frame one and two so it’s going to pull everything from this left data frame because we’re specifying we’re doing a left join so everything from the left data frame will be in there we’re also going to try to bring in everything from the right but only if it matches or or is able to merge so just this information right here will come over we weren’t able to join on 1006 1007 or 1008 so really none of that information is going to come over so let’s go down and check on this so again we have 1 2 3 4 all of the data with this first name and skills everything is in here but then we are trying to bring over the age but we only have matches with 1,1 and 10002 so only these two values will come in let’s look at the right join CU it’s basically the exact opposite let’s look at the right and this is basically the exact opposite of the left in the fact that now we’re only looking at the right hand and then if there’s something that matches in data frame one then we will pull that in so this is basically just looking like data Frame 2 except we’re pulling in that skills column and since only 1 And1 and 102 are the same that’s why the skills values are here now those are the main types of merges that I will use when I’m using a data frame or when I’m trying to merge a data frame but there also is one called a cross or a cross join uh and let’s look at this one and this one is quite a bit different here we go let’s run this so this one is different in that it takes each value from the left data frame and Compares it to each value in the right data frame so for froto in this left data frame it looks at the froto in the right data frame Sam wise in the right data frame legalis elron and baromir all in the right data frame then it goes to the next value Sam wise does the exact same thing Roto Sam wise legalis Elon baromir and it does that for every single value so let’s go right back up here so it’s taking this this 101 it’s comparing it to 1 2 3 4 5 then it’s taking Sam Wise It’s comparing it to one two 3 4 five Gandalf 1 two 3 four five Pippen and then you kind of see that pattern and that’s what a cross join is um there are very few in my opinion reasons for a cross join although you’ll if you ever do like an interview where you’re being interviewed on python you will sometimes be asked on Cross joins but there aren’t a lot of instances in actual work where you really use or need a cross join now let’s take a look at joins and joins are pretty similar to the merge function and it can do a lot of the same thing except in my opinion the join function isn’t as easily understood as the merge function it’s a little bit more complicated um but let’s take a look and see how we can join together these data frames using the join function so let’s go right up here we’re going to say data frame one. join and then we’ll do data Frame 2 very similar to how we did it before and let’s try running this and it’s not going to work um when we did the merge function it had a lot of defaults for us let’s go down and see what this error is it says the columns overlap but no suffix was specified so it’s telling us that it’s trying to use the fellowship ID and the first name just like the join did except it’s not able to distinguish which is which and so we need to go in there and kind of help it out a little bit again a little bit more Hands-On than the merge let’s see what we can do to make this work let’s do comma and we’ll say on and let’s really quickly let’s open this up and kind of see what we have so this one has less options than the merge does we have other and that’s our other data frame we can do on and we’re going to specify you know what column do we want to join on and then we can look at how do we want it to be a left an inner and outer the same kind of types of joins as the merge then we have that left suffix right suffix and that’s right here is kind of part of the issue that we were just facing is that those columns are the same but if we say left suffix it’ll give us an underscore whatever we want to specify any string four columns that are both in the left and the right we can give it a unique name so it we’ll no longer have that issue and then we can also sort it like we did on the other one but anyways let’s go back to our on we’ll say on is equal to and then we’ll say Fellowship ID let’s try running this and we’re still getting an error it’s just not as simple as the merge so let’s keep going so now let’s specify the type so we’ll say how is equal to and we’ll do an outer and if we run this it still doesn’t work we’re still getting the exact same issue as the left suffix and the right suffix so now let’s finally resolve it I just wanted to show you how a little bit more frustrating it was but now let’s say uh L suffix is equal to and now it automatically when we did the merge did an underscore X but we can do let’s do underscore left and then we can do a comma we’ll do right suffix and we’ll say it’s equal to and we’ll do underscore right now when we run this it should work properly let’s run this so this is our output and obviously looks quite a bit different over here we have this Fellowship ID then we also have Fellowship ID left first name left Fellowship ID right and first name right so it just doesn’t look right now something I didn’t specify when I first started this because I kind of wanted to show you is that the join usually is better for when you’re working with indexes before when we were using the merge We Were Us using the column names and that worked really well and it was pretty easy to do but as you can see right here when we’re trying to use these column names it’s not working exceptionally well let’s go ahead and create our index and then I can show you how this actually works and how it works a little bit better when we’re working with just the index although you can get it to work just the same as the merge it’s just a lot more work so let’s go right down here and let’s go and say df4 so we’ll create a new data frame we’ll say df1 do set _ index and we’ll do an open parenthesis and we’ll say we want to do this index on the fellowship ID and then we’re going to do the join so now we’re going to say join so we’re setting an index so we’re setting that index on the fellowship ID now we’re going to join it on df2 do setor index and then we’re also going to do that on the fellowship ID and I’ll just copy this oh jeez I hate it when I do that okay now we also want to do and specify the left and the right index so I’ll just copy this as we do need to specify this now let’s try running the data frame four so really quickly just to recap we were setting the indexes we were doing the same thing above right we have this join we were joining data frame one with data Frame 2 now we’re joining data frame one with data frame two except in both instances we’re setting the index as Fellowship ID so we’re joining now on that index so now let’s run this and this should look a lot more similar to the merge than the join that we did above except now the fellowship ID right here is actually an index so it’s just a little bit different but we can still go in here and do how is equal to Outer oops let’s say outer so we can still specify our different types of joins or the different way that we can merge or join these data frames together we can still specify that again it’s just a little bit different and that’s why for most instances I’m using that merge function because it’s just a little bit more seamless little bit more intuitive the join function can still get the job done but as you can see it takes a little bit more work now let’s look at concatenate concatenating data frames can be really useful and the distinction between a merge and join versus the concatenate is that the concatenate is kind of like putting one data frame on top of the other rather than putting one data frame next to one another which is like the merge and the join so concatenating them is just a little bit different in how it’ll operate but let’s actually write this out and see how this looks let’s go up here and we’ll say pd. concat we’ll do an open parenthesis and then we’re going to concatenate data frame one comma data Frame 2 that’s all we have to write and let’s run this and so just like I said it literally took the First Data frame 1 2 3 4 and put it on top of the right data frame 1 2 6 7 8 so that is our left data frame this is our right data frame and they’re literally just sitting one on top of the other but just like when we merge either with a left or a right when you have these skills and there aren’t any values that populate for them it is going to say not a number and since we’re not actually joining we’re not joining on one and two even though this one and this one is the same rows it’s not populating that value because again we’re not joining these together we’re just concatenating and putting one on top of the other now if we go into this concat we say shift tab there are a lot of different things that we can do which if you remember the zero axis is the left-and index and the axis of one is the top index which is the columns so you can specify that and we can also do joins and this is the one that I’m going to take a look at but there are other ones that you can um look into as well but let’s look at join let’s do comma and we’ll say join is equal to and let’s do an inner join so let’s see what happens with this as you can see it is only taking the columns that are the same that’s what this inner is doing it’s joining these columns together and the ones that were different they didn’t take because again we weren’t able to combine them they aren’t similar between both data frames Let’s do an outer and now it’s going to take all of them and like I said that’s doing this on these colums right here but we can also do it on this axis as well so let’s go ahead and say a is equal to 1 and when we run this now it’s joining us on this Index right here of 0 1 2 3 4 so now these ones are being joined together and it’s putting it side by side much like a merge would so that’s how concatenate works and I’m going to show you one more thing and again it’s not up here in this you know title because it’s not one that I recommend but is one called append the append function is used to append rows from one data frame to the end of another data frame and then we can return that new data frame and so let’s do data frame 1. append do an open parenthesis and we’ll say data Frame 2 very similar to how we’ve been doing other things and let’s run this and as you can see this is almost exactly like how the concatenate did when we first did it but if we read kind of this warning it’s saying the frame. append method is deprecated and will be removed from pandas in the future version use pandas do canat instead so it’s literally warning us you know a pend is on its way out if you want to do exactly what you’re doing right here go and try concat or concatenate because that’ll do the exact same thing so I’m not really going to show you any other variations of a pend because there’s no reason it’s going to be on its way out in the next version so that is our video on merge join and concatenate and aend as well uh in pandas and I hope that that was helpful I hope that you learned something I mean this stuff is really important because often times you’re not just working with one CSV or one Json or one text file you’re working with multiple of them and you need to combine them all into one data frame and so this is a really really important concept and thing to understand hello everybody today we’re going to be building visualizations in pandas in this video we’ll look at how we can build visualizations like line plots Scatter Plots bar charts histograms and more I’ll also show you some of the ways that you can customize these visualizations to make them just a little bit better with that being said let’s go right over here start importing our libraries and we’ll start with importing pandas SPD and this one is really all you need to actually create the visualizations in pandas but we may get a little bit crazy uh and so we’re going to do a few different ones as well like import numpy as NP and then we’re going to do import matplot li. pyplot as PLT now I may or may not use this I just you know when I get into visualizations I may want to change some different things so we’re going to at least have them here in case we do want to use them let’s go ahead and run this so now let’s our data set that we’re going to be using so let’s say data frames equal to pd. read CSV and let’s get this in right here now we’re going to be doing these ice cream ratings let’s take a look at this really quickly now these values are completely randomly generated they are not real in any way um but that’s what we’re going to be using because I just wanted something kind of generic something that wouldn’t be too crazy confusing just something that we could use and you guys can understand that there’re just numerical values vales but let’s also set that index really quick so we’ll say data frame. setor index and then we’ll say date and then we’ll say that’s equal to the data frame and we have this date column right here as our index so we have uh January 1st second third fourth and then we have our ratings right here and again these are all just integers and they’re pretty easy or really easy to demonstrate how you can visualize these so that’s why we’re using it today so the way that we visualize something in pandas is use something called plot so let’s just take our data frame we’ll do data frame. plot and we’ll do our parentheses now let’s go in here really quickly let’s hit shift Tab and this is going to come up and this is pretty important because this kind of is going to tell us what we can do within this plot and unfortunately there isn’t like a quick overview we just have this dock string but we have our parameters right here these are what we can pass in to kind of customize our visualization so the data is going to be our data frame then we have our X and Y labels we can specify the kind and this one’s important because we can specify what kind of visualization do we want we can do a line plot horizontal a vertical bar plot histogram box plot and then a few others including area Pi density all these other things we can also specify if we want it to be a subplot and a lot of these things that I’m specifying you know I’m going to show you how to do you can use uh different indexes you can add titles add grids Legends Styles all these different things I mean you can go through here because there are a lot but you can specify and you know customize all of these things we won’t be going into all of them but I will show you some of the ones that I probably use the most and that I think are the most useful to know right away so let’s get out of here and we’re just going to do DF do plot and when we run this we’ll get this right here and that was super super easy created a line plot by literally doing just about nothing um but by by default it’s going to give us a line plot so if we come up here we say kind and let me get that out of the way is equal to line and we run this so by default without us actually having to input anything it’s giving us that line plot as a default so uh we can specify it’s a line plot as you can see we already have all of our data right here we didn’t have to specify anything it kind of automatically took it in it is visualizing all three of these columns and it has this little um Legend right here and we can specify where we want that uh there is an argument to be able to do that it also gave us these tick marks of 2 4 6 8 10 again it read in and it said it’s only going from 0.0 to 1.0 that is kind of the peak and so it kind of automatically gave us these ticks for us again that’s another thing that you can specify we can make it go up to 2 5 10 1,000 whatever you want it to be and then we’re doing this based off of this date value right here here really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it’s going to teach you just about everything you need to know about pandas so huge shout out to you me for sponsoring this Panda series and let’s get back to the video if we wanted to break these out by the actual column we could go in here and say subplot is equal to true and it’s actually subplots whoops and now we can run that and then we can see each of those columns being broken out by themselves instead of them all being in one visualization it’s now uh three separate visualizations now let’s go right over here we’re going to get rid of the subplots I want to show you just some of the different arguments that you can use to make this look nice uh because I don’t want to do this on every single visualization I just want to show you what you can do so we have this one right here we can add a title notice there’s no title or anything really telling us what is so we can say comma idle and we’ll say ice cream ratings if we run this we now have this nice title right here now we can also customize the labels or the titles for the X and Y AIS it automatically took this date which is right here this is our date index it automatically took that for us but we can customize that if we’d like to all we have to do is comma and then we’ll say x label is equal to and so our X is the this date one right here and we can say daily rating and then we can do the Y label we’ll say y label is equal to and for this one we can say scores hope you cannot hear my dog in the background cuz they’re being insane uh but let’s go ahead and run this and now we have these daily ratings on the x- axis and on the Y AIS we have scores now let’s go right down here and start taking a look at our next kind of visualization which is going to be a bar plot so we’ll do DF do plot we’ll do kind is equal to and for this one we’re going to say bar now this is what your typical bar plot will look like and a lot of the arguments that we just did on the line plot you can also apply to this bar plot something that’s unique to the barplot is that you can also make it a stacked bar plot all we have to do is go in here we’ll say comma and we’ll say stacked is equal to true so now let going to make it a stacked bar chart instead of just you know your regular bar chart let’s go ahead and run this and as you can see this is now stacked on top of one another with each of these columns all representing the values that they have now we don’t always have to do every single column we can also specify the column that we want so let’s take the flavor rating for example we could do flavor oops flavor rating good night flavor rating and then it’s only going to take in that flavor rating column and if you notice we don’t have a legend that’s only when you have multiple values which we are only looking at this one column so all the values are right here now in this bar chart it automatically defaults to a vertical bar chart but you can change it to a horizontal bar chart let’s go ahead and take a look at how to do that bring back all of them we’ll do DF do plot Dot and then we’ll say barh and I don’t know if I can keep in that kind equals bar let me run this yeah I need to get rid of that because the bar. H is its own um this is its own function so now I’m going to run this it should just have a stacked bar chart except now it should be horizontal so now you can see this worked properly it’s basically the exact same thing as a vertical bar chart just now horizontal which may look better especially depending on if you have values like this or you know something else that just looks better being horizontal now the next one that we’re going to take a look at is the scatter plot so we’re going to say DF do plot. scatter and if we run this we’re going to get an error what we need in order to run this properly is we need to specify the X and the Y AIS in order for this scatter plot to work so let’s go here and we’ll say x is equal to and we can take any of our columns that we have up here so we’ll say x is equal to texture rating and then oops Y is equal to we’ll do overall rating now when we run this it should work properly let’s go ahead and take a look now if we go in here and we do shift tab we can also see some other things that we can specify so let’s go right down here so we have our X and we have our Y and those are the ones that we just did we can also pass through an S which is going to tell us or or change the size of the actual dots right here in our scatter plot then we can also do a c which is the color of each point let’s start with the S let’s say s is equal to and let’s just do 100 let’s see what that looks like so we have a much larger number let’s do 500 and see what that looks like so we can make these much larger on our visualization depending on what you’re looking for we can also look at the color let’s put comma C so for color we can say color is equal to and let’s do uh yellow let’s see if this works so now we’ve changed it to Yellow that looks absolutely terrible but it does work now let’s move on to the histogram histogram is always a good one it’s very similar to something like a bar chart but what’s great about a histogram is you can specify the bins um so let’s go ahead and say DF dolot doist then we’ll do an open parenthesis and let’s go ahead and hit shift tab in here take a look at this one as well so some of our parameters are the actual columns or the data frames that we want to pull in we can choose the bins and they have a default default of 10 in here and so let’s take a look at how this works so we’ll just run this as it is so this is by default what this histogram is going to look like let’s go ahead and specify our bins we’ll just say it was 10 by default let’s just do 20 see what that looks like so there are smaller columns right off the bat and remember histograms are really good for showing distribution of variables you know that’s really what a histogram is for but of course since these are completely random numbers this histogram isn’t going to make any sense at all but you can at least kind of see visually how it works and if I didn’t mention it before which I should have the bins represent how many kind of tick marks are down here so if we just do one it’s only going to be one very large uh you know histogram we could even go further down from 10 and do five so now there’s only one 2 3 four five so the distribution gets smaller and things get more compact as you spread it out again like we did 100 it’s going to spread it out a lot um and this is what it shows you know it’s showing the distribution of those bins across however many you want so the 10 by default you know it usually is pretty good for a lot of different things now let’s go down here and look at the box plot and the box plot is a pretty interesting one let’s go ahead and visualize it really quickly and then I’ll kind of explain how this one works let’s do DF dobox plot that’s r on this and really what we’re looking at is some different markers within our data this line right here is the minimum value within that column we also have the bottom of the box which is the 25th percentile of all the values within just this column this is 50% then we have 75% and then up here we have our maximum value so I can take a glance at this and see that we have a low minimum a high maximum and it definitely skews towards the lower range whereas if I look over here we have a lower minimum and a higher maximum and you can see that this medium point is at 6 versus 04 over here so this skews a lot higher now let’s go down here and take a look at an area plot we’ll do DF do plot. area and let’s just run this this is what we’re going to get by default now something I wanted to show you earlier I just haven’t gotten around to I want to show you something called Figure size or fig size um so for this it’s know it’s just looks small looks a little bit cramped let’s say we want increase the size of this and we’ll say fig size oops fig size is equal to and let’s just do a parentheses and say 10 comma 5 that should be pretty large this is going to make it a lot larger just something I wanted to throw in there I look at these area charts as pretty similar to like a line chart if we went and compared those be pretty similar um but they’re different visually and you know you absolutely can use these for different types of visualizations but I don’t use this one a lot if I’m being honest that’s why why it’s kind of towards the end of the video but you definitely can do it let’s go on to our very last one of the video that’s going to be the beautiful pie chart let’s say DF plot.py do an open parenthesis and let’s run it we’re going to get this error that’s because we need to specify what column we’re working with here so let’s just say the Y and that’s what we need me open this up for us right here we have our Y and this is our our label or our column that we’re going to plot that’s really all we need so we can just say Y is equal to flavor rating oops flavor rating let’s run this now we get this visualization right here let’s make this one a little bit bigger big size is equal to 10 comma 6 now it’s a little bit bigger it definitely depends so this Legend is going to autop populate you know you can make this as big as you want and obviously it’s going to look a little bit better if you do it larger and these colors autop populate now you can customize these colors although I found these ones to be just when you have a lot of them it’s harder to customize them as easily but you know definitely look into it these are things that everything in here is almost something that you can customize in some way although it does get a little bit tricky you definitely have to do some research and some Googling around just to kind of figure out how to do those things now one last thing that I wanted to show and something you know I could have probably done at the beginning um is you can actually change what visual this is and we can do that pretty easily within mpot lib there are different styles um and so let’s go right here let’s add a new row a new cell and we’ll say print and we’ll do PLT so that’s that map plot lib right here we’ll do PLT do style. available and what this is going to do whoops what this is going to do is show us all these different types of uh stylings that you can do to kind of change up this visualization then once we find the one that we like we’ll just do PLT do style. use and then in the parenthesis we’ll just specify which one we want now there’s all these Seaborn ones and Seaborn is a really great um really great Library let’s try Seaborn deep I haven’t tried this one at all let’s go ahead and try this and just changes some of the colors some of the visuals we can try something like like 538 let’s try this that looks quite a bit different and let’s try something like um classic I don’t know what this one looks like let’s just try it so you can try out all these different styles find one that you’d like find one that you think looks really nice and you can run with it through all your visualizations hello everybody today we’re going to be cleaning data using pandas now there are literally hundreds of ways that you can clean data within pandas but I’m going to show you some of the the ones that I use a lot and ones that I think are really good to know when you are cleaning your data sets so we’re going to start by saying import andas as PD and we’re going to run that and now we’re going to import our file so we’re going to say data frame is equal to PD so that’s pandas do read uncore and we actually have this in an Excel file so we’ll say read oops say read Excel do an open parenthesis and we’ll do R and then we’ll paste the path right here and now we’re just going to call that variable so we’ll call data frame and we’ll actually read it in and look at the data so let’s scroll down here and let’s take a look at this data frame or this Excel file that we’re reading in so right off the bat we have this customer ID that goes from 101 all the way down to20 we have this first name and everything looks pretty good here except in this last name column uh looks like we have some errors we have some forward slashes some dots some null values um so definitely going to have to clean that up because we don’t want that in the data we have a phone number and it looks like we have a lot of different formats um as well as Naas not a number um just lots of different stuff so we’re going to need to standardize that so clean it up and then standardize it to where it all looks the same um we also have address and it looks like on some of these we just have a street address but on some of the other ones we have like a street address and another location as well as a zip code in some of them so we’ll probably want to split those out we have a paying customer uh which is yes and Nos and some of those are not the same so I have to standardize that we have a do not contact kind of the same thing as the paying customer and we have this not useful column which we’ll probably just want to get rid of okay so the scenario is is that we got handed this list of names and we need to clean it up and hand it off to the people who are actually going to make these calls to this customer list so they want all the data in here standardized and cleaned so that the people who are making those calls can just make those calls as quickly as possible but they also don’t want columns and rows that aren’t useful to them so things like this not useful column we’re probably going to get rid of and then ones that say do not contact if it says yes we should not contact them we probably will want to get rid of those somehow so that’s a lot of what we’re going to be doing to clean this data set normally the very first thing that I do when I’m working with a data set most of the time except very rare cases when you’re actually supposed to have duplicates is I actually go and drop the duplicates from the data set completely all you have to do for that is say DF do dropcore duplicates so they make it super easy for you let’s just run it and up here is our original data set we have this 19 and 20 and those are obviously duplicates they have the exact same data it’s just a duplicate row that we need to get rid of if we look right down here we we no longer have that 20 we now just have one row of Anakin Skywalker and of course we want to save that so we’re just going to say DF is equal to and DF so now it’s going to save that to the data frame variable again and now when we run this our data frame Now does not have any duplicates that’s definitely one of the easier steps that we’re going to look at uh things are going to get quite a bit more complicated as we go but I’m starting out you know kind of simple so that we can kind of get a feel for it then we’ll start getting into the really tough stuff so the next thing that I want to do is remove any columns that we don’t need I don’t want to clean data that we’re not going to use so if we’re just looking through here you know they may need you know first name last name phone number for sure address might give them some information of where they’re calling to or time zone so we want that this not useful column looks like a pretty good candidate to delete and it’s very easy to do that we’re going to go right down here and we’re going to say DF do drop we’ll do an open parenthesis drop just means we are dropping that column and we can specify that by saying columns is equal to and then we’ll paste in that column that we want to delete so let’s run this and see what it looks like and it literally just drops that column exactly like we were talking about it no longer has that column again we want to save that we can always do in place equals true um if you follow this tutorial series you can always do in place equals true and that’ll save it as well but just for our workflow most of the time I’m going to assign it back to that variable um just for keeping it the same really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it’s going to teach you just about everything you need to know about pandas so huge shout out to UD to me for sponsoring this Panda series and let’s get back to the video now let’s kind of go column by column and see what we need to fix and we’ll start on this left hand side this customer ID to me looks perfectly fine I’m not going to mess with it at all the first name at a glance also looks perfectly fine I don’t see anything wrong with it visually which is a good thing um although sometimes that can be deceiving and that can cause errors down the line but we’re not going to uh assume that there are errors in here now let’s look at this last name now the last name obviously I’m I’m seeing some obvious things things that we talked about when we were first looking at this data set we have this forward slash which we definitely need to get rid of we have null values so not a number right here we have some periods as well as an underscore right here so all those things I think we should clean up and get rid of it so that when the person is making these calls you know it’s all cleaned up for them so how are we going to do that we can actually do this in several different ways but let’s just copy this last name the first one I’m going to show you is strip and we’ll write it kind of like this we’ll say data frame and then we’ll specify the column that we’re working with because we don’t want to make these changes or strip all of these values from everywhere we only want to do it on just this column if we do this and we don’t specify the column name it will apply to everywhere so if we’re trying to do these yeah let’s say bum these underscores maybe that would mess with something else in another column and we don’t want that so we just want to specify just this last name so let’s go last name. string Dot strip now what strip does and let’s see if we can open this up really quickly no we can’t um but what strip does I was just I was hitting shift tab in here to see if it could bring up um you know some of the notes on it but what strip does is it takes either the left side or the right side well L strip takes from the left side R strip takes from the right side and strip takes from both but you can strip values off the left and the right hand side and we can specify those values now for what we’re doing in this column we can just use strip because as you can see this forward slash these dots as well as this um underscore are all on the far sides if there was a value Like swancore Son the strip wouldn’t work at all because it’s not on the outside of the value of the word so we can use strip I’ll also show you how to use replace and replace is another really good option for things like this but let’s start with strip and just see what it looks like and see if we can get what we need done done so let’s just run this for now see what happens so it looks like nothing has changed because again we’re not specifying any specific value just by default it’s only taking out white space so like spaces that shouldn’t be there that’s what it does by default now we can specify within this exactly what values we want to take out so let’s go ahead and do that let’s say left strip and let’s try to take out these dots real quick so we’re just going to do a parenthesis dot dot dot now let’s run this and see what it looks like for this one Potter it is now gone so those three dots were there before let’s just show it so they were there and then when I ran it like this now they’re gone that’s what the L strip does it takes it only off the left hand side now we can also do a forward slash so we’ll do something like this and it’ll get rid of the white but as you can see now we aren’t taking out these three dots so they’re still there now is it possible to do something like this where we put these values inside of a list um let’s try it so we’ll say just like this one two three let’s run it and no it doesn’t um this L strip actually sits within the the realm of regular expression so if you’ve ever worked with regular expression you know it gets very complicated very complex so you want to keep it kind of simple especially with these values where we’re just taking a few out so what we’re going to do is we’re going to do dot dot dot and we’re take it out one by one now in order to save this because we want to save this we want to take out that value we don’t just want to say data frame equals because that would be uh very bad what this would say is now this data frame is only equal to these values that we’re seeing right here we want to only apply it to this column so we’re going to go like this so now when we do it and then we call the entire data frame it’s only applying this to this one column the last name column so let’s run it and now when we go down to Potter right here it’s cleaned up so we’re going to do the same thing but for those other values and we’ll do it just like this we’ll do a forward slash and it’s a left strip and then we’ll do I’ll do the left strip on this underscore to just to show you that it won’t work and then we will go on from there so it’s not pulling it because we’re looking at the leftand side only we need to use R strip so now let’s use R strip and now that looks perfect as no underscore so that’s how you can use strip for either the left side the right side or just Strip by itself which covers both sides now I showed you all of that because I am going to show you a different way to do it um and I apologize because I somewhat lied to you earlier um let’s run this right here actually we’re just going to pull it in like this we’re going to remove the duplicates again bear with me we’re going to drop that column and then now we’re sitting with that data frame again with those exact same mistakes I just wanted to reset it for a second there is a way uh that you can do this and I just wanted to you know kind of show you how you can do it you can do this right here and we’ll say so we’re now again we’re just looking at this column just this column and we’re using strip and let’s get rid of R because we want to do apply it to everywhere you can input all of those values individually and it will clean it up so let’s say we want to get rid of numbers we’ll do 1 2 3 then we can do the dot so that’s going to be for our period or for our dot dot dot Potter we could also do the underscore and we can do the forward slash so we put it all in one string right here now let’s take a look at this we’ll get rid of this really quickly now let’s take a look and all of them were removed I showed you how to do it before because that’s at least how my mind would think about it I’d think oh I can put it in a list and run it through this L strip or this right strip and it would work um but that’s not how strip works you have to kind of combine it all into one value so uh yes I deceived you I apologize but now when we call data frame and we assign it to that column so the last name column or assigning what we just did to this last name column everything should look perfect and it does so our customer ID first name last name are all cleaned up now we’re going to come to a much more difficult one this is probably if I’m being honest the hardest one I said we were going to work up but this is probably the hardest one of the whole video working with phone numbers and look at all these different types of of formats I mean it is um it’s not going to be fun and imagine you know there’s 20,000 of these you can’t just go and manually clean those up you need something to kind of automate that so that is what we’re going to do so let’s go right down here copy the data frame and I’m going to pull it right here so now we need to clean up this phone number what we want is it all to look exactly the same unless it’s blank and we’ll keep it blank we don’t want to populate that data but we want all of them to look exactly like this one and what we’re going to do is right off the bat we’re going to take all of the non-numeric values and just completely get rid of them strip it down to just the numbers so this 1 23- 643 or forward slash will just be the numbers same with these bars and these slashes and everything all of these will just be numeric then we’ll go back and reformat it how we want to format it which will look exactly like this one um but we just want to do it for the entire column so let’s go right up here and we’re going to try replace for the first time so let’s do phone number just oops that’s not what I wanted so we’re going to do a bracket say phone number do string. replace just like we did before now we’re going to use some regular expression in here and I’ll kind of do a really high overview although I’m not going to dive super deep into the regular expression then we’re going to do a parenthesis and within there we’re going to do a bracket um I can’t remember what this is called is it called a carrot I think it’s called a carrot uh B I’m just going to call it that it may not be correct but I think it’s a an upper Arrow so it’s an upper Arrow a a d oops A- Z A- Z and then 0-9 now at a super high level what that character that first thing is doing it’s saying we’re going to return any character except and then we specify anything A to Z A to Z upper or lowercase and then actually I think this should be like this A to Z uh and then 0 to 9 so any value like a BC One Two Three those are not going to be matched it’s going to match all of them except these values and then we’re going to replace them by saying comma and we’re going to replace them with nothing so this is just an empty string so literally we’re taking everything that is not an A A B C A 1 two 3 so a letter or a number we’re replacing all of that and then we’re replacing it with nothing so let’s run this and see what it looks like and it looks like that worked properly now we do have this na because we had an n- a for I don’t remember maybe that was Creed Bratton um but it worked for basically everything else we’re going to go through the entire process and then at the end we’ll remove any values we want them to just be completely null we we don’t want them to even see n an and wonder what that is we just want it to be blank and we’ll do that at the very end so now that we know that that worked let’s assign it we’ll do DF phone number is equal to and then we’ll say data frame and this looks a lot more standardized than it did before already but now what we want to do is try to format this um and I’ve done this many many times I always use a Lambda you can definitely use a for loop I just I don’t do it that way myself so I’m going to show you how to do it using a Lambda let’s get rid of this and we’re going to say thef phone number we’ve already done that I’m just going to get rid of it now we’re going to say DF phone number then we’re going to say do apply we’ll do an open parentheses and then this is where we’re going to build out our Lambda so we’ll say Lambda X colon now this is where we’re going to kind of format it so what I want to do is I want to take the first three strings 1 2 3 then I want to add a slash and then the next three strings add a slash or a dash uh and then that be the value that’s returned so it’s not super difficult we’re just going to do X then a bracket let me get rid of that an X and then a bracket and then we want the 0 to three so it goes 01 2 so 0 1 2 it doesn’t include the three it goes up to three so 0 1 2 that’s our third first three values then we’ll do plus and do a quote and do a dash so this is our first kind of sequence and I’m just going to copy this we’ll do plus and instead of three or we are going to start at three because now it’s inclusive so we’re going to go from three and we’re going to go all the way up to six so it should be three four five our next three values then we have a dash and we’ll copy this and we’ll say plus and now we go from six all the way to 10 now let’s try running this and as you can see we get an error now I already know what the error is float object is not subscriptable which means we’re trying to um basically look at it like a string right now it’s not a string it’s actually a number so let me get rid of this for just a second I’ll going show you what it’s talking about so right now we have values that are floats and values that are strings or not even a number so we have values that are strings or not a number so if we want to actually look through it like kind of like indexing if we want to do that they all have to be strings so we need to change this entire column into Strings before we can apply this um formatting now when I was creating this if I’m being honest my first thought when I was doing this was to do it like this string DF phone number um let’s just run that this is what the values look like um and I don’t remember why or why it was doing this I can’t I can’t remember but I looked into it quite a bit and I was like oh I need to apply this string converting it to a string on each value not the entire row or not the entire column so how we can do that is actually fairly easy because we’ve already done a lot of the heavy lifting we’re just going to copy this and we’re going to say x so string of X and again Lambda is like a little anonimous function so you could do this by saying for um X in this uh column we could do a for Loop and then say for every X it equals the string of X and then it changes it to a string but a Lambda just does it a lot quicker um so we’re going to say so let’s do that really quickly and all of our values look exactly the same and that’s how we want it so we’re just going to copy this apply it good and now we’re going to take this and we’re going to run this again just ignore all my commented out stuff pretend I don’t have that um so now when we run this it should work there we go now if we look at these numbers 1 2 3- 545 D 5421 and it does that for every every single one where there’s values even when there’s n n or na it’s still adding those values but we expected that so let’s apply it says equal to and then we’ll look at the data frame and this looks almost exactly what we’re hoping for we just need to get rid of these so this n- Dash and this na Dash we need to get rid of those and that is super easy to do um we’re just going to say so now that we’ve done it and we I me it out we’ll say DF and let’s copy this ignore the messiness I do apologize for that it’s very messy um but if you’re following along with me you get what we’re doing so DF phone number so only on the phone number say string. replace parenthesis now we can specify this value so we want to take this exact value and replace it with nothing and let’s just see if that does work it does now we have these Nas and so let’s actually I’ll paste that right down here we’re going to do this is equal to and then we’re just going to take this entire string put it right here and put this value as our what we’re looking for and then replacing and then when we call that data frame it should work properly and it is perfectly cleaned so so we have every single value all the exact same they don’t have different characters or different um you know formatting and we got rid of all the ones that we don’t have or don’t need um all the ones that were just random values so this column is now completely cleaned up again definitely one of the more difficult ones um one that I’ve done a thousand times I’ve had to work with a lot of phone numbers and stuff like that this one does get very tricky especially if you have like a plus one which is like an area code um that can get tricky as well but this is on a kind of a high level this is how you can do that and it’s pretty neat how you can actually you know clean up and standardize those phone numbers so let’s go right down here uh let’s run it the next thing that we’re going to look at is this address now let’s just pretend that the people who are on the call center want all these separated into three different columns they can read it easier see what the ZIP code is where they live uh you know whatever they want it for let’s just say we want to do that and this is you know again for this use case it may not make sense but you have to do this I do this all the time um you need to split those columns now luckily all of these things are separated by a comma so we can specify that we’re going to split on this column and then we’ll be able to create three separate columns based off of this one column which is exactly what we want then we can name it as well and we can do that very easily by using this split so we’re going to say DF and we want to specify oh jeez not again so we want to specify that we’re looking at the address then we’re going to say string. split we’ll do an open parenthesis now the very first value that we need to specify is what we’re splitting on so we want to split on the comma so we want to specify that and then we need to specify how many values from left to right it should look for now we’ll just start with one and then we’ll go from there let’s just see what this looks like so it doesn’t really look like it did anything let’s do two well let’s go back to one and then let’s say expand equals true when we expand it it’s actually going to uh separated I believe okay so we’re expanding now we’re only doing this with one comma so we’re only looking at the very first comma and splitting it but in some of these well just in one there is an additional comma so we should do it up to two let’s do this okay so now we have three columns if we just save it like this it’s going to give us these 0 one2 these basically these indexed values for these columns and we don’t want that we want to specify what these actually are and we can do that by saying DF and let me just do is equal to we’ll do bracket and then within there we’re going to specify our list so we have three of them that we have so I’m going to do um the first one this is the street address so we’ll say street address the next one is and it’s sh is not a state uh but these all are state so I’m just going to say State and then for the very last one that looks like a zip code so we’ll say zip and we’ll do code in fact I also want to do streetcore address um so what this is now going to do is these three columns are going to be applied to these three names and they’ll basically be appended it’s doesn’t replace the address we’re not saying DF address equals the DF address we’re not replacing it we’re now creating different columns so let’s run it and then let’s also call it so they’re right over here on this right hand side I couldn’t see them at first but it did exactly what we needed it to do so now if we wanted to at the very end if we want to we’re not going to we could just delete this address and keep the street address the state and the zip code another really common thing that you can do this happens often again with like first name last name well you’ll have Alex freeberg but it’s Alex comma freeberg or Alex space freeberg and you can separate those out into different columns now the next one that we want to look at is this paying customer and the paying customer and do not contact are very similar um in the fact that it’s yes no NY yes no NY um and so let’s go right on down here and we’re going to say DF Dot and we we want to just replace these values as all yeses or all NOS but just with the same formatting um just to keep it consistent so let’s make anything that’s an N into a no anything that’s a a y into a yes I like it spelled out so let’s change anything that’s a yes into a y and anything that’s uh a a no into an N that’s usually how I do it just saves on data because it’s less strings although it’s be often very minimal um but let’s specify the in customer we see say DF bracket Pay customer then we’ll do string. replace so now we’re just going to look for those specific values so if it’s a y oops a capital Y then we’ll say yes now let’s run it and now we have no more y we now just have yeses although now these are yes yeses okay we don’t want to do that let’s do if we’re looking because it’s taking it’s literally looking up here and saying okay there’s here’s a y um let’s change the let’s change that Y into a y so now it’s doing ye uh we don’t want that so let’s look for the yes and change it into a y now when we run this that looks a lot better um so we’ll do D of paying customers equal to and then we’ll copy this we’ll do the exact same thing no and N then let’s call it and now that entire column looks really good except for that value right there but I’m going to leave that because I’m just going to apply it to the entire thing all at once to get rid of those at the end instead of just going column by column and then it’s literally going to be the exact same thing so I’m not even going to scroll down whoops I’m just going to put it right up here because this is the exact same thing I’m going save us all some time and when we run this this looks exactly like what we’re looking for again some not a number of values but we can get rid of that in just a second by doing our place over the entire data frame and that is basically the end of cleaning up individual columns now let’s go right down here we’re going to say DF do string. replace and then we’ll first do these values oops so we’ll do oops let me do that there we go and replace that with nothing let’s just see what it looks like oops data frame object has no value string well that’s because we were looking at columns before yeah I think I just need to get rid of this string we’re not looking it we’re just doing it across the entire data frame now let’s try that okay that worked appropriately and we’ll just say data frame is equal to and then we’ll copy this and we’ll do the NN as well and we’ll [Music] do and now when we do this it is not going to replace these because these aren’t actually a value because we’re looking for that string we actually need to use and I I completely forgot this I’m not going to lie to you um let’s get rid of this uh to get rid of those values because it’s literally not a number there it is technically empty um I forgot we can do um or we could not even specify it we’ll do DF do fillna so we’re going to fill these values if there’s nothing in them we’re going to fill it and we’re going to say blank and when we run that every value that doesn’t have something in it is going to show up blank even over here where we only had a few all of them throughout the data frame if it doesn’t have a value it is now blank so let’s apply that and and we’ll run this and now all of our cleaning we’re actually cleaning up the individual columns is completely done we’ve removed columns we’ve split columns we’ve formatted and cleaned up phone numbers we’ve also taken values off of first name or or this last name column and then we formatted in just kind of standardized paying customer and do not contact now they also asked us to only give them a list of phone number numbers that they can call so if we take a look some of these do not contacts are why which means we cannot contact them and then there are some that don’t even have phone numbers so we don’t want to give the people the call center numbers that or or people who don’t have numbers so we want to remove those now there’s a few different ways that we can do this but let’s start with and we’ll just go by do this do not contact it seems like the most obvious one now if it’s blank we want to give them a call we only want to not call them if they’ve specifically said we cannot call them so if it’s y we’re not going to call them so what we need to do it’s not anything like this we probably need to Loop through this column and then look at each row that has a value of this and drop that entire row uh and we probably will’ll need to do that based off this index instead of doing it based off just this column uh that may not make sense but let’s actually let’s actually start writing it so we’ll do 4X in and we need to look at our index so we’re just going to do let’s do in DF do index and we’ll do a colon enter and then we want to look at these indexes how do we look at these indexes we use lock that’s going to be DF do Lo and then we need to look at the value which is this x right here so each time it looks at the index it’s looking at the value but we want to look at the value of this column do not contact I don’t know if I copied this before let me copy it we only want to look at the value in this one column if we didn’t it would look at um a different value so we don’t want that so we’re looking at just that value if it’s equal to Y so if this value is equal to Y then we want to drop it so we actually need to say if so if this value X in this column is equal to Y then we want to do DF do drop and then we’ll say x and we I think we have to say in place equals true here otherwise it won’t take a fact um otherwise have to say like DF is equal to DF I don’t I don’t want to start messing with that let’s just do in place equals true um and let’s see if that works I I can’t remember if this is going to work or not invalid syntax okay neon and now let’s try to run this okay okay yeah if we look at our index we can already tell that there are ones missing the one the one is missing the three is missing um let’s see and the 18 is missing so we already got rid of those values and you can you can see that there’s no y’s in here anymore which is really good we can if we want to and we probably should we should probably populate that um really quickly um let me just go up here really quick I’ll copy this we probably should populate that and I didn’t plan on doing this so um if it’s blank oops it’s blank give it an n and we want to attribute it to do not contact do not contact whoops let’s see if that works and we probably need to do dot string let’s just see if it works so if it’s blank dude okay I don’t know why it’s giving us a triple n maybe there’s maybe I need to strip this or something uh okay never mind let’s not do that but now we basically need to do the exact same thing for this phone number um because if it’s blank we don’t want them calling it um so we can copy this entire thing go right down here and but now we’re looking at phone number so now we’re looking just at the values within phone number and we only want to look at if it’s blank so if it literally has no value we want to get rid of it let’s run this and see if it works again it should good and now our list is getting much smaller so you can see in our index a lot of um those rows were removed and and okay good actually this worked itself out because these all have ends um so right now we’re sitting really good everything looks really um standardized cleaned everything looks great I might drop this address if you want to you can drop this address but besides that this is all looking really good this pain customer doesn’t uh the yes and knows aren’t really anything um now we could and we probably should before we hand this off to the client or the customer call let’s we probably should reset this index because they might be confused as why there’s numbers missing or you know they might use this index um to show how many people they’ve called or I don’t know something like that so let’s go right down here we’re going to say DF Dot and then we’ll do reset index and let’s just see what this looks like um it does work but as you can tell it didn’t uh get rid of that index completely it actually took the index and saved that original one we do not need to save that whoops let’s put it right in here now we’re just going to do drop equals true and when we do that it just completely resets it drops the original index and gives us a new index and that is what we want let’s do DF equals and this is our final product now one thing that I you definitely could have done here um and I made this a little probably more complicated than it needed to be um that was just how my brain was working at the time when I’m you know typing this out we could could have done DF do drop an a um which is literally going to look at these null values um before we couldn’t do that with this one because these aren’t we’re not looking at na we’re looking at y’s so we couldn’t do that but because we’re looking at null values we could have also done drop na um and done subset is equal to and then done it just on this phone number and then done like this and done in place equals true so we could have also done this and then said DF equals um I can’t I mean I can run it it’s just not going to do anything I can run it on the different column but that’ll me mess everything up but this is another way you can do it and I’ll just save it in case you want to um I’ll say another way to drop null values there you go and that’ll just be a note for us in the future um but this is our final product it looks a lot different than when we first started I mean we had mistakes here completely different formatting in the phone number different address everything that we just talked about um and this looks just a lot lot better and you can tell why it’s really important to do this process because again we’re working on a very small data set I I purposely you know created this data set with these mistakes because you know when you’re looking at data that has tens of thousands 100 thousands a million rows these are all things that are going to be applied to much larger scale and you won’t be able to as easily see them um you’ll have to do some exploratory data analysist to find these mistakes and then you’re going to need to clean the data or doing it at the same time when you’re exploring the data uh so you’ll clean it up as you go but these are a lot of the ways that I clean data a lot of the things that you can do to make your data just a lot more standardized a lot more um visually better and then it really helps later on with visualizations and your you know actual data analysis so hello everybody today we’re going to be looking at exploratory data analysis using pandas exploratory data analysis or Eda for short is basically just the first look at your data during this process we’ll look at identifying patterns within the data understanding the relationships between the features and looking at outliers that may exist within your data set during this process you are looking for patterns and all these things but you’re also looking for um mistakes and missing values that you need to clean up during your cleaning process in the future now there are hundreds of ways to perform Eda on your data set but we can’t possibly look at every single thing so I’m just going to show you what I think are some of the most popular and the best things that you can do when you’re first looking at a data set the first thing that we’re going to do are import our libraries so we’ll do import andas as PD we’re also going to import Seaborn and matplot lib now during this exploratory data analysis process I often like to visualize things as I go because sometimes you just can’t fully comprehend it unless you just visualize it and it gives you a a larger broader glimpse of everything so we’re going to import and let’s do caborn oops as SNS and then we’ll import Matt plot li. pyplot as PLT let’s run this this should work okay perfect now we need to bring in our data set so we’ve worked with that world population data set that is the exact one that we’re going to use now so we’ll say dataframe equals pd. read CSV do R and we’ll paste in our CSV and this is what it should look like although your path may be different be sure to make sure that you have the correct file path then we’ll read it in now this data set should look extremely familiar if you’ve done some of my previous pandas tutorial but I did make some alterations to this one took out a little bit of data put in a little bit of data here and there um to change things up because if it was just exactly how I pulled it which I got this data set from kaggle if it was exactly how we pulled it like we’ve looked at in the previous videos it’s too simple you know we wouldn’t actually be able to do some of the things that I would like to show you so be sure to actually download this exact data set for this video because it is a little bit different but what we’re going to do now is just just try to get some highlevel information from this now if yours looks just a little bit different like your values are in scientific notation uh I have applied this so many times I think it’s um you know still applied to this you can do something and we’ll write it right down here we’re going to do pd. setor option and we’ll do an open parenthesis and we’ll say display. flator format and so we’re going to change that float format by just saying Lambda X colon and then we’re going to change basically how many um decimal points we’re looking at so let’s just do here so we do a quote sign 2f so we’re formatting it whoops 2f so we’re going to format it and we’ll do percent X this is going to format it appropriately I’m I can run it um and actually it will change it CU this is at0 one I believe last time I did it so let’s run this and then let’s run this again n it’ll change it to0 2 so that’s two I like it at 0.1 we don’t really need it any well let’s keep it at0 2 why not we’re going to keep it at0 two that’s how you change that and I like looking at it like this a lot better than scientific notation so just something to point out um let’s go down here and let’s just pull up data frame so we have this data one of the first things that I like to do when I get a data set is to just look at the info so we’re going to do do info and this gives gives us just some really high level information this is how many columns we have here are the column names here are how many uh values we have and if you notice this is where it kind of gets so we have 234 in each of these so in each of these columns we have 234 until we get to this 2022 population once we get there we start losing some values and then at the world population percentage we have all of our values all 234 of them the count tells us that it’s nonnull so it does have values in it and then we also have the data types and these come in handy later um and these are really great to know and we’ll be able to kind of use those in a few different ways later on in this tutorial really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it’s going to teach you just about everything you need to know about pandas so huge shout out to UD me for sponsoring this Panda series and let’s get back to the video the next thing that I really like to do and this one is DF do describe this allows you to get really a high level overview of all of your columns very quickly you can get the count the mean the standard deviation the minimum value and the maximum value as well as your 25 50 and 75 percentiles of your values so just at a super quick glance there is a row somewhere in here and there this country their population is 510 for 2022 and in fact if you go back to 1970 it was higher was at 752 that’s just interesting then if we look at the um max population one has 1.42 billion I believe that’s China and then over here in 1970 we have 822 million again I still believe that’s China but this gives you just a really nice high level of all of these values all these different calculations that you can run on it and we can run all these individually on even specific columns but you know this just a nice high level overview one thing that we just talked about was the null values that we’re seeing in here um I’d like to see how many values we’re actually missing because that is a problem um we don’t want to have too many missing values that could really obscure or change the data set entirely and so we don’t want that so we’ll say DF do is null and then we’ll do a parenthesis we’ll say do sum and when we do this whoops dot sum there we go when we do this it’s going to give us all the columns and how many values we’re actually missing now we have 234 rows of data so we have 41477 55424 um so we have we definitely have data missing what we choose to do with it in the data cleaning process maybe we want to populate it with a median value Maybe we just want to delete those countries entirely if the data is missing um you know I don’t think you’re going to do that but these are things that you need to think about when you’re actually finding these missing values this is what the Eda process is all about we want to find different um either outliers missing values things that are wrong with the data or we can find insights into it while we’re doing this as well so this is definitely something that I would consider um when I’m actually going through that data cleaning process really important information to know now let’s go right down here go to our next cell say DF do unique and this is going to show us how many unique values and it’s actually n unique uh this is going to show us how many unique values are actually in each of these uh columns and this one makes the most sense um for continent because I think there’s only seven continents right um but we have six right here and for all of these each of these ranks countries capitals should all be unique that makes perfect sense as well as these you know these populations are such specific numbers in such large numbers I would be shocked if any of these were similar and then for these world population percentages it’s much lower and again that makes a lot of sense because when we’re looking at and we’ll pull it up right here when we’re looking at these world population percentages um a lot of them are really low 0.00 0.01 like this one um 0 .2 there are a lot of really low values for those small countries and so those are all um you know one unique value now let’s say we just have this data right here and we want to take a look at some of the largest countries and we can easily do that we could even we could say Max and take a look at the largest country but I want to be a little bit more strategic I want to be able to look at some of the top range of countries and we can do that based off this 2022 population so we’ll say DF do sort underscore values this is how we sort and um not filter but um order our data so we’ll do sort values and then we’ll do buy is equal and then we’ll specify that we want uh this 2022 population and then we’re going to say comma and we’ll say actually let’s just run this as is um but we’ll do head because we just want to look at the top values so now we’re just looking at the very top values so what we’re looking at is actually these 2022 population um that’s what we’re filtering on or sorting on basically and we’re looking at the very bottom values because it’s sorting ascending so from lowest to highest so this Vatican City in Europe is um you know 510 that’s the value that we were looking at earlier now we can do comma ascending equal to false because it was by default true we can do false whoops we can do false and then it’ll give us the very largest ones so if we just take a look at the top five largest by population we’re looking at China India United States Indonesia and Pakistan and we can even specify that we want the top 10 in this head we can bring in the top 10 and we also have Nigeria Brazil Bangladesh Russia and Mexico and you can do this for literally any of these columns whether you want to look at continent capital country um you can sort on these and look at them and you can even look at you know things like growth rate world percentage this one seems really interesting let’s just look at this one really quick before we move on to the next thing um if we look at this world percentage just China alone I believe yep just China alone is 17.88% of the world so 17.88% again just getting in here looking around that’s all we’re really doing now I want to look at something and I have always liked doing this which is looking at correlations um so correlation between usually only numeric values we can do that by saying DF docr and a parenthesis and we’ll run this and what this is is it is comparing every column to every other column and looking at how closely correlated they are so this 2022 population if we look across the board it’s very highly I mean this is a a one: one this is highly correlated to each other and that almost for all of these populations they’re very very closely tied to each other which makes perfect sense because for most countries they’re going to be steadily increasing and so they’re probably almost exactly correlated but we can look at these populations and if you look at the area it’s only somewhat correlated and that’s because in some countries you know they have a very high population but a small area or vice versa small area in a very high population so there isn’t a one toone correlation there but it’s hard to really just glance at this um and understand everything that’s there we could just visualize it and it would be a lot easier so let’s go ahead and do that let’s go down here we’re just going to visualize this using a heat map basically so we’re going to say SNS do heatmap and an open parentheses and the data that we’re going to be looking at is DF do core correlation and then we also want to say inote equals true I’ll kind of show you what that looks like in just a little bit um but let’s do PLT doow and this will be our first look and I need to say show not shot um we can get a little glimpse of what it looks like but this looks um absolutely terrible let’s change the figure size really quickly so I want to make this much larger than it already is we’ll do PLT Dot RC pams RC pams oops right there do an open parenthesis and then right here we’re going to do in quotes do figure. fig size this actually needs to be in brackets I believe just like this not parentheses we’ll say fig size is equal to and now we can specify the value that we want let’s do 10 comma 7 and see if this looks any better no no that’s doesn’t look good do 20 okay that looks a lot better and um you know this is just a quick way because it gives you basically a colorcoded system highly correlated is this tan all the way down to basically no correlation or negative correlation even which is black so when we’re looking at these 2022 populations and these are populations right down here on this axis we can see that all of these are extremely highly correlated very very quick whereas the rank really has nothing to do it’s it’s negatively correlated doesn’t really have anything to do with it then for the population and the world population percentage it again is quite correlated except for the area density and growth rate so I find that really interesting that you know the density the growth rate in the area aren’t really all that Associated or correlated with the population numbers that is I kind of of would have assumed that on some level they went hand inand the area does um would you know again make sense you know larger area larger population that kind of thing but even density um I guess I guess density and growth rate um growth rate I can see because that’s a percentile thing that could be definitely not correlated but I thought the density would be more correlated than it is all that to say is this is one way that you can kind of look at your data see how correlated it is to one another that can definitely um help you know what to analyze and look at later when you’re actually doing your data analysis let’s go right down here um something that I do almost all the time when I’m doing any type of uh exploratory data analysis like this I’m going to group together columns start looking at the data a little bit closer um so let’s go ahead and group on the continent so let’s look at it right here let’s group on this continent because sometimes when you’re doing this Eda you already know kind of what the end goal of this data set is you know kind of what you’re looking for what you’re going to visualize at the end that you really comes in handy when doing this but sometimes you don’t sometimes just going in blind and so far we’ve really just been going in blind we’re just throwing things at the wind kind of seeing some overviews um looking at correlation that’s all we’ve done now I kind of want to get more specific I want to have like a use case something I’m kind of looking for not doing full data analysis not diving Into the Depths but something we can kind of aim for so the use case or the question for us is are are there certain continents that have grown faster than others and in which ways so we want to focus on these continents we know that that’s the most important column for this use case this very fake use case um so we can group on this continent and we can look at these populations right here because we can’t really see growth you can see a growth rate but the density per uh kilometer we don’t have multiple values for that it’s just a static one single value same for growth rate same for world population percentage but we have this over a long span many many years um you know 50 years of data here so this we can see which countries have really done well or which continents have really done well so without you know talking about it even more let’s do DF Group by and then we’ll say continent oops let me just copy this I’m I’m not good at spelling I’m going to say DF Group by and then we’ll do mean and we can just do it just like this and now we have Africa Asia Europe North America Oceana and South America okay so if I’m being completely honest I knew most of these all right I’m no geography extra expert but I I knew most of these I don’t know what this ocean is um this that I don’t I genuinely don’t know what that is um so let’s just search for that value and see we’ll come back up here in just a second but I want to I want to kind of understand um what this is so we’re going to DF um and we’ll say content let me sound that out for you guys um then we’ll do do string. contains oops contains good night and then I want to look for Oceana uh and let’s let’s run this oh I need to do it like this now let’s run this so now we’re looking at our data frame we’re seeing what the values have this continent as Oceana um okay so these look like Islands I’m guessing so we have Fiji Guam um New Zealand Papa New Guinea yeah these look like all I’m I’m guessing based off the continent Oceana um Oceania o ocean Oceania guys this is tough for me okay I’m doing my best I you know this is part of the Eda process I don’t know what that means I don’t know what Oceana ocean ocean Oceania geez I’m just going to call it Oceana that’s so wrong but I’m just gonna it’s so easy for me to say you know I I now am seeing this and it looks like Islands um which would make sense because for their average they have the highest average rank um and I’m guessing that’s because they’re just mostly small continents so let’s let’s order this really quickly we’re going to do dot sortore values do an open parenthesis and I want to sort on the population we’re just doing the average population um we’ll do BU um equal so on the average population and we’ll do ascending equals false so when we’re looking at this average or the mean population Asia has the highest population on average then we have South America Africa Europe North America and then Oceana at the very bottom which makes perfect sense again small Islands um world population percentage so each of the countries each of those countries in Asia makes up about 1% on average really interesting um to know and just kind of look at this and the density in Asia is far higher than d almost double every single other continent um really really interesting actually now that I’m looking at this but you know that’s something that I would actually look into and I would be like what is this Oceana or oenia what does that mean and you know let me look into that let me explore that more because I want to know this data set I’m trying to really understand this data set well but what I want to do now is I want to visualize this um because I just feel like looking at it I don’t it’s hard to visualize and again the use case that we’re saying is is which continent has grown the fastest like it could be percentage wise it could be um you know as just a whole on average let’s take a look so we’re going to take this and let’s copy it like this let’s bring this right down here so let’s look at this so if I try to visualize this and let’s do that let’s do df2 is equal to because I’m I already know it’s not going to look good just based off how the data’s sitting um we can do df2 oops what am I doing I don’t need to do that but I will okay df2 and we’ll do df2 do lot and we’ll run it just like this um as you can see Asia South America Africa Europe North America Oceana we can kind of understand what’s happening but these are the actual um values that are being visualized not the continents which is what I wanted um in order to switch it and it’s actually pretty easy and this is something that um you know is good to know we can actually transpose it to where these these continents become the columns and the columns become the index and all we have to do is say df2 do transpose and we’ll do this parentheses right here and let’s just look at it and then we’ll save it so now all these columns are right here and all of the indexes are the columns so we’ll say df3 is equal to and I’m just doing that so I don’t you know write over the DF or my earlier data frames so now we have this data frame three so now let’s do data frame 3. plot and it should look quite a bit different uh whoops I didn’t run this let’s run this and run this and as you can see this does not look right at all and the reason is because we’re not only looking at uh the correct columns we have this density in here we population percentage rank we don’t need any of those the only ones that we want to keep are these ones right here this population now we can do that and we can just go right up here this is where we created that data frame two that we transposed we can go right up here and we can specify within this we actually only want specific values now we can go through and handr write all of these and by all means go for it but I am going to go down here I’m going to say DF do columns and I’m going to run this it’s going to give us this list of all of our columns and I’m just going to you can just copy this and you can put it right in here I need a list with I think it needs to be like this if I’m let me try running this okay so this worked properly you can do it just like this or a little shortcut if you want to do it like that if you want to do a shortcut like um I I would hope you would you would just do DF do columns just like how we looked at down here except since this is our an index we can search through it so we can just say 0 1 two okay so we can do five up to 13 so I think it’s seven and we’ll just let’s see if this works uh it may not I may actually need to go like this let’s see there we go so you can just use you know the indexing to save you some visual space gives you the exact same output so now we have this this is our df2 now let’s go down and transpose it so now we just have these populations and we have our continents right here and then now we’re going to plot it and this looks good although it’s backward um okay it’s backward so what I actually want to do is not this uh that is a quick way to do it although not the best way to do it um so I’m actually going to copy all of these and although I said it would save us time it did not at all so I’m going to put a bracket right here I’m going to paste this in here and I’m literally going to change these up I might speed this up or I might just have you sit through this because you know this is an interesting part of the process and I want you know you to get the full experience you know what now that I’m talking about it that is what we’re going to do do you guys can hang out with me this is a good time we have 2010 2015 2020 and 2022 now let’s run it what did I do oh too many brackets there we go so now it’s ordered appropriately we have 1970 all the way up to 2022 this is how we want it let’s transpose it appropriately let’s run it and now we basically have the inverted uh image of this now just at a glance and we haven’t done anything to this except for literally what we are looking at at a glance we can see that from 1970 China you know Asia and China are already in the lead by quite a bit and it continues to drastically go up especially in the 2000s like right here it explodes like just straight up then kind of starts going up and just leveling off every other continent especially oce ocean is just really low it it never has done a bunch let’s see look at green green has gone up um from you know Point let’s say 0.1 up to about 0.2 so they’ve almost doubled um in the last 50 years and again you can just get an overview a highlevel overview of each of these you know continents over the span of this time so this is kind of one way that we can you know look at that use case we’re not going to harp on that too long I just want to give you an example like you know when you’re looking at this sometimes you’ll have something in mind of what you’re looking for and you go exploring and just kind of find what’s out there and find what you see um the next thing I want to look at is a box plot now I personally I love box plots you know they’re really good for finding outliers and there’s a lot of outliers I already know this because the average the 25th 50 percentile are very low and then there’s some really just big outliers but for your data set it may not be that way and those outliers may be something that you really need to look into and box plots have been something that I’ve used a lot where I found those outliers that way and started to dig into the data to find those outliers and you know came across some stuff that I’m like oh I have to clean this up I have to go back to the source really um really really powerful and useful to be able to find these so all you have to do is DF dobox plot and let’s take a look at it and this already looks good as is maybe I’ll make it a little a little bit wider um let’s do fig size oops sorry fig size is equal to let’s try 20 by 10 um okay that didn’t help at all I apologize I thought I would but let’s keep going what this is showing us is that these little boxes down here which are actually usually much larger because you have a more equal distribution of of um numbers or values in the small value this is where our averages lie this number right here is the upper range and then all these values all these Open Circles those actually stand for outliers so we’re looking at the 2022 population there’s a lot of outliers now for our data set knowing our data set is really important outliers are to be expected especially when most countries or continents are small so we’re looking at you know all of these little dots are outlier countries um or outlier values which each value corresponds to a country so if this was a different data set I would be you know searching on these and trying to find these so that I can see what’s wrong with them if anything or if they are real um numbers like if this was Revenue everyone’s revenue is way down here and then there’s one company that’s making like 10 trillion dollar that’d be an outlier up here and it would definitely be something that you want to look into for our data set knowing that you know we’re looking at population this is more than acceptable you know oddly enough but that’s what box plots are really good for showing you some of those core tiles the upper and the lower um as well as denoting these points that fall outside of those normal ranges for you to look into so really really useful so now let’s go down here pull up our data frame again and we’ve kind of just zoomed into the whole Eda process there was one last thing that I wanted to show you and this is the very last thing that we’re going to look at we’re ending on really a low point if I’m being honest because the last kind of stuff was more much more exciting but there is something DF DOD types oops let’s do DF DOD types and we’ll run this now just like info it gave us these values but we’re actually able to search on these values now so these um object float and integer we can search on those which is really great because we can do include equal and we can use something like number and none of these are numbers right or none of them EXP say number but when we run it I’m getting an error series object not oh that’s because I’m doing um D types is for a series we need to do select underscore D types now let’s run this now it’s only returning um The Columns in this data frame where the data types are included in this number so you won’t see any you know country or any of those text or the strings if we want to do that we go in here and say object and run that and this is another really quick way where we can just filter those columns to look for specific whether it’s numeric um we could even do float in here and so now it’s not including that rank which was an integer so we can specify the type of data type and it’ll filter all of the columns based off of that which you know when you’re doing stuff like this you it is good to know what kind of data types you’re working with and look at just those types of data types because there might might be some type of analysis you want to perform on just that whether it’s numeric or just the string or integer columns within your data set so again ending on a low note I apologize um you know everything else that we looked at all those other things that we looked at are all things that I typically do in some way or another when I’m looking at a data set exploratory data analysis is really just the first look you’re looking at it you’re going to be cleaning it up doing the data cleaning process and then you’re going to be doing your actual data analysis actually finding those Trends and patterns and then visualizing it um in some way to find some kind of meaning or Insight or value from that data and again there’s a thousand different ways you can go about this it it does typically um you know depend on the data set but these are a lot of the ways that you’ll clean a lot of different data sets and so you know that’s why I went into the things that we looked at in this video so I hope that you guys liked it I hope that you enjoyed something in this tutorial if you like this video be sure to like And subscribe as well as check out all my other videos on pandas and Python and I will see you in the next video [Music]
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
This course teaches backend development using Node.js and Express.js, covering topics such as building APIs, database management with MongoDB and Mongoose, user authentication with JWTs, and securing the API with Arcjet. The curriculum also includes implementing rate limiting, bot protection, and automating email reminders using Upstash. Finally, the course details deploying the application to a VPS server for scalability and real-world experience. The instruction progresses from theoretical concepts to a hands-on project building a production-ready subscription management system. Throughout, the importance of clean code practices and error handling is emphasized.
Backend Development with Node.js and Express.js: A Study Guide
Quiz
Answer the following questions in 2-3 sentences each.
What is the primary difference between REST APIs and GraphQL APIs, as described in the text?
What are backend frameworks and why are they important in backend development? Give two examples.
What are the two main types of databases, and how do they differ in terms of data storage and querying?
When might you choose a non-relational database (NoSQL) over a relational database (SQL)?
What does a “rate limit exceeded” message indicate in the context of an API, and why is this implemented?
What is the purpose of a linter in software development, and why is it beneficial?
What is the significance of using nodemon during development? How does it streamline the development process?
Explain what environment variables are and why it’s crucial to manage them for different environments (development, production).
What are routes in the context of a backend application, and how do they relate to HTTP methods?
Briefly describe what middleware is and give an example of middleware that was mentioned in the text.
Quiz Answer Key
REST APIs often require multiple endpoints to fetch different data, while GraphQL uses a single endpoint where clients specify the exact data fields they need, making it more flexible and efficient. GraphQL minimizes over-fetching or under-fetching issues for complex applications.
Backend frameworks provide a structured foundation for building servers, handling repetitive tasks like routing and middleware. This allows developers to focus on the unique logic of their application. Examples include Express.js, Django, and Ruby on Rails.
Relational databases store data in structured tables with rows and columns and use SQL for querying and manipulating data. Non-relational databases offer more flexibility, storing unstructured or semi-structured data, and don’t rely on a rigid table structure.
You might choose NoSQL for handling large volumes of data, real-time analytics, or flexible data models, such as in social media apps, IoT devices, or big data analytics, where relationships between data points are less complex or not easily defined.
A “rate limit exceeded” message indicates that a client has made too many requests to an API within a certain time frame, which could potentially overwhelm the server. This is implemented to prevent bad actors or bots from making excessive calls that could crash the server.
A linter is a tool that analyzes source code for potential errors, bugs, and style inconsistencies. It helps developers maintain a clean and consistent codebase, making it easier to scale the application and avoid future issues.
Nodemon automatically restarts the server whenever changes are made to the codebase, this eliminates the need to manually restart the server each time a change is made, making development smoother and more efficient.
Environment variables are dynamic values that can affect the behavior of running processes. Managing them for different environments (like development and production) allows for different settings (like port numbers or database URIs) to be used without changing the underlying code.
Routes are specific paths (endpoints) in a backend application that map to specific functionalities, they define how the backend will respond to different HTTP requests (GET, POST, PUT, DELETE).
Middleware in a backend application is code that is executed in the middle of the request/response cycle. For example, the error handling middleware intercepts errors and returns useful information or the arcjet middleware protects the api against common attacks and bot traffic.
Essay Questions
Answer the following questions in well-structured essays.
Compare and contrast relational and non-relational databases. Discuss situations in which you would favor each type, and discuss benefits and challenges related to each.
Describe the process of creating user authentication using JSON Web Tokens (JWTs). Explain how JWTs are created, how they are used to authorize access, and how security is maintained within the authentication process.
Discuss the importance of middleware in backend application development. Provide examples of how middleware can be used to handle common tasks or security issues.
Describe how you would set up and configure a virtual private server (VPS) for hosting a backend application. What are some steps that must be taken to ensure a robust and secure setup?
Discuss the role of API rate limiting and bot protection in maintaining a stable and secure web application. Explain how these measures contribute to the overall user experience, and discuss the consequences of not implementing them.
Glossary
API (Application Programming Interface): A set of rules and protocols that allows different software applications to communicate with each other.
Backend: The server-side of a web application, responsible for processing data, logic, and interacting with databases.
Controller: In the MVC architectural pattern, controllers handle the application’s logic. They take user input from a view, process the information using a model, and update the view accordingly.
CRUD: An acronym that stands for Create, Read, Update, and Delete. These are the four basic operations that can be performed on data in databases.
Database: A system that stores, organizes, and manages data, it can be either relational or non-relational.
Environment Variable: A named value that is set outside the application to affect its behavior without changing the code.
GraphQL: A query language for APIs that allows clients to request exactly the data they need, avoiding over-fetching and under-fetching.
HTTP Client: Software used to send HTTP requests to servers, commonly used for testing and interacting with APIs.
HTTP Method: A verb (e.g., GET, POST, PUT, DELETE) that specifies the type of action to be performed in an HTTP request.
JSON (JavaScript Object Notation): A lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate.
JSON Web Token (JWT): A standard method for securely transferring information between parties as a JSON object. Used for authentication and authorization in web applications.
Linter: A tool that analyzes source code for potential errors, bugs, and style inconsistencies.
Middleware: Code that is executed in the middle of the request/response cycle in an application, performing specific tasks, such as request logging, data validation, and error handling.
Model: In the MVC architectural pattern, models represent the data and business logic of the application.
Mongoose: An Object Data Modeling (ODM) library for MongoDB and Node.js, providing a schema-based way to structure data.
NoSQL Database (Non-Relational Database): A type of database that doesn’t follow the relational model of tables with rows and columns, often used for unstructured or semi-structured data.
ORM (Object Relational Mapper): Software that acts as a bridge between object-oriented programming languages and relational databases allowing developers to interact with the database using objects instead of SQL.
Rate Limiting: A technique used to control the number of requests a client can make to an API within a given time frame, preventing overuse or abuse.
Relational Database (SQL Database): A type of database that stores data in structured tables with rows and columns and uses SQL (Structured Query Language) for querying and manipulating data.
REST API (Representational State Transfer Application Programming Interface): An API that adheres to the REST architectural style, using standard HTTP methods (GET, POST, PUT, DELETE).
Route: A specific path (endpoint) in a backend application that maps to a specific function, allowing an application to handle different HTTP requests and deliver content accordingly.
Salt: Random data that is used as an additional input to a one-way function that “hashes” data like a password, preventing dictionary and rainbow table attacks.
SQL (Structured Query Language): A standard language for accessing and manipulating data in relational databases.
VPS (Virtual Private Server): A virtual server that operates within a larger server, often used for hosting web applications and APIs.
Node.js Backend API Development
Okay, here’s a detailed briefing document summarizing the provided text, focusing on key themes and important ideas, along with relevant quotes:
Briefing Document: Building a Backend API with Node.js
Introduction
This document summarizes a tutorial focused on building a backend API using Node.js, Express.js, and MongoDB, covering essential concepts such as API design, database management, security measures, and deployment. The tutorial emphasizes a practical approach, guiding users through each stage of development, from setting up the environment to deploying the final application.
Key Themes & Concepts
API Fundamentals:
REST vs. GraphQL: The tutorial briefly introduces GraphQL as a more flexible alternative to REST APIs, allowing clients to request specific data, avoiding over-fetching.
Quote: “graphql apis developed by Facebook which offer more flexibility than Rest apis by letting clients request exactly the data they need instead of multiple endpoints for different data”
Backend Languages and Frameworks: It highlights that to build APIs, backend languages like Python, Ruby, Java or JavaScript runtimes such as Node.js are needed. Frameworks like Express, Hono, and NestJS (for JavaScript) are introduced as structured foundations for building servers, reducing repetitive tasks.
Quote: “Frameworks provide a structured foundation for building servers and they handle repetitive tasks like routing middleware and aor handling so you can focus on your apps unique logic”
API Endpoints: The text emphasizes the importance of creating well-defined API endpoints, showing how routes are handled with Express.js (e.g., app.get, app.post).
Database Management:
Database Fundamentals: The source explains that a database is “a system that stores organizes and manages data” and emphasizes they’re optimized for speed, security, and scalability.
Relational (SQL) vs. Non-Relational (NoSQL) Databases: The tutorial differentiates between relational databases (using SQL, like MySQL, PostgreSQL) and non-relational databases (NoSQL, like MongoDB, Redis). It recommends SQL for highly structured data and NoSQL for more flexible models.
Quote: “relational databases store data in structured tables with rows and columns much like a spreadsheet… they use something known as SQL a structured query language which allows you to query and manipulate data” *Quote: “non relational databases also referred to as nosql databases they offer more flexibility and don’t rely on a rigid structure of tables they handled unstructured or semi-structured data making them perfect when data relationships are less complex”
MongoDB Atlas: The course uses MongoDB Atlas, a cloud-based NoSQL database service, for its convenience and free tier.
Setting up the Development Environment
Node.js & npm: The tutorial demonstrates using Node.js with npm for package management, including installing dependencies such as Express, nodemon, and eslint.
Express Generator: It shows how to use the Express generator to quickly set up a basic application structure. *Quote: “simply run MPX express-g generator and add a no view flag which will skip all the front- end stuff since we’re focusing just on the back end”
Nodemon: Nodemon is used to automatically restart the server whenever code changes, enhancing the development experience. *Quote: “what nodemon does is it always restarts your server whenever you make any changes in the code”
ESLint: ESLint is employed to maintain code quality and consistency.
Environment Variables: The text explains the use of .env files and the dotenv package for managing environment-specific configurations.
Security and Authentication
Rate Limiting: It introduces the concept of rate limiting to prevent API abuse, using tools like Arcjet, showing how to implement rate limiters
Quote: “you’ll be hit with a rate limit exceeded this means that you’ll stop bad users from making additional requests and crashing your server”
Bot Protection: It shows how to add a layer of bot protection to block malicious users or Bots from accessing your API.
Quote: “we’ll also Implement a bot protection system that will block them from accessing your API all of that using arcjet”
JSON Web Tokens (JWT): JWTs are used for user authentication. The tutorial demonstrates generating and verifying JWTs to protect API endpoints.
Password Hashing: Bcrypt is used to hash passwords, ensuring secure storage in the database.
Authorization Middleware: A custom middleware is introduced to verify user tokens and protect private routes.
Quote: “This means if at any point something goes wrong don’t do anything aboard that transaction”
Application Logic:
Controllers: It introduces the use of controller files to house the logic for handling API routes, keeping the routes files clean.
Models: Mongoose is used to create data models (schemas) for both users and subscriptions, defining data structure and validation rules. The subscription model is very comprehensive, showcasing use of validators, enums and timestamps as well as pre-save hooks and virtual fields.
CRUD Operations: The tutorial shows how to implement Create, Read, Update, and Delete (CRUD) operations for users and subscriptions.
Quote: “a foundational element of every sing s Le API out there you need to be able to delete create or read or update literally anything out there”
Error Handling: A global error handling middleware is created to manage and format responses for various types of errors, such as resource not found, duplicate keys, and validation errors, which helps with debugging.
Advanced Features:
Atomic Operations: The concept of atomic operations is introduced by making database transactions. This makes it so multiple operations can be treated as single units of work, preventing partial updates. *Quote: “database operations have to be Atomic which means that they either have to do All or Nothing insert either works completely or it doesn’t”
Upstash Workflow: The guide also introduces a system for setting up email reminders using Upstash, a platform for serverless workflows. Upstash helps set up tasks which can be triggered on a cron basis to send emails or SMS messages to a user. This also shows how to set up Upstash workflows using a local development server for testing purposes.
Email Reminders: NodeMailer is used to send reminder emails to users based on subscription renewal dates. This includes a nice custom email template with HTML.
Deployment
Virtual Private Server (VPS): The tutorial uses Hostinger VPS for deploying the API, emphasizing the flexibility and control it offers.
Git: Git is used for version control and for transferring the code to the VPS.
PM2: PM2 is used as a process manager to keep the Node.js application running reliably on the VPS. The document does note that the deployment portion may have errors as it depends on the operating system chosen for the VPS but there is a free, step by step guide included to finish the deployment.
Key Quotes
“to build any of these apis you’ll need a backend language so let’s explore our C to build your apis you could use languages like python Ruby Java or JavaScript runtimes like node bun or Dino”
“building a backend isn’t just about creating API endpoints it’s about managing data you might think well why not just store the data directly on the server well that’s inefficient and doesn’t scale as your app grows that’s why every backend relies on dedicated Storage Solutions commonly known as databases”
“think of a database as a specialized software that lives on a computer somewhere whether that’s your laptop a company server or a powerful machine in a remote data center just like your laptop stores files on a hard drive or an SSD databases store data but here’s the difference databases are optimized for Speed security and scalability”
“you should do what we’re doing in this video where you’re going to have users or subscriptions and then you can either have a specific Item ID or you can just have for/ subscriptions and get all of them”
“you never want to share those with the internet great in the next lesson let’s set up our routes to make your API serve its purpose”
“typically you needed routes or endpoints that do their job that way front-end apps or mobile apps or really any one that you allow can hit those endpoints to get the desired data”
“we’re basically dealing with crud functionalities right here a foundational element of every sing s Le API out there you need to be able to delete create or read or update literally anything out there”
“now is the time to set up our database you could use something older like postgress or or maybe something modern like neon which is a serverless platform that allows you to host postr databases online then you could hook it up with an orm like drizzle and it would all work but in this course I’ll use mongodb Atlas”
“models in our application let us know how our data is going to look like”
“we’re not keeping it super simple I got to keep you on your toes so you’re always learning something and then we also have references pointing to other models in the database”
“we can create another middleware maybe this one will actually check for errors and then only when both of these middle Wares call their next status we are actually navigated over to the controller which handles the actual logic of creating a subscription”
“what we’re doing here is we’re intercepting the error and trying to find a bit more information about it so we much more quickly know what went wrong”
“controllers form the logic of what happens once you hit those routes”
“hashing a password means securing it because you never want to store passwords in plain text”
“rate limiting is like a rule that says hey you can make a certain number of request in a given time and it prevents people or most commonly Bots from overwhelming your servers with two many requests at once keeping your app fast and available for everyone”
“not all website visitors are human there are many Bots trying to scrape data guess passwords or just Spam your service bot prot protection helps you detect and block this kind of bad traffic so your platform stays secure and functional”
“you’ll be able to see exactly what is happening on your website are people spamming it or are they using it politely”
“every routes file has to have its own controllers file”
“you should always validate your request with the necessary authorization procedure before creating any kind of document in your application”
“it’s going to say something like USD 10 monthly”
“you built your own API but as I said we’re not finishing here in that free guide that will always be up to date you can finish this course and deploy this API to a VPS so it becomes publicly and globally accessible”
Conclusion
The tutorial provides a comprehensive guide to building a backend API from start to finish. It covers many topics, including setting up a development environment, creating an API, managing a database, implementing security, and deploying the application. The step by step approach and the focus on using tools make this a useful guide for anyone trying to build their own API.
Building and Securing GraphQL APIs
Frequently Asked Questions:
What are the advantages of using GraphQL APIs compared to REST APIs? GraphQL APIs offer greater flexibility than REST APIs by allowing clients to request the specific data they need. Unlike REST where multiple endpoints may be required for different data sets, GraphQL uses a single endpoint and clients can specify the precise fields required. This is particularly efficient for complex applications with lots of interconnected data, as it reduces over-fetching (getting more data than required) or under-fetching (not getting all the data required) of information.
What are backend frameworks and why are they essential for building APIs? Backend frameworks provide a structured foundation for building servers and APIs. They handle repetitive tasks like routing, middleware, and error handling, allowing developers to focus on the application’s specific logic. This significantly reduces the amount of code needed to start, thus accelerating the development process. Popular frameworks include Express.js, Hono, and NestJS for JavaScript; Django for Python; Ruby on Rails for Ruby; and Spring for Java.
Why are databases essential for backend development, and what are the two primary types? Databases are specialized systems designed for efficient storage, organization, and management of data, essential for the backend of an application. They are optimized for speed, security, and scalability. The two primary types are relational and non-relational databases: relational databases store data in structured tables with rows and columns, using SQL, and are best for structured data like in banking systems, while non-relational (NoSQL) databases like MongoDB offer greater flexibility for unstructured or semi-structured data, ideal for social media apps or real-time analytics.
How do relational (SQL) and non-relational (NoSQL) databases differ, and when should each be used? Relational databases (SQL) organize data into tables with rows and columns, using SQL for querying and manipulation, making them best for structured data and complex relationships, such as in banking or e-commerce. NoSQL databases, like document-based MongoDB or key-value stores like Redis, offer greater flexibility and can handle unstructured or semi-structured data. NoSQL databases are preferred when dealing with large volumes of data, real-time analytics, or flexible data models, as often seen in social media platforms, IoT devices or big data analytics.
What is rate limiting and bot protection, and why are they crucial for API security? Rate limiting is a technique used to control the number of requests a user can make within a specific time frame, preventing API spam and denial-of-service attacks. Bot protection systems identify and block malicious bot traffic, protecting the API from unauthorized access and abuse. Both are essential to maintain server stability, performance, and prevent potential system crashes due to malicious or unintended excessive use.
What is middleware, and how is it utilized in the context of a backend application? Middleware in a backend application is code that is executed before or after a request is processed by your application routes. It acts as a layer to intercept, modify, or add to the request/response cycle. Some common middleware examples are authentication middleware to check authorization levels or global error handling middleware to ensure any application errors are handled gracefully. Middleware is useful to maintain modular and reusable code, implementing functionalities like logging, authorization, or data validation and transformation.
What are JSON Web Tokens (JWTs) and how are they used in the provided system for authentication and authorization? JSON Web Tokens (JWTs) are a standard method for representing claims securely between two parties. In the provided system, JWTs are used for authentication and authorization. When a user signs up or signs in, the server generates a JWT containing the user ID and sends it back to the client. For subsequent requests to protected routes, clients include the JWT in the request header. The server then verifies the JWT, authenticating the user and determining whether they have the necessary authorization to access the route. If invalid or missing, the user will receive an unathorized error message.
What is the purpose of using a local development server for workflows, such as those developed with Upstash, and why is it beneficial? Local development servers allow you to test and debug workflows without having to deploy code to a live environment. They simulate a production-like setup, enabling you to identify and fix potential issues. This is particularly useful with tools like Upstash, where it enables unlimited tests without incurring costs associated with running the workflows. This helps reduce costs and save time from complex setups, making the development process more efficient.
Backend Development Fundamentals
Backend development is crucial for the functionality of applications, handling data, security, and performance behind the scenes. It involves servers, databases, APIs, and authentication.
Here’s a breakdown of key backend concepts:
The Web’s Two Parts: The web is divided into the front end, which focuses on user experience, and the backend, which manages data and logic.
Servers: Servers are powerful computers that store, process, and send data. They host the backend code that manages users, processes data, and interacts with databases.
Client-Server Communication: Clients (like browsers) send requests to servers. Servers process these requests and send back data.
Protocols: Computers use communication rules called protocols, with HTTP (Hypertext Transfer Protocol) as the backbone of the internet. HTTPS is the secure version of HTTP.
DNS (Domain Name System): DNS translates domain names (like google.com) into IP addresses (like 192.168.1.1), which are unique identifiers for devices on the internet.
APIs (Application Programming Interfaces): APIs allow applications to communicate with the backend. They define how clients and servers interact by using HTTP methods to define actions, endpoints (URLs for specific resources), headers (metadata), request bodies (data sent to the server), and response bodies (data sent back).
HTTP Methods/Verbs: APIs use HTTP methods like GET (retrieve data), POST (create new data), PUT/PATCH (update data), and DELETE (remove data).
API Endpoints: A URL that represents a specific resource or action on the backend.
Status Codes: API calls use status codes to indicate what happened, such as 200 (OK), 201 (created), 400 (bad request), 404 (not found), and 500 (internal server error).
RESTful APIs: REST (Representational State Transfer) APIs are structured, stateless, and use standard HTTP methods, making them widely used for web development.
GraphQL APIs: GraphQL APIs, developed by Facebook, allow clients to request specific data, reducing over-fetching and under-fetching, which makes them efficient for complex applications.
Backend Languages: Languages like Python, Ruby, Java, and JavaScript (with runtimes like Node.js) can be used to build APIs.
Backend Frameworks: Frameworks like Express.js (for JavaScript), Django (for Python), Ruby on Rails (for Ruby), and Spring (for Java) provide a structured foundation for building servers, handling routing, middleware, and errors, allowing developers to focus on the app’s logic.
Databases are crucial for storing, organizing, and managing data, optimized for speed, security, and scalability. They are classified into two main types:
Relational Databases: These store data in tables with rows and columns and use SQL (Structured Query Language) to query and manipulate data (e.g., MySQL, PostgreSQL). They are suitable for structured data with clear relationships.
Non-Relational Databases (NoSQL): These databases offer more flexibility, handling unstructured or semi-structured data (e.g., MongoDB, Redis). They are useful for large data volumes, real-time analytics, or flexible data models.
ORM (Object-Relational Mappers): ORMs simplify database interactions by allowing queries to be written in the syntax of the chosen programming language, instead of raw SQL.
Backend Architectures:
Monolithic Architecture: All application components are combined into a single codebase. It’s simple to develop and deploy but can become difficult to scale.
Microservices Architecture: An application is broken down into independent services communicating via APIs. This is good for large-scale applications requiring flexibility and scalability.
Serverless Architecture: Allows developers to write code without managing the underlying infrastructure. Cloud providers manage provisioning, scaling, and server management.
Other important concepts include:
Authentication: Securing applications by verifying user identity and using techniques like JWTs (JSON Web Tokens) to authenticate users.
Authorization: Managing access to resources based on the user’s role or permissions.
Middleware: Functions that intercept requests, allowing for actions like error handling, authorization, and rate limiting.
Rate Limiting: Restricting the number of requests a user can make within a given time frame, preventing server overload or abuse.
Bot Protection: Techniques that detect and block automated traffic from malicious bots.
In summary, backend development involves creating the logic and infrastructure that power applications, handling data storage, user authentication, and ensuring smooth performance.
Subscription System Backend Development
A subscription system, as discussed in the sources, involves several key components related to backend development:
Core Functionality: The primary goal of a subscription system is to manage users, their subscriptions, and related business logic, including handling real money.
Backend Focus: The backend handles all the logic, from processing data to managing users and interacting with databases, while the front end is focused on the user interface.
Subscription Tracking API: This API is built to manage subscriptions, handle user authentication, manage data, and automate email reminders. It includes functionalities such as:
User Authentication: Using JSON Web Tokens (JWTs) to authenticate users.
Database Modeling and Relationships: Utilizing databases like MongoDB and Mongoose to model data.
CRUD Operations: Performing create, read, update, and delete operations on user and subscription data.
Subscription Management: Managing subscription lifecycles, including calculating renewal dates and sending reminders.
Global Error Handling: Implementing middleware for input validation, error logging, and debugging.
Rate Limiting and Bot Protection: Securing the API with tools like Arcjet to prevent abuse.
Automated Email Reminders: Using services like Upstash to schedule email notifications for subscription renewals.
API Endpoints: These are specific URLs that handle different actions related to subscriptions. Examples include:
GET /subscriptions: Retrieves all subscriptions.
GET /subscriptions/:id: Retrieves details of a specific subscription.
POST /subscriptions: Creates a new subscription.
PUT /subscriptions/:id: Updates an existing subscription.
DELETE /subscriptions/:id: Deletes a subscription.
GET /subscriptions/user/:id: Retrieves all subscriptions for a specific user.
PUT /subscriptions/:id/cancel: Cancels a user subscription.
GET /subscriptions/renewals: Retrieves all upcoming renewals.
Data Validation: Ensuring that the data sent to the backend is correct, for example, by using validation middleware to catch any errors.
Database Interaction: Using queries to store, retrieve, update, and delete data in the database. Object-relational mappers (ORMs) like Mongoose are used to simplify these interactions.
Workflows: Automating tasks using systems like Upstash, particularly for scheduling notifications or other business logic. This includes:
Triggering workflows when a new subscription is created.
Retrieving subscription details from the database.
Checking the subscription status and renewal date.
Scheduling email reminders before the renewal date.
Email Reminders: The system sends automated email reminders for upcoming subscription payments, allowing users to cancel subscriptions on time.
Deployment: The subscription system can be deployed to a virtual private server (VPS) for better performance, control, and customization. This requires server management, database backups, and real-world deployment skills.
Security: Includes measures to protect the system from abuse such as rate limiting and bot protection.
In summary, a subscription system involves building a comprehensive backend infrastructure that handles user authentication, manages subscription data, ensures data integrity, and automates notifications, all while providing a secure and scalable environment.
Database Management in Backend Development
Database management is a critical aspect of backend development, involving the storage, organization, and management of data. Databases are optimized for speed, security, and scalability and are essential for applications to function effectively. The sources discuss key aspects of database management, including types of databases, how applications interact with them, and methods to manage data efficiently:
Types of Databases:
Relational Databases (SQL): These databases store data in structured tables with rows and columns. They use SQL (Structured Query Language) for querying and manipulating data. Relational databases are suitable for structured data with clear relationships and are often used in banking, e-commerce, and inventory management. Popular examples include MySQL and PostgreSQL.
Non-Relational Databases (NoSQL): These databases offer more flexibility and do not rely on a rigid table structure. They are designed to handle unstructured or semi-structured data, making them suitable for social media apps, IoT devices, and big data analytics. NoSQL databases include document-based databases like MongoDB, which store data in JSON-like documents, and key-value pair databases like Redis.
Database Interactions:
Client-Server Communication: The client sends a request to the backend, which processes the request and determines what data is needed.
Queries: The backend sends queries to the database to fetch, update, or delete data. In SQL databases, queries use SQL syntax, while in NoSQL databases like MongoDB, queries are often similar to JavaScript syntax.
Data Retrieval: The database returns the requested data to the server, which then formats it (usually as JSON) and sends it back to the client.
Data Management:
CRUD Operations: Databases support CRUD (Create, Read, Update, Delete) operations, which are fundamental for managing resources.
Raw Queries: Developers can write raw queries to interact with the database, offering full control but potentially increasing complexity and errors.
ORMs (Object-Relational Mappers): ORMs simplify database interactions by allowing developers to write queries in the syntax of their chosen programming language instead of raw SQL. Popular ORMs include Prisma and Drizzle for SQL databases and Mongoose for MongoDB. ORMs speed up development and help prevent errors.
Database Selection:
Structured vs. Unstructured Data: The choice between relational and non-relational databases depends on the type of data and the application’s needs. Relational databases are best for structured data with clear relationships, while non-relational databases are suitable for massive, unstructured data and flexible data models.
Database Modeling:
Schemas: Databases utilize schemas to define the structure of data.
Models: Models are created from the schema to create instances of the data structure, for example, User or Subscription.
Key Considerations:
Speed: Databases are optimized for fast data retrieval and storage.
Security: Databases implement security measures to protect data.
Scalability: Databases are designed to handle growing amounts of data and user traffic.
MongoDB:
MongoDB Atlas: A cloud-based service that allows for easy creation and hosting of MongoDB databases, including free options.
Mongoose: An ORM used with MongoDB to create database models and schemas. Mongoose also simplifies data validation and other model-level operations.
In summary, effective database management involves choosing the right type of database for the application’s needs, using efficient methods to interact with the database, and ensuring that data is stored, retrieved, and managed securely and scalably. The use of ORMs can significantly simplify these processes, allowing developers to focus on application logic rather than low-level database operations.
API Development Fundamentals
API (Application Programming Interface) development is a crucial part of backend development, facilitating communication between different software systems. APIs define the rules and protocols that allow applications to interact with each other, enabling the exchange of data and functionality. The sources provide a detailed overview of API development, covering key concepts, components, types, and best practices:
Fundamentals of APIs:
Definition: An API is an interface that enables different applications to communicate and exchange data. It acts like a “waiter” that takes requests from the client (e.g., a web app or mobile app) to the backend (the “kitchen”) and returns the requested data.
Client-Server Communication: APIs facilitate how clients and servers communicate, using protocols such as HTTP.
Function: APIs enable apps to fetch new data, manage resources, and perform actions on the backend.
Key Components of APIs:
HTTP Methods (Verbs): These methods define the type of action to be taken on a resource.
GET: Retrieves data from the server. For example, GET /users to get a list of users.
POST: Creates a new resource on the server. For example, POST /users to create a new user.
PUT/PATCH: Updates an existing resource. For example, PUT /users/:id to update a specific user.
DELETE: Removes a resource from the server. For example, DELETE /users/:id to delete a specific user.
Endpoints: These are URLs that specify a particular resource or action on the backend. For example, /users, /subscriptions and /auth/signup.
Headers: Headers contain metadata about the request or response, such as authentication tokens, content type, or caching instructions. For example, an authorization header often includes a bearer token for verifying the user’s identity.
Request Body: The request body contains the data being sent to the server, usually in JSON format. This is used in POST and PUT requests.
Response Body: The response body contains the data sent back by the server after processing the request, typically also in JSON format.
Status Codes: These codes indicate the outcome of an API call.
200 (OK): Indicates a successful request.
201 (Created): Indicates a resource has been successfully created.
400 (Bad Request): Indicates something went wrong with the request.
404 (Not Found): Indicates that the requested resource does not exist.
401 (Unauthorized): Indicates that the user does not have permission to access the resource.
500 (Internal Server Error): Indicates a server-side error.
API Design:
Naming Conventions: Use nouns in URLs to specify resources and HTTP verbs to specify actions. For example, use /users instead of /getUsers. Use plural nouns for resources, and use hyphens to connect words together in URLs.
RESTful Principles: Follow the principles of RESTful architecture which is the most common thing that you will see in web development, for example, use standard HTTP methods like GET, POST, PUT, and DELETE and maintain statelessness of requests.
Types of APIs:
RESTful APIs: These are the most common type of APIs, following a structured approach where clients interact with resources via URLs and standard HTTP methods. RESTful APIs are stateless and typically use JSON.
GraphQL APIs: These APIs offer more flexibility by allowing clients to request only the data they need via a single endpoint, which avoids over-fetching or under-fetching data. This approach is beneficial for complex applications.
API Development Process:
Backend Language: Use languages such as Python, Ruby, Java, or JavaScript runtimes like Node, Bun, or Deno.
Backend Frameworks: Utilize frameworks like Express (for JavaScript), Django (for Python), Ruby on Rails (for Ruby), or Spring (for Java) to provide a structured foundation for building servers and handling repetitive tasks such as routing, middleware, and error handling.
Database Management: Connect your API to a database to store and retrieve data, using either raw queries or ORMs.
Middleware: Implement middleware for input validation, error handling, authentication, rate limiting and bot protection.
Security: Implement security measures such as authorization and protection from malicious users.
Authorization: Ensure only authorized users can access certain routes by verifying tokens included in requests.
Rate Limiting: Restrict the number of requests a user can make within a specific period to prevent abuse.
Bot Protection: Implement systems to detect and block bot traffic.
Example API Endpoints:
/api/v1/auth/signup (POST): Creates a new user.
/api/v1/auth/signin (POST): Signs in an existing user.
/api/v1/users (GET): Retrieves a list of users.
/api/v1/users/:id (GET): Retrieves a specific user by ID.
/api/v1/subscriptions (GET): Retrieves all subscriptions.
/api/v1/subscriptions (POST): Creates a new subscription.
/api/v1/subscriptions/user/:id (GET): Retrieves subscriptions of a specific user.
Testing APIs:
Use tools like HTTP clients (e.g., HTTPie, Postman, Insomnia, Bruno) to test API endpoints and simulate requests.
Test with different HTTP methods and request bodies to ensure correct functionality.
In summary, API development involves designing and building interfaces that allow applications to communicate effectively. This includes defining endpoints, choosing HTTP methods, structuring request and response bodies, handling errors, and implementing security measures. The use of backend frameworks and adherence to best practices ensure that APIs are scalable, maintainable, and secure.
Backend Application Server Deployment
Server deployment is a critical step in making a backend application accessible to users. It involves setting up the necessary infrastructure and configurations to host the application, making it available over the internet. The sources provide key insights into server deployment, covering essential aspects such as types of servers, deployment processes, and tools involved:
Types of Servers:
Physical Servers: These are actual machines in data centers that you can own or rent.
Virtual Private Servers (VPS): A VPS is like having your own computer in the cloud, offering dedicated resources, full control, and customization without the high cost of a physical machine. VPS hosting is suitable for deploying APIs, full-sta
ck applications, databases, and other server-side applications.
Cloud Servers: Cloud providers such as AWS provide servers that can be rented and configured through their services.
Serverless Architecture: This allows developers to write code without managing the underlying infrastructure, with cloud providers handling provisioning, scaling, and server management.
Deployment Process:
Setting up the Server: This involves configuring the server’s operating system (often Linux) and installing necessary software, such as Node.js, npm, and Git.
Transferring Codebase: Use Git to transfer your application’s code from your local development environment to the server. This usually involves pushing code to a repository and cloning it on the server.
Installing Dependencies: Install all the application’s dependencies on the server using a package manager like npm.
Configuring Environment Variables: Set up environment variables on the server to handle different environments such as development, staging, or production. This involves adding environment variables for databases, API keys, and other sensitive information.
Running the Application: Use a process manager like pm2 to ensure that the application runs continuously, even if it crashes or the server reboots. A process manager also allows for background execution.
Testing: After deploying the server, testing API endpoints through HTTP clients is essential to ensure the deployed application functions as expected.
Key Considerations for VPS Hosting:
Dedicated Resources: A VPS provides dedicated RAM and SSD storage for better performance.
Full Control: You have full control and customization over your server, allowing you to run applications as desired.
Real-World Skills: Hosting on a VPS provides hands-on experience with server management, database backup, and real-world deployments.
Cost-Effective: VPS hosting is a cost-effective alternative to physical servers and provides better performance than regular shared hosting.
Tools and Technologies
Git: A version control system for managing and transferring code.
npm: A package manager used for installing packages, libraries, and tools needed for Node.js applications.
pm2: A process manager for Node.js applications that ensures applications keep running.
SSH: Secure Shell Protocol is used to remotely manage the server through a terminal.
Operating System: Linux, like Ubuntu, is often the preferred choice for hosting servers.
Deployment Workflow
Development: Develop the application on a local machine using a code editor or IDE.
Testing: Test the application locally to ensure all features work as expected before deploying.
Code Transfer: Use Git to upload the code to a repository like GitHub, and then clone the repository to the VPS.
Environment Setup: Configure all necessary environment variables on the server.
Dependency Installation: Install all the required packages using npm install.
Application Execution: Run the application using pm2 to start the process and keep it running in the background.
Monitoring: Regularly monitor the server to ensure optimal performance and identify any potential issues.
In summary, server deployment is a crucial process for making a backend application accessible to users. It involves setting up a server (physical, virtual, or serverless), transferring the codebase, installing dependencies, configuring the environment, and running the application. VPS hosting offers dedicated resources, full control, and real-world deployment skills, making it a valuable option for deploying backend applications. Following best practices and using the right tools will ensure a smooth and successful deployment process.
Complete Backend Course | Build and Deploy Your First Production-Ready API
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
This PDF excerpt details a PyTorch deep learning course. The course teaches PyTorch fundamentals, including tensor manipulation and neural network architecture. It covers various machine learning concepts, such as linear and non-linear regression, classification (binary and multi-class), and computer vision. Practical coding examples using Google Colab are provided throughout, demonstrating model building, training, testing, saving, and loading. The course also addresses common errors and troubleshooting techniques, emphasizing practical application and experimentation.
PyTorch Deep Learning Study Guide
Quiz
What is the difference between a scalar and a vector? A scalar is a single number, while a vector has magnitude and direction and is represented by multiple numbers in a single dimension.
How can you determine the number of dimensions of a tensor? You can determine the number of dimensions of a tensor by counting the number of pairs of square brackets, or by calling the endim function on a tensor.
What is the purpose of the .shape attribute of a tensor? The .shape attribute of a tensor returns a tuple that represents the size of each dimension of the tensor. It indicates the number of elements in each dimension, providing information about the tensor’s structure.
What does the dtype of a tensor represent? The dtype of a tensor represents the data type of the elements within the tensor, such as float32, float16, or int32. It specifies how the numbers are stored in memory, impacting precision and memory usage.
What is the difference between reshape and view when manipulating tensors? Both reshape and view change the shape of a tensor. Reshape copies data and allocates new memory, while view creates a new view of the existing tensor data, meaning that changes in the view will impact the original data.
Explain what tensor aggregation is and provide an example. Tensor aggregation involves reducing the number of elements in a tensor by applying an operation like min, max, or mean. For example, finding the minimum value in a tensor reduces all of the elements to a single number.
What does the stack function do to tensors and how is it different from unsqueeze? The stack function concatenates a sequence of tensors along a new dimension, increasing the dimensions of the tensor by one. The unsqueeze adds a single dimension to a target tensor at a specified dimension.
What does the term “device agnostic code” mean, and why is it important in PyTorch? Device-agnostic code in PyTorch means writing code that can run on either a CPU or GPU without modification. This is important for portability and leveraging the power of GPUs when available.
In PyTorch, what is a “parameter”, how is it created, and what special property does it have? A “parameter” is a special type of tensor created using nn.parameter that is a module attribute. When assigned as a module attribute, parameters are automatically added to a module’s parameter list, enabling gradient tracking during training.
Explain the primary difference between the training loop and the testing/evaluation loop in a neural network. The training loop involves the forward pass, loss calculation, backpropagation and updating the model’s parameters through optimization, whereas the testing/evaluation loop involves only the forward pass and loss and/or accuracy calculation without gradient calculation and parameter updates.
Essay Questions
Discuss the importance of tensor operations in deep learning. Provide specific examples of how reshaping, indexing, and aggregation are utilized.
Explain the significance of data types in PyTorch tensors, and elaborate on the potential issues that can arise from data type mismatches during tensor operations.
Compare and contrast the use of reshape, view, stack, squeeze, and unsqueeze when dealing with tensors. In what scenarios might one operation be preferable over another?
Describe the key steps involved in the training loop of a neural network. Explain the role of the loss function, optimizer, and backpropagation in the learning process.
Explain the purpose of the torch.utils.data.DataLoader and the advantages it provides. Discuss how it can improve the efficiency and ease of use of data during neural network training.
Glossary
Scalar: A single numerical value. It has no direction or multiple dimensions.
Vector: A mathematical object that has both magnitude and direction, often represented as an ordered list of numbers, i.e. in one dimension.
Matrix: A rectangular array of numbers arranged in rows and columns, i.e. in two dimensions.
Tensor: A generalization of scalars, vectors, and matrices. It can have any number of dimensions.
Dimension (dim): Refers to the number of indices needed to address individual elements in a tensor, which is also the number of bracket pairs.
Shape: A tuple that describes the size of each dimension of a tensor.
Dtype: The data type of the elements in a tensor, such as float32, int64, etc.
Indexing: Selecting specific elements or sub-tensors from a tensor using their positions in the dimensions.
Reshape: Changing the shape of a tensor while preserving the number of elements.
View: Creating a new view of a tensor’s data without copying. Changing the view will change the original data, and vice versa.
Aggregation: Reducing the number of elements in a tensor by applying an operation (e.g., min, max, mean).
Stack: Combining multiple tensors along a new dimension.
Squeeze: Removing dimensions of size 1 from a tensor.
Unsqueeze: Adding a new dimension of size 1 to a tensor.
Device: The hardware on which computations are performed (e.g., CPU, GPU).
Device Agnostic Code: Code that can run on different devices (CPU or GPU) without modification.
Parameter (nn.Parameter): A special type of tensor that can be tracked during training, is a module attribute and is automatically added to a module’s parameter list.
Epoch: A complete pass through the entire training dataset.
Training Loop: The process of iterating through the training data, calculating loss, and updating model parameters.
Testing/Evaluation Loop: The process of evaluating model performance on a separate test dataset.
DataLoader: A utility in PyTorch that creates an iterable over a dataset, managing batching and shuffling of the data.
Flatten: A layer that flattens a multi-dimensional tensor into a single dimension.
PyTorch Deep Learning Fundamentals
Okay, here’s a detailed briefing document summarizing the key themes and ideas from the provided source, with relevant quotes included:
Briefing Document: PyTorch Deep Learning Fundamentals
Introduction:
This document summarizes the core concepts and practical implementations of PyTorch for deep learning, as detailed in the provided course excerpts. The focus is on tensors, their properties, manipulations, and usage within the context of neural network building and training.
I. Tensors: The Building Blocks
Definition: Tensors are the fundamental data structure in PyTorch, used to encode data as numbers. Traditional terms like scalars, vectors, and matrices are all represented as tensors in PyTorch.
“basically anytime you encode data into numbers, it’s of a tensor data type.”
Scalars: A single number.
“A single number, number of dimensions, zero.”
Vectors: Have magnitude and direction and typically have more than one number.
“a vector typically has more than one number”
“a number with direction, number of dimensions, one”
Matrices: Two-dimensional tensors.
“a matrix, a tensor.”
Dimensions (ndim): Represented by the number of square bracket pairings in the tensor’s definition.
“dimension is like number of square brackets…number of pairs of closing square brackets.”
Shape: Defines the size of each dimension in a tensor.
For example, a vector [1, 2] has a shape of (2,) or (2,1). A matrix [[1, 2], [3, 4]] has a shape of (2, 2).
“the shape of the vector is two. So we have two by one elements.”
Data Type (dtype): Tensors have a data type (e.g., float32, float16, int32, long). The default dtype in PyTorch is float32.
“the default data type in pytorch, even if it’s specified as none is going to come out as float 32.”
It’s important to ensure tensors have compatible data types when performing operations to avoid errors.
Device: Tensors can reside on different devices, such as the CPU or GPU (CUDA). Device-agnostic code is recommended to handle this.
II. Tensor Creation and Manipulation
Creation:torch.tensor(): Creates tensors from lists or NumPy arrays.
torch.zeros(): Creates a tensor filled with zeros.
torch.ones(): Creates a tensor filled with ones.
torch.arange(): Creates a 1D tensor with a range of values.
torch.rand(): Creates a tensor with random values.
torch.randn(): Creates a tensor with random values from normal distribution.
torch.zeros_like()/torch.ones_like()/torch.rand_like(): Creates tensors with the same shape as another tensor.
Indexing: Tensors can be accessed via numerical indices, allowing one to extract elements or subsets.
“This is where the square brackets, the pairings come into play.”
Reshaping:reshape(): Changes the shape of a tensor, provided the total number of elements remains the same.
view(): Creates a view of the tensor, sharing the same memory, but does not change the shape of the original tensor. Modifying a view changes the original tensor.
Stacking: torch.stack() concatenates tensors along a new dimension. torch.vstack() and torch.hstack() are similar along specific axes.
Squeezing and Unsqueezing: squeeze() removes dimensions of size 1, and unsqueeze() adds dimensions of size 1.
Element-wise operations: standard operations like +, -, *, / are applied element-wise.
If reassigning the tensor variable (e.g., tensor = tensor * 10), the original tensor will be changed.
Matrix Multiplication: Use @ operator (or .matmul() function). Inner dimensions must match for valid matrix multiplication.
“inner dimensions must match.”
Transpose: tensor.T will tranpose a tensor (swap rows/columns)
Aggregation: Functions like torch.min(), torch.max(), torch.mean(), and their respective index finders like torch.argmin()/torch.argmax() reduce the tensor to scalar values.
“So you’re turning it from nine elements to one element, hence aggregation.”
Attributes: tensors have attributes like dtype, shape (or size), and can be retrieved with tensor.dtype or tensor.shape (or tensor.size())
III. Neural Networks with PyTorch
torch.nn Module: The module provides building blocks for creating neural networks.
“nn is the building block layer for neural networks.”
nn.Module: The base class for all neural network modules. Custom models should inherit from this class.
Linear Layers (nn.Linear): Represents a linear transformation (y = Wx + b).
Activation Functions: Non-linear functions such as ReLU (Rectified Linear Unit) and Sigmoid, enable neural networks to learn complex patterns.
“one divided by one plus torch exponential of negative x.”
Parameter (nn.Parameter): A special type of tensor that is added to a module’s parameter list, allowing automatic gradient tracking
“Parameters are torch tensor subclasses…automatically added to the list of its parameters.”
It’s critical to set requires_grad=True for parameters that need to be optimized during training.
Sequential Container (nn.Sequential): A convenient way to create models by stacking layers in a sequence.
Forward Pass: The computation of the model’s output given the input data. This is implemented in the forward() method of a class inheriting from nn.Module.
“Do the forward pass.”
Loss Functions: Measure the difference between the predicted and actual values.
“Calculate the loss.”
Optimizers: Algorithms that update the model’s parameters based on the loss function during training (e.g., torch.optim.SGD).
“optimise a step, step, step.”
Use optimizer.zero_grad() to reset the gradients before each training step.
Training Loop: The iterative process of:
Forward pass
Calculate Loss
Optimizer zero grad
Loss backwards
Optimizer Step
Evaluation Mode: Set the model to model.eval() before doing inference (testing/evaluation), and it sets requires_grad=False
IV. Data Handling
torch.utils.data.Dataset: A class for representing datasets, and custom datasets can be built using this.
torch.utils.data.DataLoader: An iterable to batch data for use during training.
“This creates a Python iterable over a data set.”
Transforms: Functions that modify data (e.g., images) before they are used in training. They can be composed together.
“This little transforms module, the torch vision library will change that back to 64 64.”
Device Agnostic Data: Send data to the appropriate device (CPU/GPU) using .to(device)
NumPy Interoperability: PyTorch can handle NumPy arrays with torch.from_numpy(), but the data type needs to be changed to torch.float32 from float64
V. Visualization
Matplotlib: Library is used for visualizing plots and images.
“Our data explorers motto is visualize, visualize, visualize.”
plt.imshow(): Displays images.
plt.plot(): Displays data in a line plot.
VI. Key Practices
Visualize, Visualize, Visualize: Emphasized for data exploration.
Device-Agnostic Code: Aim to write code that can run on both CPU and GPU.
Typo Avoidance: Be careful to avoid typos as they can cause errors.
VII. Specific Examples/Concepts Highlighted:
Image data: tensors are often (height, width, color_channels) or (batch_size, color_channels, height, width)
Linear regression: the formula y=weight * x + bias
Non linear transformations: using activation functions to introduce non-linearity
Multi-class data sets: Using make_blobs function to generate multiple data classes.
Convolutional layers (nn.Conv2d): For processing images, which require specific parameters like in-channels, out-channels, kernel size, stride, and padding.
Flatten layer (nn.Flatten): Used to flatten the input into a vector before a linear layer.
Data Loaders: Batches of data in an iterable for training or evaluation loops.
Conclusion:
This document provides a foundation for understanding the essential elements of PyTorch for deep learning. It highlights the importance of tensors, their manipulation, and their role in building and training neural networks. Key concepts such as the training loop, device-agnostic coding, and the value of visualization are also emphasized.
This briefing should serve as a useful reference for anyone learning PyTorch and deep learning fundamentals from these course materials.
PyTorch Fundamentals: Tensors and Neural Networks
1. What is a tensor in PyTorch and how does it relate to scalars, vectors, and matrices?
In PyTorch, a tensor is the fundamental data structure used to represent data. Think of it as a generalization of scalars, vectors, and matrices. A scalar is a single number (0 dimensions), a vector has magnitude and direction, and is represented by one dimension, while a matrix has two dimensions. Tensors can have any number of dimensions and can store numerical data of various types. In essence, when you encode any kind of data into numbers within PyTorch, it becomes a tensor. PyTorch uses the term tensor to refer to any of these data types.
2. How are the dimensions and shape of a tensor determined?
The dimension of a tensor can be determined by the number of square bracket pairs used to define it. For example, [1, 2, 3] is a vector with one dimension (one pair of square brackets), and [[1, 2], [3, 4]] is a matrix with two dimensions (two pairs). The shape of a tensor refers to the size of each dimension. For instance, [1, 2, 3] has a shape of (3), meaning 3 elements in the first dimension, while [[1, 2], [3, 4]] has a shape of (2, 2), meaning 2 rows and 2 columns. Note: The shape is determined by the number of elements in each dimension.
3. How do you create tensors with specific values in PyTorch?
PyTorch provides various functions to create tensors:
torch.tensor([value1, value2, …]) directly creates a tensor from a Python list. You can control the data type (dtype) of the tensor during its creation by passing the dtype argument.
torch.zeros(size) creates a tensor filled with zeros of the specified size.
torch.ones(size) creates a tensor filled with ones of the specified size.
torch.rand(size) creates a tensor filled with random values from a uniform distribution (between 0 and 1) of the specified size.
torch.arange(start, end, step) creates a 1D tensor containing values from start to end (exclusive), incrementing by step.
torch.zeros_like(other_tensor) and torch.ones_like(other_tensor) create tensors with the same shape and dtype as the other_tensor, filled with zeros or ones respectively.
4. What is the importance of data types (dtypes) in tensors, and how can they be changed?
Data types determine how data is stored in memory, which has implications for precision and memory usage. The default data type in PyTorch is torch.float32. To change a tensor’s data type, you can use the .type() method, e.g. tensor.type(torch.float16) will convert a tensor to 16 bit float. While PyTorch can often automatically handle operations between different data types, using the correct data type can prevent unexpected errors or behaviors. It’s good to be explicit.
5. What are tensor attributes such as shape, size, and Dtype and how do they relate to tensor manipulation?
These are attributes that can be used to understand, manipulate, and diagnose issues with tensors.
Shape: An attribute that represents the dimensions of the tensor. For example, a matrix might have a shape of (3, 4), indicating it has 3 rows and 4 columns. You can access this information by using .shape
Size: Acts like .shape but is a method i.e. .size(). It will return the dimensions of the tensor.
Dtype: Stands for data type. This defines the way the data is stored and impacts precision and memory use. You can access this by using .dtype.
These attributes can be used to diagnose issues, for example you might want to ensure all tensors have compatible data types and dimensions for multiplication.
6. How do operations like reshape, view, stack, unsqueeze, and squeeze modify the shape of tensors?
reshape(new_shape): Changes the shape of a tensor to a new shape, as long as the total number of elements remains the same, a tensor with 9 elements can be reshaped into (3, 3) or (9, 1) for example.
view(new_shape): Similar to reshape, but it can only be used to change the dimensions of a contiguous tensor (a tensor that has elements in continuous memory) and will also share the same memory as the original tensor meaning changes will impact each other.
stack(tensors, dim): Concatenates multiple tensors along a new dimension (specified by dim) and increases the overall dimensionality by 1.
unsqueeze(dim): Inserts a new dimension of size one at a specified position, increasing the overall dimensionality by 1.
squeeze(): Removes all dimensions with size one in a tensor, reducing overall dimensionality of a tensor.
7. What are the key components of a basic neural network training loop?
The key components include:
Forward Pass: The input data goes through the model, producing the output.
Calculate Loss: The error is calculated by comparing the output to the true labels.
Zero Gradients: Previous gradients are cleared before starting a new iteration to prevent accumulating them across iterations.
Backward Pass: The error is backpropagated through the network to calculate gradients.
Optimize Step: The model’s parameters are updated based on the gradients using an optimizer.
Testing / Validation Step: The model’s performance is evaluated against a test or validation dataset.
8. What is the purpose of torch.nn.Module and torch.nn.Parameter in PyTorch?
torch.nn.Module is a base class for creating neural network models. Modules provide a way to organize and group layers and functions, such as linear layers, activation functions, and other model components. It keeps track of learnable parameters.
torch.nn.Parameter is a special subclass of torch.Tensor that is used to represent the learnable parameters of a model. When parameters are assigned as module attributes, PyTorch automatically registers them for gradient tracking and optimization. It tracks gradient when ‘requires_grad’ is set to true. Setting requires_grad=True on parameters tells PyTorch to calculate and store gradients for them during backpropagation.
PyTorch: A Deep Learning Framework
PyTorch is a machine learning framework written in Python that is used for deep learning and other machine learning tasks [1]. The framework is popular for research and allows users to write fast deep learning code that can be accelerated by GPUs [2, 3].
Key aspects of PyTorch include:
Tensors: PyTorch uses tensors as a fundamental building block for numerical data representation. These can be of various types, and neural networks perform mathematical operations on them [4, 5].
Neural Networks: PyTorch is often used for building neural networks, including fully connected and convolutional neural networks [6]. These networks are constructed using layers from the torch.nn module [7].
GPU Acceleration: PyTorch can leverage GPUs via CUDA to accelerate machine learning code. GPUs are fast at numerical calculations, which are very important in deep learning [8-10].
Flexibility: The framework allows for customization, and users can combine layers in different ways to build various kinds of neural networks [6, 11].
Popularity: PyTorch is a popular research machine learning framework, with 58% of papers with code implemented using PyTorch [2, 12, 13]. It is used by major organizations such as Tesla, OpenAI, Facebook, and Microsoft [14-16].
The typical workflow when using PyTorch for deep learning includes:
Data Preparation: The first step is getting the data ready, which can involve numerical encoding, turning the data into tensors, and loading the data [17-19].
Model Building: PyTorch models are built using the nn.Module class as a base and defining the forward computation [20-23]. This includes choosing appropriate layers and defining their interconnections [11].
Model Fitting: The model is fitted to the data using an optimization loop and a loss function [19]. This involves calculating gradients using back propagation and updating model parameters using gradient descent [24-27].
Model Evaluation: Model performance is evaluated by measuring how well the model performs on unseen data, using metrics such as accuracy and loss [28].
Saving and Loading: Trained models can be saved and reloaded using the torch.save, torch.load, and torch.nn.Module.load_state_dict functions [29, 30].
Some additional notes on PyTorch include:
Reproducibility: Randomness is important in neural networks; it’s necessary to set random seeds to ensure reproducibility of experiments [31, 32].
Device Agnostic Code: It’s useful to write device agnostic code, which means code that can run on either a CPU or a GPU [33, 34].
Integration: PyTorch integrates well with other libraries, such as NumPy, which is useful for pre-processing and other numerical tasks [35, 36].
Documentation: The PyTorch website and documentation serve as the primary resource for learning about the framework [2, 37, 38].
Community Support: Online forums and communities provide places to ask questions and share code [38-40].
Overall, PyTorch is a very popular and powerful tool for deep learning and machine learning [2, 12, 13]. It provides tools to enable users to build, train, and deploy neural networks with ease [3, 16, 41].
Understanding Machine Learning Models
Machine learning models learn patterns from data, which is converted into numerical representations, and then use these patterns to make predictions or classifications [1-4]. The models are built using code and math [1].
Here are some key aspects of machine learning models based on the sources:
Data Transformation: Machine learning models require data to be converted into numbers, a process sometimes called numerical encoding [1-4]. This can include images, text, tables of numbers, audio files, or any other type of data [1].
Pattern Recognition: After data is converted to numbers, machine learning models use algorithms to find patterns in that data [1, 3-5]. These patterns can be complex and are often not interpretable by humans [6, 7]. The models can learn patterns through code, using algorithms to find the relationships in the numerical data [5].
Traditional Programming vs. Machine Learning: In traditional programming, rules are hand-written to manipulate input data and produce desired outputs [8]. In contrast, machine learning algorithms learn these rules from data [9, 10].
Supervised Learning: Many machine learning algorithms use supervised learning. This involves providing input data along with corresponding output data (features and labels), and then the algorithm learns the relationships between the inputs and outputs [9].
Parameters: Machine learning models learn parameters that represent the patterns in the data [6, 11]. Parameters are values that the model sets itself [12]. These are often numerical and can be large, sometimes numbering in the millions or even trillions [6].
Explainability: The patterns learned by a deep learning model are often uninterpretable by a human [6]. Sometimes, these patterns are lists of numbers in the millions, which is difficult for a person to understand [6, 7].
Model Evaluation: The performance of a machine learning model can be evaluated by making predictions and comparing those predictions to known labels or targets [13-15]. The goal of training a model is to move from some unknown parameters to a better, known representation of the data [16]. The loss function is used to measure how wrong a model’s predictions are compared to the ideal predictions [17].
Model Types: Machine learning models include:
Linear Regression: Models which use a linear formula to draw patterns in data [18]. These models use parameters such as weights and biases to perform forward computation [18].
Neural Networks: Neural networks are the foundation of deep learning [19]. These are typically used for unstructured data such as images [19, 20]. They use a combination of linear and non-linear functions to draw patterns in data [21-23].
Convolutional Neural Networks (CNNs): These are a type of neural network often used for computer vision tasks [19, 24]. They process images through a series of layers, identifying spatial features in the data [25].
Gradient Boosted Machines: Algorithms such as XGBoost are often used for structured data [26].
Use Cases: Machine learning can be applied to virtually any problem where data can be converted into numbers and patterns can be found [3, 4]. However, simple rule-based systems are preferred if they can solve a problem, and machine learning should not be used simply because it can [5, 27]. Machine learning is useful for complex problems with long lists of rules [28, 29].
Model Training: The training process is iterative and involves multiple steps, and it can also be seen as an experimental process [30, 31]. In each step, the machine learning model is used to make predictions and its parameters are adjusted to minimize error [13, 32].
In summary, machine learning models are algorithms that can learn patterns from data by converting the data into numbers, using various algorithms, and adjusting parameters to improve performance. Models are typically evaluated against known data with a loss function, and there are many types of models and use cases depending on the type of problem [6, 9-11, 13, 32].
Understanding Neural Networks
Neural networks are a type of machine learning model inspired by the structure of the human brain [1]. They are comprised of interconnected nodes, or neurons, organized in layers, and they are used to identify patterns in data [1-3].
Here are some key concepts for understanding neural networks:
Structure:
Layers: Neural networks are made of layers, including an input layer, one or more hidden layers, and an output layer [1, 2]. The ‘deep’ in deep learning comes from having multiple hidden layers [1, 4].
Nodes/Neurons: Each layer is composed of nodes or neurons [4, 5]. Each node performs a mathematical operation on the input it receives.
Connections: Nodes in adjacent layers are connected, and these connections have associated weights that are adjusted during the learning process [6].
Architecture: The arrangement of layers and connections determines the neural network’s architecture [7].
Function:
Forward Pass: In a forward pass, input data is passed through the network, layer by layer [8]. Each layer performs mathematical operations on the input, using linear and non-linear functions [5, 9].
Mathematical Operations: Each layer is typically a combination of linear (straight line) and nonlinear (non-straight line) functions [9].
Nonlinearity: Nonlinear functions, such as ReLU or sigmoid, are critical for enabling the network to learn complex patterns [9-11].
Representation Learning: The network learns a representation of the input data by manipulating patterns and features through its layers [6, 12]. This representation is also called a weight matrix or weight tensor [13].
Output: The output of the network is a representation of the learned patterns, which can be converted into a human-understandable format [12-14].
Learning Process:
Random Initialization: Neural networks start with random numbers as parameters, and they adjust those numbers to better represent the data [15, 16].
Loss Function: A loss function is used to measure how wrong the model’s predictions are compared to ideal predictions [17-19].
Backpropagation: Backpropagation is an algorithm that calculates the gradients of the loss with respect to the model’s parameters [20].
Gradient Descent: Gradient descent is an optimization algorithm used to update model parameters to minimize the loss function [20, 21].
Types of Neural Networks:
Fully Connected Neural Networks: These networks have connections between all nodes in adjacent layers [1, 22].
Convolutional Neural Networks (CNNs): CNNs are particularly useful for processing images and other visual data, and they use convolutional layers to identify spatial features [1, 23, 24].
Recurrent Neural Networks (RNNs): These are often used for sequence data [1, 25].
Transformers: Transformers have become popular in recent years and are used in natural language processing and other applications [1, 25, 26].
Customization: Neural networks are highly customizable, and they can be designed in many different ways [4, 25, 27]. The specific architecture and layers used are often tailored to the specific problem at hand [22, 24, 26-28].
Neural networks are a core component of deep learning, and they can be applied to a wide range of problems including image recognition, natural language processing, and many others [22, 23, 25, 26]. The key to using neural networks effectively is to convert data into a numerical representation, design a network that can learn patterns from the data, and use optimization techniques to train the model.
Machine Learning Model Training
The model training process in machine learning involves using algorithms to adjust a model’s parameters so it can learn patterns from data and make accurate predictions [1, 2]. Here’s an overview of the key steps in training a model, according to the sources:
Initialization: The process begins with a model that has randomly assigned parameters, such as weights and biases [1, 3]. These parameters are what the model adjusts during training [4, 5].
Data Input: The training process requires input data to be passed through the model [1]. The data is typically split into a training set for learning and a test set for evaluation [6].
Forward Pass: Input data is passed through the model, layer by layer [7]. Each layer performs mathematical operations on the input, which may include both linear and nonlinear functions [8]. This forward computation produces a prediction, called the model’s output or sometimes logits [9, 10].
Loss Calculation: A loss function is used to measure how wrong the model’s predictions are compared to the ideal outputs [4, 11]. The loss function provides a numerical value that represents the error or deviation of the model’s predictions from the actual values [12]. The goal of the training process is to minimize this loss [12, 13].
Backpropagation: After the loss is calculated, the backpropagation algorithm computes the gradients of the loss with respect to the model’s parameters [2, 14, 15]. Gradients indicate the direction and magnitude of the change needed to reduce the loss [1].
Optimization: An optimizer uses the calculated gradients to update the model’s parameters [4, 11, 16]. Gradient descent is a commonly used optimization algorithm that adjusts the parameters to minimize the loss [1, 2, 15]. The learning rate is a hyperparameter that determines the size of the adjustments [5, 17].
Training Loop: The process of forward pass, loss calculation, backpropagation, and optimization is repeated iteratively through a training loop [11, 17, 18]. The training loop is where the model learns patterns on the training data [19]. Each iteration of the loop is called an epoch [20].
Evaluation: After training, the model’s performance is evaluated on a separate test data set [19]. This evaluation helps to measure how well the model has learned and whether it can generalize to unseen data [21].
In PyTorch, the training loop typically involves these steps:
Setting the model to training mode using model.train() [22, 23]. This tells the model to track gradients so that they can be used to update the model’s parameters [23].
Performing a forward pass by passing the data through the model.
Calculating the loss by comparing the model’s prediction with the actual data labels.
Setting gradients to zero using optimizer.zero_grad() [24].
Performing backpropagation using loss.backward() [15, 24].
Updating the model’s parameters using optimizer.step() [24].
During training, models can have two modes: train and evaluation.
The train mode tracks gradients and other settings to learn from the data [22, 23].
The evaluation mode turns off settings not needed for evaluation such as dropout, and it turns off gradient tracking to make the code run faster [25, 26].
Other key points about the model training process are:
Hyperparameters: The training process involves the use of hyperparameters, which are values set by the user, like the learning rate or the number of epochs [5, 23].
Experimentation: Model training is often an experimental process, with various parameters and settings being tried to find the best performing model [27, 28].
Data: The quality and quantity of the training data has a big effect on the model’s performance [29, 30].
Reproducibility: Randomness is an important part of training; to reproduce results, it is necessary to set random seeds [31, 32].
Visualization: Visualizing model training through metrics such as accuracy and loss curves is important in understanding whether the model is learning effectively [33-35].
Inference: When making predictions after training, the term inference is also used [36]. Inference uses a model to make predictions using unseen data [26, 36].
In summary, the model training process in machine learning involves iteratively adjusting a model’s parameters to minimize error by using the techniques of gradient descent and backpropagation [1, 2, 14, 15].
PyTorch Model Deployment
The sources discuss model deployment in the context of saving and loading models, which is a key part of making a model usable in an application or other context. Here’s a breakdown of model deployment methods based on the sources:
Saving Models:State Dictionary: The recommended way to save a PyTorch model is to save its state dictionary [1, 2]. The state dictionary contains the model’s learned parameters, such as weights and biases [3, 4]. This is more flexible than saving the entire model [2].
File Extension: PyTorch models are commonly saved with a .pth or .pt file extension [5].
Saving Process: The saving process involves creating a directory path, defining a model name, and then using torch.save() to save the state dictionary to the specified file path [6, 7].
Flexibility: Saving the state dictionary provides flexibility in how the model is loaded and used [8].
Loading Models:Loading State Dictionary: To load a saved model, you must create a new instance of the model class and then load the saved state dictionary into that instance [4]. This is done using the load_state_dict() method, along with torch.load(), which reads the file containing the saved state dictionary [9, 10].
New Instance: When loading a model, it’s important to remember that you must create a new instance of the model class, and then load the saved parameters into that instance using the load_state_dict method [4, 9, 11].
Loading Process: The loading process involves creating a new instance of the model and then calling load_state_dict on the model with the file path to the saved model [12].
Inference Mode:Evaluation Mode: Before loading a model for use, the model is typically set to evaluation mode by calling model.eval() [13, 14]. This turns off settings not needed for evaluation, such as dropout layers [15-17].
Gradient Tracking: It is also common to use inference mode via the context manager torch.inference_mode to turn off gradient tracking, which speeds up the process of making predictions [18-21]. This is used when you are not training the model, but rather using it to make predictions [19].
Deployment Context:Reusability: The sources mention that a saved model can be reused in the same notebook or sent to a friend to try out, or used in a week’s time [22].
Cloud Deployment: Models can be deployed in applications or in the cloud [23].
Model Transfer:Transfer Learning: The source mentions that parameters from one model could be used in another model; this process is called transfer learning [24].
Other Considerations:Device Agnostic Code: It is recommended to write code that is device agnostic, so it can run on either a CPU or a GPU [25-27].
Reproducibility: Random seeds should be set for reproducibility [28, 29].
Model Equivalence: After loading a model, it is important to test that the loaded model is equivalent to the original model by comparing predictions [14, 30-32].
In summary, model deployment involves saving the trained model’s parameters using its state dictionary, loading these parameters into a new model instance, and using the model in evaluation mode with inference turned on, to make predictions. The sources emphasize the importance of saving models for later use, sharing them, and deploying them in applications or cloud environments.
PyTorch for Deep Learning & Machine Learning – Full Course
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
This text comprises a lecture delivered in Pakistan celebrating the completion of a reading of Sahih al-Bukhari, a highly esteemed collection of Hadith (sayings and traditions of the Prophet Muhammad). The speaker emphasizes the importance of Muslim unity, rejecting sectarianism and promoting a return to the core principles of Islam as exemplified in the Hadith. He explores the life and scholarship of Imam Bukhari, highlighting his deep love for the Prophet and emphasizing the significance of Sahih al-Bukhari as a source of religious knowledge. The lecture also details the speaker’s own chain of transmission (isnad) linking him back to the Prophet Muhammad, thereby establishing his scholarly credentials. Finally, the speaker urges his audience to embrace the principles of justice and piety found within the Hadith.
Sahih al-Bukhari Study Guide
Short Answer Quiz
What is the main point the speaker makes about unity in the Muslim community? He emphasizes that differences should be seen as a blessing, not a burden, and that Muslims should unite on the basis of their common love for the Prophet Muhammad and his Sunnah, rather than allowing minor differences to create division and takfir.
Why does the speaker mention Imam Bukhari and the Sahih Bukhari so frequently? The speaker highlights Imam Bukhari and the Sahih al-Bukhari as central to understanding authentic hadith and as a means of fostering unity and love for the Prophet, also as an antidote to extremism by returning to primary sources.
What does the speaker mean by the phrase “tear down the walls of Takfir”? He means that Muslims should stop declaring other Muslims as infidels based on minor differences in belief or practice, and instead focus on unity and common ground within the Ummah.
Explain the significance of the manuscript of Sahih al-Bukhari that the speaker references. The speaker emphasizes that the manuscript is 750 years old, representing a direct link to a reliably preserved version, and that despite the age of the manuscript, it has been found to match other manuscripts with high accuracy.
According to the speaker, when did the practice of holding gatherings to mark the end of reading Sahih al-Bukhari begin? He identifies Imam Ibn Hajar al-Asqalani as one of the founders of this practice and explains that this custom began after Imam Ibn Hajar completed his commentary on Sahih al-Bukhari.
What point does the speaker make about Imam Bukhari’s love of the Prophet Muhammad? He emphasizes that Imam Bukhari showed great love for the Prophet, evident in his placing the book of revelation first in Sahih Bukhari and that it serves as an example of the importance of showing love for the Prophet.
What does the speaker argue about the relationship between faith and prophethood? He asserts that mention of prophethood is the door to faith. He emphasizes that faith, at its core, is linked to acknowledging and following the example of Prophet Muhammad.
How does the speaker connect the beginning and end of Imam Bukhari’s Sahih? He asserts that Imam Bukhari began his work by describing the prophethood of Muhammad and concluded it with an emphasis on Tawhid, suggesting a full journey of faith beginning with the Prophet and ending with God.
What is the speaker’s main criticism of the Khawarij? The speaker criticizes them for their extreme interpretations of Islam, accusing them of considering themselves as superior and for creating innovation under the guise of monotheism, rejecting the Sunnah.
What is the significance of the hadith with the words Subhan Allah wa bihamdihi, Subhan Allah al-Azim at the end of the Sahih? The speaker explains that this hadith emphasizes the importance of glorifying and praising Allah, which he sees as a natural progression of a journey through the hadith starting with prophethood and ending with the recognition of God’s majesty, also as the way for people of paradise.
Essay Questions
Analyze the speaker’s arguments for why it is important to emphasize unity and avoid declaring others as infidels. What historical context does he provide, and how does he use the example of Imam Bukhari to support his ideas?
Discuss the significance of hadith literature, particularly Sahih al-Bukhari, in the speaker’s arguments. How does he use examples of hadith to promote specific viewpoints on spirituality and understanding?
Explore the relationship between knowledge and love for the Prophet Muhammad as presented by the speaker. How does he tie together actions, intentions, and love to a greater understanding of Islam?
Evaluate the speaker’s critique of extremism. How does he use historical figures and groups to illustrate the dangers of deviation from the Sunnah and the importance of moderation?
Based on the content of this source, discuss what, according to the speaker, are the main elements and characteristics of true Sufism? How is this distinct from the Sufism he criticizes?
Glossary of Key Terms
Hadith: A record of the words, actions, and tacit approvals of the Prophet Muhammad, considered a primary source for Islamic law and practice.
Sunnah: The way of life and practices of Prophet Muhammad, meant to be followed by Muslims.
Sahih al-Bukhari: One of the most revered collections of hadith in Sunni Islam, compiled by Imam Muhammad al-Bukhari.
Takfir: The act of declaring another Muslim to be an infidel, a practice the speaker strongly criticizes.
Ummah: The global community of Muslims.
Ishq wa Mohabbate Rasool: Love and devotion to the Messenger (Prophet Muhammad), a central theme in the text and in Islamic spirituality.
Tawhid: The concept of the oneness of God in Islam.
Khawarij: An early Islamic sect known for their extreme views and practice of declaring other Muslims as infidels, a group the speaker uses as a negative example.
Tawsal: In this context, seeking intercession or blessings through a medium, in this case, the reading of Sahih Bukhari for healing and other forms of help.
Rifa: The speaker states that he does not believe in rifa on earth, but that it exists in heaven, indicating some kind of ranking or status that is not possible in this world.
Ahl al-Sunnah wal Jama’ah: “People of the Sunnah and the Community,” a term used by the majority of Sunni Muslims to describe their adherence to traditional Islamic teachings.
Muhaddith: A scholar of Hadith, someone who studies and transmits the sayings and actions of the Prophet Muhammad.
Tasawwuf (Sufism): The mystical dimension of Islam, focused on spiritual purification and the direct experience of God.
Unity, Hadith, and the Legacy of Imam Bukhari
Okay, here is a detailed briefing document analyzing the provided text, focusing on its main themes and ideas:
Briefing Document: Analysis of Excerpts from a Speech on Sahih al-Bukhari
Overview:
This document analyzes excerpts from a speech, likely delivered in Pakistan, centered around the completion of a lesson on Sahih al-Bukhari, one of the most important collections of hadith (sayings and actions of the Prophet Muhammad). The speaker, identified as Muhammad Tahir al-Qadri, is a prominent Islamic scholar, and his address is a passionate call for unity, moderation, and a deeper understanding of Islamic tradition, particularly the hadith. He uses the occasion to highlight key themes from the life of Imam Bukhari, the author of the text, as well as to address contemporary issues of sectarianism and extremism. The overall tone is one of reverence, scholarship, and an appeal to reason and faith.
Main Themes and Ideas:
Emphasis on Unity and Solidarity:
The speaker repeatedly stresses the need for unity (ittehad), solidarity (yakjehti), and making differences a source of “mercy” rather than conflict.
He criticizes those who use petty differences to declare other Muslims as infidels (takfir), emphasizing the need to overcome sectarian divisions and approach disagreements with moderation.
Quote:“It’s time to create unity, unity and solidarity, make differences a mercy, don’t make them a burden. They used to convert the disbelievers into Muslims on the basis of petty differences and personal natures. Some schools of thought make Muslims infidels because of hatred.”
He argues that all Islamic schools of thought contain some part of the truth, and that no single school holds a monopoly on it.
Quote:“It is not that all truth is limited to only one school of thought. The truth is contained in all Islamic schools of thought and religions. Everyone has some part of the right…”
The speaker advocates for a focus on shared heritage of Hadith and Sunnah, specifically emphasizing the love of the Prophet Muhammad ( Ishq wa Mohabbate Rasool) as the basis for unity.
Quote:“If we were gathered on the basis of Hadith and Sunnah, as if we gathered on the basis of Ishq wa Mohabbate Rasool.”
The Importance of Hadith and Sunnah:
The speech places great importance on Hadith and Sunnah, the practices of the Prophet, as central to Islamic life.
He frames Sahih al-Bukhari as a critical resource for the understanding of these traditions, emphasizing the immense value of this particular collection. He describes it as a “boat” that saves one from sinking even if a storm comes.
He highlights the recitation of Sahih al-Bukhari as having spiritual and healing benefits.
Quote:“Recitation of Sahih Bukhari averts diseases so great…”
The speaker also criticizes the idea that all authentic hadith are found solely in Sahih al-Bukhari, asserting that many other authentic hadith exist. This point is emphasized multiple times with supporting quotes from classical scholars. He stresses, “Confining Sahih Hadith to Sahih Bukhari is not a form of knowledge. Ignorance is the sign of ignorance.”
Reverence for Imam Bukhari:
The speaker expresses deep reverence for Imam Bukhari, highlighting his piety, scholarship, and love for the Prophet.
He shares anecdotes from Bukhari’s life, including a story of his mother’s prayers restoring his eyesight, his exceptional memory for hadith, and his intense devotion to the Quran.
Quote:“Imam Muhammad Bin Ismail Bukhari was still a child when his eyes started to see. Now Imam Bukhari is starting the Tazkira from here Your mother, Majda, was a very devout Abida. She cried a lot and prayed in her dreams Ibrahim (peace be upon him) was visited. Now this is the incident that I am describing.”
He details the numerous classical scholars who praised and admired Bukhari’s contributions to the field of hadith.
He mentions Bukhari’s refusal to teach in royal courts and his subsequent expulsion, drawing a parallel to contemporary issues of scholars succumbing to political pressures and selling their religious knowledge for personal gain.
The History and Significance of Khatam al-Bukhari:
The speaker explores the history of holding gatherings to commemorate the completion (khatam) of Sahih al-Bukhari, attributing the practice to Imam Ibn Hajar al-Asqalani, whom he describes as the ‘Ameerul Momineen’ of hadith. He also mentions his own participation in one such event when he was nine years old.
He describes the grand scale of such gatherings and the importance they hold in the tradition of hadith study.
Critique of Extremism and Takfir:
The speech strongly denounces extremism, particularly the practice of declaring fellow Muslims as kafir (unbeliever).
Quote:“Let’s tear down the walls Takfir change and tear down the walls of confusion. The ancestors used to bring non-Muslims into the circle of Islam with the blessing of their words and deeds. We were Muslims by working hard all our lives. Excluded from the circle of Islam.”
He connects takfir with small personal differences and differences in nature, claiming that such practices are against the spirit of Islam.
He criticizes the Khawarij, an early sect known for their extremism, for their flawed understanding of Tawheed (monotheism), warning his audience not to follow their interpretation. He draws on the writings of Allama Ibn Taymiyyah to support this view.
Love for the Prophet Muhammad:
The speaker emphasizes the importance of love for the Prophet Muhammad (Ishq-e-Rasool) as a core element of faith. He believes that love should be accompanied by a genuine effort to adhere to the Prophet’s teachings.
He highlights various passages from Sahih al-Bukhari that showcase the Prophet’s virtues and the deep love and devotion he inspired in his companions. He draws attention to Imam Bukhari’s placement of the book on the beginning of revelation as the first book of Sahih al-Bukhari to demonstrate the importance of Prophethood.
He cites various instances of the Prophet’s interaction with his companions, his miracles, and his devotion to Allah as expressions of his supreme character.
The Importance of Authenticity and Knowledge:
The speaker dedicates a section to emphasizing the high standards of authenticity and reliability within Sahih al-Bukhari. He describes the lineage of transmission and highlights various reliable transmitters of the hadith from Bukhari onwards.
He points out that a group of these transmitters were also great Sufis (mystics), underscoring the synthesis of knowledge and spirituality within Islamic tradition.
He presents numerous chains of transmission going back to classical scholars and to the Prophet Muhammad himself. These chains serve to emphasize the authenticity of the tradition and the speaker’s own legitimate authority as a teacher.
Practical Application of Islamic Teachings:
The speaker emphasizes the practical application of Islamic knowledge to daily life.
He calls for justice in the lives of individuals as a path to success in this world and the next, connecting it with the final hadith in Sahih al-Bukhari. He urges scholars and religious leaders to uphold the truth, avoid selling their religion, and remain devoted to the teachings of the Prophet.
He emphasizes how the collection begins with the mention of Prophethood and ends with the mention of the oneness of God (Tawheed).
Call for Spiritual Renewal
The speaker calls for a renewal of traditional Sufism by purifying it from corrupt practices and emphasizing the importance of living by the Book and Sunnah. He laments that modern-day Sufism has been reduced to ceremonies, business, and materialism, contrasting it with the true asceticism of early Sufis and the original Sufism of the Salaf-e-Saliheen.
He also mentions the links between Sufism and those who passed along the teachings of Sahih al-Bukhari.
Key Quotes:
“It’s time to create unity, unity and solidarity, make differences a mercy, don’t make them a burden.”
“If we were gathered on the basis of Hadith and Sunnah, then it is as if we have gathered on the basis of love and love of the Messenger.”
“It is not that all truth is limited to only one school of thought. The truth is contained in all Islamic schools of thought and religions. Everyone has some part of the right…”
“Recitation of Sahih Bukhari averts diseases so great…”
“Confining Sahih Hadith to Sahih Bukhari is not a form of knowledge. Ignorance is the sign of ignorance.”
“The ancestors used to bring non-Muslims into the circle of Islam with the blessing of their words and deeds. We were Muslims by working hard all our lives. Excluded from the circle of Islam.”
“Imam Muhammad Bin Ismail Bukhari was still a child when his eyes started to see. Now Imam Bukhari is starting the Tazkira from here Your mother, Majda, was a very devout Abida. She cried a lot and prayed in her dreams Ibrahim (peace be upon him) was visited. Now this is the incident that I am describing.”
Concluding Remarks:
This speech presents a multi-faceted argument for a return to the core principles of Islam, rooted in a deep understanding of Hadith and Sunnah. It is a passionate plea for unity within the Muslim community, a call to abandon extremism and sectarianism, and a challenge to live a life of true devotion to Allah and His Messenger. The speaker’s detailed analysis of Sahih al-Bukhari, combined with his personal anecdotes and passionate delivery, makes it a compelling address that seeks to educate, inspire, and mobilize his audience towards a more enlightened and unified practice of Islam. The speech blends scholarship, spirituality, and an engagement with contemporary issues, making it relevant to the audience and its context.
Unity, Knowledge, and the Legacy of Imam Bukhari
FAQ: Unity, Knowledge, and the Legacy of Imam Bukhari
What is the primary call of this discourse, and what issues does it seek to address?
The primary call is for the unity and solidarity of the Muslim Ummah, emphasizing that differences should be a source of mercy, not division. The speaker argues against takfir (declaring fellow Muslims as infidels) based on minor differences in schools of thought or personal inclinations, which they see as a corruption of true Islamic teachings. The discourse aims to dismantle the walls of confusion, extremism, and hatred that have led to divisions within the Muslim community.
How does the speaker view the various schools of thought within Islam, and what does this have to do with the overall theme?
The speaker believes that all Islamic schools of thought contain some part of the truth and that no single school possesses the entirety of it. He argues against the notion that one school is entirely correct while all others are wrong, promoting instead a view that differences should be viewed as a mercy. This perspective is fundamental to the theme of unity, as it challenges the exclusivity that fuels division. The speaker emphasizes that focusing on the common ground, especially the love of the Messenger and the Hadith and Sunnah, is more important than using differences to create conflict. He himself identifies as Hanafi in fiqh, Ash’ari in belief, and Qadri in Tariqa.
What is the significance of Sahih al-Bukhari in this discourse?
Sahih al-Bukhari is presented as a central unifying factor for the Ummah. It is not just a book of hadith, but a source of knowledge, guidance, and spiritual connection to the Prophet Muhammad (peace be upon him). The speaker stresses the immense value of this collection, emphasizing its accuracy and reliability, highlighting the discovery of a 750-year-old manuscript that confirms the authenticity of its text. Reciting and studying Sahih al-Bukhari is seen as a way to avert diseases, receive blessings and connect the faithful to the Prophet Muhammad (peace be upon him). The speaker advocates for gathering on the basis of Hadith and Sunnah, embodied in Sahih al-Bukhari.
What historical context is given regarding the tradition of concluding Sahih al-Bukhari with a large gathering?
The tradition of holding large gatherings at the completion (Khatm) of Sahih al-Bukhari is attributed to Imam Ibn Hajar al-Asqalani, who organized a grand feast after completing his commentary on Sahih al-Bukhari, Fath al-Bari, in 842 AH. This event is used as an example of how the end of a book is not the end of its effects but an occasion for learning and unity, demonstrating that such gatherings are a tradition of knowledge, blessing, and beauty across schools of thought.
What is the speaker’s view on the relationship between knowledge, piety, and actions, particularly as it pertains to Imam Bukhari?
The speaker emphasizes that true knowledge is inseparable from piety, righteousness, and good actions. Imam Bukhari’s life is presented as the ideal integration of these elements, with his immense knowledge being complemented by his devotion to worship and asceticism. The speaker highlights Imam Bukhari’s extensive Quran recitations during Ramadan, emphasizing that the blessings of knowledge are greater when they are accompanied by the same level of righteous acts of worship. He also suggests a key to understanding Sahih al-Bukhari is that it begins with the mention of prophethood, and ends with the mention of Tawheed, noting that faith will remain imperfect without Tawheed and proper actions.
How does the speaker discuss the importance of understanding Imam Bukhari’s personality and life to better understand his work?
The speaker explains Imam Bukhari faced jealousy and hardship, and rejected the request of the ruler of Bukhara to teach Sahih al-Bukhari in his palace, saying “I do not humiliate knowledge, nor do I raise knowledge at the gates of kings”. This was not only a display of bravery, but is also presented as a great piece of advice for students of religion. The speaker notes that his actions led to him being expelled from Bukhara which highlights Imam Bukhari’s immense love for knowledge as well as his devotion and loyalty to Allah over material pursuits. The speaker highlights his struggles and ultimate glory after death to show that Imam Bukhari had a high status in life and in death, exemplifying the importance of preserving knowledge. This is also meant to underscore that the speaker values knowledge and religion over worldly gain.
What does the speaker say about the unique structure and content of Sahih al-Bukhari in relation to its author’s love for the Prophet Muhammad (peace be upon him)?
The speaker emphasizes that Sahih al-Bukhari is unique in its structure. Instead of beginning with the Book of Faith or purification, it starts with Bada al-Wahhi, the book about the beginning of revelation and Prophethood. This demonstrates that Imam Bukhari was a lover of the Messenger and that his book begins with a mention of Prophethood and ends with a discussion of tawheed which illustrates the core of his work and faith as being rooted in the message of Prophethood. He also discusses how Imam Bukhari repeated certain hadith related to the love and reverence for the Messenger multiple times in the text, and even highlights how Imam Bukhari ended Sahih al-Bukhari with the hadith relating to the praise and glorification of Allah.
What is the significance of the speaker presenting his sanad (chain of transmission) at the conclusion of this lesson?
The speaker presents his sanad as a demonstration of the connection he has with the great scholars of the past all the way up to the Prophet Muhammad (peace be upon him). He emphasizes the small number of intermediaries between himself and key figures of the past such as Shah Waliullah Muhaddith Dehlavi, Imam Ibn Hajar al-Asqalani, and Imam Jalaluddin Suyuti, as well as his direct connection with several living scholars who had direct connections to the most famous scholars in history. This highlights the value and importance of a living tradition in which knowledge is passed down, and how that same knowledge connects him to both the historical Islamic tradition as well as the Prophet Muhammad (peace be upon him). He also uses his sanad as the means to grant permission to all the listeners of the lesson to participate in this tradition of knowledge, and to show that the tradition is still alive and vibrant to this day.
A 63-Year Journey Through Hadith
Okay, here’s the detailed timeline and cast of characters based on the provided text:
Timeline of Events
12 Years Old (Approximately 1963): The speaker begins his journey in search of knowledge. He studies in Halmain Sharifin (Mecca and Medina), taking advantage of scholars in Masjid Nabawi and Haram of Makkah and living at a madrasa near the house of Hazrat Sayyiduna Abu Ayyub Ansari.
63 Years of Journey: The speaker states he is near this age, having traveled to all corners of Arab and Islamic countries, and has spent the last 63 years of his life learning.
Prior to 701 Hijri: Imam Sharafuddin Union has a version of Sahih al-Bukhari prepared by his scribe.
701 Hijri: Imam Sharafuddin Juni passes away, his death finalizing the reform of his manuscript of the Sahih al-Bukhari.
1313 Hijri (Approximately 1895 AD): The manuscript of Sahih al-Bukhari prepared by Imam Sharafuddin Union is destroyed in Egypt, in the beginning of Rabi al-Thani.
842 Hijri: Imam Ibn Hajar Asqalani organizes a grand feast (Walima) after completing his commentary on Sahih al-Bukhari, Fateh al-Bari, in Cairo. The event is attended by scholars, judges, and people of status, and is described as a very large and unprecedented gathering. Imam Sakhavi, then nine years old, also participated.
Before Imam Bukhari’s Birth: Imam Bukhari’s mother has a dream where she meets Ibrahim (peace be upon him), who tells her that her son’s sight will be restored due to her prayers.
Imam Bukhari’s Youth: Imam Bukhari starts building Milli Karamat with 1080 Hadiths from Shaykhs. In his youth, Imam Bukhari corrects a hadith during a lesson on narration from Abi Arwah and Abi Khattab that had troubled many scholars there. Imam Bukhari also finishes the Qur’an in the month of Ramadan 40-41 times through the Taraweeh prayer and prays a complete Quran each night in Tahajjud and one a day before Iftar, and was noted for keeping the blessed face of the Holy Prophet (pbuh) within his clothing.
Imam Bukhari’s Life: Imam Ahmad bin Hanbal states that he has not seen anyone like Muhammad bin Ismail al-Bukhari on earth. Imam Ishaq b. Rahawayh urges people to seek knowledge from Bukhari. Imam Muslim asked permission to kiss the feet of Imam Bukhari for his knowledge.
Compilation of Sahih al-Bukhari: Imam Bukhari compiles the Sahih al-Bukhari from 600,000 hadiths. He claims to remember 100,000 authentic and 200,000 non-authentic hadiths. He states there are only 2700 narrations in his book, excluding the repetition in hadith chains, which is also a great number.
Imam Bukhari’s Exile: The emir of Bukhara, Khalid bin Ahmad al-Zahli, demands Imam Bukhari teach his sons privately in the palace. Imam Bukhari refuses, citing his respect for knowledge. As a result, the emir expels Imam Bukhari.
Imam Bukhari’s Death: Imam Bukhari dies in the village of Khartan near Samarkand. It is said that the people could smell the scent of his grave even after several days.
Imam Bukhari’s Death Dream: A group of Companions are seen standing in a dream, with the Messenger of Allah (peace be upon him) waiting for Imam Bukhari. Imam Bukhari dies on the same night that this dream is seen, and the area is covered in a forest as a result of his grace after that day.
After Imam Bukhari’s Death: A famine occurs in Samarkand. On the advice of a righteous elder, people visit Imam Bukhari’s mausoleum and supplicate, after which rain falls. The same was often done in times of trouble, during the centuries of the Ummah.
2001 AD: The manuscript version of Sahih al-Bukhari of Imam Sharafuddin Union is printed again after being hidden for 106 years.
Contemporary Time: The speaker is giving a lesson on Sahih al-Bukhari. He is announcing the demolition of the walls of Takfir, calling for unity and solidarity within the Ummah, and emphasizing the importance of Hadith and Sunnah. He also emphasizes the commonality of Islamic schools of thought and says the truth lies in them all. The speaker says it is time for unity, and all must come together to learn and follow the example of the Holy Prophet (pbuh) in all ways.
Cast of Characters
Speaker: A scholar of Islamic studies, likely from Pakistan. He is 63 years old and has traveled extensively in pursuit of knowledge, has studied with many different scholars, and seems to be associated with various educational institutions. He identifies himself as Hanafi-ul-Madhab in jurisprudence, from Ahle Sunnat Wal Jamaat in belief and a Qadri according to the Tariqat. He is a great believer of the unity of the ummah and the teaching and following of the life of the Holy Prophet (pbuh).
Imam Muhammad bin Ismail al-Bukhari (RA): The author of the Sahih al-Bukhari, a highly revered collection of hadith. He is depicted as a devoted worshiper, a lover of the Prophet, and a scholar of great memory and intellect.
Imam Sharafuddin Union: A scholar who had a version of Sahih al-Bukhari prepared by his scribe. The speaker calls it the most reliable version and states it is 750 years old. The manuscript is destroyed in 1895 and re-printed in 2001.
Imam Sharafuddin Juni: Imam Sharafuddin Union’s death finalized the work in 701 Hijri.
Ibrahim (Peace be upon him): A prophet who appears in a dream to Imam Bukhari’s mother and promises the restoration of her son’s sight.
Mother of Imam Bukhari: A devout woman who prays for the restoration of her son’s sight.
Imam Ibn Hajar al-Asqalani: A renowned scholar and commentator on Sahih al-Bukhari. He is the author of Fateh al-Bari, a famous commentary on Sahih al-Bukhari. The speaker considers him among the founders of the ritual of gathering at the end of the lesson of hadith.
Imam Sakhavi: A student of Imam Ibn Hajar Asqalani who participated in the gathering on the completion of Fateh al-Bari as a child of nine years old.
Khalid bin Ahmad al-Zahli: The emir of Bukhara who orders the expulsion of Imam Bukhari.
Sayyiduna Abu Ayyub Ansari (RA): A companion of the Prophet, in whose house the madrasa the speaker lived was located.
Hazrat Qutiba: One of the jurisprudents of Imam Bukhari’s time, who said since his awareness, he has not cried as a great scholar as he cried when he saw Muhammad bin Ismail al-Bukhari.
Imam Ahmad bin Hanbal: A scholar who highly praised Imam Bukhari. He also said that seven million hadiths are authentic.
Imam Ishaq b. Rahawayh: A scholar who urged people to seek knowledge from Imam Bukhari.
Imam Muslim: A scholar of hadith, author of Sahih Muslim, who sought permission from Imam Bukhari to kiss his feet.
Imam Abu Bakr Muhammad bin Ishaq bin Khazima: A scholar of hadith who stated that he has not seen a greater hadith scholar than Imam Bukhari.
Imam Sufyan: An Imam who had a hadith narrated, that troubled the scholars in attendance, of whom Imam Bukhari clarified who had narrated it and where it came from.
Imam al-Hafiz Ibn Hajar Go to al-Uqlani: A scholar who did studies on the records of holding grand ceremonies for the completion of a book like the Sahih al-Bukhari.
Dhul-Khuwaisr Al-Tamimi: A Khawarij who is presented by Allama Ibn Taymiyyah for his lack of respect in the Prophet’s (pbuh) presence.
Allama Ibn Taymiyyah: A scholar whose writings on the Khawarij are quoted extensively in the text. He calls the Khawarij the founders of innovation in Islam, and says that they should not be taken for their interpretations of Tawheed.
Sultan Abdul Hamid Khan II: The last ruler of the Ottoman Empire, a patron of one version of Sahih al-Bukhari.
Allama Syed Siddiq Hasan Khan Al-Kanoji Al-Bhopali: A scholar of the Ahl al-Hadith school of thought and Muhaddith of the sub-continent of India. He is quoted for the tawassul of Sahih al-Bukhari.
Allama Muhammad Abd al-Rahman bin Abd al-Rahim al-Mubarak Puri: A well-known scholar, also of the Ahl al-Hadith and Salafi school of thought. He mentioned that tawasul by the Sahih al-Bukhari is a legitimate Ruqyah.
Imam Alauddin al-Deem Mashki: A scholar from whom a poem about Imam al-Bukhari sitting in the Ruza Muhammadi is quoted.
Imam Ibrahim bin Muhaq Al-Nusr Ash’arabat al-Bukhari: One of the four major narrators of Imam Bukhari, and a well known Hanafi.
Imam Abu Muhammad Hamad bin Shakir al-Nasfi: One of the four major narrators of Imam Bukhari.
Imam Abu Talha Mansoor bin Muhammad al-Bazi?: One of the four major narrators of Imam Bukhari.
Imam Abu Abdullah Muhammad bin Yusuf al-Furbari: The fourth and most famous of the four major narrators of Imam Bukhari.
Imam Abu Ali Saeed bin Uthman Ibn Suk: One of the 12 Imams and grand-disciples of Imam Farbari.
Imam Abu bin Ibrahim Ahmad Al-Balghi Al-Mustamil: One of the 12 Imams and grand-disciples of Imam Farbari.
Imam Abu Muhammad Abdullah bin Ahmad al-Sarakhi: One of the 12 Imams and grand-disciples of Imam Farbari.
Imam Abu al-Hasan Abd al-Rahman bin Muzaffar al-Dawoodi al-Bushanji: One of the 12 Imams and grand-disciples of Imam Farbari.
Imam Abu al-Hassan Muhammad bin Makki al-Kush Maini: One of the 12 Imams and grand-disciples of Imam Farbari.
Abu Muhammad Jafar bin Muhammad Nasir al-Khaldi: A Sufi scholar and narrator of hadith. The speaker states that the narrators of the Sahih al-Bukhari on one hand were muhaddith and on the other hand the Sufi mystics of the time.
Abu Hussain Noori, Ruwaym bin Ahmad Baghdadi, Samnoon bin Hamza al-Muhab: Names of Sufi scholars associated with the narration of Sahih al-Bukhari by Jafar bin Muhammad Nasir al-Khaldi.
Imam Junaid al-Baghdadi: A Sufi Imam who is the teacher and the Sheikh of Abu Muhammad Jafar bin Muhammad Nasir.
al-Hafiz Abu Dharr Abd bin Ahmad bin Muhammad bin Abdullah al-Harbi: The narrator through whom the trust of the Sahih al-Bukhari has been established in the world.
Imam Aboluq Abdul Awl bin Isa bin Shoaib al-Sajzi al-Sufi: A Sufi scholar and a narrator of the Sahih al-Bukhari.
Shaykh al-Islam Abu Ismail Al-Ansari: Imam of knowledge who was a follower of Arabic and a teacher of Imam Aboluq Abdul Awl bin Isa bin Shoaib al-Sajzi al-Sufi
Syedna Sheikh Abdul Qadir Jilani (RA): A revered Sufi saint, who led the funeral prayer of Imam Abu al-Taqwa Abdul Awl bin Isa bin Shoaib al-Sajzi al-Sufi.
Imam Dawoodi al-Bushanji: Also an ear of Akbar and a muhaddith who took from Imam Bukhari, and also a Sufi in the conduct of Sir Ila Allah.
Imam Ibn Al-Muzaffar al-Dabudi al-Bushanji: The author of the tradition who is mentioned by the speaker and who was both a scholar of hadith and Sufism.
Sheikh Abu Ali al-Daqqaq: A Sufi Sheikh in whose circle Imam Dawoodi al-Bushanji sat.
Abu al-Qasim al-Kashiri: A sheikh from who references can be found in Kashf al-Mu’jub in which references are made to his having the same teachers as Imam Dawoodi al-Bushanji.
Sheikh Nawab Siddique Hasan Khan Bhupali Al-Kanoji: A great scholar of the Ahl al-Hadith and Salafi school of thought in India.
Maulana Rashid Ahmad Ganguhi: A well known scholar who is quoted for his description of how much the companions loved the Holy Prophet (pbuh) in the time he lived.
Sayyiduna Siddiq Akbar: A companion and the first caliph of the Prophet, known for his deep love of the Prophet. His actions and response to the Holy Prophet (pbuh) on the prayer and leading of it are used to show the nature of a true lover.
Abu Saeed bin Maali: A companion of the Prophet, whose story about answering the call of the Prophet is used to illustrate the importance of prioritizing the Prophet’s call over other acts of worship.
Hazrat Abi bin Ka’b: A companion of the Holy Prophet (pbuh) whose story about God having commanded to prioritize and listen to the call of the Holy Prophet (pbuh) over prayer is brought.
Imam Tayyibi: The scholar whose ruling that it is permissible to speak to the Holy Prophet (pbuh) during prayer and to call out to him is brought forth.
Imam Shafi’i: A scholar, whose view that the dry palm preferred the hereafter to the world is quoted.
Imam Bayhaqeen: A scholar who related that Imam Shafi’i said the dry palm preferred the hereafter to the world, and who also quoted the conversation had between Aqa (pbuh) and the palm.
Imam Abu Naim: A scholar whose book was referred to in which a description of the love of the companions for the Holy Prophet (pbuh) in that they would not eat or drink before he did is quoted.
Imam Abdul Qadir Jilani: A scholar and mystic who is listed in the line of the transmission of the Sahih al-Bukhari.
Allama Dhahabi: A scholar from whose work many quotes are given in the text.
Ustad Maulana Abdul Baqi Muhaddith: The speaker’s grandfather and Ustad, a great Muhaddith who lived in Madinah.
Maulana Ziauddin Ahmad Madani: One of the teachers of the speaker who lived in Madinah.
Maulana Badr Alam Mirthi: One of the teachers of the speaker who lived in Madinah.
Maulana Fazlur Rahman Ganj Moradabadi: A scholar from whom the speaker’s father, dear Sheikh Fariduddin Al-Qadri, heard.
Hazrat Shah Abdul Aziz Muhaddith Dehlavi: One of the scholars in the speaker’s lineage of transmission.
Hazrat Shah Wali Allah Muhaddith Dehlavi: A renowned scholar of Islam, considered the greatest scholar on earth and mentioned often for his lineage and knowledge.
Al-Shaykh Abd al-Baqi bin Ali al-Ansari al-Muhadith al-Kanwi al-Madani: A Sheikh from whom the speaker is given an ijaza or certification.
Al-Shaykh Al-Mumar Fawq al-Ma’ulna Maulana Fazlur Rahman bin Ahlullah bin Al Faiz al-Kunj Murad Abadi: Another Sheikh in the chain of transmission.
Al-Shaykh Anbi bin Abbas al-Maliki al-Maki: A muhaddith of the Haram of Mecca and one of the speaker’s teachers, known for his line of transmission and his knowledge.
Sheikh Muhammad bin Alwi Al-Maliki al-Maki: Sheikh Anbi bin Abbas al-Maliki’s son and another teacher of the speaker.
Sheikh Alami bin Abbas Alwal: A teacher in the speaker’s lineage.
Imam Umar bin Hamdan al-Marsi: A great Arab scholar and also mentioned as one of the teachers of the speaker.
Sheikh Abdul Haib bin Abdul Kabir al-Qatani: A Sheikh who the speaker states Imam Umar bin Hamdan used to narrate from.
Shaykh Abdul Qadir al-Shalbi: A Sheikh who the speaker states used to narrate from Al-Trab al-Wasi.
Sheikh Muhammad bin Ali bin Zahir al-Watani: A Sheikh who the speaker states used to narrate from the student of Sheikh Ahmed bin Ismail al-Barzanji.
Al-Sheikh Abul Barakat Al Syed Ahmad Al Qadri Muhaddith Alwari: The founder of Hizbul Ananaf Lahore and one of the speaker’s teachers.
Anas Shah Ahmad Raza Khan Al-Barilvi: One of the speaker’s teachers, who the Sheikh Abul Barakat Ahmad Al-Qadir intimately listened to.
Shah Al Rasool Ahmad Almar Harwi: One of the scholars in the transmission.
Shaykh Ahmad bin Saleh al-Suwaidi Al-Baghdadi: A Sheikh that is mentioned in the line of transmission.
Ibn al-Hafiz al-Sayyid Muhammad al-Murtaza al-Zubaini: A Sheikh mentioned in the transmission.
Imam al-Muhammad Muhammad bin Sunnah Al-Falani: A Sheikh and Imam who lived over 100 years, and that is why his credentials became smaller.
Muhammad bin Abdullah al-Idrisi al-Walati: One of the scholars and imams mentioned.
al-Imam al-Tjab al-Din Muhammad bin Arqamash al-Shabki al-Hanafi: A well known muhaddith and one of the imams of the line of transmission.
Imam Al-Hafiz Ahmed Ibn Hajar Al-Asklani: Considered to be the Amir-ul-Momineen of the hadith, and one of the imams in the lineage of transmission.
Imam Jalaluddin Suyuti: One of the last great muhadditheen and one of the imams mentioned in the lineage of transmission.
Imam Shamsuddin Muhammad b Abd al-Rahman bin Ali al-Qami al-Masri: One of the Imams in the line of transmission.
Shaykh Muhammad al-Fatih bin Muhammad bin Muhammad bin Jafar al-Qatani: One of the Sheikhs mentioned in the line of transmission, of whom many great scholars were students.
Muhammad Badr al-Din bin Yusuf al-Hasani: A scholar from Syria and mentioned as a great muhaddith, and a student of Shaykh Muhammad al-Fatih bin Muhammad bin Muhammad bin Jafar al-Qatani.
Abi al-Makaram Amin Sabid Al-Damashqi: A student of Shaykh Muhammad al-Fatih bin Muhammad bin Muhammad bin Jafar al-Qatani.
Sheikh Muhammad Mustafa The famous Bama’ Al-Ainin al-Shankiti: A student of Shaykh Muhammad al-Fatih bin Muhammad bin Muhammad bin Jafar al-Qatani.
Sheikh Umar Bin Hamdan Al-Maarithi: A student of Shaykh Muhammad al-Fatih bin Muhammad bin Muhammad bin Jafar al-Qatani.
Sheikh Ahmad Ibn Ismail al-Barzanji: A Sheikh who was one of the teachers of Shaykh Umar Bin Hamdan Al-Maarithi.
Imam Abdullah bin Salim al-Muhadith al-Basri: A Sheikh whose teachings were used by Shaykh Muhammad al-Fatih bin Muhammad bin Muhammad bin Jafar al-Qatani.
Al-Sheikh Al-Sayyid Tahir Alauddin Al-Jilani al-Baghdadi: One of the speaker’s teachers, and from whom he has a chain of transmission of only 4 wastas to the Imam Al-Wasi.
Al-Naqeeb al-Sayyid Mahmud Asam al-Din al-Jilani al-Baghdadi: A Sheikh in the line of transmission.
Imam al-Muhaddith al-Naqeeb Al-Sayyid Abdul Rahman Zaheeruddin Al-Maaz Al-Jilani Al-Baghdadi: A Sheikh in the line of transmission.
Imam Nu’man bin Mahmood Al-Alusi: An Imam in the line of transmission.
al-Walada al-Imam Mahmud bin Abdullah al-Alusi Sahib Ruh al-Ma’ani: Imam Al-Wasi’s teacher.
Imam Yusuf bin Ismail al-Nabahani: A Muhaddith of Sham from the early parts of the last century, and mentioned as one of the speaker’s teachers.
Al-Sheikh Hussain bin Ahmed Asiran al-Asadi Al-Lebanani: One of the speaker’s teachers, and mentioned as one of the last students of Imam Yusuf bin Ismail al-Nabhani.
Al-Sheikh Abdul Moeed Abdul Maoud Al-Jilani Al-Madani: A Sheikh of the speaker, who lived over 155 years.
Al-Shah Imdad Abdullah Al-Muhajir al-Makki: A direct student of the Sheikh Abdul Moeed Abdul Maoud Al-Jilani Al-Madani, who is a well known figure to those of the Deoband movement.
Sheikh Syed Ahmed Saeed Al-Qazimi Al-Marohi: A sheikh, known from a list in which they are stated to be the student of Sheikh Mustafa Raza Khan Al Barili.
Sheikh Mustafa Raza Khan Al Barili: The Sheikh who is stated to be in the line of transmission of the certificate.
Sheikh Muhammad Sardar Ahmad Al Qadri Al-Shati: One of the scholars in the line of transmission.
Sheikh Hamid Raza Khan Al-Qadri: One of the scholars in the line of transmission.
Sheikh Muhammad Abdul Rasheed Qutbuddin Al-Rizween: One of the scholars in the line of transmission.
Allama Al-Sheikh Muhammad Tahir: A scholar who wrote a magazine on the permission, ease, and hadith which he held.
Al-Mufida and Arabic Al-Jaami al-Sahih Al-Imam al-Bukhari: A figure mentioned in the text who seems to be an allegory for those listening to the lesson.
Imam Abi Abdullah Muhammad bin Yusuf al-Farbari: Whose mention was also held in this meeting, for being the most famous narrator of Imam Bukhari in his time, from whom the tradition of the Sahih al-Bukhari became known around the world.
Abi Luqman Yahya bin Ammar bin Muqbal bin Shahan al-Khatlani: One of the eleven Sheikhs who transmit between the speaker and Imam Bukhari.
Islamic Unity: A Call for Solidarity
The sources emphasize the importance of unity and solidarity within the Muslim Ummah, highlighting the need to overcome sectarianism and conflict [1, 2]. Several key points about Islamic unity are discussed:
Common Ground: The sources stress focusing on the commonalities of Islam, such as the Quran, Hadith, and Sunnah, rather than differences in schools of thought [2]. It is noted that all Islamic schools of thought contain some part of the truth [3]. The common love and respect for the Prophet Muhammad is also emphasized as a basis for unity [1, 3].
Rejection of Extremism: There is a call to eliminate extremism and sectarianism, with the need for moderation and centrality emphasized [2, 3]. The sources argue that differences should be seen as a mercy, not a burden, and that Muslims should not declare each other infidels over minor disagreements [1, 4, 5].
Importance of Hadith and Sunnah: The sources argue that the protection and revival of religion requires focusing on the knowledge of Hadith and Sunnah. It is stated that unity is not possible without this protection [1, 2, 5]. Connecting with the righteous Salaf (early generations of Muslims) through Hadith and Sunnah is presented as a way to revive the culture of knowledge [2].
The Role of Scholars: Scholars are urged to promote unity and solidarity rather than division [1, 2, 4]. It is mentioned that scholars should not sell their consciences or character for worldly gain [6]. The sources emphasize that scholars and students should keep in mind that the knowledge of hadith is not limited to Sahih Bukhari alone, but is contained in other books as well [7, 8].
Historical Context: The sources refer to historical figures, such as Imam Ibn Hajar al-Asqalani, who organized gatherings to celebrate the completion of hadith books to emphasize the importance of knowledge and unity [9, 10]. The sources present a historical perspective on how the pursuit of knowledge and unity was a shared goal among various Islamic schools of thought. The narrators of Sahih Bukhari include both Muhaddith and Sufi mystics, showing the combination of these aspects [11-13].
Practical Steps: The sources call for tearing down the “walls of confusion” and “walls of Takfir,” referring to the practice of declaring other Muslims infidels [5, 6, 14]. The idea is to revive the values of religion by emphasizing the love of the Prophet and the knowledge of Hadith [4, 10, 15]. The sources also suggest focusing on commonalities and seeking the truth in all Islamic schools of thought [2-4].
Call to Action: The sources conclude with an announcement to tear down the walls of hatred and unite the Ummah based on the teachings of the Prophet, the Quran, and the Sunnah, while following the example of Imam Bukhari and the predecessors [1, 2, 14, 16].
Overall, the sources present a view of Islamic unity based on shared principles, mutual respect, and a commitment to knowledge and the teachings of the Prophet. The emphasis is on moving beyond sectarianism and focusing on the common goals of the Muslim Ummah.
Hadith, Sunnah, and Islamic Unity
The sources discuss the importance of Hadith knowledge, its preservation, and its role in Islamic unity and practice. Here are some key points:
Central Role of Hadith and Sunnah: The sources emphasize that Hadith and Sunnah are essential for the protection and revival of religion [1, 2]. It’s stated that unity among Muslims is not possible without focusing on Hadith and Sunnah [1, 2]. The sources suggest that a connection to the righteous Salaf (early generations of Muslims) through Hadith and Sunnah is vital to revive the culture of knowledge [2].
Sahih Bukhari: The text discusses Sahih Bukhari extensively, noting its significance as a primary source of Hadith [1-28]. It highlights the meticulousness of Imam Bukhari in compiling the book, who is said to have memorized 100,000 authentic and 200,000 non-authentic hadiths [29].
Not Limited to One Book: The sources make it clear that Hadith knowledge is not limited to Sahih Bukhari alone [5, 30]. It is noted that Imam Bukhari himself stated that he left out many authentic hadiths to keep the book from becoming too long [5, 30]. It is also mentioned that Imam Muslim did the same, and that other books also contain authentic hadith [30, 31]. To only accept hadith from Sahih Bukhari is considered a sign of ignorance [5].
Importance of the Chain of Narration: The sources discuss the importance of the chain of narrators (Isnad) in verifying the authenticity of Hadith [5, 25-28, 32-36]. The transmission of Hadith through various scholars is highlighted with emphasis on the reliability of the narrators [4, 6, 8, 9, 27, 28, 33, 35, 37].
Love for the Prophet: The text illustrates how Imam Bukhari’s compilation of Hadith was motivated by his deep love and respect for the Prophet Muhammad [1, 11-13, 17, 18, 22-24, 38, 39]. Imam Bukhari is said to have started his book with the mention of Prophethood and ended with the knowledge of Tawheed [40, 41]. The sources contain various hadiths about the Prophet’s life, actions, and character, emphasizing his importance in the Muslim faith [4, 12, 17, 20-22, 40].
Practical Application: The text discusses the concept of tawassul, which is using the recitation of Sahih Bukhari as a means to seek blessings, cure diseases, and ask for help from Allah [1, 6]. The sources emphasize that the true claim of love for the Prophet (Ishq Rasool) is shown through adapting to the actions and character of the Prophet Muhammad, including following his ways in eating, drinking, praying, and other aspects of life [22, 23].
Levels of Authenticity: The sources describe a seven-level system for categorizing the authenticity of hadith, with Sahih Bukhari and Sahih Muslim at the top, followed by hadith recorded by either one of them, and then by other scholars who met specific conditions [31]. This highlights the meticulousness and systematic approach to hadith verification within Islamic scholarship [32].
Sufism and Hadith: The sources note that both Muhaddith scholars of hadith and Sufi mystics were among the narrators of Sahih Bukhari [9, 10, 33, 42]. This connection between Hadith and Sufism indicates that these two traditions were not separate and that knowledge and spirituality were both important in the preservation and transmission of Hadith [43].
Rejection of Extremism: The sources state that Imam Bukhari rejected the ideas of the Khawarij, an early Islamic sect, at the end of Kitab al-Tawheed, as a warning against extremism [14, 15, 44]. The Khawarij are considered the founders of innovation in the history of Islam, and the source emphasizes that Muslims should not take their interpretation of Tawheed or their ideas about who is a believer and who is not [15, 44].
Importance of Scholars: Scholars are portrayed as having a vital role in preserving, transmitting, and explaining the Hadith [5, 11, 14, 23, 30, 40, 43-46]. They are urged to promote unity and solidarity, and to avoid selling their principles for personal gain [23, 46].
In summary, the sources highlight that Hadith knowledge is central to understanding and practicing Islam and that it promotes unity and love for the Prophet, while also warning against extremism and division. The sources emphasize that a true understanding of hadith comes from careful study, adherence to the chain of narration, and putting the teachings into practice.
Sahih Bukhari: A Comprehensive Overview
The sources discuss Sahih Bukhari as a central text in Islam, revered for its collection of hadith, and emphasize its importance in various aspects of Islamic faith and practice [1-4]. Here’s a detailed overview:
Compilation and Significance: Sahih Bukhari is described as one of the most reliable versions of Sahih Bukhari and is a highly respected collection of hadith [3]. It was compiled by Imam Muhammad bin Ismail al-Bukhari, who is portrayed as a great scholar, with a deep love for the Prophet Muhammad [4, 5]. Imam Bukhari is said to have memorized 100,000 authentic hadith and 200,000 non-authentic hadith [6, 7]. He selected hadith for the book from a collection of six lakh hadith [6].
Structure and Content:
The first book of Sahih Bukhari, Bada al-Wahi, begins with the mention of Prophethood and how the revelation was revealed to the Prophet Muhammad [4, 8]. This is unique, as other collections of hadith begin with other subjects, such as the Book of Faith [9].
The second book of Sahih Bukhari is Kitab al-Iman, or the Book of Faith. In this book, Imam Bukhari includes Bismillah, “In the name of Allah, the Most Gracious, the Most Merciful,” at the beginning of the second chapter, but did not do so in the first book, which is about the Prophet [10].
The last book of Sahih Bukhari is Kitab al-Tawheed, which deals with the concept of the oneness of God [11].
The book concludes with a condemnation of the Khawarij, an early Islamic sect considered heretical [12, 13].
The last hadith in the collection is on the glorification of God [11, 12].
The sources note that the hadith in Sahih Bukhari relate to the words and actions of the Prophet Muhammad but that the collection also includes the regulations, follow-ups, and comments of the scholars. The number of the hadith in the collection is approximately 7563, or 9082 if all the regulations, comments, and observations are counted. The number of hadith where only the words of the Prophet are recorded is 2607, with an additional 1341 pending [14].
The collection includes 97 books and 386 chapters [15]. The narrators in the collection number 1597, with 42 female narrators. There are 153 Companions of the Prophet who are mentioned, and 304 sheikhs who are direct teachers of Imam Bukhari [15].
Preservation and Transmission: The sources emphasize that the preservation of Hadith is a divine blessing. A specific manuscript of Sahih Bukhari written 750 years ago is highlighted to show the accuracy of transmission [3]. The manuscript, written by Imam Sharafuddin Union, was hidden and then rediscovered centuries later [3]. There are various manuscripts of Sahih Bukhari all over the world with little differences, which highlights its accurate preservation over time [3].
Commentaries and Gatherings:
The sources mention that scholars organize gatherings to celebrate the end of the reading of Sahih Bukhari, with the first such event held by Imam Ibn Hajar al-Asqalani [4, 16]. Such gatherings, and the recitation of the entire book, is referred to as Khatam al-Bukhari [17].
Commentaries on Sahih Bukhari, such as Fath al-Bari, by Imam Ibn Hajar al-Asqalani, are also discussed, highlighting the tradition of scholarly analysis of this text [16].
The sources note the historical practice of celebrating the completion of Sahih al-Bukhari, which involves large gatherings of scholars and other prominent people [16, 18].
Such gatherings are described as a way to revive the culture of knowledge [18].
Role in Islamic Life:
The recitation of Sahih Bukhari is believed to avert diseases, and to bring blessings and solutions to problems [19, 20]. The concept of tawassul, using Sahih Bukhari as a means to seek blessings from Allah is mentioned [1].
The text states that traveling with Sahih Bukhari can save a boat from sinking, highlighting the symbolic and spiritual value attached to the book [1, 19].
The sources state that Imam Bukhari included many hadith about the love of the Prophet and the importance of following the Prophet’s example [21, 22].
Authenticity and Scope:
The sources note that not all authentic hadith are contained within Sahih Bukhari [7, 23, 24]. Imam Bukhari himself is quoted as saying that he did not include all authentic hadith in his collection to keep it from being too large [7, 23]. It is emphasized that other books of hadith also contain authentic material, and that it is ignorance to limit authentic hadith to the contents of Sahih Bukhari alone [7, 23].
The sources discuss a seven-level system for categorizing the authenticity of hadith, which is based on which collections the hadith is found in, and whether the hadith meets the specific conditions of hadith scholars. According to this system, hadith in Sahih Bukhari and Sahih Muslim are the most authentic, but hadith in other collections may also be considered authentic [24].
Imam Bukhari’s Character and Methods:
Imam Bukhari is described as a very pious man, who was known for his devotion to prayer and reading the Quran [25]. It is mentioned that his mother prayed for him and that he was granted the blessing of having his eyesight restored.
The text highlights Imam Bukhari’s refusal to bring knowledge to the courts of kings, and his decision to be exiled instead, which is used as an example of how scholars should not compromise their principles for worldly gain [26, 27].
The text emphasizes Imam Bukhari’s love for the Prophet, as evidenced by the way he structured his book, by his selection of hadith, and by his personal devotion to the Prophet [8, 9].
Imam Bukhari is said to have visited the grave of the Prophet before writing his collection [19].
Narrators of Sahih Bukhari: The sources name various scholars who transmitted Sahih Bukhari and include both Muhaddith (scholars of hadith) and Sufi mystics [28]. The four most famous are Imam Ibrahim bin Muhaq Al-Nusr Ash’arabat al-Bukhari, Imam Abu Muhammad Hamad bin Shakir al-Nasfi, Imam Abu Talha Mansoor bin Muhammad al-Bazi, and Imam Abu Abdullah Muhammad bin Yusuf al-Farbari [15, 28]. Imam Farbari is considered the most reliable source for the text of Sahih Bukhari, and the lineage of the transmission of hadith to Imam Farbari is detailed [28].
Relevance to Contemporary Issues: The sources connect the importance of Sahih Bukhari to the contemporary issue of sectarianism, stating that Muslims should unite on the basis of shared beliefs and practices, as taught in the hadith. They also emphasize the need to avoid declaring other Muslims infidels [1, 2]. The text argues that the Khawarij, who are condemned at the end of Sahih Bukhari, are an example of the dangers of extremism and declaring other Muslims as infidels [12, 13].
In summary, Sahih Bukhari is portrayed as a highly important and reliable collection of hadith, compiled by a great scholar who was deeply devoted to the Prophet Muhammad. The text emphasizes the book’s importance in Islamic life, while also cautioning against limiting Hadith knowledge to this book alone and using it to justify division and extremism [8, 9].
Love for the Prophet Muhammad in Islam
The sources emphasize the profound significance of love for the Prophet Muhammad (Ishq Rasool and Mohabbate Rasool) in Islam, portraying it as a core element of faith and practice, and a central theme in Sahih Bukhari [1, 2]. Here’s a detailed exploration of this concept:
Centrality of Love for the Prophet: The sources assert that love for the Prophet is a fundamental aspect of Islamic belief and practice [2, 3]. It is presented not just as an emotion but as a defining principle that should shape the actions and character of a believer [4, 5]. This love is not just a matter of personal devotion but also a foundation for unity among Muslims [3].
Manifestations of Love: The sources describe several ways in which love for the Prophet is expressed:
Following the Prophet’s example: True love for the Prophet is demonstrated by adhering to his teachings and emulating his behavior and character [4, 5]. This includes adopting his ways in matters of eating, drinking, praying, and all other aspects of life [5].
Deep respect and longing: The sources highlight a deep respect and yearning for the Prophet. Imam Bukhari is described as having a deep love for the Prophet, and this love motivated him to collect hadith [6, 7].
Recitation of Sahih Bukhari: The text notes that the recitation of Sahih Bukhari, which contains the Prophet’s words and actions, is a form of expressing love and seeking blessings [8, 9].
Gatherings and celebrations: Organizing gatherings to celebrate the end of the reading of Sahih Bukhari is seen as an expression of love for the Prophet and a way to revive the culture of knowledge [10].
Love for the Prophet in Sahih Bukhari: The sources highlight Imam Bukhari’s emphasis on the Prophet’s love in his collection of hadith:
Starting with Prophethood: Imam Bukhari begins his book with the mention of Prophethood, which is unique among other hadith collections [2]. This is presented as an indication of the Imam’s focus on the Prophet and his message [11].
Hadith Selection: Imam Bukhari is said to have selected hadiths that show the love of the Prophet in various contexts [2]. He emphasizes the love of the Prophet in twelve places, while also including 33 hadith about the Prophet and his family, particularly Sayyida Fatima, Sayyidna Ali, Sayyidna Imam Hasan, and Sayyidna Imam Hussain [12].
Recurring Themes: The sources highlight recurring themes in Sahih Bukhari that demonstrate love and respect for the Prophet. For instance, the incident of the Prophet showing his face during prayer is repeated six times, and the incident of Sayyiduna Siddique Akbar stepping aside during prayer to allow the Prophet to lead the prayer is repeated nine times in the collection [12-15]. These are seen as examples of the Imam’s emphasis on the Prophet’s importance and the love and devotion he inspired [16, 17].
Emphasis on the Prophet’s character: The first few hadith in Sahih Bukhari highlight the Prophet’s character, including his good behavior after the first revelation, and Sayyidah Khadijah’s description of his virtues [11].
The Trunk of the Date Palm: The sources describe how Imam Bukhari includes hadith relating how the dry trunk of a date palm cried when the Prophet began using a different place to deliver his sermons. This illustrates the love and connection between the Prophet and even inanimate objects [17, 18].
Love as a Condition of Faith: The sources stress that love for the Prophet is so important that it is essential for the perfection of faith. The hadith states that a person’s faith is not complete until they love the Prophet more than they love their own lives, parents, and children [12, 19].
Love and Deeds: The sources make it clear that love for the Prophet must be accompanied by righteous deeds [4, 5]. It should not be a mere claim or empty custom [4]. True love, according to the sources, manifests in adopting the Prophet’s character and following his teachings [5].
Avoiding Extremism: The text emphasizes that love for the Prophet must be balanced with a proper understanding of Islamic teachings. It warns against extremism and declaring other Muslims infidels. The sources condemn the Khawarij, who are presented as an example of those who are considered heretics because they did not adhere to the Sunnah and the example of the Prophet and the companions [20, 21].
Love as a Means to Divine Favor: The sources connect love for the Prophet with seeking divine favor and blessings from God. They highlight that those who love the Prophet and follow his example are more likely to receive Allah’s grace [4, 5].
Unity Through Love: The sources portray love for the Prophet as a unifying factor for the Muslim community, and they emphasize the need to focus on the common love for the Prophet and his teachings as a means of overcoming division and sectarianism [1, 22].
In summary, the sources depict love for the Prophet Muhammad as an indispensable aspect of the Islamic faith, which is to be demonstrated through devotion, emulation of his character, and righteous deeds. This love is presented as a unifying force for the Muslim community, a means to seek divine favor, and a central theme in the teachings of Sahih Bukhari.
Islamic Schools of Thought: Unity in Diversity
The sources discuss religious schools of thought within Islam, emphasizing both their diversity and the need for unity despite differences. Here’s a breakdown of the key points:
Diversity of Schools: The sources acknowledge the existence of various Islamic schools of thought (Masalik) [1], including:
Schools of jurisprudence: Hanafi, Maliki, Shafi’i, and Hanbali schools of jurisprudence are specifically mentioned [2]. The speaker identifies as following the Hanafi school of jurisprudence [3].
Schools of belief: The speaker identifies as being from the Ahle Sunnat Wal Jamaat in terms of belief [3].
Sufi orders: The Qadri order (Tariqat) is mentioned [3].
Other groups: The text also refers to the Deoband and Bareilly schools of thought [4].
The text also refers to the Ahl al-Hadith or Salafi school of thought [5, 6].
Commonalities: Despite the differences between these schools, the sources stress that there are more commonalities than differences [7].
The core of the religion, the Hadith and Sunnah of the Prophet, is a common ground for all [1, 7]. The sources emphasize the need to focus on these commonalities rather than the differences to foster unity and solidarity [7].
Love for the Prophet is presented as the greatest common asset of the Ummah, and a basis for unity [3].
Validity of Different Schools: The sources suggest that all Islamic schools of thought contain some part of the truth [3, 8]. It is asserted that no single school possesses the entirety of the truth, but rather, each has a portion of it [3].
The speaker argues against the idea that only one school of thought is correct, stating that “the right lies in all of them” [3].
The speaker uses the analogy of heaven to demonstrate this point, stating it would be illogical for only one Imam to be admitted to heaven while others are excluded [8]. The speaker wonders where the Imams such as Sufyan Thori, Abdullah Bin Mubarak, Sufyan Ibn Aina, Waqi bin Al-Jarrah, Imam Bukhari and Imam Muslim will be placed if that were the case [8].
The text states that the truth is contained in all Islamic schools of thought and religions, and everyone has some part of the right [3].
Sectarianism and Conflict: The sources strongly condemn sectarianism and sub-religious conflicts, emphasizing the need to eliminate or minimize such divisions [7]. The sources express concern that differences have led to Muslims declaring each other infidels [1, 9]. The sources highlight that in the past, people used to bring non-Muslims into Islam through their good behavior, whereas now Muslims exclude each other from Islam because of minor differences [9].
Unity and Solidarity: The text emphasizes the importance of unity and solidarity (Ittihad wa Ittifaq) within the Ummah [1, 7]. The speaker calls for the demolition of the “walls” of division and confusion [1, 9] and for focusing on what unites Muslims, such as the love of the Prophet and the knowledge of the Hadith and Sunnah [1, 3, 7]. The sources call for unity based on the knowledge of Hadith and Sunnah [7].
Moderation and Centrality: The sources stress the need to create moderation and centrality within the Ummah and to eliminate extremism [3]. It is noted that there is a need to revive the culture of knowledge and connection with Hadith and Sunnah, which have been disconnected [5, 7].
The Importance of Knowledge: The sources see a connection between religious knowledge and unity. By focusing on commonalities through the Hadith and Sunnah, Muslims can avoid the problems of sectarianism and conflict.
In summary, the sources advocate for a balanced approach that acknowledges the diversity of Islamic schools of thought while emphasizing the need for unity, mutual respect, and a focus on the common ground of the Hadith, Sunnah, and love for the Prophet. The sources call for setting aside differences and sectarianism for the sake of unity within the Ummah and the pursuit of common religious goals.
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
This collection of articles, dated February 28 – March 1, 2024, explores a variety of topics relevant to Saudi Arabia and the wider world. The articles examine preparations for Ramadan, including market trends, price stability, and the tradition of gift-giving. Digital transformation within the Kingdom, as well as various aspects of Saudi culture, including the founding day, are discussed. Furthermore, the paper contains global issues such as NATO’s future and displacement in Palestine. The newspaper also features sports coverage, health advice, and local advertisements, giving a snapshot of life in Saudi Arabia.
Ramadan and Saudi Arabia in 2025: A Comprehensive Study Guide
I. Quiz
Answer each question in 2-3 sentences.
According to the texts, what are some ways that Ramadan in Saudi Arabia is marked by a sense of spirituality and community?
How has the التحول الوطني (National Transformation Program) impacted the lives of citizens in Saudi Arabia?
What indications are there in the source material that Ramadan is a significant commercial period in Saudi Arabia?
How are advancements in technology impacting the experience of Ramadan in Saudi Arabia, specifically at the المسجد الحرام and المسجد النبوي?
What are some economic concerns regarding Ramadan in Saudi Arabia as expressed in the documents?
In what ways does the article suggest Saudi Arabian culture is preserving its heritage and values?
How has the Kingdom used space technologies? What goals were announced?
Describe the dispute between former US President Trump and NATO. What solutions were proposed?
What is the essence of the Palestinian “right of return”? How does this apply to Israeli settlements in the West Bank?
How has Saudi Arabia supported its local artists?
II. Quiz – Answer Key
Ramadan is marked by a sense of spirituality and community in Saudi Arabia through increased prayer, attendance at mosques for Tarawih prayers, charitable giving, and a focus on family and social connections. This is further exemplified by the unique customs and traditions observed in the country during the holy month.
The التحول الوطني (National Transformation Program) has improved the lives of citizens in Saudi Arabia by providing easier digital government services that save time and effort, as well as enhancing the role of the private sector to improve citizens’ lives. The program also prioritizes the financial security of citizens and offers a variety of innovative services that reflect Saudi Arabia’s leadership and global development.
Ramadan is a significant commercial period in Saudi Arabia, with increased consumer spending, especially on food and gifts. The وزراة التجارة (Ministry of Commerce) announces seasonal discounts, and many businesses launch promotions targeting Ramadan shoppers, indicating the economic importance of the month.
Advancements in technology are significantly improving the Ramadan experience in Saudi Arabia by providing digital access to خطب (sermons) and translation services at the المسجد الحرام, as well as providing guidance and information to visitors in multiple languages. Additionally, digital platforms like “توكلنا” enhance emergency response and healthcare services during the holy month.
Economic concerns surrounding Ramadan include rising costs of Ramadan gifts, which can strain household budgets, especially for low-to-middle-income families. Some people resort to loans or savings depletion to keep up with social customs of lavish giving, leading to potential financial instability.
Despite the growing digital world, Saudi Arabia is working to preserve its heritage and values by integrating them with modernization, for instance, the Najdi architecture which is a blend of authentic historical elements. There are national celebrations and historical museums to preserve local history and culture.
Saudi Arabia has invested billions of dollars into technology to enhance its economy and has increased its presence in space. The stated goals for these projects are to provide advanced technological services, including Earth monitoring.
Trump repeatedly insisted that NATO member states increase their defense spending to 2% of their gross domestic product and threatened to reduce American support. One proposal to resolve the issues was for Europe to develop an independent defense.
The Palestinian “right of return” refers to the claim of Palestinians who were displaced during the establishment of Israel in 1948 to return to their former homes. Settlements in the West Bank, claimed as Palestinian territory, are considered illegal under international law.
Local Saudi artists are supported by building cultural coffee shops to share their work, helping architects use sustainable materials to design buildings, encouraging Islamic calligraphy, and highlighting authors in newspapers. This highlights a commitment to promote their work.
III. Essay Questions
Analyze the evolving role of technology in shaping the Ramadan experience in Saudi Arabia. How does technology enhance both the spiritual and practical aspects of the holy month, and what are the potential challenges or drawbacks of this increased reliance on technology?
Discuss the economic implications of Ramadan in Saudi Arabia. How does the holy month impact consumer spending, business activity, and the overall economy, and what measures are being taken to ensure economic stability and prevent price manipulation during this period?
Evaluate the challenges and opportunities facing حلف الناتو (NATO) in the context of changing global politics. How do factors such as shifting U.S. foreign policy, relations with Russia, and burden-sharing among member states impact the future of the alliance?
Examine the humanitarian situation in the West Bank and Gaza, focusing on the impact of Israeli policies and military operations on the Palestinian population. What are the key issues and challenges facing civilians in these areas, and what role do international organizations play in addressing their needs?
Explore the intersection of tradition and modernity in contemporary Saudi Arabian society, with specific reference to the topics and trends highlighted in the provided articles. How are cultural heritage, religious values, and technological advancements being integrated to shape the identity and development of the kingdom?
IV. Glossary of Key Terms
رمضان (Ramadan): The ninth month of the Islamic calendar, observed by Muslims worldwide as a month of fasting, prayer, reflection, and community.
المسجد الحرام (Al-Masjid al-Haram): The Great Mosque of Mecca, the holiest mosque in Islam, containing the Kaaba.
المسجد النبوي (Al-Masjid an-Nabawi): The Prophet’s Mosque in Medina, the second holiest mosque in Islam.
التحول الوطني (Al-Tahawul Al-Watani): The National Transformation Program, a Saudi Arabian initiative aimed at diversifying the economy, improving public services, and promoting sustainable development.
وزارة التجارة (Wizarat Al-Tijara): Ministry of Commerce, the government agency responsible for regulating and promoting trade and commerce in Saudi Arabia.
خطب (Khutbah): Sermons.
تكافل وتراحم (Takaful wa tarahum): Mutual support and compassion; Islamic virtues emphasized during Ramadan.
حلف الناتو (Helf Al-Nato): The North Atlantic Treaty Organization (NATO), a military alliance established in 1949.
القدس (Al-Quds): Jerusalem, a city of religious importance to Judaism, Christianity, and Islam.
الضفة الغربية (Al-Diffah Al-Gharbiyyah): The West Bank, a landlocked territory near the Mediterranean coast of Western Asia.
غزة (Ghazzah): Gaza, a self-governing Palestinian territory on the eastern coast of the Mediterranean Sea, that borders Egypt on the southwest and Israel on the east and north.
القدس الشرقية (Al-Quds al-Sharqia): East Jerusalem, a part of Jerusalem occupied by Israel in 1967 and later annexed.
Saudi Arabia: Vision 2030, Ramadan, and Regional Tensions in 2025
Okay, here’s a detailed briefing document summarizing the main themes and ideas from the provided excerpts.
Briefing Document: Analysis of “20702.pdf” Excerpts
Date: October 26, 2023 Source: Excerpts from “20702.pdf” (Various articles from the Al Riyadh newspaper, dated Friday-Saturday 28-2/1-3 2025, Issue: 20702)
Overview:
These excerpts from the Al Riyadh newspaper, dated February 28 – March 1, 2025, provide insights into various aspects of Saudi Arabian life, focusing on the approaching month of Ramadan, economic trends, cultural preservation, technological advancements, and regional political concerns. The overarching themes include:
The significance of Ramadan in Saudi society and the preparations surrounding it.
Economic developments, digitalization, and the impact of Vision 2030.
Cultural preservation and national identity.
Regional political tensions, particularly concerning Palestinian territories.
The evolving role of Saudi Arabia in the global landscape.
The future of NATO
Main Themes and Key Ideas:
1. Ramadan in Saudi Arabia:
Spiritual Significance: Ramadan is portrayed as a month of great spiritual importance, marked by increased religious observance, charitable activities, and a focus on community. “رمضان.. أبواب الجنة فتحت” (Ramadan… the gates of paradise are opened). The excerpts highlight the filling of mosques, increased prayer, and the spirit of “تكامل وتراحم” (solidarity and compassion).
Social Traditions: The article emphasizes the unique customs and traditions associated with Ramadan, noting how even non-Muslims in some countries participate in the spirit of the month by respecting the atmosphere. The Saudi society places a high value on the spiritual aspects of Ramadan, reinforcing Islamic values and social bonds.
Economic Impact: The excerpts discuss the significant increase in consumer spending during Ramadan, leading to seasonal sales and promotions. The Ministry of Commerce announces discounts on various products, including food, clothing, electronics, and home appliances. There’s a focus on ensuring fair pricing and preventing price manipulation. (Early on, the Ministry of Commerce announced seasonal discounts for the holy month of Ramadan and Eid).
Digitalization of Religious Services: The excerpts highlight the high level of preparedness at the Masjid al-Haram (Grand Mosque) in Mecca, with services enhanced by modern technology. This includes digital broadcasting of sermons in multiple languages, audio services, and informational services accessible through digital platforms. “المسجد الحرام في رمضان.. جاهزية عالية بخدمات تقنية” (The Grand Mosque in Ramadan… high readiness with technological services).
Ramadan Gift Giving: The text addresses the growing trend of Ramadan gift-giving and its economic implications. The practice is described as potentially becoming an “التزام مرهق” (onerous obligation), especially for families with limited income. The article suggests alternatives, such as exchanging homemade dishes or giving symbolic gifts.
Consumer Trends: There’s a focus on consumer behavior during Ramadan, including the tendency to buy excessive amounts of food, the increase in demand for meat, and the availability of promotional offers. The data indicates a move towards electronic payments and a positive reception of this trend by the public.
2. Economic Development and Vision 2030:
National Transformation Program: The excerpts emphasize the positive changes and growth driven by the National Transformation Program, including the development of digital government services and the strengthening of the private sector.
Digital Economy: The significant number of point-of-sale transactions (210,088,000 transactions worth 13,045,971,000 riyals during Feb 16-22, 2025) reflects Saudi Arabia’s leadership in the digital realm and the increasing adoption of electronic payments.
Technology Investment: Saudi Arabia is investing heavily in technology, with spending exceeding $34.5 billion in various fields.
Economic Diversification: The Saudi government is actively working on economic diversification, reducing reliance on oil, and enabling the private sector. The Kingdom is striving to become a leading nation through the National Transformation Program, focusing on strategic partnerships and economic development.
3. Cultural Preservation and National Identity:
Architectural Heritage: The excerpts touch on the importance of preserving architectural heritage, highlighting the “Najdi style” as a symbol of Saudi identity.
Day of Founding: It discussed the official website for the Saudi “Day of Founding”, where a rich literary style is discovered based on historical events. It reveals a long history that proves its existence before Islam.
Importance of Saudi Figures in Literature: A reminder of a number of important Saudi writers who have had a major influence on the history of literature and culture in Saudi Arabia.
The Concept of Shame (Aib): The article discusses the concept of Aib and its influence on the norms of the society, which may be used as a method for regulating social behaviour. However, it may be controversial in several cases.
Al-Hussain, the man of good qualities: Discussing Ibrahim Al-Hussain’s journey to help people and stand with them and stand by their needs.
4. Regional Political Tensions:
Palestinian Territories: The excerpts express concern about the situation in the West Bank, with reports of demolitions, displacement of Palestinians, and Israeli military operations. The articles note the potential for permanent displacement and the impact on healthcare. “شبح التهجير يطارد الفلسطينيين” (The specter of displacement haunts the Palestinians). The article mentions the destruction of homes in Jenin and the fears of long-term Israeli presence.
Concerns about Expansion: The reports suggest that some Israeli officials are calling for the annexation of the West Bank, which raises concerns about the future of a Palestinian state.
5. International Affairs and NATO
Concerns over US Commitment to NATO: Donald Trump’s questioning of NATO’s relevance and his call for increased defense spending by member states raise concerns about the future of the alliance. “»الناتو« السيناريو المقبل.. البقاء أو االنهيار” (NATO The next scenario… survival or collapse).
Quotes of Significance:
“رمضان.. أبواب الجنة فتحت” (Ramadan… the gates of paradise are opened) – Illustrates the profound spiritual significance of Ramadan.
“المسجد الحرام في رمضان.. جاهزية عالية بخدمات تقنية” (The Grand Mosque in Ramadan… high readiness with technological services) – Highlights the integration of technology in religious services.
(Early on, the Ministry of Commerce announced seasonal discounts for the holy month of Ramadan and Eid). – Shows The importance of commerce in celebrating the month of Ramadan
(The specter of displacement haunts the Palestinians). – Highlights the severity of political tension on Palestinian territories.
Key Trends
Digital Transformation: Saudi Arabia’s rapid adoption of digital technologies across various sectors is a prominent trend.
Economic Restructuring: Vision 2030 is driving significant changes in the Saudi economy, with a focus on diversification and private sector growth.
Cultural Preservation alongside Modernization: There’s an effort to balance cultural preservation with modernization and development.
Continued Regional Instability: The political situation in the Palestinian territories remains a significant concern.
Potential Implications:
The focus on Ramadan provides an opportunity for businesses to cater to increased consumer demand while upholding Islamic values.
The growth of the digital economy can attract foreign investment and create new job opportunities.
Preserving cultural heritage can strengthen national identity and promote tourism.
Regional political tensions may require Saudi Arabia to play a mediating role.
The outcome of the US commitment to NATO has a direct impact in the Kingdom of Saudi Arabia.
Further Research:
Investigate the specific initiatives under the National Transformation Program and their impact on various sectors.
Analyze the economic impact of Ramadan on different industries in Saudi Arabia.
Examine Saudi Arabia’s role in regional peace efforts and its relationship with international organizations.
Investigate the role of Saudi Arabia in maintaining NATO
Ramadan 2025: Saudi Arabia, West Bank, and Global Implications
Ramadan and Saudi Arabia in 2025: An FAQ
1. What is the significance of Ramadan for Muslims, particularly in Saudi Arabia?
Ramadan holds immense spiritual significance for Muslims worldwide, including those in Saudi Arabia. It is considered the most blessed month, characterized by heightened religious observance, community, and charitable activities. In Saudi Arabia, this is reflected in increased attendance at mosques for Tarawih prayers, greater emphasis on prayer and remembrance of God, and a strong sense of unity and social cohesion. The Saudi community places great importance on upholding the Islamic values and traditions associated with the holy month.
2. What are some of the changes and developments occurring in Saudi Arabia that are highlighted in the provided articles?
The articles spotlight the significant transformations in Saudi Arabia, driven by the National Transformation Program. These changes include the expansion of digital government services, improvements in quality of life for citizens, and the Kingdom’s growing digital prowess. This is reflected in the substantial increase in point-of-sale transactions, indicating the public’s embrace of electronic payments. These changes also highlight the Kingdom’s focus on economic diversification and the adoption of cutting-edge technologies.
3. How is Saudi Arabia preparing for and facilitating religious observances during Ramadan in Mecca and Medina?
Saudi Arabia invests heavily in ensuring a smooth and spiritually fulfilling experience for pilgrims and residents during Ramadan. This includes comprehensive readiness plans for the Grand Mosque in Mecca and the Prophet’s Mosque in Medina, with technical services facilitating communication in multiple languages. There is also the availability of digital content, such as translated sermons and lectures, and an organized volunteer force to assist visitors. Health and safety measures are also prioritized, with strategic positioning of medical facilities and the use of technology to manage emergency response.
4. How have shopping habits and spending patterns changed during Ramadan in Saudi Arabia, and what are the economic implications?
There is a marked increase in consumer demand for food and other essential goods during Ramadan, which can cause inflationary pressure on prices. Families and restaurants tend to purchase larger quantities of food, and the tradition of giving Ramadan gifts has evolved, with costs rising in recent years. This has led some families to re-evaluate their budgets, and consider less extravagant gift-giving options, and many shops offer Ramadan deals to compete for consumer spending.
5. What are some of the trends observed in Ramadan gift-giving, and what advice is given to families regarding this custom?
The practice of giving gifts during Ramadan has become increasingly elaborate and expensive, placing a financial burden on some families. Experts recommend returning to simpler traditions, like exchanging homemade dishes or giving small, personalized gifts. They also advise families to plan their spending in advance, seek out discounts, and consider group purchases to reduce individual costs. The articles emphasize that excessive spending on gifts can detract from the spiritual purpose of Ramadan.
6. What issues are Palestinians in the West Bank facing during Ramadan, according to these sources?
The articles highlight the difficult circumstances faced by Palestinians in the West Bank. Israeli military operations in refugee camps have resulted in displacement and destruction of homes, raising concerns about forced displacement. There are also reports of increased violence and attacks on healthcare facilities, as well as concerns regarding restrictions on aid and access to services for Palestinian refugees.
7. What was the role of Ibrahim al-Hussayyin in relation to Sheikh Ibn Baz, and what does this reveal about scholarly life?
Ibrahim al-Hussayyin served as a close companion and assistant to Sheikh Ibn Baz for many years. He managed his office, read correspondence, and accompanied him on travels. This demonstrates the important role played by assistants and students in supporting prominent scholars.
8. What potential challenges does the article “The Future Scenario of NATO… Survival or Collapse” pose for the NATO alliance, especially considering the possibility of Donald Trump’s return to power?
The article highlights the potential for instability within NATO due to pressures from the United States under a possible Trump administration. Key challenges include demands for increased defense spending from member states, potential reduction of U.S. commitments to European security, and a more conciliatory approach towards Russia. These factors could weaken alliance cohesion, undermine deterrence strategies, and force European nations to develop independent defense policies.
Saudi Arabia Prepares for Ramadan
سعودi Arabia gears up for Ramadan with a focus on spirituality, community, and charitable giving. Preparations include:
Charitable Initiatives Ramadan is considered a prime time for philanthropy, with many individuals and organizations increasing their efforts to aid those in need. Digital platforms are used to streamline donations and distribute zakat and sadaqat to eligible recipients.
Community Events Ramadan is a time for strengthening family and community bonds.
Families gather daily for Iftar, sharing meals with relatives and neighbors.
Many mosques and community centers organize mass Iftar gatherings for the needy and travelers.
The Saudi Ministry of Culture usually launches events to celebrate cultural heritage and enhance the atmosphere of Ramadan.
Increased Religious Observance During Ramadan, there is a greater emphasis on religious practices.
Mosques are filled with worshippers, and Tarawih prayers are performed.
Many Muslims aim to complete a reading of the Quran during the month and engage in Itikaf in the last ten days.
Economic Activity The Ministry of Commerce announces seasonal discounts for Ramadan, with many establishments and online stores participating.
The markets see an increase in demand for consumer goods, especially food.
The commercial sector sees increased activity, and shopping centers record a rise in sales of goods and products related to Ramadan.
Maintaining Price Stability The Saudi government works to ensure the availability of essential goods at reasonable prices and intensifies inspections to protect consumers.
** духовные приготовления ** В Медине аль-Мунавваре готовятся к Рамадану, удваивая усилия по уборке и обеспечивая медицинское обслуживание и транспорт.
Saudi Arabia: Digital Transformation in Finance and Charity
The sources highlight aspects of digital transformation in Saudi Arabia, particularly in the context of facilitating financial transactions and charitable giving.
Here’s a summary:
Digital Financial Services: The Saudi Central Bank (SAMA) is actively promoting digital transformation by expanding options for electronic payments and encouraging their adoption. This includes developing the infrastructure for national payment systems, improving digital payment solutions, and increasing the efficiency and decreasing the costs of financial transactions.
Digital Charitable Donations: During Ramadan, a period marked by increased charitable giving, digital platforms are used to streamline the donation process. This facilitates assistance to those in need and reinforces the spirit of solidarity among individuals.
Saudi Arabia: Ramadan Food Price Analysis
The sources discuss food prices in Saudi Arabia, particularly in the context of Ramadan.
Price Fluctuations: During the month of Ramadan, data indicates some fluctuations in food prices.
The price of Chinese garlic increased by 1.7% to 12.26 riyals per kilo, while the price of local jute mallow recorded a decrease of 0.4% to 2.55 riyals.
Parsley prices rose by 1.9% to 1.07 riyals, while spinach prices increased by 4.1% to 1.26 riyals.
Increased Demand: The holy month of Ramadan typically sees increased demand for food.
There is a high demand from charities and individuals for food items needed for Iftar meals.
The increased demand plays a significant role in boosting sales of food and Ramadan products.
Price Ranges for Meat: Meat prices vary, with larger-sized Swakni ranging around 2000 riyals, medium-sized Swakni selling for about 1700 riyals, and Najdi sheep priced at 1300 riyals. The price of al-Harri is about 1500 riyals.
Price of Dates: The availability of Saudi dates, with over 300 varieties, ensures options suitable for different consumer segments in the local markets.
Price Concerns: There are concerns about potential احتكار(hoarding) by some traders and rising annual prices, especially in the days leading up to and the first week of Ramadan.
Price Stability Efforts: The Saudi government works to ensure the availability of essential goods at reasonable prices and intensifies inspections to protect consumers.
NATO’s Uncertain Future: Challenges, Scenarios, and the U.S. Role
The future of the North Atlantic Treaty Organization (NATO) is uncertain, with scenarios ranging from continued unity to potential collapse, as explored in the sources.
Key points regarding NATO’s future:
Challenges to NATO:
Political tensions and doubts: Criticism, threats, and doubts regarding the U.S.’s commitment to NATO reveal significant challenges for the alliance’s future.
Financial Burdens: The U.S. has historically been a major contributor to NATO, providing substantial financial support. There have been calls for other member states to increase their defense spending.
Internal Disputes: Disputes among member states, such as those between Turkey and other European nations, pose challenges to NATO’s unity and effectiveness.
Scenarios for NATO’s Future:
Potential Collapse: The possibility of the U.S. withdrawing from NATO is considered a radical step. While there are legal constraints, a U.S. president might exploit legal loopholes to initiate withdrawal.
NATO’s Purpose:
Original Goal: NATO was founded in 1949 to counter Soviet expansion in Europe, with the U.S. providing essential support due to Europe’s limited resources after World War II.
Revival: Russia’s invasion of Ukraine in 2022 has seemingly revitalized the alliance by uniting members against a common adversary.
U.S. Role:
Dominant Force: The U.S. is a central player in NATO, contributing approximately 70% of the alliance’s military strength and defense spending.
Impact of US policy: Some analysts believe that certain policies may lead to a reduction in the U.S.’s role in funding and supporting NATO activities, potentially causing a rift between the U.S. and European allies.
Palestinian Displacement: Israeli Operations in the West Bank
The sources discuss Palestinian displacement in the context of Israeli military operations in the West Bank.
Key points:
Displacement:
Jenin: Israeli forces have reportedly demolished wide areas of the Jenin refugee camp, displacing at least 40,000 Palestinians in Jenin and nearby Toukarem.
Strategic Intent: An expert in military intelligence suggests the actions are part of a broader and continuous operation of displacement targeting residents, especially in refugee camps near Toukarem and Jenin.
Israeli Perspective:
Counterterrorism: Israel has justified the operations as necessary to target Iranian-backed groups, such as Hamas and Islamic Jihad, which it claims have infiltrated refugee camps.
Palestinian Perspective:
Accusations of Occupation: Islamic Jihad views the extensive evacuations and the presence of Israeli tanks as confirmation of an Israeli plan to seize the West Bank by force.
Condemnation: Islamic Jihad has condemned the actions as an aggressive step aimed at uprooting the Palestinian people from their land.
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
The provided text consists of a series of coding tutorials and projects focused on Python. The initial tutorials cover fundamental Python concepts, including Jupyter Notebooks, variables, data types, operators, and conditional statements. Later tutorials explore looping constructs, functions, data type conversions, and practical projects like building a BMI calculator. The final segment introduces web scraping using Beautiful Soup and Requests, culminating in a project to extract and structure data from a Wikipedia table into a Pandas DataFrame and CSV.
Python Fundamentals: A Study Guide
Quiz
What is Anaconda, and why is it useful for Python development?
Anaconda is an open-source distribution of Python and R, containing tools like Jupyter Notebooks. It simplifies package management and environment setup for Python projects.
Explain the purpose of a Jupyter Notebook cell, and how to execute code within it.
A Jupyter Notebook cell is a block where code or markdown text can be written and executed. Code is executed by pressing Shift+Enter, which runs the cell and moves to the next one.
Describe the difference between code cells and markdown cells in a Jupyter Notebook.
Code cells contain Python code that can be executed, while markdown cells contain formatted text for notes and explanations. Markdown cells use a simple markup language for formatting.
What is a variable in Python, and how do you assign a value to it?
A variable is a named storage location that holds a value. Values are assigned using the assignment operator (=), such as x = 10.
Explain why variable names are case-sensitive, and provide an example.
Python treats uppercase and lowercase letters differently in variable names. For example, myVar and myvar are distinct variables.
List three best practices for naming variables in Python.
Use descriptive names, follow snake_case (words separated by underscores), and avoid starting names with numbers.
What are the three main numeric data types in Python?
The three main numeric data types are integers (whole numbers), floats (decimal numbers), and complex numbers (numbers with a real and imaginary part).
Explain the difference between a list and a tuple in Python.
A list is mutable (changeable), while a tuple is immutable (cannot be changed after creation). Lists use square brackets, while tuples use parentheses.
Describe the purpose of comparison operators in Python, and give three examples.
Comparison operators compare two values and return a Boolean result (True or False). Examples: == (equal to), != (not equal to), > (greater than).
Explain the purpose of the if, elif, and else statements in Python.
if executes a block of code if a condition is true. elif checks additional conditions if the previous if or elif conditions are false. else executes a block of code if none of the preceding conditions are true.
Quiz Answer Key
Anaconda is an open-source distribution of Python and R, containing tools like Jupyter Notebooks. It simplifies package management and environment setup for Python projects.
A Jupyter Notebook cell is a block where code or markdown text can be written and executed. Code is executed by pressing Shift+Enter, which runs the cell and moves to the next one.
Code cells contain Python code that can be executed, while markdown cells contain formatted text for notes and explanations. Markdown cells use a simple markup language for formatting.
A variable is a named storage location that holds a value. Values are assigned using the assignment operator (=), such as x = 10.
Python treats uppercase and lowercase letters differently in variable names. For example, myVar and myvar are distinct variables.
Use descriptive names, follow snake_case (words separated by underscores), and avoid starting names with numbers.
The three main numeric data types are integers (whole numbers), floats (decimal numbers), and complex numbers (numbers with a real and imaginary part).
A list is mutable (changeable), while a tuple is immutable (cannot be changed after creation). Lists use square brackets, while tuples use parentheses.
Comparison operators compare two values and return a Boolean result (True or False). Examples: == (equal to), != (not equal to), > (greater than).
if executes a block of code if a condition is true. elif checks additional conditions if the previous if or elif conditions are false. else executes a block of code if none of the preceding conditions are true.
Essay Questions
Discuss the differences between for loops and while loops in Python. Provide examples of situations where each type of loop would be most appropriate.
Explain the concept of web scraping using Python. What libraries are commonly used for web scraping, and what are some ethical considerations involved in web scraping?
Describe the process of defining and calling functions in Python. Explain the purpose of function arguments and return values, and provide examples of how to use them effectively.
Explain the different data types in Python and provide examples of using them in variable assignments, and data manipulation.
Explain the difference between an arbitrary argument, an arbitrary keyword argument, and an ordinary argument, and what are the use cases for each one.
Glossary of Key Terms
Anaconda: An open-source distribution of Python and R used for data science and machine learning, simplifying package management.
Jupyter Notebook: An interactive web-based environment for creating and sharing documents containing live code, equations, visualizations, and explanatory text.
Cell (Jupyter): A block in a Jupyter Notebook where code or markdown can be entered and executed.
Markdown: A lightweight markup language used for formatting text in markdown cells.
Variable: A named storage location that holds a value in a program.
Data Type: The classification of a value, determining the operations that can be performed on it (e.g., integer, string, list).
Integer: A whole number (positive, negative, or zero).
Float: A number with a decimal point.
String: A sequence of characters.
List: An ordered, mutable collection of items.
Tuple: An ordered, immutable collection of items.
Set: An unordered collection of unique items.
Dictionary: A collection of key-value pairs.
Comparison Operator: Symbols used to compare two values (e.g., ==, !=, >, <).
Logical Operator: Symbols used to combine or modify Boolean expressions (e.g., and, or, not).
if Statement: A conditional statement that executes a block of code if a condition is true.
elif Statement: A conditional statement that checks an additional condition if the preceding if condition is false.
else Statement: A conditional statement that executes a block of code if none of the preceding if or elif conditions are true.
for Loop: A control flow statement that iterates over a sequence (e.g., list, tuple, string).
while Loop: A control flow statement that repeatedly executes a block of code as long as a condition is true.
Function: A reusable block of code that performs a specific task.
Argument: A value passed to a function when it is called.
Return Value: The value that a function sends back to the caller after it has finished executing.
Web Scraping: Extracting data from websites using automated software.
Beautiful Soup: A Python library for parsing HTML and XML documents, making it easier to extract data from web pages.
Request: The act of asking a URL for its information.
HTTP request: A request using the standard “Hypertext Transfer Protocol,” which is the foundation for data communication on the World Wide Web.
CSV file: A Comma Separated Value file, which allows data to be saved in a table-structured format.
Pandas data frame: A two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes.
List comprehension: An elegant syntax for creating lists based on existing iterables. It provides a concise way to generate lists using a single line of code, making it efficient and readable.
Python Programming and Web Scraping Tutorial
Okay, I have reviewed the provided text and here’s a briefing document summarizing its main themes and important ideas:
Briefing Document: Python Basics and Web Scraping Tutorial
Overall Theme:
This document contains the script for a video tutorial aimed at teaching beginners the fundamentals of Python programming, including setting up the environment, covering core concepts, and introducing web scraping with Beautiful Soup. The tutorial is structured as a hands-on lesson, walking the viewer through installing Anaconda, using Jupyter Notebooks, understanding variables, data types, operators, control flow (if/else, for loops, while loops), functions, data type conversions, and finally applying these skills to a web scraping project.
Key Ideas and Facts:
Setting up the Python Environment:
The tutorial recommends installing Anaconda, describing it as “an open source distribution of python and our products. So within Anaconda is our jupyter notebooks as well as a lot of other things but we’re going to be using it for our Jupiter notebooks.”
It walks through the Anaconda installation process, emphasizing the importance of selecting the correct installer for the operating system (Windows, Mac, or Linux) and system architecture (32-bit or 64-bit).
Introduction to Jupyter Notebooks:
Jupyter Notebooks are the primary environment for writing and executing Python code in the tutorial. “Right here is where we’re going to be spending 99% of our time in future videos this is where we’re going to write all of our code.”
Notebooks are comprised of “cells” where code or markdown can be written.
Markdown is introduced as a way to add comments and organization to the notebook: “markdown is its own kind of you could say language but um it’s just a different way of writing especially within a notebook.”
Basic notebook operations are explained, including saving, renaming, inserting/deleting cells, copying/pasting cells, moving cells, running code, interrupting the kernel, and restarting the kernel.
Python Variables:
A variable is defined as “basically just a container for storing data values.”
Variables are dynamically typed in Python, meaning the data type is automatically assigned based on the value assigned to the variable.
Variables can be overwritten with new values.
Multiple variables can be assigned values simultaneously (e.g., x, y, z = 1, 2, 3).
Multiple variables can be assigned the same value (e.g., x = y = z = “hello”).
Lists, dictionaries, tuples, and sets can all be assigned to variables.
The tutorial covers naming conventions such as camel case, Pascal case, and snake case, recommending snake case for readability: “when I’m naming variables I usually write it in snake case because I just find it a lot easier to read because each word is broken up by this underscore”.
It also covers invalid naming practices and what symbols can be used within variable names.
Strings can be concatenated using the + operator, but you can’t directly concatenate strings and numbers within a variable assignment (you can in a print statement using commas).
Python Data Types:
The tutorial covers the main data types in Python.
Numeric: Integers, floats, and complex numbers.
Boolean: True or False values.
Sequence Types: Strings, lists, and tuples.
Set: Unordered collections of unique elements.
Dictionary: Key-value pairs.
Strings can be defined with single quotes, double quotes, or triple quotes (for multi-line strings). Strings are arrays of bytes representing Unicode characters.
Indexing in strings starts at zero and can use negative indices to access characters from the end of the string. Slicing also works.
Lists are mutable, meaning their elements can be changed after creation using their indexes. Lists are index just like a string is. “One of the best things about lists is you can have any data type within them.”
Tuples are immutable, and lists and tuples are very similar with that being one main exception. “typically people will use tupal for when data is never going to change”.
Sets only contain unique values and are unordered.
Dictionaries store key-value pairs, and values are accessed by their associated keys. Dictionaries are changeable. “within a data type we have something called a key value pair… we have a key that indicates what that value is attributed to”.
Operators:
Comparison Operators: Used to compare values (e.g., ==, !=, >, <, >=, <=).
Logical Operators: Used to combine or modify boolean expressions (and, or, not).
Membership Operators: Used to check if a value exists within a sequence (in, not in).
Control Flow:
If/Elif/Else Statements: Used to execute different blocks of code based on conditions. The tutorial mentions “You can have as many ill if statements as you want but you can only have one if statement and one else statement”. Nested if statements are also covered.
For Loops: Used to iterate over a sequence, the diagram in the text walks through this, ending in “exit the loop and the for loop would be over”.
While Loops: Used to repeatedly execute a block of code as long as a condition is true. Break statements, continue statements, and using else statements to create a “counter” are also covered.
Functions:
Functions are defined using the def keyword.
Arguments can be passed to functions.
The tutorial describes many types of arguments including custom arguments, multiple arguments, arbitrary arguments, and keyword arguments. Arbitrary arguments use *args, and arbitrary keyword arguments use **kwargs.
Data Type Conversion:
Functions like int(), str(), list(), tuple(), and set() are used to convert between data types. This is important because the tutorial also says “it cannot add both an integer and a string.” Converting a list to a set automatically removes duplicate elements.
BMI Calculator Project:
The tutorial walks through building a BMI calculator.
The program takes user input for weight (in pounds) and height (in inches).
The BMI is calculated using the formula: weight * 703 / (height * height).
The program then uses if/elif/else statements to categorize the BMI into categories like underweight, normal weight, overweight, obese, severely obese, and morbidly obese.
The program uses input() for user input, which is then converted to an integer.
Web Scraping Project:
Introduction to Web Scraping: Web scraping is the process of extracting data from websites.
Libraries Used:requests: Used to send HTTP requests to retrieve the HTML content of a webpage.
Beautiful Soup: Used to parse the HTML content and make it easier to navigate and extract data. “Beautiful Soup takes this messy HTML or XML and makes it into beautiful soup”.
pandas: Used for data manipulation and analysis, specifically creating a DataFrame to store the scraped data. “We can use Pandas and manipulate this”.
Steps in Web Scraping:Send a Request: Use the requests library to get the HTML content of the target webpage.
Parse the HTML: Use Beautiful Soup to parse the HTML content into a navigable data structure.
Locate Elements: Use Beautiful Soup’s methods (e.g., find(), find_all()) to locate the specific HTML elements containing the data you want to extract.
Extract Data: Extract the text or attributes from the located HTML elements.
Store Data: Store the extracted data in a structured format, such as a pandas DataFrame or a CSV file.
Key Beautiful Soup Methods:find(): Finds the first element that matches the specified criteria.
find_all(): Finds all elements that match the specified criteria.
HTML Element Attributes: The tutorial mentions the importance of HTML element attributes (e.g., class, href, id) for targeting specific elements with Beautiful Soup.
Targeting Elements with Classes: The .find() and .find_all() methods can be used to select elements based on their CSS classes.
Navigating HTML Structure: The tutorial demonstrated how to navigate the HTML structure to locate specific data elements, particularly focusing on table, tr (table row), and td (table data) tags.
Data Cleaning: The tutorial showed how to clean up the extracted data by stripping whitespace from the beginning and end of the strings.
Creating Pandas DataFrame: The scraped data is organized and stored into a pandas DataFrame.
Exporting Data to CSV: The tutorial shows how to export the data in a data frame to a CSV file.
Quotes:
“Right here is where we’re going to be spending 99% of our time in future videos this is where we’re going to write all of our code.”
“a variable is basically just a container for storing data values.”
“when I’m naming variables I usually write it in snake case because I just find it a lot easier to read because each word is broken up by this underscore”
“typically people will use tupal for when data is never going to change”
“Beautiful Soup takes this messy HTML or XML and makes it into beautiful soup”
“we can use Pandas and manipulate this”
“It cannot add both an integer and a string”
Overall Assessment:
The provided text outlines a comprehensive introductory Python tutorial suitable for individuals with little to no prior programming experience. It covers a wide range of essential concepts and techniques, culminating in practical projects that demonstrate how these skills can be applied. The step-by-step approach and clear explanations, supplemented by hands-on examples, should make it accessible and engaging for beginners. However, some familiarity with HTML is assumed for the web scraping portion.
Python Programming: Basics and Fundamentals
Python Basics & Setup
1. What is Anaconda and why is it recommended for Python beginners?
Anaconda is an open-source distribution of Python and R, containing tools like Jupyter Notebooks. It simplifies setting up a Python environment, especially for beginners, by providing pre-installed packages and tools, avoiding individual installations and configurations that can be complex.
2. What is a Jupyter Notebook and how do you use it to write and run Python code?
A Jupyter Notebook is an interactive environment where you can write and execute Python code, as well as include formatted text (Markdown), images, and other content. You create cells within the notebook, type code, and then run each cell individually by pressing Shift+Enter.
3. What are variables in Python and why are they useful?
Variables are containers for storing data values. They are useful because they allow you to assign a name to a value (like a number, string, or list) and then refer to that value throughout your code by using the variable name, without having to rewrite the value itself.
4. How does Python automatically determine the data type of a variable, and what are some common data types?
Python uses dynamic typing, meaning it automatically infers the data type of a variable based on the value assigned to it. Common data types include integers (whole numbers), floats (decimal numbers), strings (text), Booleans (True/False), lists, dictionaries, tuples, and sets.
Python Fundamentals & Usage
5. What are the key differences between lists, tuples, and sets in Python?
Lists: Ordered, mutable (changeable) collections of items. They allow duplicate values.
Tuples: Ordered, immutable (unchangeable) collections of items. They also allow duplicate values.
Sets: Unordered collections of unique items. Sets do not allow duplicate values.
6. What are comparison, logical, and membership operators in Python, and how are they used?
Comparison Operators: Used to compare values (e.g., == (equal), != (not equal), > (greater than), < (less than)). They return Boolean values (True or False).
Logical Operators: Used to combine or modify Boolean expressions (e.g., and, or, not).
Membership Operators: Used to test if a value is present in a sequence (e.g., in, not in).
7. Explain the purpose of if, elif, and else statements in Python, and how they control the flow of execution.
if, elif (else if), and else statements are used to create conditional blocks of code. The if statement checks a condition, and if it’s true, the code block under the if statement is executed. elif allows you to check additional conditions if the previous if or elif conditions were false. The else statement provides a default code block to execute if none of the if or elif conditions are true.
8. How do for and while loops work in Python, and what are the differences between them?
for Loops: Used to iterate over a sequence (like a list, tuple, or string) and execute a block of code for each item in the sequence.
while Loops: Used to repeatedly execute a block of code as long as a specified condition is true. The loop continues until the condition becomes false.
The main difference is that for loops are typically used when you know in advance how many times you want to iterate, while while loops are used when you want to repeat a block of code until a specific condition is no longer met.
Creating Jupyter Notebooks with Anaconda
To create a notebook in Anaconda using jupyter notebooks, these steps can be followed:
Download Anaconda, which is an open source distribution of Python and R products, from the Anaconda website. Make sure to select the correct installer for your operating system (Windows, Mac, or Linux). For Windows users, it’s important to check the system settings to determine if it’s a 32-bit or 64-bit system.
Install Anaconda by clicking ‘next’ on the installer window. Review the license agreement and click ‘I agree’. Choose the installation type, either for the current user only or for all users on the computer. Select the file path for the installation, ensuring there is enough disk space (approximately 3.5 GB).
In the advanced options, it is not recommended to add Anaconda to the path environment variable unless you are experienced with Python. It is safe to register Anaconda as the default Python version. Allow the installation process to complete.
After the installation is complete, search for and open Anaconda Navigator.
In Anaconda Navigator, launch jupyter Notebook. This will open a new tab in the default web browser. If this is the first time opening jupyter Notebook, the file directory may be blank.
In the jupyter Notebook interface, go to the ‘new’ drop down and select ‘Python 3 (ipykernel)’ to create a new notebook with a Python 3 kernel.
A new jupyter Notebook will open where code can be written. This is where code will be written in future tutorials.
In the notebook, there are cells where code can be typed. To run the code in a cell, press Shift + Enter.
Besides writing code, markdown can be used to add comments and organize the notebook. To use markdown, type a hashtag/pound sign (#) followed by the text.
After creating a jupyter Notebook, the title can be changed by clicking on the name at the top of the page. It is also possible to insert cells, delete cells, copy and paste cells and move cells up or down.
Python Code: Concepts and Web Scraping
Python code involves several key concepts, including variables, data types, operators, control flow (if statements, loops), functions, and web scraping.
Variables:
Are containers for storing data values, such as numbers or strings.
A value can be assigned to a variable using the equal sign (=), for example, x = 22 assigns the value 22 to the variable x.
The print() function displays the value of a variable. For example, print(x) would output 22 if x has been assigned the value of 22.
Python automatically assigns a data type to a variable based on the assigned value.
Variables can be overwritten with new values.
Variables are case-sensitive.
Multiple values can be assigned to multiple variables. For example, x, y, z = “chocolate”, “vanilla”, “rocky road” assigns “chocolate” to x, “vanilla” to y, and “rocky road” to z.
Multiple variables can be assigned to one value. For example, x = y = z = “root beer float” assigns “root beer float” to all three variables.
Variables can be used in arithmetic operations. For example, y = 3 + 2 assigns the value 5 to the variable y.
Variables can be combined within a print statement using the + operator for strings or commas to combine different data types.
Data Types:
Are classifications of the data that you are storing.
Numeric data types include integers, floats, and complex numbers.
Integers are whole numbers, either positive or negative.
Floats are decimal numbers.
Complex numbers are numbers with a real and imaginary part, where j represents the imaginary unit.
Booleans have two built-in values: True or False.
Sequence types include strings, lists, and tuples.
Strings are arrays of bytes representing Unicode characters and can be enclosed in single quotes, double quotes, or triple quotes. Triple quotes are used for multi-line strings. Strings can be indexed to access specific characters.
Lists store multiple values and are changeable (mutable). Lists are defined using square brackets []. Lists can contain different data types. Items can be added to the end of a list using .append(). Items in a list can be changed by referring to the index number. Lists can be nested.
Tuples are similar to lists but are immutable, meaning they cannot be modified after creation. Tuples are defined using parentheses ().
Sets are unordered collections of unique elements. Sets do not allow duplicate elements. Sets are defined using curly brackets {}.
Dictionaries store key-value pairs. Dictionaries are defined using curly brackets {}, with each key-value pair separated by a colon :. Dictionary values are accessed using the key. Dictionary items can be updated, and key-value pairs can be deleted.
Operators:
Comparison operators compare two values.
== (equal to)
!= (not equal to)
> (greater than)
< (less than)
>= (greater than or equal to)
<= (less than or equal to)
Logical operators combine conditional statements.
and (returns True if both statements are true)
or (returns True if one of the statements is true)
not (reverses the result, returns False if the result is true)
Membership operators test if a sequence is present in an object.
in (returns True if a sequence is present in the object)
not in (returns True if a sequence is not present in the object)
Control Flow:
If statements execute a block of code if a condition is true.
if condition: (body of code)
else: (body of code) – executes if the initial if condition is false
elif condition: (body of code) – checks an additional condition if the initial if condition is false
Nested if statements can be used for more complex logic.
For loops iterate over a sequence (list, tuple, string, etc.).
for variable in sequence: (body of code)
Nested for loops can be used to iterate over multiple sequences.
While loops execute a block of code as long as a condition is true.
while condition: (body of code)
break statement: stops the loop even if the while condition is true
continue statement: rejects all the remaining statements in the current iteration of the loop
else statement: runs a block of code when the condition is no longer true
Functions:
Are blocks of code that run when called.
Defined using the def keyword.
Arguments can be passed to functions.
Arbitrary arguments allow an unspecified number of arguments to be passed.
Keyword arguments allow arguments to be passed with a key-value assignment.
Arbitrary keyword arguments allow an unspecified number of keyword arguments to be passed.
Web Scraping:
Involves extracting data from websites using libraries like Beautiful Soup and requests.
The requests library is used to send HTTP requests to a website.
Beautiful Soup is used to parse HTML content.
find() and find_all() methods are used to locate specific HTML elements.
Python Variable Assignment: A Comprehensive Guide
Variable assignment in Python involves using variables as containers for storing data values. You can assign a value to a variable using the equal sign (=). For example, x = 22 assigns the value 22 to the variable x. You can display the value of a variable using the print() function, such as print(x).
Key aspects of variable assignment:
Data Type Assignment: Python automatically assigns a data type to a variable based on the assigned value. For example, assigning 22 to x makes it an integer.
Overwriting: Variables can be overwritten with new values.
y = “mint chocolate chip”
print(y) # Output: mint chocolate chip
y = “chocolate”
print(y) # Output: chocolate
Case Sensitivity: Variables are case-sensitive. Y and y are treated as different variables.
Y = “mint chocolate chip”
y = “chocolate”
print(Y) # Output: mint chocolate chip
print(y) # Output: chocolate
Multiple Assignments: Multiple values can be assigned to multiple variables. For example:
x, y, z = “chocolate”, “vanilla”, “rocky road”
print(x) # Output: chocolate
print(y) # Output: vanilla
print(z) # Output: rocky road
One Value to Multiple Variables: Multiple variables can be assigned the same value. For example:
x = y = z = “root beer float”
print(x) # Output: root beer float
print(y) # Output: root beer float
print(z) # Output: root beer float
Combining Variables in Print: Variables can be combined within a print statement using the + operator for strings or commas to combine different data types. However, it is important to note that you can only concatenate a string with another string, not with an integer, unless you are separating the values by a comma in the print statement.
x = “ice cream”
y = “is”
z = “my favorite”
print(x + ” ” + y + ” ” + z) # Output: ice cream is my favorite
x = 1
y = 2
z = 3
print(x, y, z) # Output: 1 2 3
It is allowable to assign lists, dictionaries, tuples, and sets to variables as well.
Python Data Types: Numeric, Boolean, Sequence, Set, and Dictionary
Data types are classifications of the data that is stored; they inform what operations can be performed on the data. The main data types within Python include numeric, sequence type, set, Boolean, and dictionary.
Numeric data types include integers, float, and complex numbers.
An integer is a whole number, whether positive or negative.
A float is a decimal number.
A complex number is used for imaginary numbers, with j as the imaginary number.
Boolean data types only have two built-in values: either true or false.
Sequence type data types include strings, lists, and tuples.
Strings are arrays of bytes representing Unicode characters. Strings can be in single quotes, double quotes, or triple quotes. Triple quotes are called multi-line. Strings can be indexed, with the index starting at zero.
Lists store multiple values and are changeable. A list is indexed just like a string. A bracket means that it will be a list. Lists can have any data type within them. The comma in a list denotes that the values are separate. Lists can be nested.
Tuples are quite similar to lists but the biggest difference is that a tuple is immutable, meaning that it cannot be modified or changed after it is created. Typically, tuples are used when data is never going to change.
A set is similar to a list and a tuple, but does not have any duplicate elements. The values within a set cannot be accessed using an index, because it does not have one.
A dictionary is different than the other data types because it has a key value pair.
Web Scraping with Beautiful Soup and Requests
Web scraping involves extracting data from websites using libraries like Beautiful Soup and requests.
Key points regarding web scraping:
Libraries:
Requests: Used to send HTTP requests to a website to retrieve its HTML content. The requests.get() function sends a GET request to the specified URL and returns a response object. A response of 200 indicates a successful request.
Beautiful Soup: Used to parse HTML content, making it easy to navigate and search for specific elements.
Send a GET request to the URL and retrieve the page content:
page = requests.get(URL)
Create a Beautiful Soup object to parse the HTML content:
soup = Beautiful Soup(page.text, ‘html.parser’)
HTML Structure:
HTML (Hypertext Markup Language) is used to describe the structure of web pages.
HTML consists of elements defined by tags (e.g., <html>, <head>, <body>, <p>, <a>).
Tags can have attributes, such as class, id, and href.
Inspecting web pages using browser developer tools helps identify the relevant HTML elements for scraping.
Finding Elements:
find(): Locates the first occurrence of a specific HTML element.
find_all(): Locates all occurrences of a specific HTML element and returns them as a list.
Elements can be filtered by tag name, class, id, or other attributes.
soup.find_all(‘div’, class_=’container’)
Extracting Data:
.text: Extracts the text content from an HTML element.
.strip(): Removes leading and trailing whitespace from a string.
Workflow:
Import libraries: Import Beautiful Soup and requests.
Get the HTML: Use requests to fetch the HTML content from the URL.
Parse the HTML: Create a Beautiful Soup object to parse the HTML.
Find elements: Use find() or find_all() to locate the desired elements.
Extract data: Use .text to extract the text content from the elements.
Organize data: Store the extracted data in a structured format, such as a list or a Pandas DataFrame.
Pandas DataFrames and Exporting to CSV:
The extracted data can be organized into a Pandas DataFrame for further analysis and manipulation.
The DataFrame can be exported to a CSV file using df.to_csv(). To prevent the index from being included in the CSV, use index=False.
Learn Python in Under 3 Hours | Variables, For Loops, Web Scraping + Full Project
The Original Text
what’s going on everybody welcome back to another video today we’re going to be learning the basics of python in under 3 [Music] hours python is a fantastic skill to know how to do but I remember when I was first learning python it was a little bit intimidating it a little bit more difficult than I was used to when I had just known Excel and SQL and python seemed really really difficult but I’ve been using python for over 7even years now is a fantastic skill to know how to use so in this really long lesson we’re going to be walking through every you need to know in order to get started in Python I’ll be walking you through how to set up your environment to make sure that you can actually run your code and then we’ll be walking through all of the basics all the variables and for loops and while loops and even web scraping and we’ll even have a full project in this as well so we have a ton of things to cover and I hope it is really helpful without further Ado let’s jump onto my screen and get started all right so let’s get started by downloading anaconda anaconda is an open source distribution of python and our products So within Anaconda is our jupyter notebooks as well as a lot of other things but we’re going to be using it for our Jupiter notebooks so let’s go right down here and if I hit download it’s going to download for me because I’m on Windows but if you want additional installers if you’re running on Mac or Linux then you can get those all right here now if you are running on Windows just make sure to check your system to see if it’s a 32-bit or a 64 you can go into your about and your system settings to find that information I’m going to click on this 64-bit it’s going to pop up on my screen right here and I’m going to click save now it’s going to start downloading it it says it could take a little while but honestly it’s going to take probably about two to three minutes and then we’ll get going now that it’s done I’m just going to click on it and it’s going to pull up this window right here we are just going to click next because we want to install it this is our license agreement you can read through this if you would like I will not I’m just going to click I agree now we can select our installation type and you can either select it for just me or if you have multiple admin or users on one laptop you can do that as as well for me it’s just me so I’m going to use this one as it recommends now it’s going to show you where it’s installing it on your computer this is the actual file path it’s going to take about 3.5 gigs of space I have plenty of space but make sure you have enough space and then once you do you can come right over here to next and now we can do some Advanced options we can add Anaconda 3 to my path environment variable and when you’re using python you typically have a default path with whatever python IG or notebook that you’re using I use a lot of Visual Studio code so if I do this I’m worried it might mess something up so I am not going to do this it also says it doesn’t recommend it again messing with these paths is kind of something that you might want to do once you know more about python so I don’t really recommend you having this checked we can also register in AA 3 as my default python 3.9 you can do this one and I’m going to keep it this way just so I have the exact same settings as you do so let’s go ahead and click install install and now it is going to actually install this on your computer now once that’s complete we can hit next and now we’re going to hit next again and finally we’re going to hit finish but if you want to you can have this tutorial and this getting started with anonda I don’t want either of them because I don’t need them but if you would like to have those keep those checked and you can get those let’s click finish now let’s go down and we’re going to search for Anaconda and it’ll say Anaconda navigate and we’re going to click on that and it should open up for us so this is what you should be seeing on your screen this is the Anaconda Navigator and this is where that distribution of python and R is going to be so we have a lot of different options in here and some of them may look familiar we have things like Visual Studio code spider our studio and then right up here we have our Jupiter notebooks and this is what we’re going to be using throughout our tutorials so let’s go ahead and click on launch and this is what should kind of pop up on your screen now I’ve been using this a lot um so I have a ton of notebooks and files in here but if you are just now seeing this it might be completely blank or just have some you know default folders in here but this is where we’re going to open up a new jupyter notebook where we can write code and all the things that we’re going to be learning in future tutorials and you can use this area to save things and create folders and organize everything if you already have some notebooks from previous projects or something you can upload them here but what we’re going to do is go right to this new we’re going to click on the drop down and we’re going to open up a Python 3 kernel and so we’re going to open this up right here now right here is where we’re going to be spending 99% of our time in future videos this is where we’re going to write all of our code so right here is a cell and this is where we can type things so I can say print I can do the famous hello world and then I’ll run that by clicking shift enter and this is where all of our code is going to go these are called cells so each one of these are a cell and we have a ton of stuff up here and I’m going to get to that in just a second one thing I want to show you is that you don’t only have to write code here you can also do something called markdown and so markdown is its own kind of you could say language but um it’s just a different way of writing especially within a notebook so all we’re going to do is do this little hashtag and actually I think it’s a pound sign but I’m going to call it hashtag we’re going to do that and we’re going to say first notebook and then if I run that we have our first notebook and we can make little comments and little notes like that that don’t actually run any code they just kind of organize things for us and I’m going to do that in a lot of our future videos so just wanted to show you how to do that now let’s look right up here a lot of these things are pretty important uh one of the first things that’s really important is actually saving this so let’s say we wanted to change the title to I’m going to do a AA because I want it to be at the beginning um so I can show you this I’m do AAA new notebook and I’m going to rename it and then I’m going to save that so if I go right back over here you can see AAA new notebook that green means that it’s currently running and when I say running I mean right up here and if we wanted to we go ahead and shut that down which means it wouldn’t run the code anymore and then we’d have to run up a new cluster uh so let’s go ahead and do that I didn’t plan on doing that but let’s do it so we have no notebooks running and right here it says we have a dead kernel so this was our Python 3 kernel and now since I stopped it it’s no longer processing anything so let’s go ahead and say try restarting now and it says kernel is ready so it’s back up and running and we’re good to go the next thing is this button right here now this is an insert cell below so if I have a lot of code I know I’m going to be writing I can click a lot of that and I often do that because I just don’t like having to do that all the time so I make a bunch of cells just so I can use them you can also delete cells so say we have some code here we’ll say here and we have code here and then we have this empty cell right here we can just get rid of that by doing this cut selected cells we can also copy selected cells so if I hit copy selected cells then I can go right here and say paste selected cells and as you can see it pasted that exact same cell you can also move this up and down so I can actually take this one and say I wanted it in this location I can take this cell and move it up or I can move it down and that’s just an easy way to kind of organize it in instead of having to like copy this and moving it right down here and pasting it you can just take this cell and move it up which is really nice now earlier when I ran this code right here I hit shift enter you can also run and it’ll run the cell below so you can hit run and it works properly if you’re running a script and it’s taking forever and it’s not working properly at least it’s you don’t think it’s working properly you can stop that by doing this interrupt the kernel right here and anything you’re trying to do within this kernel if it’s just not working properly it’ll stop it you can restart it then you can try fixing your code you can also hit this button if you want to restart your kernel and this button if you want to restart the kernel and then rerun the entire notebook as we talked about just a second ago we have our code and our markdown code we’re not going to talk about either of these because we’re not going to use that throughout the entire series the next thing I want to show you is right up here if you open this file we can create a new notebook we can open an existing notebook we can copy it save it rename it all that good stuff we can also edit it so a lot of these things that we were talking about you can cut the cells and copy the cells using these shortcuts if you would like to we also go to view and you can toggle a lot of these things if you would like to which just means it’ll show it or not show it depending on what you want so if we toggle this toolbar it’ll take away the toolbar for us or if we go back and we toggle the toolbar we can bring it back we can also insert a few different things like inserting a cell above or a cell below so instead of saying This plus button you can just say A or B adding above or below we also have the cell in which we can run our cells or run all of them or all above or all below and then we have our kernels right here which we were talking about earlier where we can interrupt it and restart those there are widgets we’re not going to be looking at any widgets in this series but if it’s something you’re interested in you can definitely do that then we have help so if you are looking for some help on any of these things especially some of these references which are really nice you can use those and you can also edit your own keyboard shortcuts and now that we walked through all of that you now have anacon and jupyter notebooks installed on your computer in future videos this is where we’re going to be writing all of our python code so be sure to check those out so we can learn python together hello everybody today we’re going to be learning about variables in Python a variable is basically just a container for storing data values so you’ll take a value like a number or a string you can assign it to a variable and then the variable will carry and contain whatever you put into it so for example let’s go right over here we’re going to say x and this is going to be our variable we’re going to say is equal to now we can assign the value to it so let’s say I want to put 22 x is now equal to 22 so we won’t have to write out the number 22 in later scripts that we write we can just say x because X is equal to 22 it now contains that number so now we can hit enter and say print we do an open parentheses and we’ll say x now I’m going to hit shift enter and now it prints out that 22 because we are printing x and x is equal to 22 this is our value and this is our variable one really great thing about variables is that it assigns its own data type it’s going to automatically do this so we didn’t have to go and tell X that it’s an integer it just automatically knew that 22 is a number so we can check that by saying type and then open parenthesis and writing X and we’ll do shift enter again and this says that X is an integer type now we only assigned a integer to X let’s try assigning a string value or some text to a variable so we’ll say Y is equal to uh let’s say mint chocolate chip I’m feeling some ice cream today so we’ll say mint chocolate chip now if we print that again we’ll do print open parentheses Y and do shift enter it’ll print mint chocolate chip and if we look at the type we can see that the type is a string this time and not an integer now again we did not tell it that X was an integer and Y was a string it just automatically knew this let’s go up here really quickly we’re going to add several rows in here because we’re about to write a lot of different variables and really learn in- depth how to use variables the next thing to know about variables is that you can overwrite previous variables right now we have mint chocolate chip and that is assigned to the variable y so if I go down here I say print y I hit shift enter it’s going to print out mint chocolate chip but if I go right above it I say Y is equal to and let’s say chocolate if I print that out it’s now going to say chocolate whereas up here I’m reassigning it to Y it’s still going to say mint chocolate chip so if I come right down here and I copy this and I’m going to paste this right here initially it is going to assign y to Chocolate but then right here it will automat Ally overwrite y as mint chocolate chip and when we hit shift enter it’s going to show mint chocolate chip variables are also case sensitive so if I come up here and I say a capital Y this is a lowercase Y and this is a capital Y it is going to print out the correct one instead of mint chocolate chip and then if I go down here to the print and I type the capital Y it will give us the mint chocolate chip up till now we’ve only assigned one value to one variable but but we can actually assign multiple values to multiple variables so let’s do X comma y comma Z is equal to and now we can assign multiple values to all of those so we can say chocolate and then we’ll do a comma oops a comma then we can say vanilla and then we’ll do another comma and we’ll say rocky road now now this is going to assign chocolate to X vanilla to Y and Rocky Road to Z so what we can do is we’ll say print and we’ll go print print print and we’ll say X Y and Z so it prints out chocolate vanilla and rocky road and these are our three different values we can also assign multiple variables to one value and we can do this by saying X is equal to Y is equal to Z is equal to and we can put whatever we would like let’s do root beer float then we’ll come back up here we’ll copy this and let’s print off our X our Y and Z and they are all the exact same now so far we’ve really only looked at integers and strings but you can assign things like lists dictionaries tupal and sets all to variables as well so let’s go right down here so let’s create our very first list I’m going to say ice _ cream is equal to and that is our variable right there the ice uncore cream is our variable so now we’re going to do an Open Bracket like this and we’re going to come up here and copy all of these values and we’re going to stick it within our list so now within ice cream we have three string values chocolate vanilla and rocky road all within this list so what we can do is we can say x comma y comma Z is equal to to ice cream so now these three values chocolate vanilla and rocky road will be assigned to these three variables X Y and Z and we can copy this print up here and we’ll hit shift enter and now the X Y and Z all were assigned these values of chocolate vanilla and rocky road now something that we just did which is really important or something that you really need to consider is how you name your variables so right here we have ice cream now this to me is exactly how I usually write my variables but there are many different ways that you can write your variables so let’s take a look at that really quickly and let’s add just a few more because I have a feeling we’re going to go a little bit longer than what we have so there are a few best practices for naming variables first I’m going to show you kind of what a lot of people will do I’ll show you some good practices and I’m going to show you some bad practices as well that you should avoid doing the first thing that we’re going to look at is something called camel case and let’s say we want to name it test variable case oops case now if we have a test variable case the camel case is going to look like this we’ll have lowercase test and then we’ll have uppercase variable and uppercase case is equal to this is what this variable is going to look like and we can assign it a nilla swirl and this is what your camel case will look like it’s going to be lowercase and then all the rest of those uh compound words or however you want to say that these letters are going to be capitalized to kind of separate where the words end and begin let’s go right down here we’re going to copy this the next one is called Pascal case so Pascal case is going to look just a little bit different instead of the lower case at test it’s going to be a capital T in test so test variable case again this is a very similar way of writing it very similar to camel case but just a capital at the beginning now let’s look at the last one and this one is my personal favorite this one is going to be the snake case now this one is quite a bit different in the fact that you don’t use any capital letters and you separate everything using underscore so we’re going to write testore variable uncore case now typically let me have them all in there typically these are the best practices these are what you typically want to do but probably the best one to use is this snake case right here what a lot of people say is that it improves readability if you take a look at either the camel case or the Pascal case which you will see people do it’s not as easy to distinguish exactly what it says and the name of a variable is important because you can gain information from it if people name them appropriately so when I’m naming variables I usually write it in snake case because I just find it a lot easier to read because each word is broken up by this underscore so now let’s look at some good variable names these are all ones that you can use or could use so let’s do something like test VAR so test VAR is completely appropriate we can also do something like testore VAR oops underscore we could do underscore testore VAR you’ll see that often as well well people will start it with an underscore you can do test VAR capital T oops capital T capital V in test VAR or you could even do something like test VAR two now adding a number to your variable is not inherently a Bad Thing usually it’s semif frowned upon but there are definitely some use cases where you can use it but one thing that you cannot do is do something like putting the two at the front if you put the two at the front it no longer works it won’t run properly at all so we’re going to take that out so we can’t do that so I’m going to use this as an example of what you should not do you also can’t use a dash so something like test- var2 that doesn’t work either and you also can’t use something like a space or a comma or really any kind of symbol like a period or a backslash or equal sign none of those things will work work within your variable now another thing that you can do within your variable is use the plus sign so let’s assign this we’ll say x is equal to and we’ll do a string we’ll say ice cream is my favorite and then we’ll do a plus sign and we’ll say period now what this will do is it will literally add these two strings together so let’s do print and we’ll do X so now it says ice cream is my favorite one thing that we cannot do in a variable is we cannot add a string and a number or an integer so we can’t do ice cream as my favorite two if we try to do that it will give us this error right here so in this error it’s saying you can only concatenate a string not an integer to a string so only a string plus a string for this example you can also do and we’ll say x is equal to or we’ll say y we’ll say Y is equal to 3 + 2 and it should output 5 because you can also do an integer and an integer now so far we’ve only been outputting one variable in the print statement but you can actually add multiple variables within a print statement so let’s go right down here we’re going to say let’s give it some more right there so we’ll say x is equal to ice cream and we’ll say Y is equal to is and then the last one Z is equal to my favorite and we’ll do a period at the end now we can go to the bottom and we can say print x + y + C and when we enter that and when we run and when we run that we get ice cream is my favorite now we can actually add a space before is a space before my and when we hit shift enter it says ice cream is my favorite you can also do this exact same thing with numbers as well so we’ll say x = to 1 2 and what Z is equal to 3 so this should equal six now one thing that we tried to do was assign to one variable a string plus an integer and that did not work but what you can do is you can take something like this and you can say ice cream and we’ll get rid of this one and we’ll get rid of the Z Now say Plus is actually not going to work let’s try running this so again we can’t concatenate these but what we can do in the print statement is we can separate it by a comma so when we add this comma it should work properly let’s hit enter and it says ice cream 2 again this makes no sense but you are able to combine a string and an integer separating by a comma now this is the meat and potatoes of variables there are some other things as well but some of those things are a little bit more advanced and not something I wanted to cover in this tutorial although we may be looking at some of those things in future tutorials but this is definitely the basics what you really really need to know about variables hello everybody today we’re going to be talking about data types in Python data types are the classification of the data that you are storing these classifications tell you what operations can be performed on your data we’re going to be looking at the main data types within python including numeric sequence type set Boolean and dictionary so let’s get started actually writing some of this out and first let’s look at numeric there are three different types of numeric data types we have integers float and complex numbers let’s take a look at integers an integer is basically just a whole number whether it’s positive or negative so an integer could be a 12 and we can check that by saying type we’ll do an open parenthesis and a Clos parenthesis and if we say the type of 12 it’s going to give us an integer or if we say a -2 that is also an integer we can also perform basic calculations like -2 + 100 and that’ll tell us it is also an integer so whether it’s just a static value or you’re performing an operation on it it’s still going to be that data type if those numbers are whole numbers whether negative or positive now let’s take this exact one and let’s say 12 and we’ll do plus 10.25 when we run this it’s no longer going to be a whole number it’ll now be a float so let’s check this now this is a float type because is no longer a whole number it’s now a decimal number and the last data type within the numeric data type is called complex let’s copy this right down here now personally this is not one that I’ve used almost ever but it is one just worth noting so you can do 12 plus and let’s say 3 J and if we do this it’s going to give us a complex the complex data type is used for imaginary numbers for me it’s not often used but if you do use it J is used as that imaginary number if you use something like C or any other number it’s going to give you an error J is the only one that will work with it now let’s take a look at Boolean values so we’ll say Boolean the Boolean data type only has two built-in values either true or false so let’s go right down here and say type true and when we run this it’ll say bu which stands for Boolean we can do the exact same thing with false that is also Boolean and this can be used with something like a comparison operator so let’s say 1 is greater than 5 and let’s check this this is giving us a Boolean because it’s telling us whether one is greater than five let’s bring that right down here this will give us a false so it’s telling us that one is not greater than five and just as we got a false we can say 1 is equal to 1 and this should give us a true so now let’s take a look at our sequence type data types and that includes strings lists and tupal we let’s start off by looking at string strings in Python strings are arrays of byes representing Unicode characters when you’re using strings you put them either in a single quote a double quote or a triple quote I call them apostrophes it’s just what I was raised to call them but most people who use Python call them quotes So Right Here we have a single quote and that works well we can do a double quote and that works also and as you can see they are the exact same output and then we have a triple quote just like this and this is called a multi-line so we can write on multiple lines here so let’s write a nice little poem so we’ll say the ice cream vanquished my longing for sweets upon this diet I look away it no longer exists on this day and then if we run that it’s going to look a little bit weird it’s basically giving us the raw text which is completely fine but let’s let’s call this a multi-line and we’re going to call this a variable multi-line and we’re going to come down here and say print and before I run this I have to make sure that this is Ran So now let’s print out our multi-line and now we have our nice little poem right down here now something to know about these single and double quotes is how they’re actually used so if we use a single quote and we say I’ve always wanted to eat a gallon of ice cream and then we do an apostrophe at the end obviously something went wrong here what went wrong is when you use a single quote and then within your text within your sentence you have another apostrophe it’s going to give you an error so what we want to do is whenever we have a quote within it we need to use a double quote these double quotes will negate any single quotes that you have within your statement they won’t however negate another double quote so you need to make sure you aren’t using double quotes within your sentence if you want to do something like that you need to use the triple quotes like we did above so we can do double double and then let’s paste this within it and anything you do Within These triple quotes will be completely fine as long as you don’t do triple quotes within your triple quotes we’ll say this is wrong so even though it’s between these two triple quotes it doesn’t work exactly again you just have to understand how that works you have to use the proper apostrophes or quotes within your string and just to check this we can always say here’s our multi-line we can always say type of multi-line and that is still a string one really important thing to know about strings is that they can be indexed indexing means that you can search within it and that index starts at zero so let’s go ahead and create a variable and we’ll just say a is equal to and let’s let’s do the all poopular hello world let’s run this and now when we print the string we can say a and we’re going to do a bracket and now we can search throughout our string using the index so all you have to do is do a colon we going say five what this is going to do is is going to say zero position zero all the way up to five which should give us the whole hello I believe let’s run this and it’s giving us the first five positions of this string we can also get rid of the colon and just say something like five and then when we run this it’s actually going to give us position five so this is 0 1 2 3 4 and then five is the space let’s do six so we can see the actual letter and that is our w we can also use a negative when we’re indexing through our string so we could say -3 and it’ll give us the L because it’s -1 2 and three we can also specify a range if we don’t want to use the default of Z so before we did 0 to 5 and it started at zero because that was our default but we could also do 2 to 5 let’s run this and now we go position 0 1 and then we start at 2 L L O now we can also multiply strings and we have this a hello world so we can do a time three and if we run this it’ll give us hello world three times and we can also do A+ a and that is hello world hello world now let’s go down here and take a look at lists lists are really fantastic because they store multiple values the string was stored as one value multiple characters but a list can store multiple separate values so let’s create our very first list we’ll say list really quickly and then we’ll put a bracket and a bracket means this is going to be a list there are other ones like a squiggly bracket and a parentheses these denote that they are different types of data types the bracket is what makes a list a list so to keep it super simple we’ll say 1 2 3 and we’ll run this and now we have a list that has three separate values in it the comma in our list denotes that they are separate values and a list is indexed just like a string is indexed so position zero is this one position one is the two and position two is the three now when we made this list we didn’t have to use any quotes because these are numbers but if we wanted to create a list and we wanted to add string values we have to do it with our quotes so we’ll say quote cookie dough then we’ll do a comma to separate the value and then we’ll say strawberry and then we’ll do one more and this will just be chocolate and when we run this we have all three of these values stored in our list now one of the best things about list is you can have any data type within them they don’t just have to be numbers or strings you can basically put anything you want in there so let’s create a new list and let’s say vanilla and then we’ll do three and then we’ll add a list within a list and we’ll say Scoops comma spoon and then we’ll get out of that list and then we’ll add another value of true for Boolean and now we can hit shift enter and we just created a list with several different data types within one list now let’s take this one list right here with all of our different ice cream flavors we’ll say icore cream is equal to this list now one thing that’s really great about lists is that they are changeable that means we can change the data in here we can also add and remove items from the list after we’ve already created it so let’s go and take ice cream and we’ll say ice cream. append and this is going to append it to the very end of the list we do an open parenthesis let’s say salted caramel now when we run this and we call it just like this it’s going to take this list add salted caramel to the end and we’ll print it off and as you can see it was added to the list and just like I said before let me go down here we can also change things from this list so let’s say ice cream and then we need to look at the indexed position so we’re going to say zero and that’s going to be this cookie dough right here we can say that is equal to so we can now change that value so let’s call that butter peon and now when we call it we can now see that the cookie dough was changed to butter peon another thing that you saw just a little bit ago is something called a list within a list basically a nested list so we had Scoops spoon true let’s give this and we’ll say nested uncore list is equal to now when we run this we now have this nested list so if we look at the index and we say 0 we’ll get vanilla if we say two we’ll get scoops and spoons now since we have a list within a list we can also look at the index of that nested list so let’s now say one and that should give us just spoon and you can go on and on and on with this you can do lists within lists within lists and all of them will have indexing that you can call now let’s go down here and start taking a look at tupal so a list and a tupal are actually quite similar but the biggest difference between a list in a tupal is that a tupal is something called immutable it means it cannot be modified or changed after it’s created let’s go right up here we’re going to say Tuple and let’s write our very first tupal so we’ll say Tuple undor Scoops is equal to and then we’ll do an open parenthesis now these open parentheses you’ve seen if you do like a print statement but that’s different because that’s executing a function this is actually creating a tupal which is going to store data for us so we’ll say one 2 three two and one let’s go ahead and create that tupal and we can just check the data type really quickly and it’s a tupal and just like we saw before a tupal is also indexed so if we go at the very first position which is a one we will get the output of a one but we can’t do something like aend and then add a value like three if we do that it’s going to say tupal object has no attribute aend it’s just because you cannot change or add anything to a tupal just like we were talking about before typically people will use tupal for when data is never going to change an example for this might be something like a city name a country a location something that won’t change they definitely have their use cases but I don’t think they’re as popular as just using a list so now let’s scroll down and start taking a look at sets but really quickly let me add a few more cells for us and let’s say sets now a set is somewhat similar to a list and a tupal but they are little bit different in the fact that they don’t have any duplicate elements another big difference is that the values within a set cannot be accessed using an index because it doesn’t have an index because it’s actually unordered we can still Loop through the items in a set with something like a for Loop but we can’t access it using the bracket and then accessing its index point so let’s go ahead and create our very first first set so we’re going to say daily pints then we’re going to say equal to and to create a set we’re going to use these squiggly brackets I don’t know if there’s an actual name for those if I’m being honest I call them squiggly brackets and that’s what we’re going to go with we’re going to put in a one a two and a three so let’s go ahead and run this and let’s look at the type and as you can see it is a set now when we print this out it’s going to show us one a two and a three and those are all the values Within set but if we copy this and we’ll say daily pant log this is going to be every single day maybe I had different values now when we run this and we do the exact same thing now when we print this it’s going to have just the unique values within that set now a use case for set and this is something that I’ve done in the past is comparing two separate sets maybe you have a list or a tupal and you convert that into a set and that will narrow it down down to its unique values then you can compare the unique values of one set to the unique values in another set and then we can see what’s the same and what’s different so let’s go down here and let’s say wife’s uncore daily and we’ll just copy this right here we’ll say is equal to let’s do our squiggly lines let’s do one two let’s do just random numbers so now this is my daily log and this is my wife’s daily log and now we can compare these values so let’s go right down here let’s say print we’ll do my daily logs and then we’ll do this bar right here and this is going to show us the combined unique values it’s basically like putting them all in one set and then trimming it down to just the unique values so we’ll take wife’s daily pintes log and when we run this we actually need to run this first when we run this we should see all the unique values between these two sets and so as you can see 0 1 2 3 4 5 6 7 24 31 so these are all the unique values between these two sets we can also do another one and instead of this bar we’re going to do this symbol right here which I believe is called an Amper sand don’t quote me on that but when we run this it’s going to show what matches that means which ones show up in both sets so the only ones that show up in both sets are 1 2 3 and five we can also do the opposite of that by doing a minus sign and this is going to show us what doesn’t match and so we have 4 6 and 31 now where is our 24 that was in our wife’s daily pints log it’s in this one but we’re subtracting the values on this one so let’s reverse this and we’ll say daily pints log and let’s run it now those are our other values so we’re taking the values of this and then we’re subtracting all the ones that are the same and getting the remaining values and then for our last one we can get rid of this and we’ll do this symbol right here and this is going to show if a value is either in one or the other but not in both so let’s run this so these values are completely unique only to each of those sets now the very last one that we’re going to look at in this video is dictionaries so let’s go right down here let’s add a few cells and let’s say dictionaries now I saved dictionary for last because this one is probably the most different out of all the previous data types that we’ve looked at within a data type we have something called a key value pair that means when we use a dictionary it’s not like a list where you just have a value comma value comma value we have a key that indicates what that value is attributed to so let’s write out a dictionary to see how this looks so we’re going to say dictionary cream and just like a set we use a squiggly line but the thing that differentiates it is that in a dictionary we’ll have that key value pair whereas in a set each value is just separated by a comma so let’s write name and this is our key and then we do a colon and this is then where we input our value so we’re going to say Alex freeberg and then we separate that key value Pair by a comma and now we can do another key value pair so we’ll say weekly intake and and a colon and we’ll say five pints of ice cream do a comma and then we’ll do favorite ice creams and now what we’re going to do is we’re going to put in here a list so within this dictionary we can also add a list we’ll do MCC from mint chocolate chip and then we’ll add chocolate another one of my favorites so now we have our very first dictionary let’s copy this and run it and let’s just look at the type and as as you can see it says that this is a dictionary let’s also print it out now if we want to we can take our dictionary cream and say dot values with an open parenthesis and when we execute this we’ll see all of the values within this dictionary so here’s our values of Alex freeberg five mint chocolate chip and chocolate we can also say keys and when we run this all of the keys the name weekly intake and favorite ice creams and we can also say items so this key value pair is one item and this key value pair is another item now one difference between something like a list and a dictionary is how you call the index but you can’t call it by doing something like this where you just do a bracket oops and say zero so this would in theory take this very first one right our very first key value pair that’s going to give us an error how you call a dictionary is actually by the key so it doesn’t technically have an index but you can specify what you want to call and take it out so we’re going to say name and this is going to call that key right here and when we run this we’ll get the value which is Alex freeberg one other thing that you can do is you can also update information in a dictionary which we can’t with some other data types so for this for the name it was Alex freeberg now let’s say Steen freeberg and when we update that I’m also going to print the dictionary get rid of this so it’s going to update Christine freeberg in that value of the name so let’s go ahead and run this and now it changed the name from Alex freeberg to Christine freeberg we can also update all of these values at one time so let’s copy this and I’m going to put it right down here I’m going to say dictionary.c cream. update then we’re going to put a bracket or not a rocket but a parenthesis around these so now what we’re going to do is update this entire thing let me take this say print this dictionary now we can update this to anything we want so instead of here I can say I’ll say weight and because of all that ice cream I now weigh 300 lb so let’s run this and as you can see it did not delete our key value pair right here instead it just added to it when you’re using the update we can’t actually delete that’s the delete statement and I’ll show you that in just a second but all we did was added this new value it also is going to check and see if you changed anything with your key value pair so we can go in here and change this value and we’ll say 10 so now when we run this the value of this key value pair was changed but let’s say we do want to delete it we’ll say deel that stands for delete part of this dictionary cream and now let’s specify the key which will also delete the value with it but let’s specify the key that we want to get rid of and let’s say wait and then let’s print that again and as you can see the weight was deleted from that dictionary so hello everybody today we’re going to be taking a look at comparison logical and membership operators in Python operators are used to perform operations on variables and values for example you’re often going to want to compare two separate values to see if they are the same or if they’re different within Python and that’s where the comparison operator comes in right here you can see our operators you can also see what they do so this equal sign equal sign stands for equal we have the does not equal the greater than less than greater than or equal to and less than or equal to and honestly I use these almost every single time I use Python so these are very important to know and know how to use so let’s get rid of that really quickly and actually start writing it out and see how these comparison operators work in Python the very first one that we’re going to look at is equal to now you can’t just say 10 is equal to 10 let’s try running that really quickly by clicking shift enter it’s going to say cannot assign to literal that’s because this is like assigning a variable we’re trying to say 10 is equal to 10 and then we can call that 10 later but that’s not how this actually works what we’re trying to do is to determine whether 10 is equal to 10 so we’re going to say equal sign equal sign and then if we run that by clicking shift enter again it’s going to say true now if we put something else like 50 in there and we try to run this it’s going to say false so really what you’re going to get when you use these comparison operators is either a true or a false if we take this right down here we can also say does not equal and we’re going to use an exclamation point equal sign and that says 10 is not equal to 50 and that should be true you can also compare strings and variables so let’s go right down here and we’re going to say vanilla is not equal to chocolate and when we run this it’ll say false now if it was the same just just like when we did our numbers it should say true and we can also compare variables so we’ll say x is equal to vanilla and Y is equal to chocolate and then when we come down here we can say x is equal to Y and it’ll give us a false and we say X is not equal to Y and it’ll give us a true the next one that we’re going to take a look at is the less than so let’s copy this one right up here let’s scroll down and let’s say 10 is less than 50 now this will come out as true now let’s say we put a 10 in here before 10 was of course less than 50 but is 10 less than 10 no that’s false because they are the same so if we want an output that is true all we would have to add is an equal sign right here and this would say 10 is less than or it is equal to 10 and now it’s true of course we can say the exact same thing by saying greater than so 10 is equal or greater than 10 that’ll be true because 10 is equal to 10 we can also say 50 is greater or equal to 10 because 50 is obviously greater than 10 now let’s look at logical operators that are often combined with comparison operators so our operators are and or and not so if you have an and that returns true if both statements are true if it’s or only one of the statements has to be true and the not basically reverses the result so if it was going to return true it would return turn false I don’t use this not one a lot but I will show you how it works so let’s actually test that out so before we were saying 10 is greater than 50 and of course this returned false so now let’s add a parentheses around this 10 is greater than 50 and we’re going to say and we’ll do an open parenthesis 50 is greater than 10 now this statement right here is true 50 is greater than 10 so we have a true statement and a false statement but this and is going to look at both of them it’s going to say they both need to be true in order to return a true so let’s try running this and we still have a false if we want it to return true we’re going to have to change this to make it a true statement so 70 is greater than 50 and 50 is greater than 10 when we run this it should return true now let’s look at the or so let’s copy this and we’ll say 10 is greater than 50 or 50 is greater than 10 now this is a false statement and this is a true statement so if even one of them is a true statement the output should be true and again we can do this even with strings so we can do vanilla and chocolate there we go and vanilla is actually greater than chocolate because V is a higher number in the alphabetical order so V is like 20 something whereas chocolate is three right so it actually looks at the spelling for this so if we say or here it will come out true and if we say and here it should also be true because V is greater than C and 50 is greater than 10 so this should also be true now let’s copy this right here and we’re going to say not so what we had before is 50 is greater than 10 that returned true but now all we’re doing is putting not in front of it so instead of returning true it’s going to return false so now let’s take a look at membership operators and we use this to check if something whether it’s a value or a string or something like that is within another value or string or sequence our operators are in and not in so it’s pretty simple if it’s in it’s going to return true if the sequence with a specified value is present in the object just like we were talking about and for not in it’s basically the exact same thing if it’s not in that object so let’s start out by taking a look at a string we’re going to say icore cream is equal to I love chocolate ice cream and then we’re going to say love in icore cream and that will will turn true so all we’re doing is searching if the word love or that string is in this larger string we could also just do that by literally copying this and putting this where this is so we can check is this string part of this string and it’ll say true we can also make a list so we’ll say Scoops is equal to and then we’ll do a bracket and we’ll say 1 2 3 4 5 and then we’ll say two in Scoops so all we’re doing is searching to see if two is within this list and that should return true now if we put a six here and we said not in it will also return true because six is not in scoops and that is true and just like we did we could also say wanted underscore Scoops and we’ll say eight so I wanted eight Scoops so we can say wanted Scoops in Scoops and this should return true because there’s not an eight within the Scoops that we wanted and if we said in and we said we wanted eight is that within our list that we created and that’s going to return a false hello everybody today we’re going to be taking a look at the if statement within python now it’s actually the if lfl statement but that’s a mouthful so I’m just going to call it the if L statement now we have this flowchart and I apologize for being blurry but this is the absolute best one that I could find right up top we have our if condition now if this if condition is true we’re going to run a body of code but if that condition is false we’re going to go over here and go to the LF condition the LF condition or statement is basically saying if the first if statement doesn’t work let’s try this if statement if this LF statement is true it goes to this body of code if it’s false it’ll come over here to the else and the else is basically if all these things don’t work then run this body of code now you can have as many ill if statements as you want but you can only have one if statement and one else statement so let’s write out some code and see how this actually looks let’s first start off by writing if that is our if statement and now we have to write our condition which is about to be either met or not met so we’ll say if 25 is greater than 10 which is true we’ll say colon and then we’re going to hit enter and it’s going to automatically indent that line of code for us and this is our body of code so if 25 is greater than 10 our body of code will execute so for us we’re just going to write print and we’ll say it worked now if we run this it’s going to check is 25 greater than 10 if that is true print this so let’s hit shift enter and it worked now let’s take this exact code we’ll paste it right down here and we’ll say is less than and right now this if statement is not true so it’s not actually going to work as you can see there’s no output there’s nothing that happened really but it did check to see if 25 was less than 10 but it just wasn’t true now we can use our else statement so we’re going to come right down here and we’re going to say else and we’ll do a colon and we’ll hit enter again automatically indenting and we’re going to say print and we’re going to say it did not work dot dot dot so what it’s going to do is it’s going to come up here and check is 25 less than 10 no it’s not so this body of code is not going to be executed it’s going to go right down to this else statement now this else statement is going to be printed there’s no condition on this so the if statement has a condition 25 is less than 10 this has no condition so if this doesn’t work if this is false it’s going to come down here and it will run this body of code let’s run this by clicking shift enter and as you can see our output is it did not work now let’s go back up here and put greater than because this is now true it’s going to say if 25 is greater than 10 print it worked and then it’s going to stop it’s not going to go to this L statement at all so let’s run this and our output is it worked so what if we have a lot of different conditions that we want to try let’s come right down here this is where the LF comes in so really quickly let’s change this to a not true a false statement we’re going to go down and say LF and we’re going to say if it is and let’s say 30 we’ll say LF worked so now it’s going to check is 25 less than 10 no it’s not let’s look at the next condition is 25 less than 30 and if it is we’ll print L if worked so let’s try running this and L if worked now we can do as as many of these LF statements as we want we can do let’s just try a few of them right here so we’ll say if 25 is less than 20 is less than 21 and let’s do 40 and let’s do 50 so we’ll say LF lf2 lf3 and lf4 now if you look at this the first one that is actually going to work is this 25 to 40 right here once this one is checked and it comes out as true none of the other LF or L statements will work so let’s try this one it should be lf3 and this one ran properly now within our condition so far we’ve only used a comparison operator we can also use a logical operator like and or or so we can say if 25 is less than 10 which it’s not let’s say or actually and we’ll say or 1 is less than three which is true if we run this now it will actually work so we can use several different types of operators within our if statement to see if a condition is true or not or several conditions are true there’s also a way to write an if else statement in one line if you want to do that so we can write print we’ll say it worked and then we’ll come over here and say if 10 is greater than 30 and then we’ll write else print and we’ll say it did not work just like we had before except now it’s all occurring on one line so let’s just try this and see if it works so it’s saying print it worked if 10 is greater than 30 which it wasn’t so it went to the L statement and then it printed out our body right here although we didn’t have any indentation or multiple lines it was all done in one line now there’s one other thing that we haven’t looked at yet uh and I’m going to show it to you really quickly and that’s a nested if statement so when we run this it’s going to say it worked it works because it says 25 is less than 10 or 1 is less L than three since this is true it’s going to print out it worked but we can also do a nested if statement so we can do multiple if statements as well so we’re going to hit enter and we’ll say if and we’ll do a true statement here so we’ll say if 10 is greater than 5 let’s do a colon hit enter then we’ll say print and then we’ll type A String saying this nested if statement oops worked now let’s try this out and see what we get so it went through the first if statement it said it was true and it prints out it worked this is still the body of code so it goes down to this next if statement and it says if 10 is greater than five we’re going to print this out and you could do this on and on and on it can basically go on forever and you can create a really in-depth logic and that actually happens a lot when you start writing more advanced code hello everybody today we’re going to be learning about four Loops in Python the for Loop is used to iterate over a sequence which could be a list a tube an array a string or even a dictionary here’s the list that we’ll be working with throughout this video and I have this little diagram right here which kind of explains how a for Loop works the for Loop is going to start by looking at the very first item in our sequence or our list and that’s going to be our one right here it’s going to ask is this the last element in our list and it is not so it’s going to go down to this body of the for Loop now we can have a thousand different things that can happen in the body of the for loop as we’re about to look out in just second then it’s going to go up to the next element and ask is this the last element reached so it’ll be no again because we’ll be going to the two and then the three and then the four and the five once it reaches the five it’ll go to the body of the for Loop and then when it asks if that’s the last element the answer would be yes because it’s iterated through all the items within the list and then we would exit the loop and the for Loop would be over now that may not have made perfect sense but let’s actually start writing out the syntax of a for Loop so we can understand understand this better to start our for loop we’re going to say four and then we’re going to give it a temporary variable for this for Loop so it’s a variable as it iterates through these numbers it’s going to assign the variable to that number so for this one we’re just going to say number because it’s pretty appropriate because these are all numbers and then we’re going to say in integers now right here you can put just about anything this could be the list this could be a tuple this could be a string even but that is what we’re going to iterate through so we’re saying for the variables each of these numbers within this list of integers and then we’re going to write a colon this is the body of code that’s going to actually be executed when we run through and iterate through our list so for our first example we’re going to start off super simple and all we’re going to do is say print open parentheses and say number as it iterates through the one two 3 4 and five number becomes our variable that is going to be printed so during that first loop our one will be printed because that will be assigned right here then through the next iteration the two will be assigned and it’ll be put right here in each Loop until the very end so let’s hit shift enter and as you can see it did exactly that now in this body and I’ll copy and paste this down here in this body we really can do just about anything we want we don’t even have to use this variable number right here we can just print yep if we wanted to and and what it’s going to do is for each iteration all five of those every time it Loops through it’s going to print off yep so let’s hit shift enter and it printed it off for us so really we weren’t even using the numbers within the list we were really just using it as almost a counter now let’s copy this integers once again let’s go right up here and let’s go copy this for Loop that we wrote now we do not have to call this number this can be anything you want any variable name that you’d like to name it we could call it jelly and we can do jelly plus jelly I think you’re getting the picture right when it Loops through that one it’s doing one plus one when it Loops through the two it’s doing 2 + 2 that is basically how a for Loop works now for a dictionary it’s going to handle it a little bit differently so let’s create a dictionary really quickly so we’ll say ice cream d iary is equal to we’re going to do a squiggly brackets so we’re going to say name and we’re going to say colon we need to assign our value for that item so we’re going to say Alex freeberg we’ll do our next one separated by a comma and we’ll say weekly intake and I’ll say five Scoops per week the next one we will do is favorite ice creams and for this one we’re going to do something a little bit different for this we’re going to have a list list within this dictionary so we’ll say within our list of my favorite ice creams we’ll say mint chocolate chip and I’ll just do MCC for that and we’ll separate that out by a comma and we’ll say chocolate so now we have this dictionary ice cream dick and within it we have my name my weekly intake and my favorite ice creams with a list in there as well let’s hit shift enter and now we’re going to start writing our for Loop now the for Loop is going to look very similar but to call it dictionary it’s just a little bit different so we’re going to say for the cream in icore creamore dictionary. values and then we’re going to do parentheses and then a colon now we’re going to print the cream so in order to indicate what we actually want to pull we have to specify within the dictionary what we want are we pulling the item are we pulling the value we need to specify this so that’s why we have thist value right here so let’s run this and see what we get so as you can see we are pulling in the values right here that’s why we’re pulling in Alex freeberg 5 and mint chocolate chip SL chocolate now we are able to call both of those both the key and the value so let’s go right down here and we can do both the key and the value so we can pull two things at one time and we’re going to do this by saying do items so we could also do key if we just wanted to do a key but we want to do items so we going to do both of them so we’re going to go right down here and say four key and value in ice cream dictionary. items print and let’s write key and then we’ll do a comma and then let’s give it a little arrow or something like that uh something like this and then we’ll do a comma and we’ll say value and let’s print this off and see what we get so it’s looping through and for each key and value it’s saying here is the key so that’s the name then we have weekly intake then we have favorite ice creams it’s giving us a little arrow and then we’re also printing off the value so we have name Alex freeberg weekly intake five favorite ice creams mint chocolate chip and chocolate so now let’s talk about nested for Loops we’ve looked at for Loops we understand how they work and why they do what they do but what about a nested for Loop a for Loop within a for Loop for this example let’s create two separate lists let’s create flavors and let’s make that a list by making it a bracket and we’ll do vanilla the classic chocolate and then cookie dough all great flavors so that’s our first list and then we’re going to say toppings and we’ll do a bracket for that as well and we’ll say fudge and then we’ll do Oreos and then we’ll do Marsh mows is how you spell marshmallows I think it’s an e that looks wrong I might be spelling it wrong but that’s okay so let’s save this by clicking shift enter and now we have our flavors and our toppings so now let’s write our first for Loops we’re going to say 41 as in our number one for loop we’re going to say in flavors and we’ll do a colon we’ll click enter now we can write our second for Loop so we’re going to say 42 in toppings and then we’ll do a colon and enter and then we’re going to say print and we’ll do an open parenthesis and then we’re going to say one so we’re printing the one in flavors and then we’re going to say one comma we to say topped with comma two so what this is essentially going to do is we’re going to say for one we’re going to take the very first one in flavors and then we’re going to Loop through all of two as well so we’re going to Loop through hot fudge Oreos and marshmallows and once we print that off then we will Loop all the way back to Flavors and look at the next iteration or the next sequence within the first for Loop so let’s run this really quickly and see what we get so as you can see it goes vanilla vanilla vanilla and vanilla is topped with the hot fudge the Oreos and the marshmallows and then we start iterating through our second one in our first for Loop so there’s that hierarchy so we’re iterating completely through this one before we actually go to the very first for Loop and start iterating through that one again now that is essentially how a nested for Loop works these nested for Loops can get very complicated in fact for Loops in general can get very complicated the more you add to it and the more you’re wanting to do with it but that is basically how a for Loop and a nested for Loop Works hello everybody today we’re going to be taking a look at while Loops in Python the while loop in Python is used to iterate over a block of code as long as the test condition is true now the difference between a for Loop and a while loop is that a for Loop is going to iterate over the entire sequence regardless of a condition but the while loop is only going to iterate over that sequence as long as a specific condition is met once that condition is not met the code is going to stop and it’s not going to iterate through the rest of the sequence so if we take a look at this flowchart right here we’re going to enter this while loop and we have a test condition right here the first time that this test condition comes back false it’s going to exit the while loop so let’s start actually writing out the code and see how this while loop works so let’s create a variable we’re just going to say number is equal to one and then we’ll say while and now we need to write our condition that needs to be met in order for our block of code beneath this to run so we’re going to say while number is less than five and then we’ll do colon enter and now this is our block of code we’re going to say print and then we’ll say number now what we need to do is basically create a counter we’re going to say number equals number + 1 if you’ve never done something like this it’s kind of like a counter most people start it at zero in fact let’s start it at zero and then each time it runs through this while loop it’s going to add one to this number up here and then it’s going to become a one a two a three each time it iterates through this while loop now once this number is no longer less than five it’ll break out of the while loop and it will no longer run so let’s run this really quick by hitting shift enter so it starts at zero and it’s going to say while the number is less than five print number so the first time that it runs through it is zero and so it prints zero and then it adds one to number and then it continues that y Loop right here and it keeps looping through this portion it never goes back up here to this line of code this is just our variable that we start with and then once this condition is no longer met once it is false then it’s going to break out of that code now that we basically know how a y Loop Works let’s look at something called a break statement so let’s copy this right down here and what we’re going to say is if number is equal to three we’re going to break now with the break statement we can basically Stop the Loop even if the while condition is true so while this number is less than five it’s going to continue to Loop through but now we have this break statement so it’s going to say if the number equals three we’re going to break out of this while loop but if this is false we’re going to continue adding to that number just like normal so let’s execute this so as you can see it only went to three instead of four like before because each time it was running through this y while loop it was checking if the number was equal to three and once it got to three this became true and then we broke out of this while loop the next thing that I want to look at and we’ll copy this right down here is an else statement much like an if statement but we can use the else statement with a while loop which runs the block of code and when that condition is no longer true then it activates the else statement so we’ll go right down here and we’ll say else and we’ll do a colon and enter and then we’ll say print and we’ll say no no longer less than five now because this if statement is still in there it will break so let’s say six and then we’ll run this and so it’s going to iterate through this block of code and once this statement is no longer true once we break out of it we’re going to go to our else statement now as long as this statement is true it’s going to continue to iterate through but once this condition is not met then it will go to our L statement and we’ll run that line of code now the L statement is only going to trigger if the Y Loop no longer is true if we have something like this if statement that causes it to break out of the while loop the lse statement will no longer work so let’s say if the number is three and we run this the L statement is no longer going to trigger so this body of code will not be run now the next thing that I want to look at is the continue statement if the continue statement is triggered it basically rejects all remaining statements in the current iteration of the loop and then we’ll go to the next iteration now to demonstrate this I’m going to change this break into a continue so before when we had the break if the number was equal to three it would stop all the code completely but when we change this to continue which we’ll do right now what it’s going to do is it’s no longer going to run through any of the subsequent code in this block of code it’s just going to go straight up to the beginning and restart our while loop so what’s going to happen when we run this is it’s going to come to three it’s going to become three it’s going to continue back into the while loop but it’s never going to have that number changeed to be added to one to continue with the while loop this will create an infinite Loop let’s try this really quickly and as you can see it’s going to stay three forever eventually this would time out but I’m just going to stop the code really quick so if we just change up the order of which we’re doing things we’re going to say there and we’re going to put this down here so what it’s going to do now instead of printing the number immediately and then adding the number later we’re going to add the number right away and then we’re going to say if it is three we’re going to continue and it’s going to print the number so let’s try executing this and see what happens so as you can see we no longer have the three in our output what it did was when we got to the number three it continued and didn’t execute this right here which prints off that number hello everybody today we’re going to be taking a look at functions in Python a function is a block of code which is only run when you call it so right here we’re defining our function and then this is our body of code that when we actually call it is going to be ran so right here we have our function call and all we’re doing is putting the function with the parenthesis es that is basically us calling that function and then we have our output throughout this video I’m going to show you how to write a function as well as pass arguments to that function and then a few other things like arbitrary arguments keyword arguments and arbitrary keyword arguments all these things are really important to know when you are using functions so let’s get started by writing our very first function together we’re going to start off by saying DF that is the keyword for defining a function then we can actually name our function and for this one we’re just going to do first underscore function and then we do an open parenthesis and then we’ll put a colon we’ll hit enter and it’ll automatically indent for us and this is where our body of code is going to go now within our body of code we can write just about anything and in this video I’m not going to get super Advanced we’re just going to walk through the basics to make sure that you understand how to use functions so for right now all we’re going to say is print we’ll do an open parenthesis we’ll do an apostrophe and we’ll say we did it and now we’re going to hit shift enter and this is not going to do anything at least you won’t see any output from this if we want to see the output or we actually want to run that function and some functions don’t have outputs but if we want to run that function what we have to do is just copy this and put it right down here and now we’re going to actually call our function so let’s go ahead and click shift enter and now we’ve successfully called our first function this function is about as simple as it could possibly be but now let’s take it up a notch and start looking at arguments so let’s go right down here and we’re going to say Define number underscore squared we’ll do a parenthesis and our colon as well now really quickly when you’re naming your function it’s kind of like naming a variable you can use something like X or Y but I tend to like to be a little bit more descriptive but now let’s take a look at passing an argument into a function the argument is going to be passed right here in the parenthesis so for us I’m just going to call it a number and then we’re going to hit enter and now we’ll write our body of code and all we’re going to do for this is type print and open parenthesis and we’ll say number and we’ll do two stars at least that’s what I call it a star and a two and what this is going to do is it’s going to take the number that we pass into our function it’s going to put it right here in our body of code and then for what we’re doing it’s going to put it to the power of two and so when the user or you run this and call this function this number is something that you can specify it’s an argument that you can input that will then be run in this body of code so let’s copy this right here and then put it right down here into this next cell and we’ll say five and so this five is going to be passed through into this function and be called right here for this print statement let’s run it and it should come out as I believe 25 that is my fault I forgot to actually run this block of code so I’m going to hit shift enter so now we’ve defined our function up here and now we can actually call it so now we’ll hit shift enter and we got our output of 25 now in this function we only called one argument but you can basically call as many arguments arents as you want you just have to separate them by commas so let’s copy this and we’ll put it right down here now we’ll say number squared uncore custom and then we’ll do number and then we’ll do power so now we can specify our number as well as the power that we want to raise it to so instead of having two which is what you call hardcoded we can now customize that and we’ll have power and now when we call this function we can specify the number and the power and both of those will go into this body of code and be run and we can customize those numbers so let’s copy this and we’ll say 5 to the power of three and let’s make sure I Ram this so let’s do shift enter and now we will call our function and let’s hit shift enter and we got 5 to the^ of 3 which is 125 and just one last thing to mention is if you have two arguments within your function and you are calling right here you have to pass in two arguments you can’t just have one so if we have a five right here it’s going to error out we have to specify both Arguments for it to work now let’s take a look at arbitrary arguments now arbitrary arguments are really interesting because if you don’t know how many arguments you want to pass through if you don’t know if it’s a one a two or a three you can specify that later when you’re calling the argument so you don’t have to do it upfront and know that information ahead of time so let’s define our function so we’re going to say Define and then we’re going to say number underscore args and we’ll do an open parenthesis and a colon now within our argument right here typically we would just specify here’s what our argument will be it will be number or it will be a word right but what we’re going to do is something called an arbitrary argument so it’s unknown so we’re going to put star and then we’ll say args now you will see something exactly like this typically if you’re looking at tutorials that’ll have star args in there or you’re looking at just a generic piece of code this is what it will look like but for us we’re going to actually put number so again we have the star and then we have our arbitrary argument right here and then we’ll hit enter and we’re going to say print open parentheses and this is where it’s going to get a little bit different so we’re going to say number and then we’re going to do an open bracket and let’s say zero and then we’ll do that times and then we’ll say number again with a bracket of one so in a little bit once we run this and then we call this number args function right here we’re going to need to specify the number zero and the number one that’s going to be called so let’s go ahead and run this and then we are going to call it and let’s say 5 comma 6 comma 1 2 8 so right up here we did not know how many arguments we were going to pass through it could be five it could be a thousand and we could also call in a tuple and that’s what this is right here we’re calling in a tup so what it’s going to do now is when it calls this number it’s going to call the very first within that tupal which will be that five and then it’ll also call in this number which will be the first position which is the six so let’s hit shift enter and it’s going to multiply these numbers together so five * 6 is equal to 30 now like I just said this is a tuple so we don’t actually have to write out these numbers like we just did we can pass through a tuple when we are actually calling this function let’s do that right up here let’s just create um let’s call it argor Tuple and we’ll do open parentheses and we’ll do the same numbers let’s just copy it make it easier and now we’ve created this tupal right here which we can then pass in and this is a lot more handy a lot more specific and this is most likely how someone would do something like this but let’s now create this and now we can copy AR Tuple and pass it through now really quickly this is going to fail and I’m doing that on purpose but I want to show you what you need to do in order to pass through this tupal so right now it’s going to say Tuple index is out of range all you have to do in order to use this is you have to specify a star before it just like you did when you creating your argument up here we have to put a star in front of our Tuple that we just passed through and now let’s try running this and now it works properly now the last two things that we’re going to look at are keyword arguments and arbitrary keyword arguments there are more things that you can learn and do within functions but again I’m just trying to teach you the basics to make sure that you understand how they work so let’s go right up here and a keyword argument is kind of similar to this right here and let’s actually copy this and put it right down here now a keyword argument is very similar in that you’re going to specify your arguments right here but what we did up here let me bring this down when we actually called the function what we did was we just put a five and a three and when we did that it automatically assigned number to five and power to three and that’s totally fine and you can do that but if you want a little bit more control you can use a keyword argument so right here we could say our is equal to five and number is equal to three so I just switched it around right number was assigned to five and power was assigned to three but I just switched it to show you how this might work let’s run both of these and now it’s 3 to the^ of 5 which is 243 so that essentially is a keyword argument again it just gives you a little bit more control you don’t have to put them in specific positions like if you’re just calling multiple arguments now let’s come right down here we’re going to create basically another custom function uh so for this one we’re going to write Define number underscore org and then we’ll do an open parenthesis a colon and enter and what this one is is is this one is a keyword argument or an arbitrary keyword argument now to specify an arbitrary argument all we did was a star and then we input number but if we’re doing a keyword argument we actually have to have two stars right here so let’s start taking a look and again if you’re doing arbitrary it means we don’t really know how many keyword arguments we want to pass into our function so we’re just going to put star star number and then later within our body of code and when we’re calling it we’ll be able to specify it and just like the arbitrary argument before the arbitrary keyword argument means we really just don’t know how many keyword arguments we’re going to need to pass into our function so to demonstrate this let’s write print do an open parenthesis and we’ll say my oops need to do an apostrophe my number is we’ll do just like that little space and we’ll say plus and this is kind of where it gets a little interesting or a little bit more tricky so what we’re going to say is number So This Is Us calling our number and then we’re going to do a bracket and then I’m actually going to go to calling the function it’s a little bit backward or a little bit different than what you might think but when we’re calling it what I’m going to do is I’m going to say integer is equal to let’s just do some random number now when we’re calling that keyword within our body of code what we’re going to do is we’re going to actually type out integer just like this and this looks a little bit different but what this this allows us to do is we can put as many keyword arguments in here as we want later and I’ll show you in just a second but for us we’re just creating this key and this value when we are calling it within the function so now when we create this and we run this oh whoops I forgot this has to be a string um so let’s run this again now we will say my number is 2309 then we’re going to add we’ll say plus and this isn’t going to look great but we’ll say my other number this will all be in the same line that’s okay my other number and then we’ll say number and we can specify again what we want in there so now we can go down here to where we’re calling it we’ll just put a comma and we’ll say integer oops integer 2 is equal to we’ll do a random number and then we’ll put integer two right here and then we’ll add plus right here so we don’t error out we’ll create this we’ll run this and as you can see both numbers were passed through again the syntax is terrible but now you can see that you have this arbitrary keyword argument right here and all we have to do is put number number and we can pass through as many of these arbitrary keyword arguments as we want as long as we just specify within our function when we’re calling it hello everybody today we’re going to be talking about converting data types in Python in this video I’m going to show you how to convert several different data types in including strings numbers sets tupal and even dictionaries so let’s start off by creating a variable we’ll say numor int is equal to 7 and we can check that data type by saying type and then inserting our variable number undor int and that will tell us that our data type for this variable is an integer let’s go ahead and create another one we’re going to say numor string is equal to and for this one we’ll also do a seven but let’s check the type and and we’ll do an open parenthesis and we’ll say the type of num string and that one is a string now let’s say we wanted to add those we’ll say num underscore sum so the sum of numor int plus numor string now when we’re adding these two values it is not going to work it’s going to give us an error and it’s going to say unsupported operand for INT and string so it cannot add both an integer and a string what we need to do in order to add these two numbers is to convert that string into an integer so let’s go right up here let’s add another cell and let’s say numor string undor converted is equal to and we want to convert it into an integer so all we have to do to convert it into an integer is type int and then we’re going to say numor string and that is as easy as it’s going to get all we have to do is say integer with our numb string inside of it and then it’s going to convert it and we can even check it right after by saying type num string converted and let’s run this and now we can see that it was converted into an integer so now let’s add that num string converted right here let’s copy and replace that string with the string converted and let’s actually print out that numor sum and it worked properly now we did not specify what type of value this Num Sum was going to be but because it was two integers in here it’s going to automatically apply that data type of integer to that num suum let’s go right down here and now let’s look at how we can convert lists sets and tupal so now let’s say we have a listor type and that’s equal to 1 2 3 and we can check it again by saying type and that is a list let’s say we want to convert it to a tuple it’s fairly easy all we’re going to do is write Tuple say listor type that listor type is now going to be a tupal and we can check that by saying type and wrapping it around this tupal and it shows us that it is converting that list into a tupal now we can also convert a list into a set but it may change the actual values within it let’s check that out really quickly so let’s say we have this list and let’s add a few more values to this just like that now let’s say we want to convert it to a set so we’re going to run this and we’ll say set of listor type and let’s try running this and see what the output is so this is something that you really need to be aware of when you are converting data types because set does not act the same as a list a set is basically going to take the unique values in the list and convert it to a set and it fundamentally changes the data that was in that original list and just to check the data type we can say type I’m just doing this for all of them and as you can see that is now a set now let’s go down here and take a look at dictionaries now let’s say we have a dictionary called dictionary type and we’ll do a squiggly bracket and we’ll say name and we’ll do a colon and we’ll say Alex then we’ll do age and a colon and we’ll say 28 and then we’ll do hair col and so really quickly let’s take that dictionary type and just confirm that it is a dictionary and it is and now what we’re going to do is take a look at all the items within that dictionary so we’re going to do dictionary type. items open parenthesis and this is going to show us all the items within it now we can also take this and look at something like the values and when we run that these are our values So within our dictionary we have items and that’s what this is right here this is one item and then within that we have our values which are right here so Alex 28 and Na and then we have something called a key and this is the key the name age and hair are all keys and we can look at that by saying dot keys so let’s say we want to take all of the keys and put that into a list what we’re going to do is we’re going to take this right here to say list we’ll do an open parenthesis we’ll type that in right there so it says a list and we’re converting these Keys into a list and let’s run that and now this is a list and let’s just check the type as well just to confirm and as you can see it was converted properly into a list and we can do the exact same thing with value Val and the values can also be converted into a list now we can also convert longer strings that aren’t just numbers like we did above in our very first example so let’s do longcore string and we’ll say I like to party now we’re going to take this string and we’re going to say list long string so we’re going to convert this string into a list and let’s see what happens so it took every single character in that string and put it into a list and we could also do a set as well that one’s a lot shorter because it’s only looking at unique values so that is how you convert data types in Python hello everybody today we’re going to be working on building a BMI calculator in Python now before we get started I want to show you this BMI calculator that I found online and it shows you the basic calculation that they use and that’s the one we’re going to use in this video and they also have this calculator right down here and some ranges that we can use for our calculator as well so for reference I weigh about 170 I’m about 59 let’s calculate this so I’m about a 25.1 BMI which falls into the overweight category that’s unfortunate but we can see exactly how this works and how RS should work when we actually build it so we’re going to kind of reference this throughout the video so let’s go right over here to our BMI calculator we need to calculate weight and height and then run this calculation right here so let’s go ahead and copy this and we’re going to put it right down here and so now we have our calculation so what we need is we need input from a user and there is an input function within python that we’re going to be using so let’s actually give me a few more cells so the first thing that we need to calculate is their weight let’s type out weight right here we’ll say weight is equal to and this is where we’ll use our input function so we’ll say input and when we actually run this it’s just going to give us this blank square or a user can input something we’ll say Alex so this is our output is what the actual user input and it does save it to this variable so if we say print weight it will still print out Alex now this is where we want the user to just like we did before where they’ll input their weight so we want to kind of give them a prompt for this we’ll put a string in here so I’ll do a double quote and then I’ll say enter your weight in and we’re using pounds say pounds colon space so now when we do this it’ll say enter your weight in pounds I’ll say 170 and then when we run this it does store that now let’s do print I should have saved it wait again oops now it’s only storing the value of 170 it’s not actually storing this string right here so that’s really important for when we do our calculations later um I’m going to I’m going to save this right down here because I’m sure I’m going to use that later um so we have that it’s working now we need to also do our height so let’s copy this and we’ll put it right here and we’ll do height and enter your height in inches so now for this one if we hit enter it’s actually running let’s stop it really quick and interrupt it let’s try running this so it’s going to say enter your weight and pounds that’s the first input say 170 and then when I hit enter it’s going to prompt me for that second input and so in inches 59 is 69 in and then I can hit enter again and now we have both of our inputs now we need this calculation right down here and just like that so now we have weight in pounds * 703 divided by height in inches by height in inches so we actually have weight and it’s already written in there but I’m just going to do it like this we’ll do weight time 703 so that’s pounds there our weight in pounds time 703 divided by now we have our height in inches times the height in inches so this is our calculation right here so let’s do this exact same thing let’s run this and this times of course is not going to work oops we need to do our star for both of these right now this is our calculation so let’s run this so we have 170 and that’s pounds and inches was 69 hit enter and it says cannot multiply the sequence of non- integer type of string Ah that’s because these are being stored in strings if right down here I do and we’ll do type of height we run that this is actually a string so we want to change that cuz we don’t need that anymore that so we don’t want it to be a string we need those to be integers or Floats or really anything besides a string it just needs to be numerical so integer float really so let’s do integer and we’ll wrap that input in it and we’ll do the same thing for this one now we have an integer for our weight an integer for our height so now when we’re running this calculation it should work properly let’s run this again our pounds are 70 our height is 69 in and it’s not giving us our output because we’re not printing anything okay so I just need to do print BMI so let’s try this again 1 70 69 and there is our BMI 25.1 so it worked the exact same as this one so they input well we input our height we inputed our or we inputed our weight we inputed our height and then it calculated rbmi the next thing that we need to do is we need to kind of give the user some context is that good is there BMI in within a good range a bad range we don’t know uh so let’s go ahead and I’m going to see if I can copy this know if this will work or not let’s go ahead and copy this right down here perfect so what we now need to do is we need to say okay if the user has given us this input we want to give them or tell them if they are a normal weight overweight obese severely obese anything like that and we have these ranges so that should help us out quite a bit so let’s just write our if statement and then we’ll include it up here but let’s go down here and we’ll say if and then we’ll do BMI and let’s just say BMI is greater than zero so if it’s greater than zero if they had any input where the BMI was not zero which should be every time if they do it properly don’t you know put a string in there or something or type out 40 which maybe we should make a prompt for that if that happens then we can say if we’ll do BMI and now we need to give that first range so this range right here so if it’s under 18.5 so we need to do a less than so if it’s less than 18.5 and it just says under it doesn’t say under or equal to so I’ll keep it at 18.5 so if it’s under 18.5 then let’s give kind of the output we’ll say print and the output or the basically the prompt is underweight so we’ll just say you are under under case underweight and just like that um then we’re going to pass several ellf statements through here well let’s just say else so I guess this would be like if they are if they don’t input something properly or something messes up maybe I we could write something like um print oops I’m thinking all this through we can write print enter valid inputs or something like this or we can always change that but let’s really quickly let’s run this okay so I’m not in that range uh let’s make the next one so then I can be within a certain range oops and we need we should need one more a minimum so we’ll say LF and LF these next two are this 24.9 so it’s going to check this one first so if it’s 18.5 or below 18.5 it’s automatically going to print this one so this next one we don’t have to do like a range or anything we can just say if it’s below if it’s between 25 and 20 9.9 so this one actually should be less than or equal to um this one is normal oh whoops 24.9 so this one is 24.9 this one is going to say you are normal weight so let’s run this now let’s see BMI was 25.1 oh guys I’m just messing up here I apologize all right this is the one that I was part of so now it’s going to be part of the overweight crowd now let’s run this and now our prompt is you are overweight because remember the BMI was saved right here as 25.1 down here if we run through this it’s saying no you’re not in oops get rid of that no you’re not in under 18.5 you’re not under 24.9 if you are under 29.9 you are overweight so that did work properly so that’s really good and I don’t think I want this to be our output for the person because we’re going to add this up here it’s just going to give us the BMI and then the output is going to say you are overweight uh let’s make it a little bit more customized um I’m going to say name is equal to input and then we’ll say enter your name um so it’ll be enter your name we’ll do Alex 70 69 there’s our BMI now it’s going to run through this logic or it will run through this logic in just a second when we actually finished this then we have 34.9 and let’s do one more oops and then this one’s going to be for 39.9 so this one was overweight this one is obese severely obese OB we’ll say severely you spell it severely obese and then anything that’s over that 40 and over so if it’s not this one anything else should be S morbidly obese so actually this else statement right here should say uh you are you are severely obese this is going to say morbidly morbidly obese now I added that name up here here because I wanted to add that down below actually so we’re going to say uh name plus and then we’ll do like comma you are underweight so it’ll be a little bit more personalized uh I think it’ll I think it’ll be a nice touch I really do we’ll do it like this and we’ll say you and let’s go back and do that to all of them and let me see how quickly I can do this oh whoops what I do got rid of that name plus u like that geez you guys are seeing me mess up a ton name plus you and then name plus you so now let’s run this and now it’s a little more personalized it says Alex you are overweight so this is all really good now this is an if statement um what we had done before I think is actually what we should put right down here so we’ll say else and then if that doesn’t work we’ll say what do we say enter valid input we’ll just put that um and let let me see if I can test this out don’t I don’t know if this will error out or if this will even work let me just see if I can mess with it and see if I can get it to work actually let’s copy this we’re going to copy this whole thing we’re going to include it right here and now we have basically our entire calculator so um let’s run this enter your name we’ll say Alex enter your pounds 170 Ander your inches 69 and then it’s going to say 25.1 Alex you are overweight and that’s perfect we could even go as far as adding like some feedback we could say you are overweight and then it would be a period and we could say um you need to exercise more stop sitting and writing so many python tutorials so now if we run this we’ll do Alex 17069 it says Alex you are overweight you need to exercise more and stop sitting and writing so many python tutorials period and that’s it this is the entire project um you can go a ton farther you can include much more complex logic you could even build out a UI to create your own you know app just like this where it has this input and this UI you can build that out with in jupyter notebooks with python um but that’s not really what this tutorial is for this is just to kind of help you um think through some of the logic of creating something like this hello everybody in this lesson we’re going to be taking a look at beautiful soup and requests now these packages in Python are really useful these are the two main ones that I use when I was first starting out with web scraping it can get a lot of what you want done in order to get that information out now of course there are other packages that you can use that may be a little bit more advanced but again this is just the beginner Series in a future series we’ll look at other packages as well that have some more advanced functionality so what we’re going to be doing is we’re going to import these packages and then we’re going to get all of the HTML from our website and make sure that it’s in a usable State and then in the next lesson we’re going to kind of query around in the HTML kind of pick and choose exactly what we want we’ll look at things like tags variable strings classes attributes and more so let’s get started by importing our packages what we’re going to say is from bs4 this is the module that we’re taking it from we’re going to say import and then we’ll do beautiful soup then we’re going to come down and we’re going to say import requests now let’s go ahead and run this I’m going to hit shift enter and it works well for me now if this do does not work for you you may potentially need to actually install bs4 so you may have to go to your terminal window and say pip install bs4 I’ll just let you Google how to do that if you need to do that CU it’s pretty easy but if you’re using Jupiter notebooks through Anaconda like how we set it up at the beginning of this python series then you should be totally fine it should be there for you the next thing that we need to do is specify where we’re taking this HTML from so what we need to actually do is come right over here to our web page and we need to get the URL so we’re going to go here we’re going to copy this URL and I’m just going to put it right here for a second and what we’re going to do is we’re going to be using this URL quite a bit so we just want to assign it to a variable so just say URL is equal to and then we’ll put it right in here now we can get rid of that so now this is our URL going forward this is where we’re going be pulling data from let’s go ahead and run this now we’re going to use requests and what we’re going to do is we’re going to say requests.get and then we’re going to put in url now this get function is going to use the request Library it’s going to send a get request to that URL and it’s going to return a response object let’s go ahead and run this as you can see here I got a response of 200 if you got something like a 204 or a 400 or 401 or 404 all of these things are potentially bad something like a 204 would mean there was no content in the actual web page 400 means a bad request so it was invalid the server couldn’t process it and you don’t get any response if you you got a 404 that might be one that you’re familiar with that’s an error that means the server cannot be found the next thing that we’re going to do is take the HTML now if you remember we come right back here and we inspect this we have all of this HTML right here now on this web page specifically right now it’s completely static it’s not a bunch of moving stuff or anything like that usually when you’re looking at HTML if you’re looking at something like Amazon and those web pages can update but when you actually pull that into python you’re basically getting a snapshot of the HTM at that time so what we’re going to do is bring in all of this HTML which is our snapshot of our website and then we can take a look at it so we’re going to come right down here and now we’re going to say beautiful soup so now we’ll use the beautiful soup package or Library so we need to say beautiful soup and we’re going do an open parenthesis we’re going to do two things there’s two parameters that we need to put in here first we need to put in this get request we actually need to name this and we’ll call this page we’ll say page is equal to and let’s run this and now we’re going to put that page in here and what we’re going to say is text so the page is what’s sending that request and then the text is what’s retrieving the actual raw HTML that we’re going to be using then we’re going to put a comma here and what we need to specify is how we’re going to parse this information now this is an HTML so what we’re going to do is HTML just like this this is a standard this already built into to this Library so we don’t need to go any further but it’s basically going to parse the information in an HTML format now let’s go ahead and run this let’s see what we get and as you can see we have a lot of information and as we scroll down I’ll try to point out some things that we’ve already looked at in previous lessons umm something like this th tag that should be very similar that’s the title then we have these TD tags and then of course if we scroll down even further we’ll have things like ATR tag so these are all things that we looked at in that first lesson when learning about HTML now again we want to assign this to a variable so we’re going to say soup that’s going to say equal to this information right here now I’m not going to go into all the history behind beautiful soup what I will say is the guy who created this beautiful soup Library uh what he said was is that it takes this really messy HTML or XML which you can also use it for and makes it into this kind of beautiful soup so I just thought that was kind of funny uh but that’s why we’re calling it soup right here and we’re going to go ahead and run this and we’ll come right down here here and we’ll say print soup and let’s run it and now we have everything in here so we have our HTML our head we have some HR and some links in here let scroll down a little bit more and then we have our body right there and of course we have a bunch of information in here now in the next lesson what we’re going to be doing is learning how to kind of query all of this to take specific information out and basically understand a lot of what’s going on in this HTML to make sure we can actually get what we need now if this looks really kind of messy to you and it just doesn’t make a lot of sense there is one more thing that I’m going to show you and we’ll come right down here so we’ll say soup. prettify and if you’ve ever used a different type of programming languages uh pry is very common in a lot of them where it’ll just make it a little bit more easy to visualize and see uh you’ll notice that it kind of has this hierarchy built in whereas if we scroll up there’s no hierarchy built in it’s all just down this left hand side so if you kind of want to view it and just kind of visually see the differences this does help a lot but it doesn’t actually help a lot when you’re you know querying it or using you know find and find all which is what we’re going to look at in the next lesson now the first thing that we need to learn is HTML HTML stands for hypertext markup language and it’s used to describe all of the elements on a web page now when we actually go to a website and start pulling data and information we need to know HTML so we can specify exactly what we want to take off of that website so that’s where HTML comes in and we’re going to look at the basics understanding just the basic structure of HTML then we’ll go look at a real website and you’ll kind of see that’s a little bit more difficult than what we just have right here but this is the basic building blocks to get to what the HTML actually looks like on a website now this is basically what HTML looks like we have these angled brackets with things like HTML head title body and then you’ll notice that at the end we’ll have a body and then we’ll have a body at the bottom this forward SL body denotes that this is the end of the body section in HTML so everything inside of this is within this body so there is this hierarchy within HTML we have HTML and HTML at the bottom which encapsulates all the HTML on the website then we have things like head and head body and body now Within These sections we usually have things like classes tags attributes text and all these other things things that we’ll get to in different lessons but one of the easiest ones to notice and look at are tags things like a P tag or a title tag now Within These tags because this is a super simple example we have these strings here my first web page and this is what’s called a variable string and this is actual text that we could take out of this web page now that you understand the super basics of HTML let’s actually go to our website and I’m going to have a link down below but it’s going to be this one right here this is basically just a website that you can you know practice web scraping on it’s called scrape the site.com and what we’re going to do is look at the HTML behind this web page and you can do this on any website that you go on so we’re going to right click we’re going to go down to inspect now right off the bat this looks a lot more complicated and a lot more complex than the very simple illustration that we’re looking at but let’s kind of roll this up just a little bit you’ll notice we have HTML and HTM at the bottom we have a head and there is the end of the head and then a body and the end of the body so in a super simple sense it is similar but just the information that’s within it is a lot more difficult now if we look at this title right here this is our title tag if we click this little arrow this is our drop- down you’ll notice that here we have the string hockey teams forms searching imagination now let’s say we didn’t know we didn’t want to click on that and go find it there’s something that’s super helpful within this inspection page that you can click on right here it says select an element in the page to inspect it so we’re going to click on that and as we go through our page and let’s click on this title it’s going to take us to exactly where this is in our HTML this is extremely helpful extremely useful for example let’s say the data I want is down here I want to take in the Boston Bruins I can click on it and it’s going to take me to where that is exactly in the HTML this is where we can start writing our web scraping script to specify okay I’m looking for a TR tag I’m looking for a TD tag I’m looking for the class called team this is all information and things that we can use to specify exactly what we want to pull out of our web page now there are other things that we didn’t really look at as well in just our simple illustration let’s come right over here there’s things like HRS now these are hyperlinks so if we went and then clicked on this this is just regular text but inside of it is this hyperlink where if we clicked on it it would take us to another website and typically that’s denoted by this hre right here then you’ll typically see things like a P tag which usually stands for a paragraph now the last thing that I want to show you while we’re here and we’re going to learn a lot more in the next several lessons but if we come right down here there is this actual entire table here and let’s try to find this table and I’m having trouble selecting the entire thing but let’s select this team name and if we look at this team name you can see that this is encapsulating the tables this table tag now these are super helpful because it takes in the entire table now if we wrap this up and we look just at this it says class table and then we have the end of this table tag now when we open it it’s going to have all of this information so as you can see as I’m highlighting over it we have these th tags and we have these TD tags and even these TR tags which is the individual data and this is something that we’ll look at when we’re actually scraping all of the data from this table in a few future lesson so this is how we can use HTML how we can inspect the web page and see exactly what’s going on kind of under the hood and then in future lessons we’ll see how we can use this HTML to specify exactly what data we want to pull out thank hello everybody in this lesson we’re going to be taking a look at find and find all really we’re going to be looking at a ton of different things in this lesson this is where we really start digging in seeing how we can extract specific information from our web page but in order to do that let’s set everything up where we actually bring in the HTML like we did in the last lesson and we’re just going to write all this out one more time just for practice if nothing else and then we’ll get into actually getting that information from the HTML so we’re going to start by saying from bs4 import beautiful soup there we go and import requests we’ll go ahead and run this then we’re going to come up here grab our HTML or sorry our URL we’ll say URL is equal to to and we’ll have that right here now we need to say page is equal to and then we’ll do requests.get and then we’ll put in our URL right here and we’re going to come over here and run this and lastly we need to say soup so we’ll say soup is equal to beautiful soup there we go and then within our parentheses we need to specify the page. text because we need that and our parser which is HTML and there we go and let’s go ahead and run this let’s print it out make sure it’s working and there we go so we have our soup right here all this should look really similar to uh our last lesson and so now we’ brought in our HTML from our page we have a lot a lot a lot of information in here now really quickly let’s come over and let’s inspect our web page now in here we have a ton of information right we have bunch of different tags and classes and all these other things but how do we actually use these well that’s where the find and find all is going to come into play and they’re pretty similar and you’ll see that in just a little bit but let’s say we want to take uh one of these tags and let’s come down let’s say we just want to take this div tag now there’s going to be a lot of different div tags in our HTML but let’s just come right here let’s go down and let’s say we’re going to call soup we’re going to say soup that’s all of our information we’re going to say do find now within our parentheses we can specify a lot of different things but we’re going to keep it really simple right now we’re just going to say di let’s go ahead and run this what this is going to bring up is the very first div tag in our HTML and that’s going to be this information right here now let’s copy this and we’re going to do the exact same thing except we’re going to say find underscore all now let’s run this now we’re going to have a ton more information really all find and find all do is that they find the information now find is only going to find the first response in our HTML lead that’s the div class container let’s go back up to the top that’s our div class container but find all is going to find all of them so it’ll put it in this list for you so it’s going to have this first one and it goes down to uh this word SL div which should be right here and then we have have a comma which separates our next div tag so that is how we can use it now what if we want to specify one of these div tags we pulled in a ton of them but we want to just look for one of them well this is something where the class comes in handy because right now we have classes equal to container class is equal to co md-12 I don’t know what these are at the off the top of my head but um usually they’ll be somewhat unique and we can use these to help us specify what we’re looking for for example just kind of glancing of this we could also use this a tag if we wanted to look at this so we could say oh we’re looking for uh these H refs so we have an hre here and this right down here we have this hre as well which again uh if you remember from previous lesson that stands for a hyperlink now something like the class or the href um or these IDs these are all attributes so we can specify or kind of filter Down based off of these now let’s try it so what we can do is we can do class first and this is kind of the default uh within something like find all is you can even do class underscore we can come right back up we have this div and then here’s our class so again we have to have the div and the class if we took this a tag this is an a tag which would go right here with the class of something like navlink or something like navlink again down here we need to specify that more but we have our div so we’ll say CL Cole md12 right here and let’s go ahead and run this and now it’s going to pull in just that information now we’re still getting a list because we have multiple of these so this div class uh Co md-12 doesn’t just happen once if we scroll down we’ll see it multiple times something like right here uh or actually let me see right here so here’s this comma then here’s our next one so we have two of these uh div tags with a class of coal- md-12 and in each of these we have different information this looks like a paragraph with this P tag right here and let’s scroll back up uh so I also think we should try out doing something like this P tag typically these P tags stand for paragraphs or they have text information in them let’s try to P tag really quickly let’s just see what we get and let’s run this and it looks like we get multiple P tags now if we come back here you can see that there’s this information and it’s this information that we’re pulling in and I’m just you know noticing that from right here and then we have this information right here and it looks like there’s one more which is this hre which looks like this open source so data via and then that uh hyperlink or that link right there so we have three different P tags now just to verify and make sure that that’s correct what we could do is come over here we’re going to click on this paragraph it’s going to take us to that P tag where the class is equal to lead let’s come over here and look at this paragraph now we have another P tag right over here where the class is equal to glyphicon glyphicon education I have no idea what that means um and then we’ll go to our last one which is right here where the P tag is equal to uh we have AAG hre class uh and a bunch of other information so let’s say we just wanted to pull in this paragraph right here let’s go here and see how we can specify this information so it looks like P or the class is equal to lead that looks like it’s going to be unique to just that one so if we come down here we’re going to say comma and it was class so you can do uh class underscore is equal to and then we’re going to say lead let’s try running this and we’re just pulling in that information now let’s say we actually want to pull in this paragraph We actually want this text right here and this is a very real use case you know let’s say I’m trying to pull in some information or or a paragraph of text well let’s copy this and what we’re going to then do is say. text and let’s run this now we’re going to get an error right here and this is a very common error because we’re trying to use find all unfortunately find all does not have a text attribute we actually need to change this to find typically when I’m working with these find and find alls I’m using find all most of the time until I want to start extracting text then when I specify it I’ll change this back to find just like this now let’s try this and now we’re getting in parentheses this information now this is all wonky it needs to definitely be cleaned up a little bit but if we Cod back up it’s no longer in a list and we no longer have things like these P tags in here or this class attribute so we’re really just trying to pull out this information now again this does not look perfect we could even trying to do something like strip look like there’s some white space uh that cleans it up a little bit this definitely looks a little better um and we could definitely go in here and clean this up more but just for you know an example this is how we can then extract that information now let’s look at one more example this is some information and this is what we’re going to do kind of our little mini project in the next lesson on let’s say we wanted to take all this information well what if we wanted to pull in something like the team name that’s going to be in right here in this TR tag and each of these TR tags have th tags underneath them so if we scroll down you’ll notice that each row is this TR tag so let’s go ahead and search for let’s do th let’s just search for that first so let’s come right back up here let’s use this find all and we’ll get rid of this text for right now and let’s just say we want to look for the TR is that what we said we were looking for no th so let’s say we’re looking for th let’s go ahead and run this so we’re going to have underneath this th we have team name year wins losses and notice these are all the titles so these titles are the only ones with these th tags if we go down you’ll notice that the data is actually TD tags so now let’s go back and look for TD we’ll say d and this is going to be a lot longer we have a lot of information but these are all the rows of data let’s see if we can just get one piece of this data we’re going to get back we want just this team name that’s all we’re trying to pull in for now um and then we’ll try to get this row and then in the next lesson we’re going to try to get all of this information make it look really nice and then we’ll put it into a panda’s data frame so let’s just get this team name right now let’s go ahead we’re going to say th let’s run this and we have this th and now that we know we’re getting this information in we can do find let’s run this so there’s our team name I’m just going to say. text and again we can do do strip just like that and Bam we have our team name so you can kind of start getting the idea of how we’re pulling this information out we’re really just specifying exactly what we’re seeing in this HTML and and what’s really really helpful and you know something that I do all the time is I’m inspecting it I’m just kind of searching like how what do I want what piece of information do I want then I go ahead and click on it and then I’m looking you know where is this sitting in the hierarchy it’s within the body it’s within this table with the class of table then it’s down here where this TR tag and then this TD tag so I’m looking kind of at the hierarchy and I’m specifying exactly what I’m looking for so that is what we’re going to look at in today’s lesson that’s how we can use f find and find all we were able to look at classes and tags and attributes and variable strings which is this right here getting that text uh and variable strings and we will look at find and find all and how it’s pulling that information in and how we can specify exactly what we’re looking for hello everybody in this lesson we are going to be scraping data from a real website and putting it into a panda’s data frame and maybe even exporting it to CSV if we’re feeling a bit spicy now in the last several lessons we’ve been looking at this page right here and I even promised that we were going to be pulling this data but as I was building out the project I just I honestly thought it was a little bit too easy since in the last lesson we kind of already pulled out some information from this table and I want to kind of throw you guys off so we’re going to be pulling from a different table we’re going to be going on to Wikipedia and looking at the list of the largest companies in the United States by revenue and we’re going to be pulling all of this information so if you thought this was going to be easy in a little mini project uh it’s now a full project because why not so let’s get started uh what we’re going to do is we’re going to import beautiful soup and requests we’re going to get this information and we’re going to see how we can do this and it’s going to get a little bit more complicated and a little bit more tricky we’re going to have to you know format things properly to get it into our Panda data frame to make it looking good and making it more usable so let’s go ahead and get rid of this easy table we don’t want that one uh and we’re going to come in here and we’re just going to start off this should look uh really familiar by now we’re going to say from bs4 import beautiful soup I don’t know if you’ve noticed but I’ve messed up spelling beautiful soup in every single uh video I’ve noticed uh let’s run this and now we need to go ahead and get our URL so let’s come up here let’s get our URL say URL is equal to and we’ll just keep it all in the same thing really quickly because we know this by Heart by now right uh we’ll say request. get and then URL to make sure that we’re getting that information it give us a response object um hopefully it’ll be 200 that’ll mean a good response and then we’ll say soup is equal to and then we’ll say beautiful soup and we’ll do our page. text now we’re pulling in the information from this URL and then we use our parser which will be oops HTML and let’s go ahead and run this looks like everything went well let’s print our soup now this is completely new to you it’s completely new to me I don’t know what I’m doing uh but it looks like we’re pulling in the information am I right so we got a lot of things going for us uh the uh stuff was imported properly we got our URL we got our soup which is uh not beautiful in my opinion but let’s keep on rolling let’s come right down here now what we need to do is we need to specify what data we’re looking for so let’s come and let’s inspect this web page now the only information that we’re going to want want is right in here we’re going to want these uh titles or these headers whoops so we’re going to want rank name industry Etc and then we are for sure going to want all of this information let’s just scroll down see if there’s anything tricky in here all right that looks pretty good uh and there is another table so there’s not just one table in here there are two tables in this page so that might change things for us but let’s come right back and let’s inspect our page by using this little button right here and let’s specify in let’s see if I can highlight just this page oh it’s not oh let’s do that right there so now we have this uh Wiki table sorter now I’m going to actually come right here I’m going to copy and I’m just going to say copy the outer HTML I’m just going to paste it in here real quick and that’s a ton of information I didn’t think it was going to copy all of it and we’re just going to delete that I just wanted to keep that class uh because I wanted to then come right down here at the bottom and just see what this table uh looks like I don’t know if it’s part of it or if it’s a if it’s its own table um I can’t tell let’s look at this Rank and let’s come up so it says uh it’s under this table and it looks like it’s its own table but it says Wiki table sort sortable jQuery table sorter what could be do a sortable jQuery table Ser so it looks like there are two tables with the same class which shouldn’t be a problem if we’re using find to get our text because we should be taking the first one which will be this table and this is the table we want um and if we wanted this one we could just use find all and since it’s a list we could use indexing to pull this table right um but I think we’re going to be okay with just pulling in this one so let’s go ahead and let’s do our find so we’ll do soup. find and we could find all or we could just do find a table let’s just try this and see what we get and if it pulls in the right one that we’re looking for that’ be great now this does not look correct at all um I don’t know what table it’s pulling in oh maybe it’s this right here this might be a table yeah it is so we have this uh box more citations so actually we are going to have to do exactly like what I was talking about uh let’s pull this and we well we could do comma class uh right here and let’s do both you know what this is a learning opportunity let’s do both so let me go back up to the top because I need these um and what we’re going to do let come right down here I want to add in uh another thing actually I’ll just push this one up there we go so we’re going to say findor all let’s run this so now we have multiple and again we got that weird one first but if we scroll down here’s our comma and then here’s our wik Wiki table sortable and then we have rank name industry all the ones that we were hoping to see and I guarantee you if you scroll all the way to the bottom um we’re going to see potentially Wells Fargo Goldman Sachs I’m pretty sure those are um let’s see yeah here we go like Ford motor Wells Fargo Goldman Soxs that’s this table right here so now we’re looking at the third table but again this is a list so we can use indexing on this and we’ll just choose not position zero because that’s this one right here which we did not like well now we’ll take position one let’s run this let’s go back up to the top and this is our table right here rank name industry this is the information that that we were actually wanting just to confirm rank name industry Etc so this is the information we’re wanting and we’re able to specify that with our find all and this is the information we want so we now want to make this the only information that we’re looking at so I’m just going to copy this we didn’t need to use our class for this one you could probably could have um but we could so let’s actually um put this right down here this will be our table we’ll say equal to but then I’ll come right here and I’m going to say soup. find this is just for demonstration purposes we do table comma glcore is equal to and then we’ll look at this right here whoops me do this let’s see if we get the correct output and let’s run this and looks like we’re getting a nun type object uh if I remember looks like the actual class is this right here so let’s run this instead and I got to get rid of the index there we go okay so we were able to pull it in just using the find so the find table class and it says Wiki table sortable at least that’s the HTML that we’re pulling in right here let me go back because I don’t I don’t know if that’s what I was seeing earlier let’s just get this rank let’s go back up where’s the rank go rank there we go so here’s our Rank and let’s go up to the table and there’s our class yeah and and that’s just uh to me that’s a little bit odd so it says Wiki table sortable jQuery Das table sorder right here but in our actual um in our actual python script that were running it was only pulling in the wiki table sortable so it wasn’t pulling in the jQuery dasht sorter why uh I’m not 100% sure but all things that we’re working through and we were able to uh we were able to figure out so we’re going to make this our table we’re going to say tables equal to uh soup. findall and let’s run this and if we print out our table we have this table now this is our only data that we are looking at now the first thing that I want to get is I want to get these titles or these headers right here that’s what we’re going to get first so let’s go in here we can just look in this information you can see that these are with these th tags and we can pull out those th tags really easily let’s come right down here we’re just going to say t and we can get rid of this let’s run this now these are our only th tags because everything else is a TR tag for these rows of data so these th tags are pretty unique which makes it really easy which is really great because then we can just do worldcore titles is equal to so now we have these titles but uh they’re not perfect but what we’re going to do is we’re going to Loop through it so I’m going to say worldcore titles and I’ll kind of walk through what I’m talking about is in a list and each one is Within These th tags so th and then there’s our um string that we’re trying to get so we can easily take this list and use list comprehension and we can do that right down here so I’m going to keep this to where we can see it um we’ll do worldcore tore titles that’s equal to now we’ll do our list comprehension should be super easy uh we’ll just say for title in worldcore titles and then what do we want we want title. text that’s it um because we’re just taking the text from each of these we’re just looping through and we’re getting rank then We’re looping through getting name looping through getting industry that’s it so let’s go and print our world table titles and see if it worked and it’s did uh this looks like it needs to be cleaned up just a little bit so let’s go ahead and do that while we’re here before we actually put it into the uh P’s data frame oops I just wanted uh I just wanted this actually so what we’re going to do is try to get rid of those back slash ends if we do dot strip that may actually not work yeah uh because this is a list what we need to do is we can actually do it dot. text. strip right here let’s try to do it in there there there we go so now we have uh this and now this world tables is good to go now I’m actually noticing one thing that may be odd yeah so we have rank name industry it goes to headquarters but then in here we’re getting rank name industry and then the profits which is from this table right here which we don’t want uh let’s scroll back up let’s kind of backtrack this and see where this happened we did find all table we’re looking at the first one right and then we’re doing [Music] headquarters uh so we’re doing print table ah okay I think I found the issue here and let’s backtrack again this is we’re working through this together we’re going to make mistakes uh the table is what we actually wanted to do we just did soup. findall th which is going to pull in that secondary table um jeez we were not thinking here um so now we need to do find all on the table not the soup because now we were looking at all of them oh what a rookie mistake okay uh let’s go back now let’s look at this now it’s just down to headquarters okay okay let’s go ahead and run this let’s run this now we just have headquarters now let’s run this now we are sitting pretty okay excuse my mistakes Hey listen you know if it happens to me it happens to you I promise you this is you know this is a project this a little U little project we’re creating here so we’re going to run into issues and that’s okay we’re figuring out as we go now what I want to do before we start pulling in all the data is I want to put this into our Panda data frame we’ll have the uh you know headers there for us to go so we won’t have to get that later and it just makes it easier uh in general trust me so we’re going to import pandas as PD let’s go ahead and run this and now we’re going to create our data frame so we’ll say PD dot now we have these world uh t titles so what we’re going to do is pd. data frame and then in here for our columns we’ll say that’s equal to the world table titles and let’s just go ahead and say that’s our data frame and call our data frame right here let’s run it there we go so we were able to pull out and extract those headers and those titles of these columns we’re able to put it into our data frame so we’re set up and we’re ready to go we’re rocking and rolling the next thing we need let’s go back up next thing we need is to start pulling in this data right here so we have to see how we can pull this data in now if you remember that we had those th tags those were our titles as you can see I’m highlighting over it but down here now we have these TD tags and those are all encapsulated within a TR tag so these TR represent the rows right then the D represents the data within those rows so R for rows D for data so let’s see how we can use that in order to get the information that we want so let’s go back up here just going to take this cuz again we’re only pulling from table not soup not soup what were we thinking um and let’s go ahead and let’s look at TR let’s run this now when we’re doing this TR these do come in with the headers so we’re going to have to later on we’re going to have to get rid of these we don’t want to pull those in um and have that as part of our data but if we scroll down there’s our Walmart um we have the location these are all with these TD tags and then of course it’s separated by a comma then we have our td2 so above we had our td1 so Row one row two Row three all the way down now we will easily be able to use this right because this is our column data and we can even call it that column underscore data is equal to we’ll run that um and what we’re going to do is we’re going to Loop through that cuz it was all in a list so we’re going to Loop through that information but instead of looking at the TR tag we’re going to look at the T D tag so let’s come right down here we’ll say for the row in column row and we’ll do a colon now we need to Loop through this we’ll do something like row. findor all and then what are we looking for we’re not looking for the TR looking for the TD and just for now let’s print this off see what this looks like apparently I didn’t run this uh column data that’s why and let’s run this and what we actually need to do is something almost exactly like this and I’m going to put it right below it um instead of printing this off because again this is all in a list we’re using find all so we’re we’re printing off another list which isn’t actually super helpful um for each of or all these data that we’re pulling in what we can do is we can call this uh the rowcor data and then we’ll put the row data in here so we’ll say four and we’ll say in row data so we’ll just say for the data in row data and we’ll take the data we’ll exchange that and now instead of uh World Table titles we can change this into uh individual row data right and now let’s print off the individual row data so it’s the exact same process that we were doing up here and that’s how we cleaned it up and got this and we may not need to strip but let’s just run this and see what we get there we go um and strip I’m sure was helpful let’s actually get rid of this yeah strip was helpful it’s the exact same thing that happened on the last one so let’s keep that actually let’s run this and now let’s just kind of glance glance at this information let’s look through it this looks exactly like the information that’s in the table let’s just confirm with this first one uh 25 uh two what am I saying 572 754 2.4 2300 57275 2.4 2200 so this looks exactly correct now we have to figure out a way to get this into our table because again these are all individual lists it’s not like we’re just you know putting all of this in at one time we can’t just take the entire table and plop it into um into the data frame we need a way to kind of put this in one at a time now if you’re just here for web scraping and you haven’t taken like my panda series that’s totally fine that’s not what we’re here for anyways um but what we can do we’ll have our individual row data and we’re going to put it in kind of one at a time now the reason we have to do that is because when we had it like this and let’s go back when we had it like this it’s printing out all of it but what it’s really doing and let’s get rid of it um what it’s really doing is it’s kind of doing it like this it’s printing it off one at a time and it’s only going to save that current row of data this last one it’s only going to save that as it’s looping through so what we actually want to do is every time it Loops through we append this information onto the data frame so as it goes through and eventually it’s going to end up with this one but as it goes through let’s run this as it goes through it puts this one in and then the next time it Loops through it puts this one in and the next time it Loops through Etc all the way down um so let’s see how we can do this so we have our data frame right here let’s get rid of this let’s bring our data frame in now again like I just mentioned if you don’t know pandas and you haven’t learned that uh you know go take my uh series on that it’s really good and we do something very similar to this in that Series so I’m not going to kind of walk through the entire logic um but there is something called l which stands for location when you’re looking at the index on a data frame and we’re going to use that to our advantage so we’re going to say the length of the data frame so we’re looking at how many rows are in this data frame and then we’re going to say that’s our length then we’re going to take that length and use it when we’re actually putting in this new information pretty um pretty cool so we’re going to say df.loc then a bracket and we’re putting in that length so we’re checking the length of our data frame each time it’s looping through and then we’re going to put the information in the next position that’s exactly what we’re doing let’s go ahead and put in the individual row data um so let’s just recap We’re looping through this TR this is our column data so these TR that’s our row of data then we’re as as We’re looping through it we’re doing find all and looking for TD tags that’s our individual data so that’s our row data then we’re taking that that data each piece of data and we’re getting out the text and we’re stripping it to kind of clean it and now it’s in a list for each individual row then we’re looking at our current data frame which has nothing in it right now we’re looking at the length of it and we’re appending each row of this information into the next position so let’s go ahead and run this it’s working it’s thinking and it looks like we got an issue and not set a row with mismatched columns now we’re encountering an issue not one that I got earlier but we’re going to cancel this out we’re going to figure this out together so let’s print off our individual row data let’s look at this this one is empty uh this is I’m almost certain is probably the issue um I didn’t encounter this issue when I wrote these uh when I wrote this lesson um but I’m almost certain that this is the issue right here so let’s do the column data but let’s start at position um let’s try one and not parentheses I need brackets because this is a list right so it should work and there we go so now that first one’s gone so now we just have the information I didn’t even think about that um just a second ago but I’m glad we’re running into it in case you ran into that uh issue let’s go ahead and try this again and it looked like it worked so let’s pull our data frame down I could have just wrote DF let’s pull our data frame down and now this is looking fantastic fantastic now um these three dots just mean there’s information in there just doesn’t want to display it but it looks like we have our rank we have our name have the industry revenue revenue growth employees and headquarters for every single one so this is perfect now this is exactly what I was hoping to get now you can go in and use pandas and manipulate this and change it and you know dive into all the information in there but we can also export this into a CSV if that’s what you’re wanting so we could easily do that by saying we’ll do DF do2 CSV and then within here we’re just going to do R and specify our file path so let’s come down here to our file path then we’ll go to our folder for our output so we’re just going to take this path and let me do it like that so I have this path in my one drive documents python web scraping folder for output so you know I already made this um and I’m just going to put this right down here now I do have to specify what we’re going to call this um we’ll just call this companies and then we have to say CSV that is very important now if we run this I already know just because uh we have this Rank and this index here we’re going to keep this index in the output not great uh but let’s run it let’s look at our output there’s our companies and when we pull this up as you can see this is not what we want because we have this extra thing right here now if we’re automating this this would get super annoying so what we’re going to do is go back and just say index equals false let’s go out of here and now we’re just going to come right down here we’re going to say comma index equals false and so it’s going to take this index and it’s not going to import or actually export it into the CSV now let’s go ahead and run this let’s pull up our folder one more time and let’s refresh just to make sure should be good and now this looks a lot better so we’re able to take all of that information and put it into a CSV and it’s all there so this is the whole project so if we scroll all the way back up let’s just kind of glance at what we did here scroll down we brought in our libraries and packages we specified our URL we brought in our soup um and then we tried to find our table now that took a little bit of uh testing out but we knew that the table was the second one so in position one so we took that table we were also able to specify it using find but then we use the class and of course we just wanted to work with that table that’s all the data we wanted so we specified this is our table and we worked with just our table going forward of course uh we encountered some small issues user errors on my end but we were able to get our world titles and we put those into our data frame right here using pandas then next we went back and we got all the row data and the individual data from those rows and we put it into our Panda’s data frame then we came below and we exported this into an actual CSV file so that is how we can use web scraping to get data from something like a table and put it into a panda’s data frame I hope that this lesson was helpful I know we encountered some issues that’s on my end and I apologize but if you run into those same issues hopefully that helped uh but I hope this was helpful and if you like this be sure to like And subscribe below I appreciate you I love you and I will see you in the next lesson [Music]
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!