{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# NOTEBOOK 3 - ANALYSE COUNTERS AND WITHDRAWALS\n",
"\n",
"In this notebook we will analyse df_withdrawals, the dataframe of unique revocation/counter-notice records for termination notices filed under ss 203 and 304. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Import file"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import re\n",
"import json\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"pd.set_option(\"display.max_rows\",800)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"pd.set_option(\"colwidth\",1000)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"df_withdrawals = pd.read_json(r'df_withdrawals.json')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(177, 95)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_withdrawals.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explode out titles to enable us to apply the filters to them "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this part we \"explode\" the titles out for each record. Each record may relate to multiple titles: this is normal, as an author may be terminating the copyright grants for many works at once. When we explode out titles, we duplicate all other rows in the dataframe such that each title - currently in a list - becomes a separate string object. \n",
"\n",
"We also reset the index to ensure we do not have duplicate index numbers. This enables easy searching and filtering by index number later."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2162"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals = df_withdrawals.explode(\"titles\").copy()\n",
"titles_withdrawals.title.count()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals.reset_index(drop=True,inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RangeIndex(start=0, stop=2162, step=1)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.index"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.titles.count()==titles_withdrawals.title.count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build and apply filters\n",
"\n",
"In this section, we create filters based on previously identified patterns in the data. \n",
"\n",
"We then apply those filters to the data, creating new columns in the dataframe. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Pre-78 registration patterns\n",
"\n",
"These are patterns based on pre-1978 Copyright Office registration number prefixes."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"music_reg_304_pattern = r\"\\bE\\.*\\s*for|\\bE\\.*\\s*pub|\\bE\\.*\\s*unp|\\bEs*pub|\\bEs*unpub|\\bEU\\s*\\d{2,}|\\bEu\\s*\\d{2,}|\\bEP\\s*\\d{2,}|\\bEO\\s*\\d{2}|\\bEp\\d{2,}|\\bEF\\d{2,}|\\bE\\d{2,}|\\bEFO-\\d{2,}|\\bEFO\\d{2,}|\\bEU\\s*\\d{1,}\""
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"literary_reg_304_pattern = r\"\\bA\\s*\\d{2,}|\\bB\\s*\\d{3,}|\\bAIO\\s*\\d{2,}|\\bAF\\s*\\d{2,}|\\bAFO-\\d{2,}|\\bAI\\d{2,}|\\bAFO\\d{2,}|\\bAIO-\\d{2,}|\\bC\\d{4,}|\\bAI-\\d{2,}|\\bBF\\s*\\d{2,}|\\bBI\\s*\\d{2,}|\\bBIO\\s*\\d{2,}\""
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"dramatic_reg_304_pattern = r\"\\bDP\\s*\\d{2,}|\\bDU\\s*\\d{2,}|\\bD\\s*\\d{2,}|\\bLP\\s*\\d{2,}|\\bL\\s*\\d{3,}|\\bDF\\s*\\d{2,}|\\bDFO-\\s*\\d{2,}|\\bLU\\s*\\d{2,}|\\bLF\\s*\\d{2,}|\\bLFO\\s*\\d{2,}|\\bM\\s*\\d{2,}|\\bMP\\s*\\d{2,}|\\bMU\\s*\\d{2,}|\\bMFO\\s*\\d{2,}\""
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"sound_recording_reg_304_pattern = \"Reg.\\sN\\s*\\d{2,}\""
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"art_reg_304_pattern = r\"\\bGU\\s*\\d{2,}|\\bG\\s*\\d{2,}|\\bGF\\s*\\d{2,}|\\bGFO\\s*\\d{2,}|\\bGP\\s*\\d{2,}|\\bH\\s*\\d{2,}|\\bHFO\\s*\\d{2,}|\\bHF\\s*\\d{2,}|\\bsI\\s*\\d{2,}|\\bIFO\\s*\\d{2,}|\\bIP\\s*\\d{2,}|\\bIU\\s*\\d{2,}|\\bJ\\s*\\d{2,}|\\bJFO\\s*\\d{2,}|\\bJP\\s*\\d{2,}|\\bJU\\s*\\d{2,}|\\bK\\s*\\d{2,}|\\bKF\\s*\\d{2,}|\\bKFO\\s*\\{d,}|\\bKK\\s*\\d{2,}|\\bKKF\\s*\\d{2,}|\\bKKFO\\s*\\d{2,}\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Post-78 registration patterns\n",
"\n",
"These are patterns based on post-1978 Copyright Office registration number prefixes"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"tx_reg_203_pattern = r\"\\bTX\\d{2,}|\\bTX\\s*\\d{2,}|\\bTXu\\d{2,}|\\bTXu\\s*\\d{2,}|\\bTX\\s*\\d\\-|\\bTX\\s*\\-\\d{2,}\"\n",
"pa_reg_203_pattern = r\"\\bPAu\\s*\\d{2,}|\\bPA\\s*\\d{2,}|\\bPAu*\\s*\\d-\"\n",
"sr_reg_203_pattern = r\"\\bSR\\d{2}|\\bSR\\s*\\d{2}|\\bSRU\\s*\\d{2}|\\bSRu*\\s*\\d-\"\n",
"va_reg_203_pattern = r\"\\bVAu\\s*\\d{2,}|\\bVA\\s*\\d{2,}|VA\\s*\\d\\-\"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Descriptors\n",
"\n",
"These are self-identifiers regarding the type of work - they are not present in every record but are in some, helping us to identify the types of works subject to termination notices. "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"descriptors_pattern = \"literary work|sound recording|composition|musical score|musical work|artwork|dramatic work|musical play\""
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"#create a column list for each of the patterns for ease of reference\n",
"columns_list = ['music_reg_304','literary_reg_304',\n",
" 'dramatic_reg_304',\"sound_recording_reg_304\",\n",
" \"art_reg_304\",\"descriptors\",\n",
" \"tx_reg_203\",\"pa_reg_203\",\n",
" \"sr_reg_203\",\"va_reg_203\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#for this column, which has registration numbers, replace anything that evaluates as 'false' with an empty string.\n",
"\n",
"titles_withdrawals[\"registration_number_not_verified\"] = titles_withdrawals.registration_number_not_verified.apply(lambda x:\"\" if not x else x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extract patterns and reproduce in columns\n",
"\n",
"The next section extracts the patterns and reproduces them in the relevant columns. \n",
"\n",
"At each change, it is prudent to do a random sample check to make sure that the patterns extracted are indeed the registration numbers/self-identifiers we are looking for. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filter reg_304"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Music_reg_304**"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"music_reg_304\"] = (titles_withdrawals.titles.str.findall(music_reg_304_pattern).map(str)\n",
" + ' ' + titles_withdrawals.title.str.findall(music_reg_304_pattern).copy().map(str)\n",
" + ' ' + titles_withdrawals.notes.str.findall(music_reg_304_pattern).copy().map(str)\n",
" + ' ' + titles_withdrawals.registration_number_not_verified.str.findall(music_reg_304_pattern).copy().map(str))\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 ['EP104176'] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 ['EU209098'] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: music_reg_304, dtype: object"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.music_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"music_reg_304\"] = titles_withdrawals.music_reg_304.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 ['EP104176'] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 ['EU209098'] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: music_reg_304, dtype: object"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.music_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"music_reg_304\"] = titles_withdrawals.music_reg_304.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"music_reg_304\"] = titles_withdrawals.music_reg_304.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 [['EP104176']]\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 [['EU209098']]\n",
"613 []\n",
"403 []\n",
"Name: music_reg_304, dtype: object"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.music_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"music_reg_304\"] = titles_withdrawals.music_reg_304.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"music_reg_304\"] = titles_withdrawals.music_reg_304.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 [['EP104176']]\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 [['EU209098']]\n",
"613 []\n",
"403 []\n",
"Name: music_reg_304, dtype: object"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.music_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"music_reg_304\"] = titles_withdrawals.music_reg_304.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 NaN\n",
"1226 NaN\n",
"1417 NaN\n",
"56 NaN\n",
"1868 NaN\n",
"661 [['EP104176']]\n",
"303 NaN\n",
"134 NaN\n",
"1237 NaN\n",
"1359 NaN\n",
"2034 NaN\n",
"241 NaN\n",
"2043 NaN\n",
"812 NaN\n",
"1044 NaN\n",
"790 NaN\n",
"1547 NaN\n",
"201 [['EU209098']]\n",
"613 NaN\n",
"403 NaN\n",
"Name: music_reg_304, dtype: object"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.music_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2096, 96)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.music_reg_304.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(66, 96)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.music_reg_304.isnull()].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Literary_reg_304**"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"literary_reg_304\"] = titles_withdrawals.titles.str.findall(literary_reg_304_pattern).map(str)+' '+titles_withdrawals.title.str.findall(literary_reg_304_pattern).copy().map(str)+' '+titles_withdrawals.notes.str.findall(literary_reg_304_pattern).copy().map(str)+' '+titles_withdrawals.registration_number_not_verified.str.findall(literary_reg_304_pattern).copy().map(str)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: literary_reg_304, dtype: object"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.literary_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"literary_reg_304\"] = titles_withdrawals.literary_reg_304.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: literary_reg_304, dtype: object"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.literary_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"literary_reg_304\"] = titles_withdrawals.literary_reg_304.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [[], [], [], []]\n",
"1226 [[], [], [], []]\n",
"1417 [[], [], [], []]\n",
"56 [[], [], [], []]\n",
"1868 [[], [], [], []]\n",
"661 [[], [], [], []]\n",
"303 [[], [], [], []]\n",
"134 [[], [], [], []]\n",
"1237 [[], [], [], []]\n",
"1359 [[], [], [], []]\n",
"2034 [[], [], [], []]\n",
"241 [[], [], [], []]\n",
"2043 [[], [], [], []]\n",
"812 [[], [], [], []]\n",
"1044 [[], [], [], []]\n",
"790 [[], [], [], []]\n",
"1547 [[], [], [], []]\n",
"201 [[], [], [], []]\n",
"613 [[], [], [], []]\n",
"403 [[], [], [], []]\n",
"Name: literary_reg_304, dtype: object"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.literary_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"literary_reg_304\"] = titles_withdrawals.literary_reg_304.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: literary_reg_304, dtype: object"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.literary_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"literary_reg_304\"] = titles_withdrawals.literary_reg_304.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"literary_reg_304\"] = titles_withdrawals.literary_reg_304.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: literary_reg_304, dtype: object"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.literary_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"literary_reg_304\"] = titles_withdrawals.literary_reg_304.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 NaN\n",
"1226 NaN\n",
"1417 NaN\n",
"56 NaN\n",
"1868 NaN\n",
"661 NaN\n",
"303 NaN\n",
"134 NaN\n",
"1237 NaN\n",
"1359 NaN\n",
"2034 NaN\n",
"241 NaN\n",
"2043 NaN\n",
"812 NaN\n",
"1044 NaN\n",
"790 NaN\n",
"1547 NaN\n",
"201 NaN\n",
"613 NaN\n",
"403 NaN\n",
"Name: literary_reg_304, dtype: object"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.literary_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2159, 97)"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.literary_reg_304.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3, 97)"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.literary_reg_304.isnull()].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Sound_recording_reg_304**"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sound_recording_reg_304\"] = (titles_withdrawals.titles.str.findall(sound_recording_reg_304_pattern).map(str)\n",
" +' '+titles_withdrawals.title.str.findall(sound_recording_reg_304_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.notes.str.findall(sound_recording_reg_304_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.registration_number_not_verified.str.findall(sound_recording_reg_304_pattern).copy().map(str))"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: sound_recording_reg_304, dtype: object"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sound_recording_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sound_recording_reg_304\"] = titles_withdrawals.sound_recording_reg_304.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: sound_recording_reg_304, dtype: object"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sound_recording_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sound_recording_reg_304\"] = titles_withdrawals.sound_recording_reg_304.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [[], [], [], []]\n",
"1226 [[], [], [], []]\n",
"1417 [[], [], [], []]\n",
"56 [[], [], [], []]\n",
"1868 [[], [], [], []]\n",
"661 [[], [], [], []]\n",
"303 [[], [], [], []]\n",
"134 [[], [], [], []]\n",
"1237 [[], [], [], []]\n",
"1359 [[], [], [], []]\n",
"2034 [[], [], [], []]\n",
"241 [[], [], [], []]\n",
"2043 [[], [], [], []]\n",
"812 [[], [], [], []]\n",
"1044 [[], [], [], []]\n",
"790 [[], [], [], []]\n",
"1547 [[], [], [], []]\n",
"201 [[], [], [], []]\n",
"613 [[], [], [], []]\n",
"403 [[], [], [], []]\n",
"Name: sound_recording_reg_304, dtype: object"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sound_recording_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sound_recording_reg_304\"] = titles_withdrawals.sound_recording_reg_304.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: sound_recording_reg_304, dtype: object"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sound_recording_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sound_recording_reg_304\"] = titles_withdrawals.sound_recording_reg_304.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: sound_recording_reg_304, dtype: object"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sound_recording_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sound_recording_reg_304\"] = titles_withdrawals.sound_recording_reg_304.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sound_recording_reg_304\"] = titles_withdrawals.sound_recording_reg_304.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 NaN\n",
"1226 NaN\n",
"1417 NaN\n",
"56 NaN\n",
"1868 NaN\n",
"661 NaN\n",
"303 NaN\n",
"134 NaN\n",
"1237 NaN\n",
"1359 NaN\n",
"2034 NaN\n",
"241 NaN\n",
"2043 NaN\n",
"812 NaN\n",
"1044 NaN\n",
"790 NaN\n",
"1547 NaN\n",
"201 NaN\n",
"613 NaN\n",
"403 NaN\n",
"Name: sound_recording_reg_304, dtype: float64"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sound_recording_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2162, 98)"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.sound_recording_reg_304.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Series([], Name: sound_recording_reg_304, dtype: float64)"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.sound_recording_reg_304.isnull()].sound_recording_reg_304"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Art_reg_304**"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"art_reg_304\"] = (titles_withdrawals.titles.str.findall(art_reg_304_pattern).map(str)\n",
" +' '+titles_withdrawals.title.str.findall(art_reg_304_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.notes.str.findall(art_reg_304_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.registration_number_not_verified.str.findall(art_reg_304_pattern).copy().map(str))"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: art_reg_304, dtype: object"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.art_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"art_reg_304\"] = titles_withdrawals.art_reg_304.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: art_reg_304, dtype: object"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.art_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"art_reg_304\"] = titles_withdrawals.art_reg_304.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [[], [], [], []]\n",
"1226 [[], [], [], []]\n",
"1417 [[], [], [], []]\n",
"56 [[], [], [], []]\n",
"1868 [[], [], [], []]\n",
"661 [[], [], [], []]\n",
"303 [[], [], [], []]\n",
"134 [[], [], [], []]\n",
"1237 [[], [], [], []]\n",
"1359 [[], [], [], []]\n",
"2034 [[], [], [], []]\n",
"241 [[], [], [], []]\n",
"2043 [[], [], [], []]\n",
"812 [[], [], [], []]\n",
"1044 [[], [], [], []]\n",
"790 [[], [], [], []]\n",
"1547 [[], [], [], []]\n",
"201 [[], [], [], []]\n",
"613 [[], [], [], []]\n",
"403 [[], [], [], []]\n",
"Name: art_reg_304, dtype: object"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.art_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"art_reg_304\"] = titles_withdrawals.art_reg_304.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: art_reg_304, dtype: object"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.art_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"art_reg_304\"] = titles_withdrawals.art_reg_304.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"art_reg_304\"] = titles_withdrawals.art_reg_304.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"art_reg_304\"] = titles_withdrawals.art_reg_304.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 NaN\n",
"1226 NaN\n",
"1417 NaN\n",
"56 NaN\n",
"1868 NaN\n",
"661 NaN\n",
"303 NaN\n",
"134 NaN\n",
"1237 NaN\n",
"1359 NaN\n",
"2034 NaN\n",
"241 NaN\n",
"2043 NaN\n",
"812 NaN\n",
"1044 NaN\n",
"790 NaN\n",
"1547 NaN\n",
"201 NaN\n",
"613 NaN\n",
"403 NaN\n",
"Name: art_reg_304, dtype: float64"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.art_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2162, 99)"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.art_reg_304.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0, 99)"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.art_reg_304.isnull()].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Dramatic_304_reg**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note it doesn't filter the \"notes\" column because they all have Vxxxx Dxxxx"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [],
"source": [
"#titles_304[\"dramatic_reg_304\"] = titles_304.titles.str.findall(dramatic_reg_304_pattern).map(str)+' '+titles_304.title.str.findall(dramatic_reg_304_pattern).copy().map(str)+' '+titles_304.notes.str.findall(dramatic_reg_304_pattern).copy().map(str)+' '+titles_304.registration_number_not_verified.str.findall(dramatic_reg_304_pattern).copy().map(str)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"dramatic_reg_304\"] = (titles_withdrawals.titles.str.findall(dramatic_reg_304_pattern).map(str)\n",
" +' '+titles_withdrawals.title.str.findall(dramatic_reg_304_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.registration_number_not_verified.str.findall(dramatic_reg_304_pattern).copy().map(str))"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] []\n",
"1226 [] [] []\n",
"1417 [] [] []\n",
"56 [] [] []\n",
"1868 [] [] []\n",
"661 [] [] []\n",
"303 [] [] []\n",
"134 [] [] []\n",
"1237 [] [] []\n",
"1359 [] [] []\n",
"2034 [] [] []\n",
"241 [] [] []\n",
"2043 [] [] []\n",
"812 [] [] []\n",
"1044 [] [] []\n",
"790 [] [] []\n",
"1547 [] [] []\n",
"201 [] [] []\n",
"613 [] [] []\n",
"403 [] [] []\n",
"Name: dramatic_reg_304, dtype: object"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.dramatic_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"dramatic_reg_304\"] = titles_withdrawals.dramatic_reg_304.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] []\n",
"1226 [] [] []\n",
"1417 [] [] []\n",
"56 [] [] []\n",
"1868 [] [] []\n",
"661 [] [] []\n",
"303 [] [] []\n",
"134 [] [] []\n",
"1237 [] [] []\n",
"1359 [] [] []\n",
"2034 [] [] []\n",
"241 [] [] []\n",
"2043 [] [] []\n",
"812 [] [] []\n",
"1044 [] [] []\n",
"790 [] [] []\n",
"1547 [] [] []\n",
"201 [] [] []\n",
"613 [] [] []\n",
"403 [] [] []\n",
"Name: dramatic_reg_304, dtype: object"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.dramatic_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"dramatic_reg_304\"] = titles_withdrawals.dramatic_reg_304.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [[], [], []]\n",
"1226 [[], [], []]\n",
"1417 [[], [], []]\n",
"56 [[], [], []]\n",
"1868 [[], [], []]\n",
"661 [[], [], []]\n",
"303 [[], [], []]\n",
"134 [[], [], []]\n",
"1237 [[], [], []]\n",
"1359 [[], [], []]\n",
"2034 [[], [], []]\n",
"241 [[], [], []]\n",
"2043 [[], [], []]\n",
"812 [[], [], []]\n",
"1044 [[], [], []]\n",
"790 [[], [], []]\n",
"1547 [[], [], []]\n",
"201 [[], [], []]\n",
"613 [[], [], []]\n",
"403 [[], [], []]\n",
"Name: dramatic_reg_304, dtype: object"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.dramatic_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"dramatic_reg_304\"] = titles_withdrawals.dramatic_reg_304.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: dramatic_reg_304, dtype: object"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.dramatic_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"dramatic_reg_304\"] = titles_withdrawals.dramatic_reg_304.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"dramatic_reg_304\"] = titles_withdrawals.dramatic_reg_304.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"dramatic_reg_304\"] = titles_withdrawals.dramatic_reg_304.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 NaN\n",
"1226 NaN\n",
"1417 NaN\n",
"56 NaN\n",
"1868 NaN\n",
"661 NaN\n",
"303 NaN\n",
"134 NaN\n",
"1237 NaN\n",
"1359 NaN\n",
"2034 NaN\n",
"241 NaN\n",
"2043 NaN\n",
"812 NaN\n",
"1044 NaN\n",
"790 NaN\n",
"1547 NaN\n",
"201 NaN\n",
"613 NaN\n",
"403 NaN\n",
"Name: dramatic_reg_304, dtype: object"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.dramatic_reg_304.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2151, 100)"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.dramatic_reg_304.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(11, 100)"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.dramatic_reg_304.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [],
"source": [
"fil1 = (titles_withdrawals.notes.str.contains(dramatic_reg_304_pattern,na=False,case=True,regex=True))"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [],
"source": [
"doc_filter = (titles_withdrawals.notes.str.contains(\"V\\d{2,}\\s*\\D\\d{2,}\",case=True,na=False,regex=True))"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[fil1&~doc_filter].title.count()"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"381 001 Islands (V3620 D554)\n",
"382 002 Technopop (V3620 D554)\n",
"384 004 The age of plastic (V3626 D126) / Reg. SR15916.\n",
"385 005 Living in the plastic age (V3626 D126) / Reg. SR15916.\n",
"386 006 Video killed the radio star (V3626 D126) / Reg. SR15916.\n",
"387 007 Kid dynamo (V3626 D126) / Reg. SR15916.\n",
"388 008 I love you (Miss Robot) (V3626 D126 / Reg. SR15916.\n",
"389 009 Clean, clean (V3626 D126) / Reg. SR15916.\n",
"390 010 Elstree (V3626 D126) / Reg. SR15916.\n",
"391 011 Astroboy (and the proles on parade) (V3626 D126) / Reg. SR15916.\n",
"392 012 Johnny on the monorail (V3626 D126) / Reg. SR13269.\n",
"Name: titles, dtype: object"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dramatic_remove = titles_withdrawals[((titles_withdrawals.dramatic_reg_304.notna())\n",
" &(titles_withdrawals.titles.str.contains(\"V\\d{2,}\\s*\\D\\d{2,}\",case=True,na=False,regex=True)))]\n",
"dramatic_remove.titles"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [],
"source": [
"filters = ((titles_withdrawals.dramatic_reg_304.notna())\n",
" &(titles_withdrawals.titles.str.contains(\"V\\d{2,}\\s*\\D\\d{2,}\",case=True,na=False,regex=True)))"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals.loc[filters,'dramatic_reg_304'] = np.nan"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0, 100)"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.dramatic_reg_304.notna()].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Descriptors"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"descriptors\"] = (titles_withdrawals.titles.str.findall(descriptors_pattern).map(str)\n",
" +' '+titles_withdrawals.title.str.findall(descriptors_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.registration_number_not_verified.str.findall(descriptors_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.notes.str.findall(descriptors_pattern).copy().map(str))"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] ['composition'] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] ['composition'] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: descriptors, dtype: object"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.descriptors.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"descriptors\"] = titles_withdrawals.descriptors.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] ['composition'] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] ['composition'] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: descriptors, dtype: object"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.descriptors.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"descriptors\"] = titles_withdrawals.descriptors.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [[], [], [], []]\n",
"1226 [[], [], [], []]\n",
"1417 [[], [], [], []]\n",
"56 [[], [], [], []]\n",
"1868 [[], [], [], []]\n",
"661 [[], ['composition'], [], []]\n",
"303 [[], [], [], []]\n",
"134 [[], [], [], []]\n",
"1237 [[], [], [], []]\n",
"1359 [[], [], [], []]\n",
"2034 [[], [], [], []]\n",
"241 [[], [], [], []]\n",
"2043 [[], [], [], []]\n",
"812 [[], [], [], []]\n",
"1044 [[], ['composition'], [], []]\n",
"790 [[], [], [], []]\n",
"1547 [[], [], [], []]\n",
"201 [[], [], [], []]\n",
"613 [[], [], [], []]\n",
"403 [[], [], [], []]\n",
"Name: descriptors, dtype: object"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.descriptors.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"descriptors\"] = titles_withdrawals.descriptors.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 [['composition']]\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 [['composition']]\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: descriptors, dtype: object"
]
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.descriptors.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"descriptors\"] = titles_withdrawals.descriptors.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"descriptors\"] = titles_withdrawals.descriptors.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"descriptors\"] = titles_withdrawals.descriptors.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 NaN\n",
"1226 NaN\n",
"1417 NaN\n",
"56 NaN\n",
"1868 NaN\n",
"661 [['composition']]\n",
"303 NaN\n",
"134 NaN\n",
"1237 NaN\n",
"1359 NaN\n",
"2034 NaN\n",
"241 NaN\n",
"2043 NaN\n",
"812 NaN\n",
"1044 [['composition']]\n",
"790 NaN\n",
"1547 NaN\n",
"201 NaN\n",
"613 NaN\n",
"403 NaN\n",
"Name: descriptors, dtype: object"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.descriptors.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1915, 101)"
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.descriptors.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(247, 101)"
]
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.descriptors.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"665 [['composition']]\n",
"1030 [['composition']]\n",
"1651 [['composition']]\n",
"1036 [['composition']]\n",
"1918 [['sound, recording']]\n",
"867 [['composition']]\n",
"472 [['composition'], ['composition']]\n",
"847 [['composition']]\n",
"663 [['composition']]\n",
"1650 [['composition']]\n",
"1049 [['composition']]\n",
"1643 [['composition']]\n",
"1055 [['composition']]\n",
"1074 [['composition']]\n",
"855 [['composition']]\n",
"1928 [['sound, recording']]\n",
"1797 [['sound, recording']]\n",
"1784 [['sound, recording']]\n",
"2138 [['composition']]\n",
"1649 [['composition']]\n",
"Name: descriptors, dtype: object"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.descriptors.isnull()].sample(n=20,replace=False,random_state=3).descriptors"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[['composition']] 150\n",
"[['sound, recording']] 94\n",
"[['sound, recording'], ['sound, recording']] 2\n",
"[['composition'], ['composition']] 1\n",
"Name: descriptors, dtype: int64"
]
},
"execution_count": 116,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.descriptors.value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filter 203s"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TX_reg_203**"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"tx_reg_203\"] = (titles_withdrawals.titles.str.findall(tx_reg_203_pattern).map(str)\n",
" +' '+titles_withdrawals.title.str.findall(tx_reg_203_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.notes.str.findall(tx_reg_203_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.registration_number_not_verified.str.findall(tx_reg_203_pattern).copy().map(str))"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: tx_reg_203, dtype: object"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.tx_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"tx_reg_203\"] = titles_withdrawals.tx_reg_203.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: tx_reg_203, dtype: object"
]
},
"execution_count": 120,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.tx_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"tx_reg_203\"] = titles_withdrawals.tx_reg_203.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [[], [], [], []]\n",
"1226 [[], [], [], []]\n",
"1417 [[], [], [], []]\n",
"56 [[], [], [], []]\n",
"1868 [[], [], [], []]\n",
"661 [[], [], [], []]\n",
"303 [[], [], [], []]\n",
"134 [[], [], [], []]\n",
"1237 [[], [], [], []]\n",
"1359 [[], [], [], []]\n",
"2034 [[], [], [], []]\n",
"241 [[], [], [], []]\n",
"2043 [[], [], [], []]\n",
"812 [[], [], [], []]\n",
"1044 [[], [], [], []]\n",
"790 [[], [], [], []]\n",
"1547 [[], [], [], []]\n",
"201 [[], [], [], []]\n",
"613 [[], [], [], []]\n",
"403 [[], [], [], []]\n",
"Name: tx_reg_203, dtype: object"
]
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.tx_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"tx_reg_203\"] = titles_withdrawals.tx_reg_203.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: tx_reg_203, dtype: object"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.tx_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"tx_reg_203\"] = titles_withdrawals.tx_reg_203.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"tx_reg_203\"] = titles_withdrawals.tx_reg_203.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: tx_reg_203, dtype: object"
]
},
"execution_count": 127,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.tx_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"tx_reg_203\"] = titles_withdrawals.tx_reg_203.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 NaN\n",
"1226 NaN\n",
"1417 NaN\n",
"56 NaN\n",
"1868 NaN\n",
"661 NaN\n",
"303 NaN\n",
"134 NaN\n",
"1237 NaN\n",
"1359 NaN\n",
"2034 NaN\n",
"241 NaN\n",
"2043 NaN\n",
"812 NaN\n",
"1044 NaN\n",
"790 NaN\n",
"1547 NaN\n",
"201 NaN\n",
"613 NaN\n",
"403 NaN\n",
"Name: tx_reg_203, dtype: object"
]
},
"execution_count": 129,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.tx_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2151, 102)"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.tx_reg_203.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(11, 102)"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.tx_reg_203.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"938 [['TX1060434']]\n",
"937 [['TX221068']]\n",
"934 [['TX840937']]\n",
"935 [['TX1717768']]\n",
"942 [['TX1528818']]\n",
"939 [['TX1060434']]\n",
"940 [['TX1570961']]\n",
"933 [['TX221068']]\n",
"936 [['TX1749189']]\n",
"941 [['TX1537903']]\n",
"943 [['TX1570534']]\n",
"Name: tx_reg_203, dtype: object"
]
},
"execution_count": 132,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.tx_reg_203.isnull()].tx_reg_203.sample(n=11,replace=False,random_state=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**PA_reg_203**"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"pa_reg_203\"] = (titles_withdrawals.titles.str.findall(pa_reg_203_pattern).map(str)\n",
" +' '+titles_withdrawals.title.str.findall(pa_reg_203_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.notes.str.findall(pa_reg_203_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.registration_number_not_verified.str.findall(pa_reg_203_pattern).copy().map(str))"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 ['PA302759'] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 ['PA17735'] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: pa_reg_203, dtype: object"
]
},
"execution_count": 134,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.pa_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"pa_reg_203\"] = titles_withdrawals.pa_reg_203.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 ['PA302759'] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 ['PA17735'] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: pa_reg_203, dtype: object"
]
},
"execution_count": 136,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.pa_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"pa_reg_203\"] = titles_withdrawals.pa_reg_203.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [[], [], [], []]\n",
"1226 [[], [], [], []]\n",
"1417 [[], [], [], []]\n",
"56 [['PA302759'], [], [], []]\n",
"1868 [[], [], [], []]\n",
"661 [[], [], [], []]\n",
"303 [[], [], [], []]\n",
"134 [[], [], [], []]\n",
"1237 [[], [], [], []]\n",
"1359 [[], [], [], []]\n",
"2034 [[], [], [], []]\n",
"241 [[], [], [], []]\n",
"2043 [[], [], [], []]\n",
"812 [['PA17735'], [], [], []]\n",
"1044 [[], [], [], []]\n",
"790 [[], [], [], []]\n",
"1547 [[], [], [], []]\n",
"201 [[], [], [], []]\n",
"613 [[], [], [], []]\n",
"403 [[], [], [], []]\n",
"Name: pa_reg_203, dtype: object"
]
},
"execution_count": 138,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.pa_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"pa_reg_203\"] = titles_withdrawals.pa_reg_203.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 [['PA302759']]\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 [['PA17735']]\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: pa_reg_203, dtype: object"
]
},
"execution_count": 140,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.pa_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"pa_reg_203\"] = titles_withdrawals.pa_reg_203.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"pa_reg_203\"] = titles_withdrawals.pa_reg_203.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 [['PA302759']]\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 [['PA17735']]\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: pa_reg_203, dtype: object"
]
},
"execution_count": 143,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.pa_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"pa_reg_203\"] = titles_withdrawals.pa_reg_203.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 NaN\n",
"1226 NaN\n",
"1417 NaN\n",
"56 [['PA302759']]\n",
"1868 NaN\n",
"661 NaN\n",
"303 NaN\n",
"134 NaN\n",
"1237 NaN\n",
"1359 NaN\n",
"2034 NaN\n",
"241 NaN\n",
"2043 NaN\n",
"812 [['PA17735']]\n",
"1044 NaN\n",
"790 NaN\n",
"1547 NaN\n",
"201 NaN\n",
"613 NaN\n",
"403 NaN\n",
"Name: pa_reg_203, dtype: object"
]
},
"execution_count": 145,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.pa_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2117, 103)"
]
},
"execution_count": 146,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.pa_reg_203.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(45, 103)"
]
},
"execution_count": 147,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.pa_reg_203.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"817 [['PA17740']]\n",
"1596 [['PAu696534'], ['PAu000696534']]\n",
"1091 [['PA375972'], ['PA375972']]\n",
"814 [['PA17737']]\n",
"818 [['PA17741']]\n",
"823 [['PA17737']]\n",
"70 [['PA128090']]\n",
"64 [['PA266520']]\n",
"1101 [['PA375972'], ['PA375972']]\n",
"1590 [['PAu141089',, 'PAu66677']]\n",
"1095 [['PA375972'], ['PA375972']]\n",
"825 [['PA17739']]\n",
"820 [['PA17736']]\n",
"1099 [['PA375972'], ['PA375972']]\n",
"816 [['PA17739']]\n",
"1585 [['PAu141091',, 'PA66672']]\n",
"1586 [['PAu141092',, 'PA294865']]\n",
"1093 [['PA375972'], ['PA375972']]\n",
"1098 [['PA375972'], ['PA375972']]\n",
"1591 [['PAu141086',, 'PA294858']]\n",
"Name: pa_reg_203, dtype: object"
]
},
"execution_count": 148,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.pa_reg_203.isnull()].pa_reg_203.sample(n=20,replace=False,random_state=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**SR_reg_203**"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sr_reg_203\"] = (titles_withdrawals.titles.str.findall(sr_reg_203_pattern).map(str)\n",
" +' '+titles_withdrawals.title.str.findall(sr_reg_203_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.notes.str.findall(sr_reg_203_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.registration_number_not_verified.str.findall(sr_reg_203_pattern).copy().map(str))"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 ['SR36'] [] [] []\n",
"1226 ['SR54'] [] [] []\n",
"1417 ['SR37'] [] [] []\n",
"56 ['SR77'] [] [] []\n",
"1868 ['SR42'] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 ['SR44'] [] [] []\n",
"1237 ['SR97'] [] [] []\n",
"1359 [] [] [] []\n",
"2034 ['SR85'] [] [] []\n",
"241 [] [] [] []\n",
"2043 ['SR56'] [] [] []\n",
"812 ['SR40'] [] [] []\n",
"1044 ['SR32'] [] [] []\n",
"790 [] [] [] []\n",
"1547 ['SR13'] [] [] []\n",
"201 [] [] [] []\n",
"613 ['SR10'] [] [] []\n",
"403 ['SR12'] [] [] []\n",
"Name: sr_reg_203, dtype: object"
]
},
"execution_count": 150,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sr_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sr_reg_203\"] = titles_withdrawals.sr_reg_203.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 152,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 ['SR36'] [] [] []\n",
"1226 ['SR54'] [] [] []\n",
"1417 ['SR37'] [] [] []\n",
"56 ['SR77'] [] [] []\n",
"1868 ['SR42'] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 ['SR44'] [] [] []\n",
"1237 ['SR97'] [] [] []\n",
"1359 [] [] [] []\n",
"2034 ['SR85'] [] [] []\n",
"241 [] [] [] []\n",
"2043 ['SR56'] [] [] []\n",
"812 ['SR40'] [] [] []\n",
"1044 ['SR32'] [] [] []\n",
"790 [] [] [] []\n",
"1547 ['SR13'] [] [] []\n",
"201 [] [] [] []\n",
"613 ['SR10'] [] [] []\n",
"403 ['SR12'] [] [] []\n",
"Name: sr_reg_203, dtype: object"
]
},
"execution_count": 152,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sr_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 153,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sr_reg_203\"] = titles_withdrawals.sr_reg_203.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 154,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [['SR36'], [], [], []]\n",
"1226 [['SR54'], [], [], []]\n",
"1417 [['SR37'], [], [], []]\n",
"56 [['SR77'], [], [], []]\n",
"1868 [['SR42'], [], [], []]\n",
"661 [[], [], [], []]\n",
"303 [[], [], [], []]\n",
"134 [['SR44'], [], [], []]\n",
"1237 [['SR97'], [], [], []]\n",
"1359 [[], [], [], []]\n",
"2034 [['SR85'], [], [], []]\n",
"241 [[], [], [], []]\n",
"2043 [['SR56'], [], [], []]\n",
"812 [['SR40'], [], [], []]\n",
"1044 [['SR32'], [], [], []]\n",
"790 [[], [], [], []]\n",
"1547 [['SR13'], [], [], []]\n",
"201 [[], [], [], []]\n",
"613 [['SR10'], [], [], []]\n",
"403 [['SR12'], [], [], []]\n",
"Name: sr_reg_203, dtype: object"
]
},
"execution_count": 154,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sr_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sr_reg_203\"] = titles_withdrawals.sr_reg_203.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 [['SR36']]\n",
"1226 [['SR54']]\n",
"1417 [['SR37']]\n",
"56 [['SR77']]\n",
"1868 [['SR42']]\n",
"661 []\n",
"303 []\n",
"134 [['SR44']]\n",
"1237 [['SR97']]\n",
"1359 []\n",
"2034 [['SR85']]\n",
"241 []\n",
"2043 [['SR56']]\n",
"812 [['SR40']]\n",
"1044 [['SR32']]\n",
"790 []\n",
"1547 [['SR13']]\n",
"201 []\n",
"613 [['SR10']]\n",
"403 [['SR12']]\n",
"Name: sr_reg_203, dtype: object"
]
},
"execution_count": 156,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sr_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sr_reg_203\"] = titles_withdrawals.sr_reg_203.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sr_reg_203\"] = titles_withdrawals.sr_reg_203.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [['SR36']]\n",
"1226 [['SR54']]\n",
"1417 [['SR37']]\n",
"56 [['SR77']]\n",
"1868 [['SR42']]\n",
"661 []\n",
"303 []\n",
"134 [['SR44']]\n",
"1237 [['SR97']]\n",
"1359 []\n",
"2034 [['SR85']]\n",
"241 []\n",
"2043 [['SR56']]\n",
"812 [['SR40']]\n",
"1044 [['SR32']]\n",
"790 []\n",
"1547 [['SR13']]\n",
"201 []\n",
"613 [['SR10']]\n",
"403 [['SR12']]\n",
"Name: sr_reg_203, dtype: object"
]
},
"execution_count": 159,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sr_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"sr_reg_203\"] = titles_withdrawals.sr_reg_203.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 161,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [['SR36']]\n",
"1226 [['SR54']]\n",
"1417 [['SR37']]\n",
"56 [['SR77']]\n",
"1868 [['SR42']]\n",
"661 NaN\n",
"303 NaN\n",
"134 [['SR44']]\n",
"1237 [['SR97']]\n",
"1359 NaN\n",
"2034 [['SR85']]\n",
"241 NaN\n",
"2043 [['SR56']]\n",
"812 [['SR40']]\n",
"1044 [['SR32']]\n",
"790 NaN\n",
"1547 [['SR13']]\n",
"201 NaN\n",
"613 [['SR10']]\n",
"403 [['SR12']]\n",
"Name: sr_reg_203, dtype: object"
]
},
"execution_count": 161,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.sr_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(659, 104)"
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.sr_reg_203.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 163,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1503, 104)"
]
},
"execution_count": 163,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.sr_reg_203.isnull()].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**VA_reg_203**"
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"va_reg_203\"] = (titles_withdrawals.titles.str.findall(va_reg_203_pattern).map(str)\n",
" +' '+titles_withdrawals.title.str.findall(va_reg_203_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.notes.str.findall(va_reg_203_pattern).copy().map(str)\n",
" +' '+titles_withdrawals.registration_number_not_verified.str.findall(va_reg_203_pattern).copy().map(str))"
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: va_reg_203, dtype: object"
]
},
"execution_count": 165,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.va_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"va_reg_203\"] = titles_withdrawals.va_reg_203.apply(lambda x: x.strip())"
]
},
{
"cell_type": "code",
"execution_count": 167,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [] [] [] []\n",
"1226 [] [] [] []\n",
"1417 [] [] [] []\n",
"56 [] [] [] []\n",
"1868 [] [] [] []\n",
"661 [] [] [] []\n",
"303 [] [] [] []\n",
"134 [] [] [] []\n",
"1237 [] [] [] []\n",
"1359 [] [] [] []\n",
"2034 [] [] [] []\n",
"241 [] [] [] []\n",
"2043 [] [] [] []\n",
"812 [] [] [] []\n",
"1044 [] [] [] []\n",
"790 [] [] [] []\n",
"1547 [] [] [] []\n",
"201 [] [] [] []\n",
"613 [] [] [] []\n",
"403 [] [] [] []\n",
"Name: va_reg_203, dtype: object"
]
},
"execution_count": 167,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.va_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 168,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"va_reg_203\"] = titles_withdrawals.va_reg_203.apply(lambda x: x.split())"
]
},
{
"cell_type": "code",
"execution_count": 169,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 [[], [], [], []]\n",
"1226 [[], [], [], []]\n",
"1417 [[], [], [], []]\n",
"56 [[], [], [], []]\n",
"1868 [[], [], [], []]\n",
"661 [[], [], [], []]\n",
"303 [[], [], [], []]\n",
"134 [[], [], [], []]\n",
"1237 [[], [], [], []]\n",
"1359 [[], [], [], []]\n",
"2034 [[], [], [], []]\n",
"241 [[], [], [], []]\n",
"2043 [[], [], [], []]\n",
"812 [[], [], [], []]\n",
"1044 [[], [], [], []]\n",
"790 [[], [], [], []]\n",
"1547 [[], [], [], []]\n",
"201 [[], [], [], []]\n",
"613 [[], [], [], []]\n",
"403 [[], [], [], []]\n",
"Name: va_reg_203, dtype: object"
]
},
"execution_count": 169,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.va_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 170,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"va_reg_203\"] = titles_withdrawals.va_reg_203.apply(lambda x: [i for i in x if i != '[]'])"
]
},
{
"cell_type": "code",
"execution_count": 171,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: va_reg_203, dtype: object"
]
},
"execution_count": 171,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.va_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 172,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"va_reg_203\"] = titles_withdrawals.va_reg_203.apply(lambda x: [i for i in x if i != 'None'])"
]
},
{
"cell_type": "code",
"execution_count": 173,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"va_reg_203\"] = titles_withdrawals.va_reg_203.apply(lambda x: [i for i in x if i != 'nan'])"
]
},
{
"cell_type": "code",
"execution_count": 174,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 []\n",
"1226 []\n",
"1417 []\n",
"56 []\n",
"1868 []\n",
"661 []\n",
"303 []\n",
"134 []\n",
"1237 []\n",
"1359 []\n",
"2034 []\n",
"241 []\n",
"2043 []\n",
"812 []\n",
"1044 []\n",
"790 []\n",
"1547 []\n",
"201 []\n",
"613 []\n",
"403 []\n",
"Name: va_reg_203, dtype: object"
]
},
"execution_count": 174,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.va_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 175,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"va_reg_203\"] = titles_withdrawals.va_reg_203.apply(lambda x: np.nan if len(x)==0 else x)"
]
},
{
"cell_type": "code",
"execution_count": 176,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"120 NaN\n",
"1226 NaN\n",
"1417 NaN\n",
"56 NaN\n",
"1868 NaN\n",
"661 NaN\n",
"303 NaN\n",
"134 NaN\n",
"1237 NaN\n",
"1359 NaN\n",
"2034 NaN\n",
"241 NaN\n",
"2043 NaN\n",
"812 NaN\n",
"1044 NaN\n",
"790 NaN\n",
"1547 NaN\n",
"201 NaN\n",
"613 NaN\n",
"403 NaN\n",
"Name: va_reg_203, dtype: float64"
]
},
"execution_count": 176,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.va_reg_203.sample(n=20,replace=False,random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 177,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2162, 105)"
]
},
"execution_count": 177,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[titles_withdrawals.va_reg_203.isnull()].shape"
]
},
{
"cell_type": "code",
"execution_count": 178,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0, 105)"
]
},
"execution_count": 178,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals[~titles_withdrawals.va_reg_203.isnull()].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyse titles_withdrawals\n",
"\n",
"In this part, we examine the types of works after having extracted registration numbers and/or self-identifiers from the relevant columns. "
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[['composition']] 150\n",
"[['sound, recording']] 94\n",
"[['sound, recording'], ['sound, recording']] 2\n",
"[['composition'], ['composition']] 1\n",
"Name: descriptors, dtype: int64"
]
},
"execution_count": 179,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.descriptors.value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create and apply filters"
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {},
"outputs": [],
"source": [
"performing_art_filter = (titles_withdrawals.music_reg_304.notna()\n",
" |titles_withdrawals.dramatic_reg_304.notna()\n",
" |titles_withdrawals.pa_reg_203.notna()\n",
" |titles_withdrawals.descriptors.map(str).str.contains(\"screenplay|composition|musical|dramatic\",na=False,case=False,regex=True))"
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {},
"outputs": [],
"source": [
"literary_filter = (titles_withdrawals.literary_reg_304.notna()\n",
" |titles_withdrawals.descriptors.map(str).str.contains(\"literary\",case=False,na=False,regex=True)\n",
" |titles_withdrawals.tx_reg_203.notna())"
]
},
{
"cell_type": "code",
"execution_count": 182,
"metadata": {},
"outputs": [],
"source": [
"sound_recording_filter = (titles_withdrawals.sound_recording_reg_304.notna()\n",
" |titles_withdrawals.descriptors.map(str).str.contains(\"sound|recording\",case=False,na=False,regex=True)\n",
" |titles_withdrawals.sr_reg_203.notna())"
]
},
{
"cell_type": "code",
"execution_count": 183,
"metadata": {},
"outputs": [],
"source": [
"art_filter = (titles_withdrawals.art_reg_304.notna()\n",
" |titles_withdrawals.descriptors.map(str).str.contains(\"artwork\",case=False,na=False,regex=True)\n",
" |titles_withdrawals.va_reg_203.notna())"
]
},
{
"cell_type": "code",
"execution_count": 184,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"is_performing_art\"] = performing_art_filter"
]
},
{
"cell_type": "code",
"execution_count": 185,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"is_literary\"] = literary_filter"
]
},
{
"cell_type": "code",
"execution_count": 186,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"is_sound_recording\"] = sound_recording_filter"
]
},
{
"cell_type": "code",
"execution_count": 187,
"metadata": {},
"outputs": [],
"source": [
"titles_withdrawals[\"is_art\"] = art_filter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Generate true/false counts**"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False 1950\n",
"True 212\n",
"Name: is_performing_art, dtype: int64"
]
},
"execution_count": 188,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.is_performing_art.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 189,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False 2148\n",
"True 14\n",
"Name: is_literary, dtype: int64"
]
},
"execution_count": 189,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.is_literary.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 190,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True 1583\n",
"False 579\n",
"Name: is_sound_recording, dtype: int64"
]
},
"execution_count": 190,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.is_sound_recording.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 191,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False 2162\n",
"Name: is_art, dtype: int64"
]
},
"execution_count": 191,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals.is_art.value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Create whole and category dataframes and concatenate them**"
]
},
{
"cell_type": "code",
"execution_count": 192,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2162, 23)"
]
},
"execution_count": 192,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals_final = titles_withdrawals.loc[:,columns].copy()\n",
"titles_withdrawals_final.shape"
]
},
{
"cell_type": "code",
"execution_count": 193,
"metadata": {},
"outputs": [],
"source": [
"musical_work_filter = ((titles_withdrawals_final.titles.str.contains(\"musical work\",case=False,na=False,regex=True))\n",
" |(titles_withdrawals_final.title.str.contains(\"musical work\",case=False,na=False,regex=True))\n",
" |(titles_withdrawals_final.notes.str.contains(\"musical work\",case=False,na=False,regex=True))\n",
" |(titles_withdrawals_final.registration_number_not_verified.str.contains(\"musical work\",case=False,na=False,regex=True)))"
]
},
{
"cell_type": "code",
"execution_count": 194,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0, 23)"
]
},
"execution_count": 194,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titles_withdrawals_final[musical_work_filter].shape"
]
},
{
"cell_type": "code",
"execution_count": 195,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(212, 23)"
]
},
"execution_count": 195,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"performing_art_df = titles_withdrawals[performing_art_filter].loc[:,columns].copy()\n",
"performing_art_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 196,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(14, 23)"
]
},
"execution_count": 196,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"literary_df = titles_withdrawals[literary_filter].loc[:,columns].copy()\n",
"literary_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 197,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1583, 23)"
]
},
"execution_count": 197,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sound_recording_df = titles_withdrawals[sound_recording_filter].loc[:,columns].copy()\n",
"sound_recording_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 198,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0, 23)"
]
},
"execution_count": 198,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"art_df = titles_withdrawals[art_filter].loc[:,columns].copy()\n",
"art_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 199,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1809, 23)"
]
},
"execution_count": 199,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"categorised_incl_duplicates = pd.concat([performing_art_df,literary_df,sound_recording_df,art_df])\n",
"categorised_incl_duplicates.shape"
]
},
{
"cell_type": "code",
"execution_count": 200,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(122, 23)"
]
},
"execution_count": 200,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"multiple_patterns_df = categorised_incl_duplicates[categorised_incl_duplicates.index.duplicated(keep=\"first\")].copy()\n",
"multiple_patterns_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 201,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(122, 23)"
]
},
"execution_count": 201,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"multiple_patterns_df[(multiple_patterns_df[\"is_performing_art\"]==True)].shape"
]
},
{
"cell_type": "code",
"execution_count": 202,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(475, 23)"
]
},
"execution_count": 202,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"not_categorised = titles_withdrawals[(titles_withdrawals.is_performing_art==False)\n",
" &(titles_withdrawals.is_literary==False)\n",
" &(titles_withdrawals.is_sound_recording==False)\n",
" &(titles_withdrawals.is_art==False)].loc[:,columns].copy()\n",
"\n",
"not_categorised.shape"
]
},
{
"cell_type": "code",
"execution_count": 203,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1687, 23)"
]
},
"execution_count": 203,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"categorised = titles_withdrawals[(titles_withdrawals.is_performing_art==True)\n",
" |(titles_withdrawals.is_literary==True)\n",
" |(titles_withdrawals.is_sound_recording==True)\n",
" |(titles_withdrawals.is_art==True)].loc[:,columns].copy()\n",
"\n",
"categorised.shape"
]
},
{
"cell_type": "code",
"execution_count": 204,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 204,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"categorised.title.count()+not_categorised.title.count()==titles_withdrawals.title.count()"
]
},
{
"cell_type": "code",
"execution_count": 205,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" category | \n",
" number_of_titles | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" performing_art | \n",
" 212 | \n",
"
\n",
" \n",
" 1 | \n",
" literary | \n",
" 14 | \n",
"
\n",
" \n",
" 2 | \n",
" sound_recording | \n",
" 1583 | \n",
"
\n",
" \n",
" 3 | \n",
" art | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" total_include_duplicates | \n",
" 1809 | \n",
"
\n",
" \n",
" 5 | \n",
" total_without_duplicates | \n",
" 1687 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" category number_of_titles\n",
"0 performing_art 212\n",
"1 literary 14\n",
"2 sound_recording 1583\n",
"3 art 0\n",
"4 total_include_duplicates 1809\n",
"5 total_without_duplicates 1687"
]
},
"execution_count": 205,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from tabulate import tabulate\n",
"\n",
"counts = pd.DataFrame({\"category\": [\"performing_art\",\"literary\",\"sound_recording\",\"art\",\"total_include_duplicates\",\"total_without_duplicates\"],\n",
" \"number_of_titles\": [titles_withdrawals[titles_withdrawals.is_performing_art==True].title.count(),\n",
" titles_withdrawals[titles_withdrawals.is_literary==True].title.count(),\n",
" titles_withdrawals[titles_withdrawals.is_sound_recording==True].title.count(),\n",
" titles_withdrawals[titles_withdrawals.is_art==True].title.count(),\n",
" (titles_withdrawals[titles_withdrawals.is_performing_art==True].title.count()+\n",
" titles_withdrawals[titles_withdrawals.is_literary==True].title.count()+\n",
" titles_withdrawals[titles_withdrawals.is_sound_recording==True].title.count()+\n",
" titles_withdrawals[titles_withdrawals.is_art==True].title.count()),\n",
" (titles_withdrawals[(titles_withdrawals.is_performing_art==True)\n",
" |(titles_withdrawals.is_literary==True)\n",
" |(titles_withdrawals.is_sound_recording==True)\n",
" |(titles_withdrawals.is_art==True)].title.count())]})\n",
"\n",
"counts\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"78.02960222016651"
]
},
"execution_count": 206,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(categorised.title.count()/titles_withdrawals.title.count())*100"
]
},
{
"cell_type": "code",
"execution_count": 207,
"metadata": {},
"outputs": [],
"source": [
"reg_304 = titles_withdrawals_final[(titles_withdrawals_final.music_reg_304.notna())\n",
" |(titles_withdrawals_final.dramatic_reg_304.notna())\n",
" |(titles_withdrawals_final.literary_reg_304.notna())\n",
" |(titles_withdrawals_final.sound_recording_reg_304.notna())\n",
" |(titles_withdrawals_final.art_reg_304.notna())].copy()"
]
},
{
"cell_type": "code",
"execution_count": 208,
"metadata": {},
"outputs": [],
"source": [
"reg_203 = titles_withdrawals_final[(titles_withdrawals_final.tx_reg_203.notna())\n",
" |(titles_withdrawals_final.sr_reg_203.notna())\n",
" |(titles_withdrawals_final.pa_reg_203.notna())\n",
" |(titles_withdrawals_final.va_reg_203.notna())].copy()"
]
},
{
"cell_type": "code",
"execution_count": 209,
"metadata": {},
"outputs": [],
"source": [
"word_descriptors = titles_withdrawals_final[titles_withdrawals_final.descriptors.notna()]"
]
},
{
"cell_type": "code",
"execution_count": 210,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" type | \n",
" number of titles | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" pre_1978_registration_number | \n",
" 69 | \n",
"
\n",
" \n",
" 1 | \n",
" post_1978_registration_number | \n",
" 1538 | \n",
"
\n",
" \n",
" 2 | \n",
" identifiers | \n",
" 247 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" type number of titles\n",
"0 pre_1978_registration_number 69\n",
"1 post_1978_registration_number 1538\n",
"2 identifiers 247"
]
},
"execution_count": 210,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type_counts = pd.DataFrame({'type':['pre_1978_registration_number','post_1978_registration_number','identifiers'],\n",
" 'number of titles':[reg_304.title.count(),reg_203.title.count(),word_descriptors.title.count()]})\n",
"\n",
"type_counts"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Divide into counters and withdrawals ##\n",
"\n",
"At present the dataframe has works subject to both counter-notices and withdrawals. However, we want to divide these as they are different. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Counters"
]
},
{
"cell_type": "code",
"execution_count": 242,
"metadata": {},
"outputs": [],
"source": [
"counters = titles_withdrawals[(titles_withdrawals.notes.str.contains(\"count notice|counter notice|counter-notice|counter to|contesting notice of termination\",na=False,case=False,regex=True))].copy()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Build dataframe of 304 counter-notice titles** "
]
},
{
"cell_type": "code",
"execution_count": 243,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1720, 109)"
]
},
"execution_count": 243,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_203 = counters[(counters.notes.str.contains(\"203\",case=False,na=False,regex=True)\n",
" &counters.notes.str.contains(\"termination\",case=False,na=False,regex=True))]\n",
"\n",
"counters_203.shape"
]
},
{
"cell_type": "code",
"execution_count": 244,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"134"
]
},
"execution_count": 244,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_203.document_number.nunique()"
]
},
{
"cell_type": "code",
"execution_count": 245,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1277, 109)"
]
},
"execution_count": 245,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_203[((counters_203.is_performing_art==True)\n",
" |(counters_203.is_literary==True)\n",
" |(counters_203.is_sound_recording==True)\n",
" |(counters_203.is_art==True))].shape"
]
},
{
"cell_type": "code",
"execution_count": 246,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.7424418604651163"
]
},
"execution_count": 246,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"1277/1720"
]
},
{
"cell_type": "code",
"execution_count": 247,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1399, 109)"
]
},
"execution_count": 247,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_203_pa = counters_203[counters_203.is_performing_art==True].copy()\n",
"counters_203_tx = counters_203[counters_203.is_literary==True].copy()\n",
"counters_203_sr = counters_203[counters_203.is_sound_recording==True].copy()\n",
"counters_203_va = counters_203[counters_203.is_art==True].copy()\n",
"counters_203_incl_duplicates = pd.concat([counters_203_pa,counters_203_tx,counters_203_sr,counters_203_va])\n",
"counters_203_incl_duplicates.shape\n"
]
},
{
"cell_type": "code",
"execution_count": 248,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(122, 109)"
]
},
"execution_count": 248,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_multiple_patterns = counters_203_incl_duplicates[counters_203_incl_duplicates.index.duplicated(keep=\"first\")]\n",
"counters_multiple_patterns.shape"
]
},
{
"cell_type": "code",
"execution_count": 249,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(122, 109)"
]
},
"execution_count": 249,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_multiple_patterns[counters_multiple_patterns.is_sound_recording].shape"
]
},
{
"cell_type": "code",
"execution_count": 250,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(122, 109)"
]
},
"execution_count": 250,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_multiple_patterns[counters_multiple_patterns.is_performing_art].shape"
]
},
{
"cell_type": "code",
"execution_count": 251,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" category | \n",
" number_of_titles | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" performing_art | \n",
" 149 | \n",
"
\n",
" \n",
" 1 | \n",
" literary | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" sound_recording | \n",
" 1249 | \n",
"
\n",
" \n",
" 3 | \n",
" art | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" total_include_duplicates | \n",
" 1399 | \n",
"
\n",
" \n",
" 5 | \n",
" total_without_duplicates | \n",
" 1277 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" category number_of_titles\n",
"0 performing_art 149\n",
"1 literary 1\n",
"2 sound_recording 1249\n",
"3 art 0\n",
"4 total_include_duplicates 1399\n",
"5 total_without_duplicates 1277"
]
},
"execution_count": 251,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from tabulate import tabulate\n",
"\n",
"counter_203_counts = pd.DataFrame({\"category\": [\"performing_art\",\"literary\",\"sound_recording\",\"art\",\"total_include_duplicates\",\"total_without_duplicates\"],\n",
" \"number_of_titles\": [counters_203[counters_203.is_performing_art==True].title.count(),\n",
" counters_203[counters_203.is_literary==True].title.count(),\n",
" counters_203[counters_203.is_sound_recording==True].title.count(),\n",
" counters_203[counters_203.is_art==True].title.count(),\n",
" (counters_203[counters_203.is_performing_art==True].title.count()+\n",
" counters_203[counters_203.is_literary==True].title.count()+\n",
" counters_203[counters_203.is_sound_recording==True].title.count()+\n",
" counters_203[counters_203.is_art==True].title.count()),\n",
" (counters_203[(counters_203.is_performing_art==True)\n",
" |(counters_203.is_literary==True)\n",
" |(counters_203.is_sound_recording==True)\n",
" |(counters_203.is_art==True)].title.count())]})\n",
"\n",
"counter_203_counts\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 252,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" document_number | \n",
" date_of_recordation | \n",
" year_of_recordation | \n",
" registration_number_not_verified | \n",
" party_1 | \n",
" party_2 | \n",
" title | \n",
" notes | \n",
" titles | \n",
" tx_reg_203 | \n",
" ... | \n",
" music_reg_304 | \n",
" dramatic_reg_304 | \n",
" sound_recording_reg_304 | \n",
" literary_reg_304 | \n",
" art_reg_304 | \n",
" descriptors | \n",
" is_performing_art | \n",
" is_literary | \n",
" is_sound_recording | \n",
" is_art | \n",
"
\n",
" \n",
" \n",
" \n",
" 474 | \n",
" V9944D062 | \n",
" 2017-10-10 | \n",
" 2017 | \n",
" | \n",
" Adams and Reese, LLP | \n",
" None | \n",
" Cathy’s clown & 2 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Don Everly filed under V9925 D363 P 1-3, recorded 6Sep16. | \n",
" 001 Cathy's clown / Reg. EP139961. | \n",
" NaN | \n",
" ... | \n",
" [['EP139961']] | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 475 | \n",
" V9944D062 | \n",
" 2017-10-10 | \n",
" 2017 | \n",
" | \n",
" Adams and Reese, LLP | \n",
" None | \n",
" Cathy’s clown & 2 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Don Everly filed under V9925 D363 P 1-3, recorded 6Sep16. | \n",
" 002 Sigh, cry, almost die / Reg. EP144148. | \n",
" NaN | \n",
" ... | \n",
" [['EP144148']] | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 476 | \n",
" V9944D062 | \n",
" 2017-10-10 | \n",
" 2017 | \n",
" | \n",
" Adams and Reese, LLP | \n",
" None | \n",
" Cathy’s clown & 2 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Don Everly filed under V9925 D363 P 1-3, recorded 6Sep16. | \n",
" 003 That's just too much / Reg. EP147820. | \n",
" NaN | \n",
" ... | \n",
" [['EP147820']] | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1027 | \n",
" V9939D513 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Chrysalis Records, Inc., predecessor to EMI Music | \n",
" None | \n",
" Huey Lewis and the news & 74 other titles; musical compositions. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 filed under V3612 D265, V3612 D267, V3612 D266 & V3612 D264 on behalf of Huey Lewis, John Colla, Bill Gibson & Sean Hopper. | \n",
" 015 Picture this / Reg. PA128089. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" [['composition']] | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1061 | \n",
" V9939D513 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Chrysalis Records, Inc., predecessor to EMI Music | \n",
" None | \n",
" Huey Lewis and the news & 74 other titles; musical compositions. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 filed under V3612 D265, V3612 D267, V3612 D266 & V3612 D264 on behalf of Huey Lewis, John Colla, Bill Gibson & Sean Hopper. | \n",
" 049 Sports / Reg. PA267039. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" [['composition']] | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1091 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 001 Small world & 10. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1092 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 002 Small world (part one) | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1093 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 003 Old Antone's. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1094 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 004 Perfect world. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1095 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 005 Bobo tempo. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1096 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 006 Small world. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1097 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 007 Walking with kid. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1098 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 008 World to me. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1099 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 009 Better be true. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1100 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 010 Give me the keys (and I'll drive you crazy) | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1101 | \n",
" V9940D902 | \n",
" 2016-10-25 | \n",
" 2016 | \n",
" PA375972 | \n",
" Capitol Records, LLC | \n",
" None | \n",
" Small world & 10 other titles / Reg. PA375972. | \n",
" Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. | \n",
" 011 Slammin. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1343 | \n",
" V9935D877 | \n",
" 2016-05-13 | \n",
" 2016 | \n",
" PA74417 | \n",
" UMG Recordings, Inc., successor in interest to Casablanca Record, & Filmworks, Inc. | \n",
" None | \n",
" Funkytown / Reg. PA74417; Reg. PAu151074. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Steven Greenberg filed under V3547 D065, recorded on 5Dec06. | \n",
" Funkytown / Reg. PA74417; Reg. PAu151074. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1585 | \n",
" V9940D572 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. | \n",
" None | \n",
" Movin’ on & 7 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. | \n",
" 001 Movin' on / Reg. PAu141091; Reg. PA66672. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1586 | \n",
" V9940D572 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. | \n",
" None | \n",
" Movin’ on & 7 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. | \n",
" 002 Lay it on the line / Reg. PAu141092; Reg. PA294865. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1587 | \n",
" V9940D572 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. | \n",
" None | \n",
" Movin’ on & 7 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. | \n",
" 003 Young enough to cry / Reg. PAu141090; Reg. PA66673. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1588 | \n",
" V9940D572 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. | \n",
" None | \n",
" Movin’ on & 7 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. | \n",
" 004 American girls / Reg. PAu141089; Reg. PA66674. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1589 | \n",
" V9940D572 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. | \n",
" None | \n",
" Movin’ on & 7 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. | \n",
" 005 Just a game / Reg. PAu141085. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1590 | \n",
" V9940D572 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. | \n",
" None | \n",
" Movin’ on & 7 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. | \n",
" 006 Fantasy serenade / Reg. PAu141089; Reg. PAu66677. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1591 | \n",
" V9940D572 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. | \n",
" None | \n",
" Movin’ on & 7 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. | \n",
" 007 Hold on / Reg. PAu141086; Reg. PA294858. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1592 | \n",
" V9940D572 | \n",
" 2016-10-08 | \n",
" 2016 | \n",
" | \n",
" Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. | \n",
" None | \n",
" Movin’ on & 7 other titles. | \n",
" Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. | \n",
" 008 Suitcase blues / Reg. PAu141088. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1595 | \n",
" V9958D042 | \n",
" 2018-03-30 | \n",
" 2018 | \n",
" PAu000696534 | \n",
" Columbia Pictures Industries, Inc., Thomas Lee Holland, & T.H. Productions, Inc. | \n",
" None | \n",
" Night shadows & 1 other title; screenplay / Reg. PAu696534. | \n",
" Counter notice to Notice of Termination under 17 USC Section 203; on behalf of Columbia Pictures Industries, Inc., filed under V3582 D370. | \n",
" 001 Night shadows. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1596 | \n",
" V9958D042 | \n",
" 2018-03-30 | \n",
" 2018 | \n",
" PAu000696534 | \n",
" Columbia Pictures Industries, Inc., Thomas Lee Holland, & T.H. Productions, Inc. | \n",
" None | \n",
" Night shadows & 1 other title; screenplay / Reg. PAu696534. | \n",
" Counter notice to Notice of Termination under 17 USC Section 203; on behalf of Columbia Pictures Industries, Inc., filed under V3582 D370. | \n",
" 002 Fright night. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" True | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
27 rows × 23 columns
\n",
"
"
],
"text/plain": [
" document_number date_of_recordation year_of_recordation \\\n",
"474 V9944D062 2017-10-10 2017 \n",
"475 V9944D062 2017-10-10 2017 \n",
"476 V9944D062 2017-10-10 2017 \n",
"1027 V9939D513 2016-10-08 2016 \n",
"1061 V9939D513 2016-10-08 2016 \n",
"1091 V9940D902 2016-10-25 2016 \n",
"1092 V9940D902 2016-10-25 2016 \n",
"1093 V9940D902 2016-10-25 2016 \n",
"1094 V9940D902 2016-10-25 2016 \n",
"1095 V9940D902 2016-10-25 2016 \n",
"1096 V9940D902 2016-10-25 2016 \n",
"1097 V9940D902 2016-10-25 2016 \n",
"1098 V9940D902 2016-10-25 2016 \n",
"1099 V9940D902 2016-10-25 2016 \n",
"1100 V9940D902 2016-10-25 2016 \n",
"1101 V9940D902 2016-10-25 2016 \n",
"1343 V9935D877 2016-05-13 2016 \n",
"1585 V9940D572 2016-10-08 2016 \n",
"1586 V9940D572 2016-10-08 2016 \n",
"1587 V9940D572 2016-10-08 2016 \n",
"1588 V9940D572 2016-10-08 2016 \n",
"1589 V9940D572 2016-10-08 2016 \n",
"1590 V9940D572 2016-10-08 2016 \n",
"1591 V9940D572 2016-10-08 2016 \n",
"1592 V9940D572 2016-10-08 2016 \n",
"1595 V9958D042 2018-03-30 2018 \n",
"1596 V9958D042 2018-03-30 2018 \n",
"\n",
" registration_number_not_verified \\\n",
"474 \n",
"475 \n",
"476 \n",
"1027 \n",
"1061 \n",
"1091 PA375972 \n",
"1092 PA375972 \n",
"1093 PA375972 \n",
"1094 PA375972 \n",
"1095 PA375972 \n",
"1096 PA375972 \n",
"1097 PA375972 \n",
"1098 PA375972 \n",
"1099 PA375972 \n",
"1100 PA375972 \n",
"1101 PA375972 \n",
"1343 PA74417 \n",
"1585 \n",
"1586 \n",
"1587 \n",
"1588 \n",
"1589 \n",
"1590 \n",
"1591 \n",
"1592 \n",
"1595 PAu000696534 \n",
"1596 PAu000696534 \n",
"\n",
" party_1 \\\n",
"474 Adams and Reese, LLP \n",
"475 Adams and Reese, LLP \n",
"476 Adams and Reese, LLP \n",
"1027 Chrysalis Records, Inc., predecessor to EMI Music \n",
"1061 Chrysalis Records, Inc., predecessor to EMI Music \n",
"1091 Capitol Records, LLC \n",
"1092 Capitol Records, LLC \n",
"1093 Capitol Records, LLC \n",
"1094 Capitol Records, LLC \n",
"1095 Capitol Records, LLC \n",
"1096 Capitol Records, LLC \n",
"1097 Capitol Records, LLC \n",
"1098 Capitol Records, LLC \n",
"1099 Capitol Records, LLC \n",
"1100 Capitol Records, LLC \n",
"1101 Capitol Records, LLC \n",
"1343 UMG Recordings, Inc., successor in interest to Casablanca Record, & Filmworks, Inc. \n",
"1585 Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. \n",
"1586 Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. \n",
"1587 Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. \n",
"1588 Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. \n",
"1589 Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. \n",
"1590 Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. \n",
"1591 Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. \n",
"1592 Universal Music-MGB Songs, sucessor in interest to Attic Records, Ltd. \n",
"1595 Columbia Pictures Industries, Inc., Thomas Lee Holland, & T.H. Productions, Inc. \n",
"1596 Columbia Pictures Industries, Inc., Thomas Lee Holland, & T.H. Productions, Inc. \n",
"\n",
" party_2 \\\n",
"474 None \n",
"475 None \n",
"476 None \n",
"1027 None \n",
"1061 None \n",
"1091 None \n",
"1092 None \n",
"1093 None \n",
"1094 None \n",
"1095 None \n",
"1096 None \n",
"1097 None \n",
"1098 None \n",
"1099 None \n",
"1100 None \n",
"1101 None \n",
"1343 None \n",
"1585 None \n",
"1586 None \n",
"1587 None \n",
"1588 None \n",
"1589 None \n",
"1590 None \n",
"1591 None \n",
"1592 None \n",
"1595 None \n",
"1596 None \n",
"\n",
" title \\\n",
"474 Cathy’s clown & 2 other titles. \n",
"475 Cathy’s clown & 2 other titles. \n",
"476 Cathy’s clown & 2 other titles. \n",
"1027 Huey Lewis and the news & 74 other titles; musical compositions. \n",
"1061 Huey Lewis and the news & 74 other titles; musical compositions. \n",
"1091 Small world & 10 other titles / Reg. PA375972. \n",
"1092 Small world & 10 other titles / Reg. PA375972. \n",
"1093 Small world & 10 other titles / Reg. PA375972. \n",
"1094 Small world & 10 other titles / Reg. PA375972. \n",
"1095 Small world & 10 other titles / Reg. PA375972. \n",
"1096 Small world & 10 other titles / Reg. PA375972. \n",
"1097 Small world & 10 other titles / Reg. PA375972. \n",
"1098 Small world & 10 other titles / Reg. PA375972. \n",
"1099 Small world & 10 other titles / Reg. PA375972. \n",
"1100 Small world & 10 other titles / Reg. PA375972. \n",
"1101 Small world & 10 other titles / Reg. PA375972. \n",
"1343 Funkytown / Reg. PA74417; Reg. PAu151074. \n",
"1585 Movin’ on & 7 other titles. \n",
"1586 Movin’ on & 7 other titles. \n",
"1587 Movin’ on & 7 other titles. \n",
"1588 Movin’ on & 7 other titles. \n",
"1589 Movin’ on & 7 other titles. \n",
"1590 Movin’ on & 7 other titles. \n",
"1591 Movin’ on & 7 other titles. \n",
"1592 Movin’ on & 7 other titles. \n",
"1595 Night shadows & 1 other title; screenplay / Reg. PAu696534. \n",
"1596 Night shadows & 1 other title; screenplay / Reg. PAu696534. \n",
"\n",
" notes \\\n",
"474 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Don Everly filed under V9925 D363 P 1-3, recorded 6Sep16. \n",
"475 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Don Everly filed under V9925 D363 P 1-3, recorded 6Sep16. \n",
"476 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Don Everly filed under V9925 D363 P 1-3, recorded 6Sep16. \n",
"1027 Counter notice to Notice of Termination under 17 U.S.C. 203 filed under V3612 D265, V3612 D267, V3612 D266 & V3612 D264 on behalf of Huey Lewis, John Colla, Bill Gibson & Sean Hopper. \n",
"1061 Counter notice to Notice of Termination under 17 U.S.C. 203 filed under V3612 D265, V3612 D267, V3612 D266 & V3612 D264 on behalf of Huey Lewis, John Colla, Bill Gibson & Sean Hopper. \n",
"1091 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1092 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1093 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1094 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1095 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1096 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1097 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1098 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1099 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1100 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1101 Counter notice to Notice of Termination under 17 U.S.C. 203 on behalf of Huey Lewis, John Colla, Bill Gibson and Sean Hopper filed under V3624 D215, recorded on 28Jan14. \n",
"1343 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Steven Greenberg filed under V3547 D065, recorded on 5Dec06. \n",
"1585 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. \n",
"1586 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. \n",
"1587 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. \n",
"1588 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. \n",
"1589 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. \n",
"1590 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. \n",
"1591 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. \n",
"1592 Counter notice to Notice of termination under 17 USC Section 203; on behalf of Richard Emmitt, Gil Moore, and Michael Levine filed under V3620 D166, recorded on 21Oct13. \n",
"1595 Counter notice to Notice of Termination under 17 USC Section 203; on behalf of Columbia Pictures Industries, Inc., filed under V3582 D370. \n",
"1596 Counter notice to Notice of Termination under 17 USC Section 203; on behalf of Columbia Pictures Industries, Inc., filed under V3582 D370. \n",
"\n",
" titles tx_reg_203 \\\n",
"474 001 Cathy's clown / Reg. EP139961. NaN \n",
"475 002 Sigh, cry, almost die / Reg. EP144148. NaN \n",
"476 003 That's just too much / Reg. EP147820. NaN \n",
"1027 015 Picture this / Reg. PA128089. NaN \n",
"1061 049 Sports / Reg. PA267039. NaN \n",
"1091 001 Small world & 10. NaN \n",
"1092 002 Small world (part one) NaN \n",
"1093 003 Old Antone's. NaN \n",
"1094 004 Perfect world. NaN \n",
"1095 005 Bobo tempo. NaN \n",
"1096 006 Small world. NaN \n",
"1097 007 Walking with kid. NaN \n",
"1098 008 World to me. NaN \n",
"1099 009 Better be true. NaN \n",
"1100 010 Give me the keys (and I'll drive you crazy) NaN \n",
"1101 011 Slammin. NaN \n",
"1343 Funkytown / Reg. PA74417; Reg. PAu151074. NaN \n",
"1585 001 Movin' on / Reg. PAu141091; Reg. PA66672. NaN \n",
"1586 002 Lay it on the line / Reg. PAu141092; Reg. PA294865. NaN \n",
"1587 003 Young enough to cry / Reg. PAu141090; Reg. PA66673. NaN \n",
"1588 004 American girls / Reg. PAu141089; Reg. PA66674. NaN \n",
"1589 005 Just a game / Reg. PAu141085. NaN \n",
"1590 006 Fantasy serenade / Reg. PAu141089; Reg. PAu66677. NaN \n",
"1591 007 Hold on / Reg. PAu141086; Reg. PA294858. NaN \n",
"1592 008 Suitcase blues / Reg. PAu141088. NaN \n",
"1595 001 Night shadows. NaN \n",
"1596 002 Fright night. NaN \n",
"\n",
" ... music_reg_304 dramatic_reg_304 sound_recording_reg_304 \\\n",
"474 ... [['EP139961']] NaN NaN \n",
"475 ... [['EP144148']] NaN NaN \n",
"476 ... [['EP147820']] NaN NaN \n",
"1027 ... NaN NaN NaN \n",
"1061 ... NaN NaN NaN \n",
"1091 ... NaN NaN NaN \n",
"1092 ... NaN NaN NaN \n",
"1093 ... NaN NaN NaN \n",
"1094 ... NaN NaN NaN \n",
"1095 ... NaN NaN NaN \n",
"1096 ... NaN NaN NaN \n",
"1097 ... NaN NaN NaN \n",
"1098 ... NaN NaN NaN \n",
"1099 ... NaN NaN NaN \n",
"1100 ... NaN NaN NaN \n",
"1101 ... NaN NaN NaN \n",
"1343 ... NaN NaN NaN \n",
"1585 ... NaN NaN NaN \n",
"1586 ... NaN NaN NaN \n",
"1587 ... NaN NaN NaN \n",
"1588 ... NaN NaN NaN \n",
"1589 ... NaN NaN NaN \n",
"1590 ... NaN NaN NaN \n",
"1591 ... NaN NaN NaN \n",
"1592 ... NaN NaN NaN \n",
"1595 ... NaN NaN NaN \n",
"1596 ... NaN NaN NaN \n",
"\n",
" literary_reg_304 art_reg_304 descriptors is_performing_art \\\n",
"474 NaN NaN NaN True \n",
"475 NaN NaN NaN True \n",
"476 NaN NaN NaN True \n",
"1027 NaN NaN [['composition']] True \n",
"1061 NaN NaN [['composition']] True \n",
"1091 NaN NaN NaN True \n",
"1092 NaN NaN NaN True \n",
"1093 NaN NaN NaN True \n",
"1094 NaN NaN NaN True \n",
"1095 NaN NaN NaN True \n",
"1096 NaN NaN NaN True \n",
"1097 NaN NaN NaN True \n",
"1098 NaN NaN NaN True \n",
"1099 NaN NaN NaN True \n",
"1100 NaN NaN NaN True \n",
"1101 NaN NaN NaN True \n",
"1343 NaN NaN NaN True \n",
"1585 NaN NaN NaN True \n",
"1586 NaN NaN NaN True \n",
"1587 NaN NaN NaN True \n",
"1588 NaN NaN NaN True \n",
"1589 NaN NaN NaN True \n",
"1590 NaN NaN NaN True \n",
"1591 NaN NaN NaN True \n",
"1592 NaN NaN NaN True \n",
"1595 NaN NaN NaN True \n",
"1596 NaN NaN NaN True \n",
"\n",
" is_literary is_sound_recording is_art \n",
"474 False False False \n",
"475 False False False \n",
"476 False False False \n",
"1027 False False False \n",
"1061 False False False \n",
"1091 False False False \n",
"1092 False False False \n",
"1093 False False False \n",
"1094 False False False \n",
"1095 False False False \n",
"1096 False False False \n",
"1097 False False False \n",
"1098 False False False \n",
"1099 False False False \n",
"1100 False False False \n",
"1101 False False False \n",
"1343 False False False \n",
"1585 False False False \n",
"1586 False False False \n",
"1587 False False False \n",
"1588 False False False \n",
"1589 False False False \n",
"1590 False False False \n",
"1591 False False False \n",
"1592 False False False \n",
"1595 False False False \n",
"1596 False False False \n",
"\n",
"[27 rows x 23 columns]"
]
},
"execution_count": 252,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_203_pa[counters_203_pa.is_sound_recording==False].loc[:,columns]"
]
},
{
"cell_type": "code",
"execution_count": 253,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" document_number | \n",
" date_of_recordation | \n",
" year_of_recordation | \n",
" registration_number_not_verified | \n",
" party_1 | \n",
" party_2 | \n",
" title | \n",
" notes | \n",
" titles | \n",
" tx_reg_203 | \n",
" ... | \n",
" music_reg_304 | \n",
" dramatic_reg_304 | \n",
" sound_recording_reg_304 | \n",
" literary_reg_304 | \n",
" art_reg_304 | \n",
" descriptors | \n",
" is_performing_art | \n",
" is_literary | \n",
" is_sound_recording | \n",
" is_art | \n",
"
\n",
" \n",
" \n",
" \n",
" 1088 | \n",
" V9914D199 | \n",
" 2016-02-03 | \n",
" 2016 | \n",
" A917154 | \n",
" Columbia Pictures Industries, Inc. | \n",
" None | \n",
" Kramer versus Kramer; book / Avery Corman; Reg. A917154. | \n",
" Statement of Columbia Pictures Industries, Inc., contesting notice of termination: \"Kramer versus Kramer\" pursuant to 17 U.S.C. 203 (a) | \n",
" Kramer versus Kramer; book / Avery Corman; Reg. A917154. | \n",
" NaN | \n",
" ... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" [['A917154'], ['A917154'], ['A917154']] | \n",
" NaN | \n",
" NaN | \n",
" False | \n",
" True | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
1 rows × 23 columns
\n",
"
"
],
"text/plain": [
" document_number date_of_recordation year_of_recordation \\\n",
"1088 V9914D199 2016-02-03 2016 \n",
"\n",
" registration_number_not_verified party_1 \\\n",
"1088 A917154 Columbia Pictures Industries, Inc. \n",
"\n",
" party_2 title \\\n",
"1088 None Kramer versus Kramer; book / Avery Corman; Reg. A917154. \n",
"\n",
" notes \\\n",
"1088 Statement of Columbia Pictures Industries, Inc., contesting notice of termination: \"Kramer versus Kramer\" pursuant to 17 U.S.C. 203 (a) \n",
"\n",
" titles tx_reg_203 \\\n",
"1088 Kramer versus Kramer; book / Avery Corman; Reg. A917154. NaN \n",
"\n",
" ... music_reg_304 dramatic_reg_304 sound_recording_reg_304 \\\n",
"1088 ... NaN NaN NaN \n",
"\n",
" literary_reg_304 art_reg_304 descriptors \\\n",
"1088 [['A917154'], ['A917154'], ['A917154']] NaN NaN \n",
"\n",
" is_performing_art is_literary is_sound_recording is_art \n",
"1088 False True False False \n",
"\n",
"[1 rows x 23 columns]"
]
},
"execution_count": 253,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_203[counters_203.is_literary==True].loc[:,columns]"
]
},
{
"cell_type": "code",
"execution_count": 254,
"metadata": {},
"outputs": [],
"source": [
"counters_203_notcategorised = counters_203[((counters_203.is_performing_art==False)\n",
" &(counters_203.is_literary==False)\n",
" &(counters_203.is_sound_recording==False)\n",
" &(counters_203.is_art==False))].copy()"
]
},
{
"cell_type": "code",
"execution_count": 340,
"metadata": {},
"outputs": [],
"source": [
"counters_203_notcategorised.party_1.value_counts().to_excel(\"counters_203_notcategorised_party1.xlsx\")"
]
},
{
"cell_type": "code",
"execution_count": 255,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(443, 109)"
]
},
"execution_count": 255,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_203_notcategorised.shape"
]
},
{
"cell_type": "code",
"execution_count": 256,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Capitol, LLC f.k.a. Capitol Records, Inc., successor in interest to Liberty/United Records, Liberty Records, Inc., a division of Capitol Records, Inc. 58\n",
"Capitol Records, LLC, f.k.a. Capitol Records, Inc. 44\n",
"A & M Records, Ltd. & UMG Recordings, Inc. the successor to A & M Records, Inc. 35\n",
"Capitol Records, LLC, successor in interest to Capitol Records, Inc. 31\n",
"UMG Recordings, Inc., successor to MCA Records, Inc. 28\n",
"UMG Recordings, Inc., successor in interest to AVI Entertainment Group, Inc. & Thomas Associates, Inc. 25\n",
"UMG Recordings, Inc., successor to Casablanca Record, & FilmWorks, Inc. 22\n",
"UMG Recordings, Inc., the successors in interest to MCA Records, Inc. 21\n",
"Polydor, Ltd. 21\n",
"UMG Recordings, Inc., the successor to A & M Records, Inc. 11\n",
"Polydor, Ltd., & Virgin Records, Ltd., successors to E.G. Records, Ltd. 10\n",
"UMG Recordings, Inc., successor to Capricorn Records, Inc. 9\n",
"UMG Recordings, Inc., successor in title to De-Lite Recorded Sound Corporation 9\n",
"Verve Music Group, a division of UMG Recordings, Inc., as successor-in-interest to GRP Records, Inc. 8\n",
"UMG Recordings, Inc. f.k.a. PolyGram Records, Inc., the successor-in-interest to Phonogram, Inc. 7\n",
"Verve Music Group, a division of UMG Recordings, Inc., as successor-in-interest to GRP Records, Inc. . 7\n",
"UMG Recordings, Inc., successor in interest to A & M Records, Inc. 7\n",
"Capitol, LLC, fka Capitol Records, Inc., successor-in-interest to International Record Syndicate, Inc., & EMI America Records, a division of Capitol Records, Inc. 6\n",
"Virgin Records, Ltd. 6\n",
"UMG Recordings, Inc., successor to David Geffen Company. 5\n",
"Capitol Records, LLC, successor in interest to Liberty/United Records, Inc. 4\n",
"UMG Recordings, Inc., as successor in interest to Casablanca Records & Filmworks, Inc. 4\n",
"UMG Recordings, Inc., successor-in-interest to Motown Record Corporation. 4\n",
"UMG Recordings, Inc., successor-in-interest to Motown Record Corp. 4\n",
"A & M Records, Ltd. & UMG Recordings, Inc., successor-in-interest to A & M Records, Inc. 4\n",
"UMG, Recordings, Inc., successor-in-interest to A & M Records, Inc. 3\n",
"UMG Recordings, Inc. , successor-in-interest to A & M Records, Inc. 3\n",
"Capitol Records, LLC, successor-in-interest to Chrysalis Records, Inc., & UMG Recordings, Inc., successor-in-interest to Casablanca Record, & Filmworks, Inc. 3\n",
"Universal-Island Records, Ltd. f.k.a. Island Records, Ltd. 3\n",
"Capitol Records, LLC, successor to Chrysalis Records, Inc. 3\n",
"Capitol Records, LLC, f.k.a Capitol Records, Inc., successor to Chrysalis Records, Inc. 3\n",
"UMG Recordings, Inc., successor to Island Records, Inc., & David Geffen Company. 3\n",
"UMG Recordings, Inc., successor-in-interest to MCA Records, Inc. 3\n",
"UMG Recordings, Inc., the successor in interest to MCA Records, Inc. 2\n",
"Capitol Records, LLC, successor to Capital Records, Inc. 2\n",
"Capitol Records, LLC, successor-in-interest to International Record Syndicate, Inc. 2\n",
"UMG Recordings, Inc., successor to Island Records, Inc. . 2\n",
"UMG Recordings, Inc., successor in interest to David Geffen Company. 2\n",
"UMG Recordings, Inc., successor-in-interest to Casablanca Records, Inc. & Casablanca Record & FilmWorks, Inc. 2\n",
"UMG Recordings, Inc., fka Polygram Records, Inc., Universal Music, LLC, successor to Polydor, K.K. 2\n",
"Capitol Records, LLC, formerly known as Capitol Records, Inc. 2\n",
"Capitol Records, LLC, successor in interest to International Record Syndicate, Inc. 2\n",
"Capitol Records, LLC, successor-in-interest to Enigma Entertainment Corporation. 2\n",
"UMG Recordings, Inc., successor-in-interest to Motown Record Company LP. 1\n",
"UMG Recordings, Inc., successor to ABC Records, Inc. 1\n",
"A&M Records, Ltd., UMG Recordings, Inc., successor to A&M Records, Inc. 1\n",
"Capital, LLC, f.k.a. Capitol Records, Inc., successor-in-interest to EMI America Records, a division of Capitol Records, Inc. 1\n",
"UMG Recordings, Inc., successor to United Artists Music and Record Group, Inc. 1\n",
"UMG Recordings, Inc., successor-in-interest to Island Records, Inc., &. 1\n",
"UMG Recordings, Inc., successor to A&M Records, Inc. 1\n",
"UMG Recordings, Inc., successor-in-interest to A&M Records of Canada, Ltd. & A & M Records, Inc. 1\n",
"Capitol Records, LLC f.k.a. Capitol Records, Inc. 1\n",
"Name: party_1, dtype: int64"
]
},
"execution_count": 256,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_203_notcategorised.party_1.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 257,
"metadata": {},
"outputs": [],
"source": [
"counters_203_final = counters_203.loc[:,columns]"
]
},
{
"cell_type": "code",
"execution_count": 258,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['document_number',\n",
" 'date_of_recordation',\n",
" 'year_of_recordation',\n",
" 'registration_number_not_verified',\n",
" 'party_1',\n",
" 'party_2',\n",
" 'title',\n",
" 'notes',\n",
" 'titles',\n",
" 'tx_reg_203',\n",
" 'sr_reg_203',\n",
" 'pa_reg_203',\n",
" 'va_reg_203',\n",
" 'music_reg_304',\n",
" 'dramatic_reg_304',\n",
" 'sound_recording_reg_304',\n",
" 'literary_reg_304',\n",
" 'art_reg_304',\n",
" 'descriptors',\n",
" 'is_performing_art',\n",
" 'is_literary',\n",
" 'is_sound_recording',\n",
" 'is_art',\n",
" 'notice_type']"
]
},
"execution_count": 258,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns=columns+['notice_type']\n",
"columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here and for the remaining three counter and withdrawal dataframes, we add a column specifying the kind of notice it is. We will later concatenate these dataframes to the main 203 and 304 dataframes, so we need to know whether the works relate to termination notices, counter-notices, or withdrawal notices. "
]
},
{
"cell_type": "code",
"execution_count": 259,
"metadata": {},
"outputs": [],
"source": [
"counters_203_final['notice_type']='counter_notice'"
]
},
{
"cell_type": "code",
"execution_count": 260,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1720, 24)"
]
},
"execution_count": 260,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_203_final=counters_203_final.loc[:,columns]\n",
"counters_203_final.shape"
]
},
{
"cell_type": "code",
"execution_count": 261,
"metadata": {},
"outputs": [],
"source": [
"counters_203_final.to_json('counters_203_final.json')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Build dataframe of 304 counter-notice titles** "
]
},
{
"cell_type": "code",
"execution_count": 262,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(65, 109)"
]
},
"execution_count": 262,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_304 = counters[(counters.notes.str.contains(\"304\",case=False,na=False,regex=True)\n",
" &counters.notes.str.contains(\"termination\",case=False,na=False,regex=True))]\n",
"\n",
"counters_304.shape"
]
},
{
"cell_type": "code",
"execution_count": 263,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"13"
]
},
"execution_count": 263,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_304.document_number.nunique()"
]
},
{
"cell_type": "code",
"execution_count": 264,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(65, 109)"
]
},
"execution_count": 264,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_304_pa = counters_304[counters_304.is_performing_art==True].copy()\n",
"counters_304_tx = counters_304[counters_304.is_literary==True].copy()\n",
"counters_304_sr = counters_304[counters_304.is_sound_recording==True].copy()\n",
"counters_304_va = counters_304[counters_304.is_art==True].copy()\n",
"counters_304_incl_duplicates = pd.concat([counters_304_pa,counters_304_tx,counters_304_sr,counters_304_va])\n",
"counters_304_incl_duplicates.shape\n"
]
},
{
"cell_type": "code",
"execution_count": 265,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" category | \n",
" number_of_titles | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" performing_art | \n",
" 63 | \n",
"
\n",
" \n",
" 1 | \n",
" literary | \n",
" 2 | \n",
"
\n",
" \n",
" 2 | \n",
" sound_recording | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" art | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" total_include_duplicates | \n",
" 65 | \n",
"
\n",
" \n",
" 5 | \n",
" total_without_duplicates | \n",
" 65 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" category number_of_titles\n",
"0 performing_art 63\n",
"1 literary 2\n",
"2 sound_recording 0\n",
"3 art 0\n",
"4 total_include_duplicates 65\n",
"5 total_without_duplicates 65"
]
},
"execution_count": 265,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from tabulate import tabulate\n",
"\n",
"counter_304_counts = pd.DataFrame({\"category\": [\"performing_art\",\"literary\",\"sound_recording\",\"art\",\"total_include_duplicates\",\"total_without_duplicates\"],\n",
" \"number_of_titles\": [counters_304[counters_304.is_performing_art==True].title.count(),\n",
" counters_304[counters_304.is_literary==True].title.count(),\n",
" counters_304[counters_304.is_sound_recording==True].title.count(),\n",
" counters_304[counters_304.is_art==True].title.count(),\n",
" (counters_304[counters_304.is_performing_art==True].title.count()+\n",
" counters_304[counters_304.is_literary==True].title.count()+\n",
" counters_304[counters_304.is_sound_recording==True].title.count()+\n",
" counters_304[counters_304.is_art==True].title.count()),\n",
" (counters_304[(counters_304.is_performing_art==True)\n",
" |(counters_304.is_literary==True)\n",
" |(counters_304.is_sound_recording==True)\n",
" |(counters_304.is_art==True)].title.count())]})\n",
"\n",
"counter_304_counts\n"
]
},
{
"cell_type": "code",
"execution_count": 266,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(63, 109)"
]
},
"execution_count": 266,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_304[counters_304.is_performing_art==True].shape"
]
},
{
"cell_type": "code",
"execution_count": 267,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2, 109)"
]
},
"execution_count": 267,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_304[counters_304.is_literary==True].shape"
]
},
{
"cell_type": "code",
"execution_count": 268,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" \"\"\"Entry point for launching an IPython kernel.\n"
]
}
],
"source": [
"counters_304['notice_type'] = 'counter_notice'"
]
},
{
"cell_type": "code",
"execution_count": 269,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"counter_notice 65\n",
"Name: notice_type, dtype: int64"
]
},
"execution_count": 269,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters_304_final = counters_304.loc[:,columns]\n",
"counters_304_final.notice_type.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 270,
"metadata": {},
"outputs": [],
"source": [
"counters_304_final.to_json('counters_304_final.json')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Withdrawals"
]
},
{
"cell_type": "code",
"execution_count": 271,
"metadata": {},
"outputs": [],
"source": [
"withdrawals = titles_withdrawals[(titles_withdrawals.notes.str.contains(\"withdrawal|withdraw|revocation\",na=False,case=False,regex=True))].copy()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Build dataframe of 304 withdrawal/revocation notice titles** "
]
},
{
"cell_type": "code",
"execution_count": 272,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1, 109)"
]
},
"execution_count": 272,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"withdrawals_304 = withdrawals[(withdrawals.notes.str.contains(\"304\",case=False,na=False,regex=True)\n",
" &withdrawals.notes.str.contains(\"termination\",case=False,na=False,regex=True))]\n",
"\n",
"withdrawals_304.shape"
]
},
{
"cell_type": "code",
"execution_count": 273,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" \"\"\"Entry point for launching an IPython kernel.\n"
]
}
],
"source": [
"withdrawals_304['notice_type'] = 'withdrawal_notice'"
]
},
{
"cell_type": "code",
"execution_count": 274,
"metadata": {},
"outputs": [],
"source": [
"withdrawals_304_final=withdrawals_304.loc[:,columns]"
]
},
{
"cell_type": "code",
"execution_count": 275,
"metadata": {},
"outputs": [],
"source": [
"withdrawals_304_final.to_json('withdrawals_304_final.json')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Build dataframe of 203 counter-notice titles** "
]
},
{
"cell_type": "code",
"execution_count": 276,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(376, 109)"
]
},
"execution_count": 276,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"withdrawals_203 = withdrawals[(withdrawals.notes.str.contains(\"203\",case=False,na=False,regex=True)\n",
" &withdrawals.notes.str.contains(\"termination\",case=False,na=False,regex=True))]\n",
"\n",
"withdrawals_203.shape"
]
},
{
"cell_type": "code",
"execution_count": 277,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"29"
]
},
"execution_count": 277,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"withdrawals_203.document_number.nunique()"
]
},
{
"cell_type": "code",
"execution_count": 278,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(345, 109)"
]
},
"execution_count": 278,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"withdrawals_203_pa = withdrawals_203[withdrawals_203.is_performing_art==True].copy()\n",
"withdrawals_203_tx = withdrawals_203[withdrawals_203.is_literary==True].copy()\n",
"withdrawals_203_sr = withdrawals_203[withdrawals_203.is_sound_recording==True].copy()\n",
"withdrawals_203_va = withdrawals_203[withdrawals_203.is_art==True].copy()\n",
"withdrawals_203_incl_duplicates = pd.concat([withdrawals_203_pa,withdrawals_203_tx,withdrawals_203_sr,withdrawals_203_va])\n",
"withdrawals_203_incl_duplicates.shape\n"
]
},
{
"cell_type": "code",
"execution_count": 279,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(345, 109)"
]
},
"execution_count": 279,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"withdrawals_203[((withdrawals_203.is_performing_art==True)\n",
" |(withdrawals_203.is_literary==True)\n",
" |(withdrawals_203.is_sound_recording==True)\n",
" |(withdrawals_203.is_art==True))].shape"
]
},
{
"cell_type": "code",
"execution_count": 280,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" category | \n",
" number_of_titles | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" performing_art | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" literary | \n",
" 11 | \n",
"
\n",
" \n",
" 2 | \n",
" sound_recording | \n",
" 334 | \n",
"
\n",
" \n",
" 3 | \n",
" art | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" total_include_duplicates | \n",
" 345 | \n",
"
\n",
" \n",
" 5 | \n",
" total_without_duplicates | \n",
" 345 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" category number_of_titles\n",
"0 performing_art 0\n",
"1 literary 11\n",
"2 sound_recording 334\n",
"3 art 0\n",
"4 total_include_duplicates 345\n",
"5 total_without_duplicates 345"
]
},
"execution_count": 280,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from tabulate import tabulate\n",
"\n",
"withdrawals_203_counts = pd.DataFrame({\"category\": [\"performing_art\",\"literary\",\"sound_recording\",\"art\",\"total_include_duplicates\",\"total_without_duplicates\"],\n",
" \"number_of_titles\": [withdrawals_203[withdrawals_203.is_performing_art==True].title.count(),\n",
" withdrawals_203[withdrawals_203.is_literary==True].title.count(),\n",
" withdrawals_203[withdrawals_203.is_sound_recording==True].title.count(),\n",
" withdrawals_203[withdrawals_203.is_art==True].title.count(),\n",
" (withdrawals_203[withdrawals_203.is_performing_art==True].title.count()+\n",
" withdrawals_203[withdrawals_203.is_literary==True].title.count()+\n",
" withdrawals_203[withdrawals_203.is_sound_recording==True].title.count()+\n",
" withdrawals_203[withdrawals_203.is_art==True].title.count()),\n",
" (withdrawals_203[(withdrawals_203.is_performing_art==True)\n",
" |(withdrawals_203.is_literary==True)\n",
" |(withdrawals_203.is_sound_recording==True)\n",
" |(withdrawals_203.is_art==True)].title.count())]})\n",
"\n",
"withdrawals_203_counts\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 281,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8882978723404256"
]
},
"execution_count": 281,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"334/376"
]
},
{
"cell_type": "code",
"execution_count": 282,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(31, 109)"
]
},
"execution_count": 282,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"withdrawals_203_notcategorised = withdrawals_203[((withdrawals_203.is_performing_art==False)\n",
" &(withdrawals_203.is_literary==False)\n",
" &(withdrawals_203.is_sound_recording==False)\n",
" &(withdrawals_203.is_art==False))].copy()\n",
"\n",
"withdrawals_203_notcategorised.shape"
]
},
{
"cell_type": "code",
"execution_count": 283,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"David Coverdale, Geffen Records, Mirage Records, Inc., Unidisc Music, Inc., United Artists Music and Records Group, Inc., Shackelford Attorneys and Counselors & Universal Music Group. 10\n",
"Robert E. Bell, Ronald Bell, Dennis Ronald Thomas, George Melvin Brown & August Smith Williams, heir of the Estate of Charles Claydes Smith. 9\n",
"Tom Petty 7\n",
"Anita Ward Richardson p.k.a. Anita Ward 2\n",
"Jill Croston p.k.a. Lacy J. Dalton & Capitol Records, LLC f.k.a. Capitol Records, Inc. 2\n",
"Pat Travers. 1\n",
"Name: party_1, dtype: int64"
]
},
"execution_count": 283,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"withdrawals_203_notcategorised.party_1.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 284,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" \"\"\"Entry point for launching an IPython kernel.\n"
]
}
],
"source": [
"withdrawals_203['notice_type']='withdrawal_notice'"
]
},
{
"cell_type": "code",
"execution_count": 285,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(376, 24)"
]
},
"execution_count": 285,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"withdrawals_203_final=withdrawals_203.loc[:,columns]\n",
"withdrawals_203_final.shape"
]
},
{
"cell_type": "code",
"execution_count": 286,
"metadata": {},
"outputs": [],
"source": [
"withdrawals_203_final.to_json('withdrawals_203_final.json')"
]
},
{
"cell_type": "code",
"execution_count": 287,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 287,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counters.title.count()+withdrawals.title.count()==titles_withdrawals.title.count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}