data_structure) implementing binary search tree and searching word algorithm ~!!
Hello~~ I'm lulu!
Today, I'm going to show how to implement binary search tree and searching word algorithm using a hundred thousand value which is from Shakespeare's works and lines.
Binary search tree is effective when we have to insert, delete, search a lot. I think it is really well balanced because the left side of the tree is smaller than current node, and right side is bigger than current node.
Let's see how we can implement this beautiful algorithm using python.
Today, I'm going to show how to implement binary search tree and searching word algorithm using a hundred thousand value which is from Shakespeare's works and lines.
Binary search tree is effective when we have to insert, delete, search a lot. I think it is really well balanced because the left side of the tree is smaller than current node, and right side is bigger than current node.
Let's see how we can implement this beautiful algorithm using python.
import random
import pandas as pd
class BinarySearchTree:
class Node:
def __init__(self, key, value, left, right):
self.key = key
self.value = [value]
self.left = left
self.right = right
def __init__(self):
self.root = None
self.key_count = 0
self.value_count = 0
def insert(self, key, value):
self.root = self.__insert(self.root, key, value)
self.value_count += 1
def __insert(self, node, key, value):
if node:
if node.key > key:
node.left = self.__insert(node.left, key, value)
elif node.key < key:
node.right = self.__insert(node.right, key, value)
elif node.key == key:
node.value.append(value)
else:
node = self.Node(key, value, None, None)
self.key_count += 1
# Tree의 노드에 key, value를 insert하는 알고리즘입니다. / we are inserting
key and value into Tree's node.
# 새로운 키가 추가되면 key_count도 증가시킵니다. (동일한 키가 들어오는 경우는
증가 안함) when new key is added,
#key_count also increase. (when same key is added, key_count
does not increase.)
return node
def in_order(self):
for x in self.__in_order(self.root):
yield x
def __in_order(self, node):
if not node:
return
if node.left:
for n in self.__in_order(node.left):
yield n
yield node
if node.right:
for n in self.__in_order(node.right):
yield n
def find(self, key):
return self.__find(self.root, key)
def __find(self, node, key):
if node:
if node.key == key:
return node.value
elif node.key > key:
return self.__find(node.left, key)
elif node.key < key:
return self.__find(node.right, key)
else:
return []
# Tree의 노드에 key, value를 insert하는 알고리즘입니다. we are inserting key
and value into a Tree's node.
# 키가 존재하지 않으면 [] (빈 리스트)를 반환합니다. when the key does not exist,
it returns [](empty list)
# 키가 존재하면 node.value를 반환합니다. when the key does exist,
it returns node.value
def height(self):
return self.__height(self.root)
def __height(self, node):
if node is None:
return 0
else:
return max(self.__height(node.left), self.__height(node.right)) + 1
# 노드의 높이를 반환하는 알고리즘입니다. This is a algorithm which returns
node's height
def print_stat(tree):
print(f"Height = {tree.height()}")
print("{:,} keys / {:,} values".format(tree.key_count, tree.value_count))
if __name__ == "__main__":
#-------------------------------------------------
# # 간단한 BST 테스트 a simple BST(Binary Search Tree) Test
tree = BinarySearchTree()
print_stat(tree)
tree.insert(5, "F")
print_stat(tree)
tree.insert(3, "E")
tree.insert(8, "D")
tree.insert(6, "C")
print_stat(tree)
tree.insert(7, "B")
tree.insert(1, "A")
for x in tree.in_order():
print(f"{x.key}:{x.value}", end="->")
print()
print("Key=6", tree.find(6))
print("Key=4", tree.find(4))
print("Key=1", tree.find(1))
print("Key=5", tree.find(5))
print("Key=9", tree.find(9))
print_stat(tree)
#-------------------------------------------------
# # 10만개 데이터 넣는 실험 This is a experiment inserting a hundred
thousand data
t = BinarySearchTree()
for i in range(100000):
k = random.randint(0, 100000)
t.insert(k, f"DATA {i}")
print_stat(t)
#-------------------------------------------------
# 세익스피어의 희곡에서 단어를 검색하는 실험 Searching a word in a
Shakespeare's works and lines
# 단어를 입력하면 해당 단어가 나오는 희곡의 라인을 검색해줌. When we input a word,
it searches the word and lines
bst = BinarySearchTree()
#csv 파일을 읽어들인다. read csv file
df = pd.read_csv('./Shakespeare_data.csv', dtype=str)
print("Indexing all data ...")
for index, row in df.iterrows():
#대사를 단어단위로 잘라 keys 리스트에 저장하고, 전체 행은 합쳐 value에 저장한다.
break lines into word
# and store it in a keys list, and then bring all the lines into value
keys = [x.upper() for x in row[5].split(" ")]
value = " ".join([x for x in row if str(x) != 'nan'])
for key in keys:
# 각 key, value를 트리에 저장한다. Store each key and value in a Tree
bst.insert(key, value)
print("Done")
print_stat(bst)
while True:
#사용자에게 keyword를 입력받아서 트리에서 찾아준다. Let user
input a keyword and then find in a Tree
keyword = input(">> Enter keyword ('\\q' for quit) : ")
if keyword == '\\q':
break
result = bst.find(keyword.upper())
print("\n".join(result))
print("--- Found {} lines".format(len(result)))
If you have any questions or any opinions, tell me you'll be welcomed :-)
Thank you :-)
Titanium Wok Poker - Play Online at TITaniumArt
ReplyDeleteTITanium Wok is a great titanium white octane blueprint poker table with revlon hair dryer brush titanium a nice titanium symbol layout, and is very similar medical grade titanium earrings to many others made in the titanium alloy same design.