data_structure) implementing binary search tree and searching word algorithm ~!!

Hello~~ I'm lulu!

Today, I'm going to show how to implement binary search tree and searching word algorithm using a hundred thousand value which is from Shakespeare's works and lines.

Binary search tree is effective when we have to insert, delete, search a lot. I think it is really well balanced because the left side of the tree is smaller than current node, and right side is bigger than current node.

Let's see how we can implement this beautiful algorithm using python.

import random
import pandas as pd

class BinarySearchTree:
    class Node:
        def __init__(self, key, value, left, right):
            self.key = key
            self.value = [value]
            self.left = left
            self.right = right

    def __init__(self):
        self.root = None
        self.key_count = 0
        self.value_count = 0

    def insert(self, key, value):
        self.root = self.__insert(self.root, key, value)
        self.value_count += 1

    def __insert(self, node, key, value):
        if node:
            if node.key > key:
                node.left = self.__insert(node.left, key, value)
            elif node.key < key:
                node.right = self.__insert(node.right, key, value)
            elif node.key == key:
                node.value.append(value)
        else:
            node = self.Node(key, value, None, None)
            self.key_count += 1
        # Tree의 노드에 key, value를 insert하는 알고리즘입니다. / we are inserting
                                              key and value into Tree's node.
        # 새로운 키가 추가되면 key_count도 증가시킵니다. (동일한 키가 들어오는 경우는
                                               증가 안함) when new key is added,
        #key_count also increase. (when same key is added, key_count
                                                      does not increase.)
        return node

    def in_order(self):
        for x in self.__in_order(self.root):
            yield x

    def __in_order(self, node):
        if not node:
            return
        if node.left:
            for n in self.__in_order(node.left):
                yield n
        yield node
        if node.right:
            for n in self.__in_order(node.right):
                yield n

    def find(self, key):
        return self.__find(self.root, key)

    def __find(self, node, key):
        if node:
            if node.key == key:
                return node.value

            elif node.key > key:
                return self.__find(node.left, key)
       
            elif node.key < key:
                return self.__find(node.right, key)
               
        else:
            return []

        # Tree의 노드에 key, value를 insert하는 알고리즘입니다. we are inserting key
                                                   and value into a Tree's node.
        # 키가 존재하지 않으면 [] (빈 리스트)를 반환합니다. when the key does not exist,
                                                     it returns [](empty list)
        # 키가 존재하면 node.value를 반환합니다. when the key does exist,
                                             it returns node.value

    def height(self):
        return self.__height(self.root)

    def __height(self, node):
        if node is None:
            return 0
           
        else:
            return max(self.__height(node.left), self.__height(node.right)) + 1
           
        # 노드의 높이를 반환하는 알고리즘입니다. This is a algorithm which returns
                                              node's height

def print_stat(tree):
    print(f"Height = {tree.height()}")
    print("{:,} keys / {:,} values".format(tree.key_count, tree.value_count))


if __name__ == "__main__":
   
    #-------------------------------------------------
    # # 간단한 BST 테스트 a simple BST(Binary Search Tree) Test
    tree = BinarySearchTree()
    print_stat(tree)
    tree.insert(5, "F")
    print_stat(tree)
    tree.insert(3, "E")
    tree.insert(8, "D")
    tree.insert(6, "C")
    print_stat(tree)
    tree.insert(7, "B")
    tree.insert(1, "A")
    for x in tree.in_order():
        print(f"{x.key}:{x.value}", end="->")
    print()
    print("Key=6", tree.find(6))
    print("Key=4", tree.find(4))
    print("Key=1", tree.find(1))
    print("Key=5", tree.find(5))
    print("Key=9", tree.find(9))
    print_stat(tree)




    #-------------------------------------------------
    # # 10만개 데이터 넣는 실험 This is a experiment inserting a hundred
       thousand data

    t = BinarySearchTree()
    for i in range(100000):
        k = random.randint(0, 100000)
        t.insert(k, f"DATA {i}")
    print_stat(t)



    #-------------------------------------------------
    # 세익스피어의 희곡에서 단어를 검색하는 실험 Searching a word in a
             Shakespeare's works and lines
    # 단어를 입력하면 해당 단어가 나오는 희곡의 라인을 검색해줌. When we input a word,
       it searches the word and lines
    bst = BinarySearchTree()

     #csv 파일을 읽어들인다. read csv file
    df = pd.read_csv('./Shakespeare_data.csv', dtype=str)
    print("Indexing all data ...")
    for index, row in df.iterrows():
       
        #대사를 단어단위로 잘라 keys 리스트에 저장하고, 전체 행은 합쳐 value에 저장한다.
           break lines into word
        # and store it in a keys list, and then bring all the lines into value
        keys = [x.upper() for x in row[5].split(" ")]
        value = " ".join([x for x in row if str(x) != 'nan'])
        for key in keys:
           
           #  각 key, value를 트리에 저장한다. Store each key and value in a Tree
            bst.insert(key, value)
    print("Done")
    print_stat(bst)
    while True:
         #사용자에게 keyword를 입력받아서 트리에서 찾아준다. Let user
                input a keyword and then find in a Tree
        keyword = input(">> Enter keyword ('\\q' for quit) : ")
        if keyword == '\\q':
            break
        result = bst.find(keyword.upper())
        print("\n".join(result))
        print("--- Found {} lines".format(len(result)))
       


If you have any questions or any opinions, tell me you'll be welcomed :-)
Thank you :-)

Comments

  1. Titanium Wok Poker - Play Online at TITaniumArt
    TITanium Wok is a great titanium white octane blueprint poker table with revlon hair dryer brush titanium a nice titanium symbol layout, and is very similar medical grade titanium earrings to many others made in the titanium alloy same design.

    ReplyDelete

Post a Comment

Popular posts from this blog

C언어) C언어를 이용하여 평균 구하기~!!

파이썬) 파이썬으로 사칙연산 계산기 만들기~!!

C언어) C언어를 이용하여 나의 BMI 측정하기(저체중, 표준체중, 과체중) 여러분도 직접 비만도를 측정해보세요~!!