data_structure) implementing binary search tree and searching word algorithm ~!!

Hello~~ I'm lulu!

Today, I'm going to show how to implement binary search tree and searching word algorithm using a hundred thousand value which is from Shakespeare's works and lines.

Binary search tree is effective when we have to insert, delete, search a lot. I think it is really well balanced because the left side of the tree is smaller than current node, and right side is bigger than current node.

Let's see how we can implement this beautiful algorithm using python.

import random
import pandas as pd

class BinarySearchTree:
    class Node:
        def __init__(self, key, value, left, right):
            self.key = key
            self.value = [value]
            self.left = left
            self.right = right

    def __init__(self):
        self.root = None
        self.key_count = 0
        self.value_count = 0

    def insert(self, key, value):
        self.root = self.__insert(self.root, key, value)
        self.value_count += 1

    def __insert(self, node, key, value):
        if node:
            if node.key > key:
                node.left = self.__insert(node.left, key, value)
            elif node.key < key:
                node.right = self.__insert(node.right, key, value)
            elif node.key == key:
                node.value.append(value)
        else:
            node = self.Node(key, value, None, None)
            self.key_count += 1
        # Tree์˜ ๋…ธ๋“œ์— key, value๋ฅผ insertํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. / we are inserting
                                              key and value into Tree's node.
        # ์ƒˆ๋กœ์šด ํ‚ค๊ฐ€ ์ถ”๊ฐ€๋˜๋ฉด key_count๋„ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค. (๋™์ผํ•œ ํ‚ค๊ฐ€ ๋“ค์–ด์˜ค๋Š” ๊ฒฝ์šฐ๋Š”
                                               ์ฆ๊ฐ€ ์•ˆํ•จ) when new key is added,
        #key_count also increase. (when same key is added, key_count
                                                      does not increase.)
        return node

    def in_order(self):
        for x in self.__in_order(self.root):
            yield x

    def __in_order(self, node):
        if not node:
            return
        if node.left:
            for n in self.__in_order(node.left):
                yield n
        yield node
        if node.right:
            for n in self.__in_order(node.right):
                yield n

    def find(self, key):
        return self.__find(self.root, key)

    def __find(self, node, key):
        if node:
            if node.key == key:
                return node.value

            elif node.key > key:
                return self.__find(node.left, key)
       
            elif node.key < key:
                return self.__find(node.right, key)
               
        else:
            return []

        # Tree์˜ ๋…ธ๋“œ์— key, value๋ฅผ insertํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. we are inserting key
                                                   and value into a Tree's node.
        # ํ‚ค๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š์œผ๋ฉด [] (๋นˆ ๋ฆฌ์ŠคํŠธ)๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. when the key does not exist,
                                                     it returns [](empty list)
        # ํ‚ค๊ฐ€ ์กด์žฌํ•˜๋ฉด node.value๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. when the key does exist,
                                             it returns node.value

    def height(self):
        return self.__height(self.root)

    def __height(self, node):
        if node is None:
            return 0
           
        else:
            return max(self.__height(node.left), self.__height(node.right)) + 1
           
        # ๋…ธ๋“œ์˜ ๋†’์ด๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. This is a algorithm which returns
                                              node's height

def print_stat(tree):
    print(f"Height = {tree.height()}")
    print("{:,} keys / {:,} values".format(tree.key_count, tree.value_count))


if __name__ == "__main__":
   
    #-------------------------------------------------
    # # ๊ฐ„๋‹จํ•œ BST ํ…Œ์ŠคํŠธ a simple BST(Binary Search Tree) Test
    tree = BinarySearchTree()
    print_stat(tree)
    tree.insert(5, "F")
    print_stat(tree)
    tree.insert(3, "E")
    tree.insert(8, "D")
    tree.insert(6, "C")
    print_stat(tree)
    tree.insert(7, "B")
    tree.insert(1, "A")
    for x in tree.in_order():
        print(f"{x.key}:{x.value}", end="->")
    print()
    print("Key=6", tree.find(6))
    print("Key=4", tree.find(4))
    print("Key=1", tree.find(1))
    print("Key=5", tree.find(5))
    print("Key=9", tree.find(9))
    print_stat(tree)




    #-------------------------------------------------
    # # 10๋งŒ๊ฐœ ๋ฐ์ดํ„ฐ ๋„ฃ๋Š” ์‹คํ—˜ This is a experiment inserting a hundred
       thousand data

    t = BinarySearchTree()
    for i in range(100000):
        k = random.randint(0, 100000)
        t.insert(k, f"DATA {i}")
    print_stat(t)



    #-------------------------------------------------
    # ์„ธ์ต์Šคํ”ผ์–ด์˜ ํฌ๊ณก์—์„œ ๋‹จ์–ด๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ์‹คํ—˜ Searching a word in a
             Shakespeare's works and lines
    # ๋‹จ์–ด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ•ด๋‹น ๋‹จ์–ด๊ฐ€ ๋‚˜์˜ค๋Š” ํฌ๊ณก์˜ ๋ผ์ธ์„ ๊ฒ€์ƒ‰ํ•ด์คŒ. When we input a word,
       it searches the word and lines
    bst = BinarySearchTree()

     #csv ํŒŒ์ผ์„ ์ฝ์–ด๋“ค์ธ๋‹ค. read csv file
    df = pd.read_csv('./Shakespeare_data.csv', dtype=str)
    print("Indexing all data ...")
    for index, row in df.iterrows():
       
        #๋Œ€์‚ฌ๋ฅผ ๋‹จ์–ด๋‹จ์œ„๋กœ ์ž˜๋ผ keys ๋ฆฌ์ŠคํŠธ์— ์ €์žฅํ•˜๊ณ , ์ „์ฒด ํ–‰์€ ํ•ฉ์ณ value์— ์ €์žฅํ•œ๋‹ค.
           break lines into word
        # and store it in a keys list, and then bring all the lines into value
        keys = [x.upper() for x in row[5].split(" ")]
        value = " ".join([x for x in row if str(x) != 'nan'])
        for key in keys:
           
           #  ๊ฐ key, value๋ฅผ ํŠธ๋ฆฌ์— ์ €์žฅํ•œ๋‹ค. Store each key and value in a Tree
            bst.insert(key, value)
    print("Done")
    print_stat(bst)
    while True:
         #์‚ฌ์šฉ์ž์—๊ฒŒ keyword๋ฅผ ์ž…๋ ฅ๋ฐ›์•„์„œ ํŠธ๋ฆฌ์—์„œ ์ฐพ์•„์ค€๋‹ค. Let user
                input a keyword and then find in a Tree
        keyword = input(">> Enter keyword ('\\q' for quit) : ")
        if keyword == '\\q':
            break
        result = bst.find(keyword.upper())
        print("\n".join(result))
        print("--- Found {} lines".format(len(result)))
       


If you have any questions or any opinions, tell me you'll be welcomed :-)
Thank you :-)

Comments

  1. Titanium Wok Poker - Play Online at TITaniumArt
    TITanium Wok is a great titanium white octane blueprint poker table with revlon hair dryer brush titanium a nice titanium symbol layout, and is very similar medical grade titanium earrings to many others made in the titanium alloy same design.

    ReplyDelete

Post a Comment

Popular posts from this blog

C์–ธ์–ด) C์–ธ์–ด๋ฅผ ์ด์šฉํ•˜์—ฌ ํ‰๊ท  ๊ตฌํ•˜๊ธฐ~!!

C์–ธ์–ด) C์–ธ์–ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‚˜์˜ BMI ์ธก์ •ํ•˜๊ธฐ(์ €์ฒด์ค‘, ํ‘œ์ค€์ฒด์ค‘, ๊ณผ์ฒด์ค‘) ์—ฌ๋Ÿฌ๋ถ„๋„ ์ง์ ‘ ๋น„๋งŒ๋„๋ฅผ ์ธก์ •ํ•ด๋ณด์„ธ์š”~!!

๊ตญ๊ฐ€๊ทผ๋กœ์žฅํ•™์ƒ, ๊ตญ๊ฐ€์žฅํ•™๊ธˆ, ๊ต๋‚ด ๊ตํ•™ํŒ€ ํ–‰์ •์ธํ„ด์— ๋Œ€ํ•ด์„œ!!