We can change customer attention in text document. If we use in an article more words, which are focused on group, a reader will focus attention on group behaviour (executive, leader, state, ...) . If we use in an article more words or phrases related to an investment, our readers will focus attention more on future investments (fund, banker, corporations, ...). Example is created in nltk python library.
# import os
import os
# Read wikipedia UBS description from 6.9.2021 (we can use any other article )
base_file = open("ubs.txt", 'rt')
raw_text = base_file.read()
raw_text
'UBS Group AG[nb 1] is a Swiss multinational investment bank and financial services company founded and based in Switzerland. Co-headquartered in the cities of Z\\\'fcrich and Basel, it maintains a presence in all major financial centres as the largest Swiss banking institution and the largest private bank in the world. UBS client services are known for their strict bank\\\'96client confidentiality and culture of banking secrecy.Because of the bank\'s large positions in the Americas, EMEA, and Asia Pacific markets, the Financial Stability Board considers it a global systemically important bank.\\\n\n\n\nUBS was founded in 1862 as the Bank in Winterthur alongside the advent of the Swiss banking industry. During the 1890s, the Swiss Bank Corporation (SBC) was founded, forming a private banking syndicate that expanded, aided by Switzerland\'s international neutrality. In 1912, t ......................
base_file.close()
print("Text read from file : ",raw_text[:200])
Text read from file : UBS Group AG[nb 1] is a Swiss multinational investment bank and financial services company founded and based in Switzerland. Co-headquartered in the cities of Z\'fcrich and Basel, it maintains a prese
pip install nltk
import nltk
pip install nltk.tokenize
#tokenization of tweets would be
from nltk.tokenize import TweetTokenizer
#tokenization of text file would be
from nltk.tokenize import word_tokenize
nltk.download('punkt')
token_list = nltk.word_tokenize(raw_text)
#Replace special characters
token_list2 = [word.replace("'", "") for word in token_list ]
#Remove punctuations
token_list3 = list(filter(lambda token: nltk.tokenize.punkt.PunktToken(token).is_non_punct, token_list2))
#Convert to lower case
token_list4=[word.lower() for word in token_list3 ]
print("\nSample token list : ", token_list4[:10])
Sample token list : ['ubs', 'group', 'ag', 'nb', '1', 'is', 'a', 'swiss', 'multinational', 'investment']
print("\nTotal Tokens : ",len(token_list4))
Total Tokens : 10953
# Create ngrams for text prediction
from nltk.util import ngrams
#Use a sqlite database to store ngrams information
import sqlite3
conn = sqlite3.connect(":memory:")
#table to store first word, second word and count of occurance
conn.execute('''DROP TABLE IF EXISTS NGRAMS''')
conn.execute('''CREATE TABLE NGRAMS
(FIRST TEXT NOT NULL,
SECOND TEXT NOT NULL,
COUNTS INT NOT NULL,
CONSTRAINT PK_GRAMS PRIMARY KEY (FIRST,SECOND));''')
#Generate bigrams
bigrams = ngrams(token_list4,2)
#Store bigrams in DB
for i in bigrams:
insert_str="INSERT INTO NGRAMS (FIRST,SECOND,COUNTS) \
VALUES ('" + i[0] + "','" + i[1] + "',1 ) \
ON CONFLICT(FIRST,SECOND) DO UPDATE SET COUNTS=COUNTS + 1"
conn.execute(insert_str);
#Look at sample data from the table
cursor = conn.execute("SELECT FIRST, SECOND, COUNTS from NGRAMS LIMIT 5")
for gram_row in cursor:
print("FIRST=", gram_row[0], "SECOND=",gram_row[1],"COUNT=",gram_row[2])
# Create prediction
#Function to query DB and find next word
def recommend(str):
nextwords = []
#Find next words, sort them by most occurance
cur_filter = conn.execute("SELECT SECOND from NGRAMS \
WHERE FIRST='" + str + "' \
ORDER BY COUNTS DESC")
#Build a list ordered from most frequent to least frequent next word
for filt_row in cur_filter:
nextwords.append(filt_row[0])
return nextwords
#Recommend for words group and investment
print("Next words for group are: ", recommend("group"))
print("\nNext words for investment are: ", recommend("investment"))
Next words for group are: ['ag', 'leader', 'received', 'also', 'and', 'ceo', 'companies', 'dillon', 'executive', 'ing', 'kengeter', 'of', 's', 'state', 'were']
Next words for investment are: ['bank', 'banking', 'banks', 'bankers', 'bank\\', 'banker', 'fund', 'grade', '265', 'advisory', 'capabilities', 'corporation', 'management', 'managers', 'officer', 'products', 'teams', 'trust', 'trusts']
Resources:
https://realpython.com/intro-to-python-threading/
https://www.tutorialspoint.com/python/python_multithreading.htm
https://www.tutorialspoint.com/python/python_strings.htm
https://towardsai.net/p/data-mining/text-mining-in-python-steps-and-examples-78b3f8fd913b
https://www.nltk.org
https://machinelearningmastery.com/clustering-algorithms-with-python/
https://scikit-learn.org/stable/
https://machinelearningmastery.com/clustering-algorithms-with-python/
https://link.springer.com/article/10.1007/s40595-016-0086-9
https://www.datacamp.com/community/tutorials/stemming-lemmatization-python
https://en.wikipedia.org/wiki/Document_clustering
http://people.scs.carleton.ca/~armyunis/projects/KAPI/porter.pdf
https://www.nltk.org/howto/corpus.html
https://towardsdatascience.com/basic-binary-sentiment-analysis-using-nltk-c94ba17ae386
https://realpython.com/python-nltk-sentiment-analysis/
https://en.wikipedia.org/wiki/Natural_Language_Toolkit
https://perl.developpez.com/documentations/en/5.18.0/index-language.html
https://www.nltk.org/book/ch02.html
https://www.nltk.org/data.html
https://widdowquinn.github.io/Teaching-SWC-Lessons/python/2017-05-18-standrews/extras/nltk_example.html#using
https://www.frontiersin.org/articles/10.3389/fninf.2014.00038/full
https://www.w3schools.com/python/python_dictionaries.asp
https://en.wikipedia.org/wiki/Tuple
https://www.researchgate.net/figure/Modes-and-arenas-of-political-communication-according-to-Habermas_fig4_251442610
https://www.slideshare.net/nadianaseem5/the-study-of-political-communication
https://thecodex.me/blog/sentiment-analysis-tool-for-stock-trading
https://finviz.com
https://en.wikipedia.org/wiki/Beautiful_Soup_(HTML_parser)
https://www.crummy.com/software/BeautifulSoup/
https://github.com/TheCodex-Me/Projects/blob/master/Predicting-Stock-Prices-Final/Predicting%20Stock%20Prices.ipynb
https://wiki.python.org/moin/WebFrameworks
https://papers.ssrn.com/sol3/results.cfm
https://www.investopedia.com/terms/s/social-science.asp
https://devopedia.org/text-clustering
https://towardsdatascience.com/getting-started-with-text-vectorization-2f2efbec6685
https://www.analyticsvidhya.com/blog/2018/02/natural-language-processing-for-beginners-using-textblob/
https://www.tensorflow.org/text/tutorials/text_classification_rnn
https://www.youtube.com/watch?v=BJ0MnawUpaU
https://www.datacamp.com/community/tutorials/discovering-hidden-topics-python
https://www.analyticsvidhya.com/blog/2019/09/demystifying-bert-groundbreaking-nlp-framework/?utm_source=blog&utm_medium=6-pretrained-models-text-classification
https://aclanthology.org/P05-1022.pdf
https://www.brown.edu
https://www.analyticsvidhya.com/blog/2020/03/6-pretrained-models-text-classification/
https://devopedia.org/text-clustering
https://wiki.python.org/moin/WebFrameworks
https://en.m.wikipedia.org/wiki/Speech_corpus
https://scikit-learn.org/stable/
https://pytorch.org
https://en.wikipedia.org/wiki/Political_economy
https://en.wikipedia.org/wiki/Political_communication
https://www.crummy.com/software/BeautifulSoup/
https://en.m.wikipedia.org/wiki/Buckeye_Corpus
http://www.mongodb.org/
https://www.frontiersin.org/articles/10.3389/fninf.2014.00038/full
https://www.linkedin.com/learning/building-recommender-systems-with-machine-learning-and-ai/fraud-the-perils-of-clickstream-and-international-concerns
https://towardsdatascience.com/multi-class-text-classification-with-scikit-learn-12f1e60e0a9f
https://towardsdatascience.com/model-selection-in-text-classification-ac13eedf6146
https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568
https://scikit-learn.org/stable/modules/clustering.html
http://www.json.org
http://www.hdfgroup.org/HDF5
https://datascience.stackexchange.com/questions/20076/word2vec-vs-sentence2vec-vs-doc2vec
https://www.linkedin.com/learning/building-recommender-systems-with-machine-learning-and-ai/restricted-boltzmann-machines-rbms?contextUrn=urn%3Ali%3AlearningCollection%3A6833632864169402369
https://www.linkedin.com/learning/building-deep-learning-applications-with-keras-2-0/training-and-evaluating-the-model?resume=false
https://github.com/coding-geographies/dockerized-pytest-course
https://www.linkedin.com/learning/deep-learning-foundations-natural-language-processing-with-tensorflow/building-a-text-classifier
https://www.linkedin.com/learning/deep-learning-face-recognition/what-is-face-detection?contextUrn=urn%3Ali%3AlyndaLearningPath%3A5c9ba390498e6b9e96936099
https://medium.com/analytics-vidhya/build-a-simple-predictive-keyboard-using-python-and-keras-b78d3c88cffb
https://implicit.readthedocs.io/en/latest/als.html
https://towardsdatascience.com/build-recommendation-system-with-pyspark-using-alternating-least-squares-als-matrix-factorisation-ebe1ad2e7679
https://towardsdatascience.com/prototyping-a-recommender-system-step-by-step-part-2-alternating-least-square-als-matrix-4a76c58714a1
https://realpython.com/alexa-python-skill/#getting-started-with-alexa-python-development
https://www.nbshare.io/notebook/751082217/Activation-Functions-In-Python/
https://www.nbshare.io/notebook/53490821/Activation-Functions-In-Artificial-Neural-Networks-Part-2-Binary-Classification/
Comments