To install: Edit MYPATH in install.sh to be desired install location, then run: $ source install.sh Notes: ============== # Idea: Word embeddings are a simple way to compare semantic similarity # but they only operate on the word level. Sentence embeddings are # less interpretable, but more powerful. # Starting place was this thread about sentence embeddings: # https://www.reddit.com/r/MachineLearning/comments/11okrni/discussion_compare_openai_and_sentencetransformer/ # OpenAI Ada is better, but that thread talks about the HF option # that is a simple and effective model. You can also see it has good # performance here: # https://huggingface.co/spaces/mteb/leaderboard # Model # https://huggingface.co/sentence-transformers/all-mpnet-base-v2 # Based on the above page: ml python3/3.10.5 ml pytorch/1.13.1 pip install --no-cache-dir --prefix=/projectnb/jbrcs/tweet2 sentence-transformers # Models by default save to home directory export SENTENCE_TRANSFORMERS_HOME=/projectnb/jbrcs/tweet2 # Saving model to disk # https://stackoverflow.com/a/69717491 from sentence_transformers import SentenceTransformer model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2') model.save('/projectnb/jbrcs/tweet2/models') exit()