ljnmedium

	Direct query Information appearing in text (entity extraction, summarization, find relevant paragraphs, etc … ).	Indirect query Inferenced information (mathematical calculation, comparison, conclusion, etc …).
Simple text Text containing descriptions excluding table.	Complexity: + Accuracy: +++	Complexity: ++ Accuracy: +++
Complex textText

Provider	Model	Number of parameters
Meta with Microsoft	LLama 2	7B, 13B, 32B, 65.2B
Meta	LLama	7B, 13B, 70B
Technology Innovation Institute of UAE	Flacon LLM	7B, 40B
Stanford’s CRFM	Alpaca	7B
Google	Plan-T5	80M, 250M, 780M, 3B, 11B
MPT	MosaicML	7B, 30B

Provider	Model	Cost for input	Cost for output	Cost per request.
OpenAI	text-davinci-004	$0.03/ 1K tokens	$0.06/ 1K tokens	0
OpenAI	text-davinci-003	$0.02/ 1K tokens	$0.02/ 1K tokens	0
OpenAI	text-davinci-002	$0.002/ 1K tokens	$0.002/ 1K tokens	0
OpenAI	gpt-3.5-turbo	$0.002/ 1K tokens	$0.002/ 1K tokens	0
[Cohere](https://cohere.com/pri

	API access solution - 3rd party model.	On-premise solution - open source model.
R&D developpement	The low initial cost, both in terms of time and money, allows us to quickly reach a Minimum Viable Product (MVP). The procedure for model parameter optimization and MLops is overseen by a third-party e

Task	Model version	Comments
Voice Activity Detection	Multilingual Marblenet	Other versions exist trained on telephonic conversation or only on english data
Speaker Embeddings	Titanet Large	Smaller version of the model exists.
Multiscale Clustering	Diarization MSDD Telephonic	Specifically trained on telephonic conversations which makes it suitable for similar use cases.

	values = embedd_model.encode([b['content'] for b in batch])
	sparse_values = sparsed_model.encode([b['content'] for b in batch])

	# Create unique IDs
	ids = [str(b['metadata']['id']) for b in batch]

	# Add all to upsert list
	to_upsert = [{'id': i, 'values': v, 'metadata':m , 'sparse_values': sv} for (i,v,m,sv) in zip(ids,values, metas, sparse_values)]

	# Upsert/insert these records to pinecone

	pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")
	index = pinecone.Index("projet_esg")