Nathan Cooper ncoop57

Prompt engineering

Enhance results with prompt engineering strategies.

This guide shares strategies and tactics for getting better results from large language models (sometimes referred to as GPT models) like GPT-4o. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.

You can also explore example prompts which showcase what our models are capable of:

Prompt examples

Studying the usage of text-to-text transfer transformer to support code-related tasksA Mastropaolo, S Scalabrino, N Cooper, DN Palacio, D Poshyvanyk, ...2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE …, 2021| 249| 2021
A systematic literature review on the use of deep learning in software engineering researchC Watson, N Cooper, DN Palacio, K Moran, D PoshyvanykACM Transactions on Software Engineering and Methodology (TOSEM) 31 (2), 1-58, 2022| 117| 2022
An empirical study on the usage of bert models for code completionM Ciniselli, N Cooper, L Pascarella, D Poshyvanyk, M Di Penta, G Bavota2021 IEEE/ACM 18th International Conference on Mining Software Repositories …, 2021| 84| 2021
An empirical study on the usage of transformer models for code completionM Ciniselli, N Cooper, L Pascarella, A Mastropaolo, E Aghajani, ...IEEE Transactions on Software Engineering 48 (12), 4818-4837, 2021| 83| 2021
Translating video recordings of mobile app usages into replayable scenariosC Bernal

	base_model: NousResearch/Meta-Llama-3-8B
	model_type: LlamaForCausalLM
	tokenizer_type: AutoTokenizer

	load_in_8bit: true
	load_in_4bit: false
	strict: false

	datasets:
	- path: answerdotai/tiny_programs_haiku3_critiques

	import boto3
	s3 = boto3.resource("s3")
	my_bucket = s3.Bucket("s-eai-neox")
	file_paths = []
	for my_bucket_object in my_bucket.objects.filter(Prefix="data/codepile/group1/"):
	# print(my_bucket_object.key)
	file_paths.append(f"s3a://s-eai-neox/{my_bucket_object.key}")
	print(len(file_paths))
	from spark_session_builder import build_spark_session
	file_paths = file_paths[100:200]

	import time
	import os

	from pyspark.ml import Pipeline
	from pyspark.ml.feature import RegexTokenizer, NGram, HashingTF, MinHashLSH
	from pyspark.sql.functions import col
	from spark_session_builder import build_spark_session

	spark = build_spark_session("spark://cpu64-dy-c6i-16xlarge-1:7077", 32, 128)
	db = spark.read.parquet("/fsx/shared/pilev2_parquet/StackExchange_ver4_non_local_dedupped/dataset.parquet").limit(1_000_000) # Stage 0 & 1