Skip to content

Instantly share code, notes, and snippets.

@rodloboz
Created April 16, 2025 09:09
Show Gist options
  • Save rodloboz/6e44b4a9df7dfec2d3ccbe64d11e5c2b to your computer and use it in GitHub Desktop.
Save rodloboz/6e44b4a9df7dfec2d3ccbe64d11e5c2b to your computer and use it in GitHub Desktop.
Mistral Livebook

Mistral

Mix.install([
  {:mistral, "~> 0.3.0"},
  {:anthropix, "~> 0.6"},
  {:kino, "~> 0.15.0"},
])

Section

api_key = <YOUR API KEY>
api_key = <YOUR API KEY>
client = Mistral.init(api_key)
{:ok, res} = Mistral.chat(client,
 model: "mistral-small-latest",
 messages: [
   %{
     role: "user",
     content: "Write a haiku that starts with 'Waves crash against stone…'"
   }
 ]
)
system_message = """
  You are an AI specialized in parsing resume/CV data and returning structured fields for creating profile records.
  The data you return will be used to populate database entities for a user's profile, work experiences, and education.

  You must extract these profile fields:
  1) headline (professional title/headline)
  2) bio (short professional summary)
  3) location (city, state/country)
  4) skills (array of skills)
  5) github_handle (GitHub username, extract from URLs like https://github.com/username)
  6) linkedin_handle (LinkedIn username, extract from URLs like https://linkedin.com/in/username)
  7) website_url (Personal website URL, ensure it includes http:// or https://)
  8) interests (Professional or personal interests and hobbies, with formatting preserved)

  For each work experience, extract:
  1) company (company name)
  2) role (job title)
  3) employment_type (full_time, part_time, contractor, employer_of_record, internship)
  4) workplace_type (remote, on_site, hybrid)
  5) start_date (YYYY-MM format)
  6) end_date (YYYY-MM format, or "present")
  7) description (bullet points of responsibilities/achievements)
  8) location (city, state/country where job was performed)

  For each education entry, extract:
  1) school (institution name)
  2) degree (degree type - BS, MS, PhD, etc)
  3) field_of_study (major or field)
  4) start_date (YYYY-MM format)
  5) end_date (YYYY-MM format)
  6) description (any additional information about the education)

  Return valid JSON with the following structure:
  {
    "profile": {
      "headline": string,
      "bio": string,
      "location": string,
      "skills": array of strings,
      "github_handle": string,
      "linkedin_handle": string,
      "website_url": string,
      "interests": string
    },
    "experiences": [
      {
        "company": string,
        "role": string,
        "employment_type": string,
        "workplace_type": string,
        "start_date": string,
        "end_date": string,
        "description": string,
        "location": string
      }
    ],
    "education": [
      {
        "school": string,
        "degree": string,
        "field_of_study": string,
        "start_date": string,
        "end_date": string,
        "description": string
      }
    ]
  }

  For extracting handles and URLs:
  - Extract the github_handle from GitHub URLs (e.g., https://github.com/username → "username")
  - Extract the linkedin_handle from LinkedIn URLs (e.g., https://linkedin.com/in/username → "username")
  - For website_url, include the full URL with protocol (http:// or https://)
  - If handles or URLs aren't explicitly found, look for patterns like "github.com/username" in text

  For workplace_type:
  - Identify as "remote", "on_site", or "hybrid" based on context clues
  - Look for terms like "Remote", "Work from home", "On-site", "In-office", "Hybrid"

  Normalize dates to YYYY-MM format. Convert "present" or "current" to null for end_date.
  If any required field is missing, leave it null or empty array.
  Keep the text formatting (bullet points, paragraphs) in descriptions and interests.
  """
input = Kino.Input.file("Upload your CV")
%{file_ref: file_ref, client_name: _} = Kino.Input.read(input) 
path = file_ref |> Kino.Input.file_path()
{:ok, response } =
  Mistral.upload_file(client, path, purpose: "ocr")
file_id = Map.get(response, "id")
{:ok, %{"url" => file_url}} = Mistral.get_file_url(client, file_id)
{:ok, ocr} = Mistral.ocr(
  client, 
  model: "mistral-ocr-latest", 
  document: %{type: "document_url", document_url: file_url}
)
text =
  ocr
  |> Map.get("pages", [])
  |> Enum.map(&Map.get(&1, "markdown"))
  |> Enum.join()
message = """
This is the image's OCR in markdown:

"<BEGIN_IMAGE_OCR>
#{text}
<END_IMAGE_OCR>

Convert this into a structured JSON response with the OCR contents in the following dictionnary:

{
  "profile": {
    "headline": "",
    "bio": "",
    "location": "",
    "skills": [],
    "github_handle": "",
    "linkedin_handle": "",
    "website_url": "",
    "interests": ""
  },
  "experiences": [
    {
      "company": "",
      "role": "",
      "employment_type": "",
      "workplace_type": "",
      "start_date": "",
      "end_date": "",
      "description": "",
      "location": ""
    }
  ],
  "education": [
    {
      "school": "",
      "degree": "",
      "field_of_study": "",
      "start_date": "",
      "end_date": "",
      "description": ""
    }
  ]
}
"""

Parse OCR text with Mistral

model = "pixtral-12b-latest"
messages=[
  %{role: "system", content: system_message},
  %{role: "user", content: message}
]
{:ok, mist} =
  Mistral.chat(client, model: model, messages: messages)
mist
|> Map.get("choices")
|> List.first()
|> Map.get("message")
|> Map.get("content")
|> String.trim()
|> String.replace(~r/^```json\s*/m, "")
|> String.replace(~r/\s*```$/m, "")
|> JSON.decode!()

Parse OCR text with Anthropic

api_key = System.fetch_env!("LB_ANTHROPIC_API_KEY")
client = Anthropix.init(api_key)
msgs=[
  %{role: "user", content: message}
]
{:ok, anth} = 
  Anthropix.chat(client, [
  model: "claude-3-5-haiku-20241022",
  system: system_message,
  messages: msgs,
])
anth
|> Map.get("content")
|> List.first()
|> Map.get("text")
|> JSON.decode!()

Mistral OCR with images

img_url = "https://unsplash.com/photos/95t94hZTESw/download?ixid=M3wxMjA3fDB8MXxhbGx8fHx8fHx8fHwxNzQwMDYwMjg4fA&force=true&w=640"
input = Kino.Input.file("Upload image")
%{file_ref: file_ref, client_name: _} = Kino.Input.read(input) 
path = file_ref |> Kino.Input.file_path()
api_key = "peppjbqgy1XPqHhFX31NNe908X3DFcIC"
client = Mistral.init(api_key)
{:ok, response } =
  Mistral.upload_file(client, path, purpose: "ocr")
file_id = Map.get(response, "id")
{:ok, %{"url" => file_url}} = Mistral.get_file_url(client, file_id)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment