- What do Etcd, Consul, and Zookeeper do?
- Service Registration:
- Host, port number, and sometimes authentication credentials, protocols, versions numbers, and/or environment details.
- Service Discovery:
- Ability for client application to query the central registry to learn of service location.
- Consistent and durable general-purpose K/V store across distributed system.
- Some solutions support this better than others.
- Based on Paxos or some derivative (i.e. Raft) algorithm to quickly converge to a consistent state.
- Service Registration:
- Centralized locking can be based on this K/V store.
// This can be imported via ./bin/gremlin.sh -i describe.groovy | |
// A variable 'graph' must be defined with a JanusGraph graph | |
// Run it as a plugin command ':schema' | |
// :schema describe | |
// | |
import org.janusgraph.graphdb.database.management.MgmtLogType | |
import org.codehaus.groovy.tools.shell.Groovysh | |
import org.codehaus.groovy.tools.shell.CommandSupport |
#include <boost/flyweight.hpp> | |
#include <boost/multi_index_container.hpp> | |
#include <boost/multi_index/member.hpp> | |
#include <string> | |
#include <cstdint> | |
#include <vector> | |
#include <iostream> | |
#include <tuple> | |
typedef std::tuple<short, std::uint8_t, std::uint8_t> Date; |
from __future__ import absolute_import, division, print_function | |
import argparse | |
import glob | |
import logging | |
import os | |
import random | |
import numpy as np | |
import torch |
#include <iostream> | |
#include <fstream> | |
#include <string> | |
#include <vector> | |
#include <unordered_map> | |
#include <boost/algorithm/string.hpp> | |
#include <utf8proc.h> | |
//https://unicode.org/reports/tr15/#Norm_Forms | |
//https://ssl.icu-project.org/apiref/icu4c/uchar_8h.html |
Recently, I learned that some of the top reward models on RewardBench were trained on a preference dataset that has unintentional contamination with the benchmark. The dataset, Skyworks Preferences 80k contains contamination by mixing a Magpie dataset in. Magpie is a new method for having language models generate instructions by prompting them with an empty chat template. The source for the Skyworks dataset that was contaminated is Argilla/magpie-ultra-v0.1, generated with Llama 3.1 405B Instruct. I would never expect a Magpie dataset to be contaminated.
What seems likely is that Meta trained on some these prompts, but the exact provenance of each prompt needs more example. For example, we learned that some of the prompts we used in our LLMBar subsets they got from popular training sets like Al
import torch | |
from triton.testing import do_bench | |
from torch.nn.attention.flex_attention import create_block_mask, flex_attention, noop_mask | |
torch.manual_seed(0) | |
import torch | |
torch.set_default_device('cuda') | |
def sliding_window(b, h, q_idx, kv_idx): |