Hello :) Today is Day 228!
A quick summary of today:
- read a bit more of the ML Algorithms in depth book
- made promotion material and a linkedin page for my potential society - Not Google Devs
As a fan of the ‘basics’ and testing my knowledge, I decided to continue with the ML Algs in Depth book today.
Machine Learning Algorithms in Depth Chapter 4: Software implementation
Why the chapter name? In the book it gave implementations for some of the algorithms explained below.
1. Data Structures: Storing and Organizing Data
A data structure is a systematic way of organizing and storing data so that it can be accessed and modified efficiently. Different data structures are suited for different types of applications, and they can be categorized into three main types:
-
Linear Data Structures: These store data in a linear or sequential order. Examples include arrays, linked lists, stacks, and queues. In these structures, elements are arranged one after another in memory.
-
Nonlinear Data Structures: These store data in a hierarchical or interconnected way rather than in a straight line. Examples include trees and graphs. Nonlinear structures are used to represent complex relationships among data.
-
Probabilistic Data Structures: These use randomization or probabilistic techniques to store and retrieve data efficiently, often with a trade-off between accuracy and performance. Examples include Bloom filters and skip lists.
2. Algorithmic Paradigms
An algorithmic paradigm is a general approach or strategy to solve a class of problems. The chapter discusses four key paradigms:
-
Complete Search: Also known as brute force, this method explores the entire search space to find a solution. It’s exhaustive and guarantees finding the correct solution if one exists. Pruning techniques can be applied to cut off parts of the search space that don’t need to be explored, improving efficiency.
-
Greedy Algorithm: This approach makes a series of choices, each of which looks best at the moment (locally optimum), with the hope that these choices lead to a globally optimum solution. However, greedy algorithms don’t always produce the best solution for every problem.
-
Divide and Conquer: This technique divides a problem into smaller, independent subproblems, solves each subproblem, and then combines their solutions to solve the original problem. Classic examples include merge sort and quicksort algorithms.
-
Dynamic Programming (DP): This approach is similar to divide and conquer but is used when subproblems overlap. DP solves each subproblem once, stores the solution in a table, and reuses it when needed. This avoids redundant calculations and optimizes performance.
3. Complete Search (Brute Force)
Complete search or brute force involves exploring every possible option in the search space to find the correct solution. For example, if you were trying to crack a password, a brute force approach would involve trying every possible combination until the correct one is found. Pruning refers to eliminating parts of the search space that cannot possibly contain the solution, which improves the efficiency of the search.
4. Greedy Algorithm
A greedy algorithm makes decisions step-by-step, each time choosing the option that looks the best at that moment. The hope is that by making a series of locally optimal choices, you will arrive at a globally optimal solution. However, greedy algorithms do not always guarantee the best solution, and they work best for problems where a locally optimal choice leads to a globally optimal solution.
5. Divide and Conquer
Divide and conquer is a strategy that breaks down a problem into smaller, more manageable subproblems that are easier to solve. Each subproblem is solved independently, and then their solutions are combined to form a solution to the original problem. This method is effective for problems like sorting (e.g., merge sort) or finding the closest pair of points in a set.
6. Dynamic Programming (DP)
Dynamic programming is a method used when a problem can be broken down into overlapping subproblems. DP solves each subproblem once, stores the results in a table (often called a DP table), and reuses these results to construct the solution to the original problem. This approach is especially useful for optimization problems, like finding the shortest path in a graph.
7. Mastery Through Competitive Programming
Mastering algorithms and software implementation often requires extensive practice, which can be effectively achieved through competitive programming. This involves solving algorithmic problems under time constraints, which sharpens problem-solving skills and deepens understanding of various algorithms and data structures. Appendix A provides resources for getting started with competitive programming.
8. Machine Learning Mastery
To achieve mastery in machine learning, it’s crucial to have a strong grasp of the underlying mathematical and algorithmic principles. This includes knowledge of linear algebra, probability, statistics, and optimization. Appendix A suggests texts that can help deepen your understanding and broaden your expertise in these fundamental areas.
9. Staying Current in Machine Learning
The field of machine learning is evolving rapidly, with new techniques and algorithms being developed continuously. To stay up-to-date, it’s important to regularly read the latest research papers presented at major conferences like NeurIPS, ICML, and CVPR. These papers provide insights into the cutting-edge developments in the field.
Machine Learning Algorithms in Depth Chapter 5: Classification
This chapter (and I suspect all the next ones will be similar) is pretty cool for implementing each algorithm from scratch
- The goal of a classification algorithm is to learn a mapping from inputs x to outputs y, where y is a discrete quantity.
Perceptron is a classification algorithm that updates the decision boundary until there are no more classification mistakes.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import randint
from sklearn.datasets import load_iris
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
class perceptron:
def __init__(self, num_epochs, dim):
self.num_epochs = num_epochs
self.theta0 = 0
self.theta = np.zeros(dim)
def fit(self, X_train, y_train):
n = X_train.shape[0]
dim = X_train.shape[1]
k = 1
for epoch in range(self.num_epochs):
for i in range(n):
#sample random point
idx = randint.rvs(0, n-1, size=1)[0]
#hinge loss
if (y_train[idx] * (np.dot(self.theta, X_train[idx,:]) + self.theta0) <= 0):
#update learning rate
eta = pow(k+1, -1)
k += 1
#print("eta: ", eta)
#update theta
self.theta = self.theta + eta * y_train[idx] * X_train[idx, :]
self.theta0 = self.theta0 + eta * y_train[idx]
#end if
print("epoch: ", epoch)
print("theta: ", self.theta)
print("theta0: ", self.theta0)
#end for
#end for
def predict(self, X_test):
n = X_test.shape[0]
dim = X_test.shape[1]
y_pred = np.zeros(n)
for idx in range(n):
y_pred[idx] = np.sign(np.dot(self.theta, X_test[idx,:]) + self.theta0)
#end for
return y_pred
if __name__ == "__main__":
#load dataset
iris = load_iris()
X = iris.data[:100,:]
y = 2*iris.target[:100] - 1 #map to {+1,-1} labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#perceptron (binary) classifier
clf = perceptron(num_epochs=5, dim=X.shape[1])
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
cmt = confusion_matrix(y_test, y_pred)
acc = np.trace(cmt)/np.sum(np.sum(cmt))
print("percepton accuracy: ", acc)
#generate plots
plt.figure()
sns.heatmap(cmt, annot=True, fmt="d")
plt.title("Confusion Matrix"); plt.xlabel("predicted"); plt.ylabel("actual")
#plt.savefig("./figures/perceptron_acc.png")
plt.show()
Output:
SVM is a max-margin classifier. The training data points that lie on the margin boundaries become support vectors.
import cvxopt
import numpy as np
from sklearn.svm import SVC #for comparison only
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
def rbf_kernel(gamma, **kwargs):
def f(x1, x2):
distance = np.linalg.norm(x1 - x2) ** 2
return np.exp(-gamma * distance)
return f
class SupportVectorMachine(object):
def __init__(self, C=1, kernel=rbf_kernel, power=4, gamma=None, coef=4):
self.C = C
self.kernel = kernel
self.power = power
self.gamma = gamma
self.coef = coef
self.lagr_multipliers = None
self.support_vectors = None
self.support_vector_labels = None
self.intercept = None
def fit(self, X, y):
n_samples, n_features = np.shape(X)
# Set gamma to 1/n_features by default
if not self.gamma:
self.gamma = 1 / n_features
# Initialize kernel method with parameters
self.kernel = self.kernel(
power=self.power,
gamma=self.gamma,
coef=self.coef)
# Calculate kernel matrix
kernel_matrix = np.zeros((n_samples, n_samples))
for i in range(n_samples):
for j in range(n_samples):
kernel_matrix[i, j] = self.kernel(X[i], X[j])
# Define the quadratic optimization problem
P = cvxopt.matrix(np.outer(y, y) * kernel_matrix, tc='d')
q = cvxopt.matrix(np.ones(n_samples) * -1)
A = cvxopt.matrix(y, (1, n_samples), tc='d')
b = cvxopt.matrix(0, tc='d')
if not self.C: #if its empty
G = cvxopt.matrix(np.identity(n_samples) * -1)
h = cvxopt.matrix(np.zeros(n_samples))
else:
G_max = np.identity(n_samples) * -1
G_min = np.identity(n_samples)
G = cvxopt.matrix(np.vstack((G_max, G_min)))
h_max = cvxopt.matrix(np.zeros(n_samples))
h_min = cvxopt.matrix(np.ones(n_samples) * self.C)
h = cvxopt.matrix(np.vstack((h_max, h_min)))
# Solve the quadratic optimization problem using cvxopt
minimization = cvxopt.solvers.qp(P, q, G, h, A, b)
# Lagrange multipliers
lagr_mult = np.ravel(minimization['x'])
# Extract support vectors
# Get indexes of non-zero lagr. multipiers
idx = lagr_mult > 1e-11
# Get the corresponding lagr. multipliers
self.lagr_multipliers = lagr_mult[idx]
# Get the samples that will act as support vectors
self.support_vectors = X[idx]
# Get the corresponding labels
self.support_vector_labels = y[idx]
# Calculate intercept with first support vector
self.intercept = self.support_vector_labels[0]
for i in range(len(self.lagr_multipliers)):
self.intercept -= self.lagr_multipliers[i] * self.support_vector_labels[
i] * self.kernel(self.support_vectors[i], self.support_vectors[0])
def predict(self, X):
y_pred = []
# Iterate through list of samples and make predictions
for sample in X:
prediction = 0
# Determine the label of the sample by the support vectors
for i in range(len(self.lagr_multipliers)):
prediction += self.lagr_multipliers[i] * self.support_vector_labels[
i] * self.kernel(self.support_vectors[i], sample)
prediction += self.intercept
y_pred.append(np.sign(prediction))
return np.array(y_pred)
def main():
#load dataset
iris = load_iris()
X = iris.data[:100,:]
y = 2*iris.target[:100] - 1 #map to {+1,-1} labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)
clf = SupportVectorMachine(kernel=rbf_kernel, gamma = 1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print ("Accuracy (scratch):", accuracy)
clf_sklearn = SVC(gamma = 'auto')
clf_sklearn.fit(X_train, y_train)
y_pred2 = clf_sklearn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred2)
print ("Accuracy :", accuracy)
if __name__ == "__main__":
main()
Comparing from scratch and out-of-the-box SVM:
Logistic regression is a classification algorithm that computes class conditional density based on a softmax function.
import numpy as np
import matplotlib.pyplot as plt
def generate_data():
n = 1000
mu1 = np.array([1,1])
mu2 = np.array([-1,-1])
pik = np.array([0.4,0.6])
X = np.zeros((n,2))
y = np.zeros((n,1))
for i in range(1,n):
u = np.random.rand()
idx = np.where(u < np.cumsum(pik))[0]
if (len(idx)==1):
X[i,:] = np.random.randn(1,2) + mu1
y[i] = 1
else:
X[i,:] = np.random.randn(1,2) + mu2
y[i] = -1
return X, y
class sgdlr:
def __init__(self):
self.num_iter = 100
self.lmbda = 1e-9
self.tau0 = 10
self.kappa = 1
self.eta = np.zeros(self.num_iter)
self.batch_size = 200
def fit(self, X, y):
#random init
theta = np.random.randn(X.shape[1],1)
#learning rate schedule
for i in range(self.num_iter):
self.eta[i] = (self.tau0+i)**(-self.kappa)
#divide data in batches
batch_data, batch_labels = self.make_batches(X,y,self.batch_size)
num_batches = batch_data.shape[0]
num_updates = 0
J_hist = np.zeros((self.num_iter * num_batches,1))
t_hist = np.zeros((self.num_iter * num_batches,1))
for itr in range(self.num_iter):
for b in range(num_batches):
Xb = batch_data[b]
yb = batch_labels[b]
J_cost, J_grad = self.lr_objective(theta, Xb, yb, self.lmbda)
theta = theta - self.eta[itr]*(num_batches*J_grad)
J_hist[num_updates] = J_cost
t_hist[num_updates] = np.linalg.norm(theta,2)
num_updates = num_updates + 1
print("iteration %d, cost: %f" %(itr, J_cost))
y_pred = 2*(self.sigmoid(X.dot(theta)) > 0.5) - 1
y_err = np.size(np.where(y_pred - y)[0])/float(y.shape[0])
print("classification error:", y_err)
self.generate_plots(X, J_hist, t_hist, theta)
return theta
def make_batches(self, X, y, batch_size):
n = X.shape[0]
d = X.shape[1]
num_batches = int(np.ceil(n/batch_size))
groups = np.tile(range(num_batches),batch_size)
batch_data=np.zeros((num_batches,batch_size,d))
batch_labels=np.zeros((num_batches,batch_size,1))
for i in range(num_batches):
batch_data[i,:,:] = X[groups==i,:]
batch_labels[i,:] = y[groups==i]
return batch_data, batch_labels
def lr_objective(self, theta, X, y, lmbda):
n = y.shape[0]
y01 = (y+1)/2.0
#compute the objective
mu = self.sigmoid(X.dot(theta))
#bound away from 0 and 1
eps = np.finfo(float).eps
mu = np.maximum(mu,eps)
mu = np.minimum(mu,1-eps)
#compute cost
cost = -(1/n)*np.sum(y01*np.log(mu)+(1-y01)*np.log(1-mu))+np.sum(lmbda*theta*theta)
#compute the gradient of the lr objective
grad = X.T.dot(mu-y01) + 2*lmbda*theta
#compute the Hessian of the lr objective
#H = X.T.dot(np.diag(np.diag( mu*(1-mu) ))).dot(X) + 2*lmbda*np.eye(np.size(theta))
return cost, grad
def sigmoid(self, a):
return 1/(1+np.exp(-a))
def generate_plots(self, X, J_hist, t_hist, theta):
plt.figure()
plt.plot(J_hist)
plt.title("logistic regression")
plt.xlabel('iterations')
plt.ylabel('cost')
#plt.savefig('./figures/lrsgd_loss.png')
plt.show()
plt.figure()
plt.plot(t_hist)
plt.title("logistic regression")
plt.xlabel('iterations')
plt.ylabel('theta l2 norm')
#plt.savefig('./figures/lrsgd_theta_norm.png')
plt.show()
plt.figure()
plt.plot(self.eta)
plt.title("logistic regression")
plt.xlabel('iterations')
plt.ylabel('learning rate')
#plt.savefig('./figures/lrsgd_learning_rate.png')
plt.show()
plt.figure()
x1 = np.linspace(np.min(X[:,0])-1,np.max(X[:,0])+1,10)
plt.scatter(X[:,0], X[:,1])
plt.plot(x1, -(theta[0]/theta[1])*x1)
plt.title('logistic regression')
plt.grid(True)
plt.xlabel('X1')
plt.ylabel('X2')
#plt.savefig('./figures/lrsgd_clf.png')
plt.show()
if __name__ == "__main__":
X, y = generate_data()
sgd = sgdlr()
theta = sgd.fit(X,y)
The naive Bayes algorithm assumes features are conditionally independent, given the class label. It is commonly used in document classification.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from time import time
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
from sklearn.metrics import accuracy_score
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
sns.set_style("whitegrid")
tokenizer = RegexpTokenizer(r'\w+')
stop_words = set(stopwords.words('english'))
stop_words.update(['s','t','m','1','2'])
class naive_bayes:
def __init__(self, K, D):
self.K = K #number of classes
self.D = D #dictionary size
self.pi = np.ones(K) #class priors
self.theta = np.ones((self.D, self.K)) #bernoulli parameters
def fit(self, X_train, y_train):
num_docs = X_train.shape[0]
for doc in range(num_docs):
label = y_train[doc]
self.pi[label] += 1
for word in range(self.D):
if (X_train[doc][word] > 0):
self.theta[word][label] += 1
#end if
#end for
#end for
#normalize pi and theta
self.pi = self.pi/np.sum(self.pi)
self.theta = self.theta/np.sum(self.theta, axis=0)
def predict(self, X_test):
num_docs = X_test.shape[0]
logp = np.zeros((num_docs,self.K))
for doc in range(num_docs):
for kk in range(self.K):
logp[doc][kk] = np.log(self.pi[kk])
for word in range(self.D):
if (X_test[doc][word] > 0):
logp[doc][kk] += np.log(self.theta[word][kk])
else:
logp[doc][kk] += np.log(1-self.theta[word][kk])
#end if
#end for
#end for
#end for
return np.argmax(logp, axis=1)
if __name__ == "__main__":
import nltk
nltk.download('stopwords')
#load data
print("loading 20 newsgroups dataset...")
tic = time()
classes = ['sci.space', 'comp.graphics', 'rec.autos', 'rec.sport.hockey']
dataset = fetch_20newsgroups(shuffle=True, random_state=0, remove=('headers','footers','quotes'), categories=classes)
X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.5, random_state=0)
toc = time()
print("elapsed time: %.4f sec" %(toc - tic))
print("number of training docs: ", len(X_train))
print("number of test docs: ", len(X_test))
print("vectorizing input data...")
cnt_vec = CountVectorizer(tokenizer=tokenizer.tokenize, analyzer='word', ngram_range=(1,1), max_df=0.8, min_df=2, max_features=1000, stop_words=stop_words)
cnt_vec.fit(X_train)
toc = time()
print("elapsed time: %.2f sec" %(toc - tic))
vocab = cnt_vec.vocabulary_
idx2word = {val: key for (key, val) in vocab.items()}
print("vocab size: ", len(vocab))
X_train_vec = cnt_vec.transform(X_train).toarray()
X_test_vec = cnt_vec.transform(X_test).toarray()
print("naive bayes model MLE inference...")
K = len(set(y_train)) #number of classes
D = len(vocab) #dictionary size
nb_clf = naive_bayes(K, D)
nb_clf.fit(X_train_vec, y_train)
print("naive bayes prediction...")
y_pred = nb_clf.predict(X_test_vec)
nb_clf_acc = accuracy_score(y_test, y_pred)
print("test set accuracy: ", nb_clf_acc)
Test acc: ~82%
The CART decision tree is a greedy, recursive algorithm that finds the optimum feature splits by minimizing an objective function, such as the Gini index.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
class TreeNode():
def __init__(self, gini, num_samples, num_samples_class, class_label):
self.gini = gini #gini cost
self.num_samples = num_samples #size of node
self.num_samples_class = num_samples_class #number of node pts with label k
self.class_label = class_label #predicted class label
self.feature_idx = 0 #idx of feature to split on
self.treshold = 0 #best threshold to split on
self.left = None #left subtree pointer
self.right = None #right subtree pointer
class DecisionTreeClassifier():
def __init__(self, max_depth = None):
self.max_depth = max_depth
def best_split(self, X_train, y_train):
m = y_train.size
if (m <= 1):
return None, None
#number of points of class k
mk = [np.sum(y_train == k) for k in range(self.num_classes)]
#gini of current node
best_gini = 1.0 - sum((n / m) ** 2 for n in mk)
best_idx, best_thr = None, None
#iterate over all features
for idx in range(self.num_features):
# sort data along selected feature
thresholds, classes = zip(*sorted(zip(X[:, idx], y)))
num_left = [0]*self.num_classes
num_right = mk.copy()
#iterate overall possible split positions
for i in range(1, m):
k = classes[i-1]
num_left[k] += 1
num_right[k] -= 1
gini_left = 1.0 - sum(
(num_left[x] / i) ** 2 for x in range(self.num_classes)
)
gini_right = 1.0 - sum(
(num_right[x] / (m - i)) ** 2 for x in range(self.num_classes)
)
gini = (i * gini_left + (m - i) * gini_right) / m
# check that we don't try to split two pts with identical values
if thresholds[i] == thresholds[i - 1]:
continue
if (gini < best_gini):
best_gini = gini
best_idx = idx
best_thr = (thresholds[i] + thresholds[i - 1]) / 2 # midpoint
#end if
#end for
#end for
return best_idx, best_thr
def gini(self, y_train):
m = y_train.size
return 1.0 - sum((np.sum(y_train == k) / m) ** 2 for k in range(self.num_classes))
def fit(self, X_train, y_train):
self.num_classes = len(set(y_train))
self.num_features = X_train.shape[1]
self.tree = self.grow_tree(X_train, y_train)
def grow_tree(self, X_train, y_train, depth=0):
num_samples_class = [np.sum(y_train == k) for k in range(self.num_classes)]
class_label = np.argmax(num_samples_class)
node = TreeNode(
gini=self.gini(y_train),
num_samples=y_train.size,
num_samples_class=num_samples_class,
class_label=class_label,
)
# split recursively until maximum depth is reached
if depth < self.max_depth:
idx, thr = self.best_split(X_train, y_train)
if idx is not None:
indices_left = X_train[:, idx] < thr
X_left, y_left = X_train[indices_left], y_train[indices_left]
X_right, y_right = X_train[~indices_left], y_train[~indices_left]
node.feature_index = idx
node.threshold = thr
node.left = self.grow_tree(X_left, y_left, depth + 1)
node.right = self.grow_tree(X_right, y_right, depth + 1)
return node
def predict(self, X_test):
return [self.predict_helper(x_test) for x_test in X_test]
def predict_helper(self, x_test):
node = self.tree
while node.left:
if x_test[node.feature_index] < node.threshold:
node = node.left
else:
node = node.right
return node.class_label
if __name__ == "__main__":
#load data
iris = load_iris()
X = iris.data[:, [2,3]]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("decision tree classifier...")
tree_clf = DecisionTreeClassifier(max_depth = 3)
tree_clf.fit(X_train, y_train)
print("prediction...")
y_pred = tree_clf.predict(X_test)
tree_clf_acc = accuracy_score(y_test, y_pred)
print("test set accuracy: ", tree_clf_acc)
Test acc: ~80%
Machine Learning Algorithms in Depth Chapter 6: Regression
Bayesian Linear Regression
- In Bayesian linear regression defined by y(x) =wTx + ϵ, we assume the noise term is a zero-mean Gaussian random variable
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
class ridge_reg():
def __init__(self, n_iter=20, learning_rate=1e-3, lmbda=0.1):
self.n_iter = n_iter
self.learning_rate = learning_rate
self.lmbda = lmbda
def fit(self, X, y):
#insert const 1 for bias term
X = np.insert(X, 0, 1, axis=1)
self.loss = []
self.w = np.random.rand(X.shape[1])
for i in range(self.n_iter):
y_pred = X.dot(self.w)
mse = np.mean(0.5*(y - y_pred)**2 + 0.5*self.lmbda*self.w.T.dot(self.w))
self.loss.append(mse)
print(" %d iter, mse: %.4f" %(i, mse))
#compute gradient of NLL(w) wrt w
grad_w = - (y - y_pred).dot(X) + self.lmbda*self.w
#update the weights
self.w -= self.learning_rate * grad_w
def predict(self, X):
#insert const 1 for bias term
X = np.insert(X, 0, 1, axis=1)
y_pred = X.dot(self.w)
return y_pred
if __name__ == "__main__":
X, y = fetch_california_housing(return_X_y=True)
X_reg = X[:,2].reshape(-1,1) #avg number of rooms
X_std = (X_reg - X_reg.mean())/X.std() #standard scaling
y_std = (y - y.mean())/y.std() #standard scaling
X_std = X_std[:200,:]
y_std = y_std[:200]
rr = ridge_reg()
rr.fit(X_std, y_std)
y_pred = rr.predict(X_std)
print(rr.w)
plt.figure()
plt.plot(rr.loss)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.tight_layout()
plt.show()
plt.figure()
plt.scatter(X_std, y_std)
plt.plot(np.linspace(-1,1), rr.w[1]*np.linspace(-1,1)+rr.w[0], c='red')
plt.xlim([-0.01,0.01])
plt.xlabel("scaled avg num of rooms")
plt.ylabel("scaled house price")
plt.show()
- The Bayesian linear regression successfully captures the trend of increasing house prices with the average number of rooms
Hierarchical Bayesian regression
- Hierarchical models enable the sharing of features among groups. A hierarchical model assumes there is a common distribution from which individual parameters are sampled and, therefore, captures similarities between groups
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import pymc3 as pm
def main():
#load data
data = pd.read_csv('./data/radon.txt')
county_names = data.county.unique()
county_idx = data['county_code'].values
with pm.Model() as hierarchical_model:
# Hyperpriors
mu_a = pm.Normal('mu_alpha', mu=0., sd=100**2)
sigma_a = pm.Uniform('sigma_alpha', lower=0, upper=100)
mu_b = pm.Normal('mu_beta', mu=0., sd=100**2)
sigma_b = pm.Uniform('sigma_beta', lower=0, upper=100)
# Intercept for each county, distributed around group mean mu_a
a = pm.Normal('alpha', mu=mu_a, sd=sigma_a, shape=len(data.county.unique()))
# Slope for each county, distributed around group mean mu_b
b = pm.Normal('beta', mu=mu_b, sd=sigma_b, shape=len(data.county.unique()))
# Model error
eps = pm.Uniform('eps', lower=0, upper=100)
# Expected value
radon_est = a[county_idx] + b[county_idx] * data.floor.values
# Data likelihood
y_like = pm.Normal('y_like', mu=radon_est, sd=eps, observed=data.log_radon)
with hierarchical_model:
# Use ADVI for initialization
mu, sds, elbo = pm.variational.advi(n=100000)
step = pm.NUTS(scaling=hierarchical_model.dict_to_array(sds)**2, is_cov=True)
hierarchical_trace = pm.sample(5000, step, start=mu)
pm.traceplot(hierarchical_trace[500:])
plt.show()
if __name__ == "__main__":
main()
KNN regression
- K nearest neighbors (KNNs) regression is a nonparametric model, in which for a given query data point q, we find its KNN in the training set and compute the average response variable y
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
np.random.seed(42)
class KNN():
def __init__(self, K):
self.K = K
def euclidean_distance(self, x1, x2):
dist = 0
for i in range(len(x1)):
dist += np.power((x1[i] - x2[i]), 2)
return np.sqrt(dist)
def knn_search(self, X_train, y_train, Q):
y_pred = np.empty(Q.shape[0])
for i, query in enumerate(Q):
#get K nearest neighbors to query point
idx = np.argsort([self.euclidean_distance(query, x) for x in X_train])[:self.K]
#extract the labels of KNN training labels
knn_labels = np.array([y_train[i] for i in idx])
#label query sample as the average of knn_labels
y_pred[i] = np.mean(knn_labels)
return y_pred
if __name__ == "__main__":
plt.close('all')
#iris dataset
iris = datasets.load_iris()
X = iris.data[:,:2]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
K = 4
knn = KNN(K)
y_pred = knn.knn_search(X_train, y_train, X_test)
plt.figure(1)
plt.scatter(X_train[:,0], X_train[:,1], s = 100, marker = 'x', color = 'r', label = 'data')
plt.scatter(X_test[:,0], X_test[:,1], s = 100, marker = 'o', color = 'b', label = 'query')
plt.title('K Nearest Neighbors (K=%d)'% K)
plt.legend()
plt.xlabel('X1')
plt.ylabel('X2')
plt.grid(True)
plt.show()
Finding the exact nearest neighbors in high-dimensional space is often computationally intractable, and therefore, there exist approximate methods. There are two classes of approximate methods: those that partition the space into regions, such as k-d tree implemented in fast library for approximate nearest neighbors (FLANN), and hashing-based methods, such as locality-sensitive hashing (LSH)
Gaussian process regression
- Gaussian processes (GPs) define a prior over functions that can be updated to a posterior once we have observed data. A GP assumes the function is defined at a finite and arbitrary chosen set of points x1, …, xn, such that p(f(x1), …, f(xn)) is jointly Gaussian with a mean of μ(x) and covariance of Σ(x), where Σij = κ(xi, xj) and κ is a positive definite kernel function.
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist
np.random.seed(42)
class GPreg:
def __init__(self, X_train, y_train, X_test):
self.L = 1.0
self.keps = 1e-8
self.muFn = self.mean_func(X_test)
self.Kfn = self.kernel_func(X_test, X_test) + 1e-15*np.eye(np.size(X_test))
self.X_train = X_train
self.y_train = y_train
self.X_test = X_test
def mean_func(self, x):
muFn = np.zeros(len(x)).reshape(-1,1)
return muFn
def kernel_func(self, x, z):
sq_dist = cdist(x/self.L, z/self.L, 'euclidean')**2
Kfn = 1.0 * np.exp(-sq_dist/2)
return Kfn
def compute_posterior(self):
K = self.kernel_func(self.X_train, self.X_train) #K
Ks = self.kernel_func(self.X_train, self.X_test) #K_*
Kss = self.kernel_func(self.X_test, self.X_test) + self.keps*np.eye(np.size(self.X_test)) #K_**
Ki = np.linalg.inv(K) #O(Ntrain^3)
postMu = self.mean_func(self.X_test) + np.dot(np.transpose(Ks), np.dot(Ki, (self.y_train - self.mean_func(self.X_train))))
postCov = Kss - np.dot(np.transpose(Ks), np.dot(Ki, Ks))
self.muFn = postMu
self.Kfn = postCov
return None
def generate_plots(self, X, num_samples=3):
plt.figure()
for i in range(num_samples):
fs = self.gauss_sample(1)
plt.plot(X, fs, '-k')
#plt.plot(self.X_train, self.y_train, 'xk')
mu = self.muFn.ravel()
S2 = np.diag(self.Kfn)
plt.fill(np.concatenate([X, X[::-1]]), np.concatenate([mu - 2*np.sqrt(S2), (mu + 2*np.sqrt(S2))[::-1]]), alpha=0.2, fc='b')
plt.show()
def gauss_sample(self, n):
# returns n samples from a multivariate Gaussian distribution
# S = AZ + mu
A = np.linalg.cholesky(self.Kfn)
Z = np.random.normal(loc=0, scale=1, size=(len(self.muFn),n))
S = np.dot(A,Z) + self.muFn
return S
def main():
# generate noise-less training data
X_train = np.array([-4, -3, -2, -1, 1])
X_train = X_train.reshape(-1,1)
y_train = np.sin(X_train)
# generate test data
X_test = np.linspace(-5, 5, 50)
X_test = X_test.reshape(-1,1)
gp = GPreg(X_train, y_train, X_test)
gp.generate_plots(X_test,3) #samples from GP prior
gp.compute_posterior()
gp.generate_plots(X_test,3) #samples from GP posterior
if __name__ == "__main__":
main()
The above shows three functions drawn at random from a GP prior (first) and GP posterior (second) after observing five data points in the case of noise-free observations. The shaded area corresponds to two times the standard deviation around the mean (95% confidence region). We can see that the model perfectly interpolates the training data and that the predictive uncertainty increases as we move further away from the observed data.
Not Google Devs Society
Yep. I will try to start my own society relateted to anything data.
The name?
I know there are official Google Developer Student Clubs however the deadline for this year has passed and applications will open for the 2025-26 academic year. And by that time I will have graduated.
So the names describes the sociecty members exactly as - Not Google Devs. I even ‘designed’ a logo.
And also created flyers that I will put around the uni. Below is the translated content.
2024-2nd semester looking for:
- Vice President
- Event planner
- General members
What?
- We are the Not Google Devs society
- Learn and Practice Data Engineering, Data, Science, AI, Cloud, other
- Attend bi-weekly sessions
- Work on projects alone and with others
Who is this for:
- someone that loves new challenges
- someone that loves to code
- someone that can speak english or is studying english
Currently planned schedule
- MLOps
- Data engineering
- What is HuggingFace
- RAG apps
- Hackathon
- Collab with Uni of Essex’s CS society
Then some words from myself and contact info.
Why do I want to make a society?
Being part of a society is one of the most memorable experiences of my undergraduate years. I had fun, but also got to learn soft skills that later I could add to my CV. With this society, I not only want to share knowledge of whatever I know from my work and learning experience, but also create an atmosphere of collaboration and learning. I hope that whoever joins the society can learn a lot, create project, work in a team, and later on add these experiences to their CV - just like I did. As for the meetings, I think online/in-person would be fine (depending on the amount of people). The idea is to have presentation sessions where a member or myself present a project, a new tool, something interesting that people want to or might be benefitial.
I am also looking for help - a VP and an Event Planner, so hopefully I get some volunteers ^^
I also made a LinkedIn page for the society so that people can even add it on linkedin. I need to print out the promotion material and hang it around the school in the coming days. I also created a GitHub repo that we can potential use. For now it just has the promotional info.
For completeness, here is the poster I have at the moment:
That is all for today!
See you tomorrow :)