Genetic Algorithms (GAs) attempt to computationally mimic
the processes by which natural selection operates, and apply them to solve
business and research problems. Developed by John Holland in the 1960s abd
1970s, GAs provide a framework for studying the effects of such biologically
inspired factors as mate selection, reproduction, mutation and crossover of
genetic information. In the natural world, the constraints and stresses of a particular
environment force the different species (and different individuals within
species) to compete to produce the fittest offspring. In the world of GAs, the
fittest potential solutions evolve to produce even more optimal solutions.
Not surprisingly, the field of GAs has borrowed heavily from
genomic terminology. Each cell in our body contains the same set of
chromosomes, string of DNA that function as a blueprint for making one of us.
Then, each chromosome can be partitioned into genes, which are block of DNA
designed to encode a particular trait such as en eye color. Mutation, the
altering of a single gene in a chromosome of the offspring. The offspring
fitness is then evaluated, either in terms of viability (living long enough to
reproduce).
Now, in the field of GAs, a chromosome refers to one of the
candidate solutions to the problem, a gene is a single bit or digit of the
candidate solution. Most GAs function by iteratively updating a collection of
potential solutions, called a population. Each member of the population is
evaluated for fitness on each cycle. A new population then replaces the old
population with the fittest members.
The fittest function f(x) is a real-valued function
operating on the chromosome (potential solution), not the gene, so that the x
in f(x) refers to the numeric value taken by the chromosome at the time of
fitness evaluation.
Goal oriented problem solving
Genetic algorithms are one of the tools we can use to apply
machine learning to finding good, sometimes even optimal, solutions to problems
that have billions of potential solutions. They use biological processes in
software to find answers to problems that have really large search spaces by
continuously generating candidate solutions, evaluating how well the solutions
fit the desire outcome, and refining the best solutions.
When solving a problem with a genetic algorithm, instead of
asking for a specific solution, you provide characteristics that the solution
must have or rules its solution must pass to be accepted. The more constraints
you add the more potential solutions are blocked. Genetic algorithms are commonly used to generate
high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection.
Imagine you are given 10 chances to guess a number between 1
and 1000 and the only feedback you get is whether your guess is right or wrong.
Could you reliably guess the number? With only right or wrong as feedback, you
have no way to improve your guesses so you have at best a 1 in 100 chance of
guessing the number. A fundamental aspect of solving problems using genetic
algorithms is that they must provide feedback that helps the engine select the
better of two guesses. That feedback is called the fitness, for how closely
guess fits the desired result. More importantly it implies a general
progression.
You might be able to do better than random guessing if you
have problem-specific knowledge that helps you eliminate certain number
combinations. Using problem-specific knowledge to guide the genetic algorithm's
creation and modification of potential solutions can help them find a solution
orders of magnitude faster.
Genetic algorithms and genetic programming are very good at
finding solutions to very large problems. They do it by taking millions of
samples from the search space, making small changes, possibly recombining parts
of the best solutions, comparing the resultant fitness against that of the
current best solution, and keeping the better of the two. This process repeats
until a stop condition like one of the following occurs: the known solution is
found, a solution meeting all requirements is found, a certain number of
generations has passed, a specific amount of time has passed, etc.
Guess the password
Imagine you are asked to guess a password; what kind of
feedback would you want? These decisions are some of the ones you have to make
when planning to implement a genetic algorithm to find a solution to your
problem. Genetic algorithms are good at finding good solutions to problems with
large search spaces because they can quickly find the parts of the guesses that improve fitness
values or lead to better solutions.
This is an intuitive sample that you understand and can fall
back upon when learning to use other machine learning tools and techniques, or
applying genetic algorithms in your own field of expertise.
Genetic Algorithms use random exploration of the problem
space combined with evolutionary processes like mutation and crossover (exchange
of genetic information) to improve guesses. But also, because they have no
experience in the problem domain, they try things a human would never think to
try. Thus, a person using a genetic algorithm may learn more about the problem
space and potential solutions. This gives them the ability to make improvements
to the algorithm, in a virtous cycle.
The genetic algorithm should make informed guesses.
Genes
To begin with, genetic algorithm needs a gene set to use for
building guesses. For this sample will be a generic set of letters. It also
needs a target password to guess:
Java:
/**
Gene Set, we set a generic set of letters
and target/desired result
*/
private static final String geneSet = "
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.,";
private static final String target = "This is a test for genetics aldorithms";
Python:
geneset = "
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.,"
Generate a guess
Next the algorithm needs a way to generate a random string
from the gene set.
Java:
private static final Random random = new Random();
/**
* We need a way to generate random string (a
guest) from the geneset.
* @param length the number of random chars to be generated from a
geneSet.
*/
private static String generate_parent(int length) {
StringBuilder genes = new StringBuilder();
for (int i = 0;
i < length; i++) {
genes.append(geneSet.charAt(random.nextInt(geneSet.length())));
}
return genes.toString();
}
Python:
import random
def _generate_parent(length,
geneSet, get_fitness):
genes = []
while len(genes) < length:
sampleSize = min(length - len(genes), len(geneSet))
genes.extend(random.sample(geneSet, sampleSize))
return ''.join(genes)
Fitness
The fitness value the genetic algorithm provides is the only
feedback the engine gets to guide it toward a solution. In this simple the
fitness value is the total number of letters in the guess that can match the
letter in the same position of the password.
Java:
/**
* We need to define a fitness
* The fitness is the only feedback the engine
get t o guide it towards a solution.
* Fitness is the total number in the guess
that match the letter in the same position of the password.
*/
private static int
get_fitness(String guess){
char[] guess_array =
guess.toCharArray();
char[] target_array = target.toCharArray();
int contador = 0;
for(int i = 0; i <
target_array.length; i++) {
if (guess_array[i] ==
target_array[i]) {
contador++;
}
}
return contador;
}
Python:
def get_fitness(guess,
target):
return sum(1 for expected,
actual in zip(target, guess)
if
expected == actual)
Mutate
Next , the engine needs a way to produce a new guess by
mutating the current one. The following implementation converts the parent
string to an array, then replaces 1 letter in the array with a randomly
selected one from geneSet, and fnally recombines the result into a new string.
Java:
/**
* We need a new way to produce a new guess
mutating the current one. Covnerts the parent string to an array
* and replaces 1 letter in the array with a
randomly selected from the geneSet.
*
* @return
*/
private static String
mutate(String parent){
char[] guess_array =
parent.toCharArray();
char[] target_array = target.toCharArray();
for(int i = 0; i <
target_array.length; i++) {
if (guess_array[i] !=
target_array[i]) {
char gene = geneSet.charAt(random.nextInt(geneSet.length()));
//is the new gene the same it was at
that position?
//replace the selected gene if it is
the same as the one it is supposed to replace , which can prevent a significant
number of wasted guesses.
while (gene !=
guess_array[i]){
guess_array[i] = gene;
}
}
}
return String.valueOf(guess_array);
}
Python:
def _mutate(parent, geneSet, get_fitness):
index =
random.randrange(0, len(parent.Genes))
childGenes
= list(parent.Genes)
newGene,
alternate = random.sample(geneSet, 2)
childGenes[index] = alternate \
if
newGene == childGenes[index] \
else
newGene
return ''.join(childGenes)
Display
Next, it is imporant to monitor what is happening so that
the engine can be stopped if it gets stuck. Having a visual representation of
the gene sequence, which may not be the literal gene sequence, is often
critical to identifying what works and what does not so that the algorithm can
be improved.
Normally the display function also outputs the fitness value
and how much time has elapsed.
Java:
/**
* It is important to monitor
what is happening so that the engine can be stopped if * gets stucked. Having a visual representatiion
on the gene sequence , which may not be * the literal sequence, is often
critical to identifying what works and what does not so * that the algorithms
can be improved.
* Display function also show
output the fitness value and how much time has elapsed.
*/
private static void display(String
guess, Date startTime) {
Date time = new Date();
System.out.println(String.format("Guess:
" + guess+ ", Fitness: " + get_fitness(guess) + ", Time:
" + (time.getTime() - startTime.getTime()) + "
milisegundos" ));
}
Python:
def
display(candidate, startTime):
timeDiff = datetime.datetime.now() - startTime
print("{0}\t{1}\t{2}".format(
candidate.Genes, candidate.Fitness,
str(timeDiff)))
Main
The main program begins by initializing bestParent to a
random sequence of letters and calling the display function.
The final piece is the heart of the genetic engine. It is a
loop that:
- ·
generate a guess
- ·
requests the fitness for that guess, then
- ·
compares the fitness so that of the previous
best guess, and
- ·
keeps the guess with the better fitness.
Java:
public static void main(String[]
args) {
GuessTheNumber guessTheNumber = new GuessTheNumber();
//Generamos
random.setSeed(12345);
Date startTime = new Date();
String best_parent =
guessTheNumber.generate_parent(target.length());
int best_fitness =
guessTheNumber.get_fitness(best_parent);
guessTheNumber.display(best_parent,
startTime);
while (true) {
String child = guessTheNumber.mutate(best_parent);
int child_fitness =
guessTheNumber.get_fitness(child);
if (best_fitness >=
child_fitness) {
display(child, startTime);
}
if (child_fitness >=
best_parent.length()){
display(child, startTime);
break;
}
best_fitness = child_fitness;
best_parent = child;
}
}
This cycle repeats until a stop condition occurs, in this
case when all the letters in the guess match thise in the target.
Run the code and you'll see the output simlir to the
following. Success!
Guess:
spytwdwrmCCvnDBxIoBMGoVJlCiUMEaKmWEA w, Fitness: 0, Time: 0 milisegundos
.
.
.
.
Guess: IhX is a GlKt fnV txnewiesvaNdorithms, Fitness:
24, Time: 5 milisegundos
Guess: Vhi is a cAZt fu
gTnesiXsrasdorithms, Fitness: 26, Time: 5milisegundos
Guess: PhiI is a QKGt fhj
gVneyi,sfafdorithms, Fitness: 26, Time: 5milisegundos
Guess: ghi. is a fjGt fD.
gOneYissVaNdorithms, Fitness: 26, Time: 5 milisegundos
Guess: LhiV is a ZWEt fgK
gUnexinsHaOdorithms, Fitness: 26, Time: 5 milisegundos
.
.
.
.
Guess: Thiv is a .yst fo!
gtneIicspaldorithms, Fitness: 31, Time: 10 milisegundos
Guess: ThiR is a Dyst foa
geneeicsDaldorithms, Fitness: 32, Time: 10 milisegundos
Guess: Thiu is a yJst fow
geneeicsTaldorithms, Fitness: 32, Time: 10 milisegundos
Guess: ThiA is a oCst foY
genexicsJaldorithms, Fitness: 32, Time: 10 milisegundos
Guess: Thil is a xEst fod
geneWicsYaldorithms, Fitness: 32, Time: 10 milisegundos
.
.
.
.
Guess: This is a Zest for genetics
aldorithms, Fitness: 37, Time: 15 milisegundos
Guess: This is a Vest for genetics aldorithms,
Fitness: 37, Time: 15 milisegundos
Guess: This is a test for genetics aldorithms,
Fitness: 38, Time: 15 milisegundos
Guess: This is a test for genetics algorithms,
Fitness: 38, Time: 15 milisegundos
Chromosome Object
Next, we must introduce a
Chromosome object that has Genes and Fitness attributes. This will make the
genetic engine more flexible by making it possible to pass those values around
as a unit.
public class Chromosome {
private int fitness;
private String genes;
public Chromosome(int fitness, String
genes) {
this.fitness = fitness;
this.genes = genes;
}
public int getFitness() {
return fitness;
}
public void setFitness(int fitness) {
this.fitness = fitness;
}
public String
getGenes() {
return genes;
}
public void setGenes(String
genes) {
this.genes = genes;
}
}
We have a working engine, but is
currently a tightly coupled class, so the next task is up to you,
- -
Extract
the genetic engine code from that specific to guessing the password so it can
be reused for other projects.
- -
Create a
package called genetic and include the Chromosome class there.
- -
Try to
use unit tests.
- -
Try the
genetic engine with a longer password.
- -
Add
support for benchmarking to genetic because it is useful to know how long
engine takes to find a solution on average and the standard deviation.
Write me to email
miguelurbin@gmail.com in order to send you the project in Python and Java to
make your own tests or improvements.