viernes, 7 de abril de 2017

Genetic Algorithm with Java and Python


Genetic Algorithms (GAs) attempt to computationally mimic the processes by which natural selection operates, and apply them to solve business and research problems. Developed by John Holland in the 1960s abd 1970s, GAs provide a framework for studying the effects of such biologically inspired factors as mate selection, reproduction, mutation and crossover of genetic information. In the natural world, the constraints and stresses of a particular environment force the different species (and different individuals within species) to compete to produce the fittest offspring. In the world of GAs, the fittest potential solutions evolve to produce even more optimal solutions.

Not surprisingly, the field of GAs has borrowed heavily from genomic terminology. Each cell in our body contains the same set of chromosomes, string of DNA that function as a blueprint for making one of us. Then, each chromosome can be partitioned into genes, which are block of DNA designed to encode a particular trait such as en eye color. Mutation, the altering of a single gene in a chromosome of the offspring. The offspring fitness is then evaluated, either in terms of viability (living long enough to reproduce).

Now, in the field of GAs, a chromosome refers to one of the candidate solutions to the problem, a gene is a single bit or digit of the candidate solution. Most GAs function by iteratively updating a collection of potential solutions, called a population. Each member of the population is evaluated for fitness on each cycle. A new population then replaces the old population with the fittest members.

The fittest function f(x) is a real-valued function operating on the chromosome (potential solution), not the gene, so that the x in f(x) refers to the numeric value taken by the chromosome at the time of fitness evaluation.

Goal oriented problem solving

Genetic algorithms are one of the tools we can use to apply machine learning to finding good, sometimes even optimal, solutions to problems that have billions of potential solutions. They use biological processes in software to find answers to problems that have really large search spaces by continuously generating candidate solutions, evaluating how well the solutions fit the desire outcome, and refining the best solutions.

When solving a problem with a genetic algorithm, instead of asking for a specific solution, you provide characteristics that the solution must have or rules its solution must pass to be accepted. The more constraints you add the more potential solutions are blocked. Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection.

 Imagine you are given 10 chances to guess a number between 1 and 1000 and the only feedback you get is whether your guess is right or wrong. Could you reliably guess the number? With only right or wrong as feedback, you have no way to improve your guesses so you have at best a 1 in 100 chance of guessing the number. A fundamental aspect of solving problems using genetic algorithms is that they must provide feedback that helps the engine select the better of two guesses. That feedback is called the fitness, for how closely guess fits the desired result. More importantly it implies a general progression.

You might be able to do better than random guessing if you have problem-specific knowledge that helps you eliminate certain number combinations. Using problem-specific knowledge to guide the genetic algorithm's creation and modification of potential solutions can help them find a solution orders of magnitude faster.

Genetic algorithms and genetic programming are very good at finding solutions to very large problems. They do it by taking millions of samples from the search space, making small changes, possibly recombining parts of the best solutions, comparing the resultant fitness against that of the current best solution, and keeping the better of the two. This process repeats until a stop condition like one of the following occurs: the known solution is found, a solution meeting all requirements is found, a certain number of generations has passed, a specific amount of time has passed, etc.

Guess the password

Imagine you are asked to guess a password; what kind of feedback would you want? These decisions are some of the ones you have to make when planning to implement a genetic algorithm to find a solution to your problem. Genetic algorithms are good at finding good solutions to problems with large search spaces because they can quickly find  the parts of the guesses that improve fitness values or lead to better solutions.

This is an intuitive sample that you understand and can fall back upon when learning to use other machine learning tools and techniques, or applying genetic algorithms in your own field of expertise.

Genetic Algorithms use random exploration of the problem space combined with evolutionary processes like mutation and crossover (exchange of genetic information) to improve guesses. But also, because they have no experience in the problem domain, they try things a human would never think to try. Thus, a person using a genetic algorithm may learn more about the problem space and potential solutions. This gives them the ability to make improvements to the algorithm, in a virtous cycle.

The genetic algorithm should make informed guesses.

Genes
To begin with, genetic algorithm needs a gene set to use for building guesses. For this sample will be a generic set of letters. It also needs a target password to guess:

Java:
/**
      Gene Set, we set a generic set of letters and target/desired result  
*/
private static final String geneSet = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.,";
private static final String target = "This is a test for genetics aldorithms";

Python:
geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.,"



Generate a guess

Next the algorithm needs a way to generate a random string from the gene set.

Java:
private static final Random random = new Random();

/**
 * We need a way to generate random string (a guest) from the geneset.
 * @param length the number of random chars to be generated from a geneSet.
 */
private static String generate_parent(int length) {       
    StringBuilder genes = new StringBuilder();

    for (int i = 0; i < length; i++) {
        genes.append(geneSet.charAt(random.nextInt(geneSet.length())));
    }

    return genes.toString();
}


Python:
import random
     
def _generate_parent(length, geneSet, get_fitness):
    genes = []
    while len(genes) < length:
        sampleSize = min(length - len(genes), len(geneSet))
        genes.extend(random.sample(geneSet, sampleSize))
    return ''.join(genes)


Fitness
The fitness value the genetic algorithm provides is the only feedback the engine gets to guide it toward a solution. In this simple the fitness value is the total number of letters in the guess that can match the letter in the same position of the password.

Java:
/**
 * We need to define a fitness
 * The fitness is the only feedback the engine get t o guide it towards a solution.
 * Fitness is the total number in the guess that match the letter in the same position of the password.
 */
private static int get_fitness(String guess){
    char[] guess_array = guess.toCharArray();
    char[] target_array = target.toCharArray();
   
    int contador = 0;
   
    for(int i = 0; i < target_array.length; i++) {
        if (guess_array[i] == target_array[i]) {
            contador++;
        }
    }
   
    return contador;       
   
}

Python:
def get_fitness(guess, target):
    return sum(1 for expected, actual in zip(target, guess)

           if expected == actual)



Mutate
Next , the engine needs a way to produce a new guess by mutating the current one. The following implementation converts the parent string to an array, then replaces 1 letter in the array with a randomly selected one from geneSet, and fnally recombines the result into a new string.

Java:
/**
 * We need a new way to produce a new guess mutating the current one. Covnerts the parent string to an array
 * and replaces 1 letter in the array with a randomly selected from the geneSet.
 *
 * @return
 */
private static String mutate(String parent){       
    char[] guess_array = parent.toCharArray();
    char[] target_array = target.toCharArray();
           
    for(int i = 0; i < target_array.length; i++) {
        if (guess_array[i] != target_array[i]) {
            char gene = geneSet.charAt(random.nextInt(geneSet.length()));
            //is the new gene the same it was at that position?
            //replace the selected gene if it is the same as the one it is supposed to replace , which can prevent a significant number of wasted guesses.
            while (gene != guess_array[i]){
                guess_array[i] = gene;
            }                
        }
    }
   
    return String.valueOf(guess_array);
   
}

Python:
def _mutate(parent, geneSet, get_fitness):
    index = random.randrange(0, len(parent.Genes))
    childGenes = list(parent.Genes)
    newGene, alternate = random.sample(geneSet, 2)
    childGenes[index] = alternate \
        if newGene == childGenes[index] \
        else newGene
    return ''.join(childGenes)



Display

Next, it is imporant to monitor what is happening so that the engine can be stopped if it gets stuck. Having a visual representation of the gene sequence, which may not be the literal gene sequence, is often critical to identifying what works and what does not so that the algorithm can be improved.
Normally the display function also outputs the fitness value and how much time has elapsed.

Java:
/**
* It is important to monitor what is happening so that the engine can be stopped if      * gets stucked. Having a visual representatiion on the gene sequence , which may not be * the literal sequence, is often critical to identifying what works and what does not so * that the algorithms can be improved.
* Display function also show output the fitness value and how much time has elapsed.
*/
private static void display(String guess, Date startTime) {
    Date time = new Date();
    System.out.println(String.format("Guess: " + guess+ ", Fitness: " + get_fitness(guess) + ", Time: " + (time.getTime() - startTime.getTime()) + " milisegundos" ));       
}


Python:
   def display(candidate, startTime):
       timeDiff = datetime.datetime.now() - startTime
       print("{0}\t{1}\t{2}".format(

          candidate.Genes, candidate.Fitness, str(timeDiff)))


Main
The main program begins by initializing bestParent to a random sequence of letters and calling the display function.

The final piece is the heart of the genetic engine. It is a loop that:
  1. ·         generate a guess
  2. ·         requests the fitness for that guess, then
  3. ·         compares the fitness so that of the previous best guess, and
  4. ·         keeps the guess with the better fitness.

Java:
public static void main(String[] args) {   
     
    GuessTheNumber guessTheNumber = new GuessTheNumber();
   
    //Generamos
    random.setSeed(12345);
    Date startTime = new Date();
    String best_parent = guessTheNumber.generate_parent(target.length());
    int best_fitness = guessTheNumber.get_fitness(best_parent);
    guessTheNumber.display(best_parent, startTime);
   
    while (true) {
        String child = guessTheNumber.mutate(best_parent);
        int child_fitness = guessTheNumber.get_fitness(child);
        if (best_fitness >= child_fitness) {
            display(child, startTime);
        }
       
        if (child_fitness >= best_parent.length()){
            display(child, startTime);
            break;
        }
       
        best_fitness = child_fitness;
        best_parent = child;
    }
   

}


This cycle repeats until a stop condition occurs, in this case when all the letters in the guess match thise in the target.

Run the code and you'll see the output simlir to the following. Success!

Guess: spytwdwrmCCvnDBxIoBMGoVJlCiUMEaKmWEA w, Fitness: 0, Time: 0 milisegundos
.
.
.
.
Guess: IhX  is a GlKt fnV txnewiesvaNdorithms, Fitness: 24, Time: 5 milisegundos
Guess: Vhi  is a cAZt fu  gTnesiXsrasdorithms, Fitness: 26, Time: 5milisegundos
Guess: PhiI is a QKGt fhj gVneyi,sfafdorithms, Fitness: 26, Time: 5milisegundos
Guess: ghi. is a fjGt fD. gOneYissVaNdorithms, Fitness: 26, Time: 5 milisegundos
Guess: LhiV is a ZWEt fgK gUnexinsHaOdorithms, Fitness: 26, Time: 5 milisegundos
.
.
.
.
Guess: Thiv is a .yst fo! gtneIicspaldorithms, Fitness: 31, Time: 10 milisegundos
Guess: ThiR is a Dyst foa geneeicsDaldorithms, Fitness: 32, Time: 10 milisegundos
Guess: Thiu is a yJst fow geneeicsTaldorithms, Fitness: 32, Time: 10 milisegundos
Guess: ThiA is a oCst foY genexicsJaldorithms, Fitness: 32, Time: 10 milisegundos
Guess: Thil is a xEst fod geneWicsYaldorithms, Fitness: 32, Time: 10 milisegundos
.
.
.
.
Guess: This is a Zest for genetics aldorithms, Fitness: 37, Time: 15 milisegundos
Guess: This is a Vest for genetics aldorithms, Fitness: 37, Time: 15 milisegundos
Guess: This is a test for genetics aldorithms, Fitness: 38, Time: 15 milisegundos

Guess: This is a test for genetics algorithms, Fitness: 38, Time: 15 milisegundos


Chromosome Object
Next, we must introduce a Chromosome object that has Genes and Fitness attributes. This will make the genetic engine more flexible by making it possible to pass those values around as a unit.

public class Chromosome {
   
    private int fitness;
    private String genes;

    public Chromosome(int fitness, String genes) {
        this.fitness = fitness;
        this.genes = genes;
    }

    public int getFitness() {
        return fitness;
    }

    public void setFitness(int fitness) {
        this.fitness = fitness;
    }

    public String getGenes() {
        return genes;
    }

    public void setGenes(String genes) {
        this.genes = genes;
    }
   
}



We have a working engine, but is currently a tightly coupled class, so the next task is up to you,    

  1. -          Extract the genetic engine code from that specific to guessing the password so it can be reused for other projects.
  2. -          Create a package called genetic and include the Chromosome class there.
  3. -          Try to use unit tests.
  4. -          Try the genetic engine with a longer password.
  5. -          Add support for benchmarking to genetic because it is useful to know how long engine takes to find a solution on average and the standard deviation.


Write me to email miguelurbin@gmail.com in order to send you the project in Python and Java to make your own tests or improvements.

3 comentarios:

  1. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area. data science training in Hyderabad

    ResponderEliminar
  2. I am truly getting a charge out of perusing your elegantly composed articles. It would seem that you burn through a ton of energy and time on your blog. I have bookmarked it and I am anticipating perusing new articles. Keep doing awesome.
    data science coaching in hyderabad

    ResponderEliminar
  3. Thanks for sharing this valuable information,we provide youtube shorts downloader which helps to get more
    ideas to write more quality content.

    ResponderEliminar