Categories
Artificial Intelligence

A Brief Introduction to Machine Learning Tree

A Brief Introduction to Machine Learning Tree

Artificial intelligence(AI) is a branch of computer science in which we aim to build intelligent machines. Machines are trained and given power so that they can take decisions on their own. AI has many branches but, here we will narrow down the line take only Machine Learning. Machine Learning is a branch of Artificial intelligence in which we are not implicitly programming the machines(computers) programmatically, machines learn from data. When we reached to Machine Learning, we need to define the terms closely related to ML. Machine Learning is divided into three parts, supervised learning, unsupervised learning and reinforcement learning. Before going into more depth, we will be introduced some of the concepts here.

The image shows five objects A, B, C, D and E. Each object has some features like Height, Weight and Age. The horizontal rows represent the instances e.g. the one in green color. Vertical columns show features of objects like height is one of the features.

Machine Learning methods learn from examples. machine learning, model is trained by a labeled dataset. A dataset can be defined as a collection of instances with similar features. Data can be audio, video or text and it can be structured or unstructured. Dataset is divided into two main parts, the first one is training data and the second is testing data. Mostly 80% of the dataset is used as a training dataset and the remaining 20% is used as testing or validation dataset. The training dataset is the one, which we feed to our machine learning method (Model) to train it. A testing dataset is the one that is used to test the accuracy of the model not to train the model.

Working

ML is purely based on data so; the first thing is data gathering. If luckily you have the dataset then it makes your work very easy. If not, you need to gather dataset for yourself. Datasets can be video, audio or text. Before training the model, you need to decide that where is your data warehouse, from where data will come to the model. Data collection is the first and most important step. While collecting data you should think of different conditions that can face in reality. You need to cater as many conditions as you can so that the accuracy of your model can be improved as much as you can.

After collecting the required data, you need to first clean up the data. This means you need to remove the dirty data, which can decrease your accuracy or which is irrelevant. When completely cleaned, then you should label the data. Labeling means assigning a category to an object.

Ground Truth is one of the ML services provided by AWS, which is used to build a highly accurate training dataset.

Build, Train, and Test

After labeling the cleaned dataset, it’s the turn to Build the machine learning algorithm.  While building your algorithm you need to take care of Overfitting and Underfitting. Overfitting means the model performs best for a particular data set and performs not good for a general dataset. On the other hand, underfitting means that the model is unable to 

capture the structure of data this means the model does not fit the data. Your model should be the best fit, which means it should perform well on the particular data set as well as the general data.

The second thing is to train the model you developed by the dataset you gathered. Training is the name of feeding the algorithm with the dataset. The algorithm captures the structure of the input dataset and trains itself. After the algorithm is trained by some specific data, it’s now called the model.

As discussed earlier that from the overall dataset, 80% is used as training and the remaining 20% is used for testing. The performance of the model is tested using this set of data. On the basis of the results from this test, the model is further improved to make the model more accurate and efficient.

Amazon provides one of the best tools named SageMaker. This enables the developers to create, train and deploy the models easily.

Supervised Learning

Supervised Learning is the one in which the model (Machine learning algorithm) is trained by labeled data. Labeled data means a unique output for specific input. After trained by this labeled data, the model is in a position to predict the output for an input. Regression problems and Classification problems are discussed under this umbrella.

Unsupervised Leaning

On the other hand, it’s unsupervised learning. In this type of learning, the model is learned by unlabeled data. Cluster analysis, Hidden Markov chain and Association Analysis are the methods discussed under this branch of ML.

Reinforcement Learning

The third branch of Machine Learning is Reinforcement learning. In reinforcement learning the model learns from its mistakes. The model performs some action based on some state. Based on this action, the environment gives rewards to the model. A reward can be positive and negative, the negative reward is also known as a penalty. If the reward is positive, models learn this and if a reward is negative, the model does not learn it.    

Categories
Artificial Intelligence

Reinforcement Learning

Reinforcement Learning (RL)

Machine learning is divided into three types, supervised, unsupervised and reinforcement learning. Some writers have also mentioned semi-supervised learning as the fourth type but, we do agree on three. In supervised learning, label data is given so, on the basis of the label data (Input, Output) model gives you outputs for the unknown inputs you are giving to it. Some examples are classification and regression.

The second type is unsupervised learning and un-label data is given to training the model. The third type is the one in which the model learning is closest to human learning. Before going into the detail of reinforcement learning, we will define some of the terms related to this.

  • Policy: policy is defined as the behavior of an agent in a specific state.
  • Agent: The component which decides what action to be taken. Usually this is the system.
  • Rewards: On the basis of the action, the environment sends a single number (good or bad) to the agent called reward. The objective is to maximize the reward in reinforcement learning.
  • Value Function: The total number of rewards an agent can gather starting from that state.
  • Model of the Environment: This mimics the behavior of the environment. When says environment, it’s the user in RL.

The working mechanism of reinforcement Machine Learning is that on the basis of some specific state let’s say, the agent performs some action called policy and it’s rewarded by a number (negative or positive). On the basis of this reward from the environment, the agent trains itself and the state will be updated to st+1. The goal of agent is to maximize reward in the long run.

There are some concerns about this type of learning. Let’s take the example of a football game. Let’s considered that the goal is scored and the agent(model) is regarded as positive. It’s not the single step that’s important, there are a number of steps that finally brought you the score. Some of the steps are fruitful and there will be some steps that may be not profitable but if the reward is positive now the model will learn all these steps and if the outcome is negative all the steps are dropped (not learned). This type of problem is called the Credit assignment problem. This is the problem of determining the action that leads to a certain outcome. The learning speed of the model may vary with the parameter and too much reinforcement can lead to an overload of states which can diminish the results.

In order to maximize the long-term outcome, the agent should know the steps which are profitable means which leads to a positive outcome and those which lead to a negative outcome.

To solve this kind of problem, backward propagation can be one of the techniques in artificial neural networks (ANNs). The problem in this way is that the performance (time-based) decreases with an increase in the number of artificial neurons to attain a certain convergence rate.

One of the applications of Reinforcement Learning is the Recommender system, which is used by Google for YouTube videos, google maps searches, google searches, etc. Earlier recommendation systems were developed with the help of supervised learning. With this they have some limitations like Bias (rewards for only the seen ones) and Myopia Recommendation (show the catchy and Familiar videos). So, to minimize these kinds of limitations, recommendation systems are developed using Reinforcement learning which trains itself on the basis of the feedback from the environment with time.

Categories
Artificial Intelligence

Intelligent User Interface

Intelligent User Interface Designing for Better Communication

The age we are living in is called the era of smartphone. So, the number of social media users is increasing each second. People all over the globe communicate with each other for various reasons with the help of Instant Messages. Some of the states show that mobile users are increasing at a very rapid rate. 1 million new active mobile social users are added every day. 2.1 billion people in this world who have social media accounts and 1.7 billion use social networks from a mobile device. A business of 100 employees spends on average 17 hours a week to clarify bad communication.
There is a problem that can result in huge losses. The problem is called a miscommunication. The receiver is unable to understand what the sender wants to say to him/her.

Female hand holding mobile phone showing the chatbot message While being decided place the order over the Stock market chart,Closeup Stock market exchange data on LED display, AI technology concept

When talking of Work Environment, 57% of the projects fail due to communication breakdown and $37 billion are lost yearly due to employee misunderstandings. The main cause of this miscommunication in text messaging is the absence of visual clues.

Nonverbal behavior, i.e. gestures, facial displays, body postures and movement, plays an important role in face-to-face communication. It is an understood fact that lack of aural and visual clues leads to misinterpretations of words. And the rest damage is done by assumptions where one is so sure that the other person will perceive a certain message exactly like they intend them to. Not only the senders but the receivers also have a full hand in miscommunication where the meaning of a message is shaped up by their emotions/ mood at that point in time, their relationship with and the image of the sender in their mind and the good-old stereotypes.

So, to solve this issue of miscommunication we came up with a solution known as an intelligent user interface. We are solving this problem with the help of an android application whose working is based on ML. This android application has three components, user interface, cloud storage and a machine learning model.

The cloud storage is used to store the messages on the cloud database along with their emotions. The user interface is used to render the color-coded messages that represent a specific emotion. The machine learning model is used to perform inference locally to predict the user’s emotion from the camera.

We are incorporating color theory with emotion. A specific color is assigned to a specific emotion. The mobile(android) application ‘acquires the frames from the camera, then performs the face detection and cropping to remove the background clutter. The image is then sent to the model (machine learning model), which runs locally on the user’s device, to perform the inference. The inference is run over 5 consecutive frames and then averaged to avoid miss-classifications that may occur due to transitioning frames and other factors. The final emotion is sent along with the message to the cloud database. From there it will be sent to the receiver end. The receiver has a good knowledge of the colors assigned to the specific emotions. The receiver will receive the verbal information from the words and facial emotion information from the background color of the message.

Incorporate color theory to show conversational context and assign a color to every emotion. With this information, now the receiver is in a better position to interpret the message he receives from the sender. There are many other uses of this idea. For example, you can use this idea for automatic feedback systems, security systems and many others. 

Categories
Artificial Intelligence

Improving Deep Learning Performance

Improving Deep Learning Performance

Deep Learning is one of the branches of machine learning in which we deal with Artificial Neural Networks. A model (algorithm) is trained for a problem that predicts the solution. The question is how to improve the performance of prediction? We will discuss some of the steps which are very important to improve the performance of Deep Learning models.

Improve Data

It is said that the larger the dataset you have, the more is the information. Data is very important. The more dataset you have means better will be the performance of your model. The larger dataset covers more features and this makes the model more general than the one trained from a small dataset.

When you are preparing your dataset, the first step is collecting data. This data is not ready to use at this level. We need to remove the dirty data and make the data clean so, that we can use this to train the model. Cleaning means removing the data which is irrelevant and which can decrease the performance. Likewise, labeling is also important. This is the process of tagging objects with specific labels. Another thing is data shuffling while training. Without shuffling if you are training the model with the data then there will issue of bias and the leaning will be low and stops quickly. Feature selection is also very important. Those features are very important, on which the output depends. Sometimes when there are overlapping features then this also decreases the performance of the model. Last but not least in this section is that you should have a balanced dataset which means the number of objects (samples) in each class should be equal and there should be similar data in all the classes. Otherwise, the model will be biased toward the class having a larger number of samples. Most of the time while working on the data set from collection to labelling you better understand the problem you are trying to solve. So, at this level you can better reframe the problem.

Improve Model

When we compare machine learning with human learning, one of the properties which human has is that he is a continuous learner. On the other side, at some point the machine learning model stops learning. On further training, its performance decreases not increases. So, it’s recommended to stop training further before the performance starts decreasing.

Batch size is also important. If the batch size is very small then the model will learn very slowly and loss oscillates. The gradient descent will not be smooth. Opposite to this if the batch size is so large then it will take a larger time to complete one iteration (training). Typical batch sizes are 8, 16,32 or 64. So, while selecting a batch size you need to take care that it should be not either so large or so small. You need to minimize the tradeoff by adjusting the batch size.

You may be familiar with the terms Overfitting and Underfitting in machine learning. The term underfitting means this model will give you poor performance for the training data as well as the generalized data. On the other side, Overfit is the one which gives you the best performance on trained data and poor performance on generalized data. The best model is the one that should perform best on general data. So, try to minimize underfit and overfit and try to achieve the best fit through regularization.

No one can claim the exact number of layers which will have the best performance in an artificial neural network. The same is the case for the number of neurons. The only thing is to test by adding the number of layers until the performance starts decreasing. Do the same for the number of neurons. You can also go for a combination of both too. While training at some point the learning stops and after that, the performance can decrease further learning. So, it is recommended to stop the training before the performance starts decreasing. 

It’s always difficult for a programmer to increase the performance of the Deep learning model. The good news is that now Accentedge is solving this problem for you. Accented has Deep Learning Experts as well as state-of-the-art technology, who will help you to maximize your model’s performance. So, feel free to contact Accentedge.

Categories
Accentedge

Problems in Distributed Systems

Problems in Distributed Computing

When talking of a distributed system, it is a system having more than one component on different machines that communicate with each other by sending messages. Distributed systems are of great importance. These are the systems that help in moving data from one machine to another to fulfill a required function.

On a single machine if there an internal fault occurs; it crashes despite giving an error. While in a distributed system this is not the case there, there are some systems which are 

down due to some faults and some systems work. We call this partial failure. In distributed systems, the network through which the message travel is very important. Delay, message drops and other problems in the network are the problems that we need to solve to have smooth functioning systems. These are the problems that make it difficult to work with distributed systems. Though there are several problems, here we will discuss two main problems.

Unreliable Network

Distributed systems are coordinating and communicating through the network which is unreliable. Suppose a machine sends a request.

  • Request lost or delayed
  • Request queued
  • Remote node is not present, disconnected
  • Response to this request is delayed
  • Response lost

To solve these problems the concept of Timeout was introduced in distributed systems. With the introduction of the timeout, the problem of the unreliable network is not solved completely, another problem occurs which is to decide what should be the length of timeout. If the timeout is so long then durn this time system is unreliable. If time is too short and at that time the node is busy with some other action, this will be replaced by another node and this action can happen twice. When a node is detected as dead the all the responsibilities of that node are transferred to other nodes. Most of the delays in the network are caused by the Queues when the traffic is high because these all are using the same resources like CPU, pipe, etc. when data reaches to the destination where it is to process if all the CPUs are busy operating system queues this data till the machine become free and entertains this data.

Unreliable clock

In distributed systems, there is more than one machine and they are communicating through a network. So, for every two machines communicating with each other need a network. For multiple machines there will be multiple network delays and message delays as already discussed. Message delay is not uniform for every network, it is variable for every network and making time handling tricky.

Every machine has a hardware clock. There is a minor difference in calibration due to which some are slightly slower and some are slightly faster than others. These clocks are synchronized by GPS with the help of the Network Time Protocol (NTP).

When we talk about computer systems, they have two clocks. The first one is the Time of day clock, which gives the current wall-clock time. This type of clock is also synchronized using NTP.

The second type is the Monotonic clock. These types of clock have time which always grows. Each CPU has its clock and those systems which have more than one CPU have multiple clocks. The idea of resetting this type of clock is to slow it if its fast and vice versa. Several causes create synchronization issues. Especially in virtual machines, the clock working is also virtual. If you have paused the virtual machine at that time the clock is also paused. When you restart the virtual machine, this creates a jump.

Now our work is mostly distributed and we have distributed systems like a cloud. It’s the time of AI so, how can we minimize these kinds of problems in distributed systems?

Categories
Data Science

Data Science in Agriculture

Data Science in Agriculture

For agricultural countries, agriculture provides employment to a large number of people. It suffers from many disasters such as floods, climate changes, migration of farmers toward cities and many other reasons. Improving the agriculture sector means improving the standards of the lives of many people. Like many other fields, Data Science is working to improve this sector. Data is the need of industries, and it has already revolutionized industries with the help of data science by introducing many applications. Data science helps in Weather Detection, Fertilizers Recommendation, Disease Detection, Digital Soil, and Crop Mapping and  many other processes in the agriculture sector. We will discuss above four. For the better growth and development of plants, good weather plays a very vital role. Bad weather causes soil erosion and damages to plans.

Data Science analyzes a huge amount of data and predicts the weather on this basis. A data scientist identifies the patterns and relationships with the help of specific tools. Elements of weather forecasts are rain, snow, dew, temperature, humidity, clouds coverage in the sky, speed, and direction of a wind, fog, etc.

Building the digital soil and crop mapping of soil is a technique introduced in the agriculture sector with the help of data science. This work is performed by taking the images of farms by a satellite and combine them with other relevant data like data from different weather stations. With the help of this method, it’s easier to inspect areas more quickly. After that specific area is selected for a specific crop.

Fertilizers Recommendation is another area where Data Science applications are very successful. The very first thing is knowing the exact fertilizer rate, which itself is a science and requires analysis of many factors. Misuse of fertilizer is harmful to the crops, animals and humans who are using these crops directly or indirectly. Most of the farmers still today rely on conventional methods and guesswork, which are not accurate in determining the fertilizer rate.

Climate change has a large effect on growth as well as the development of plants. To cop the situations caused by climate changes, data scientists are working hard so, that they can find a way to compensate for the change. One project which is working is giving IoT sensors to rice producers in Taiwan. These are used to collect the information which is important for their crops. Due to the climatic changes, the traditional calendar which was used by the farmers earlier is no longer sufficient now.
With the progress of technology, data science will also move forward to bring more applications to help the agriculture sector.

 

 

 

Categories
Artificial Intelligence

Challenges in Machine Learning

Challenges in Machine Learning

This is the era of Artificial intelligence and Machine Learning. Without the use of explicit instruction, a machine (computer) performs a specific task with the help of a model. Studying these kinds of models and algorithms is called Machine Learning (ML).
Machine Learning algorithms have demonstrated well at extracting patterns from images, detecting fraud and many others like these. Though Machine Learning has solved many problems, still there is a large gap when we compare Machine Learning with human learning. The availability of sufficient training data is one of the biggest challenges facing ML.

You need enough example matching the case to the train a model. Another thing is that to train a model, you need a huge amount of data. As compare to ML human learns from a few examples but, this is not the case in ML where you need a large data set to train the model. The human learns from the adaption mechanism and learns about relationships between a variety of information.
The third thing is the selection of an appropriate set of features from the data you are using as an input. The performance of your algorithm depends on the input data, better and appropriate the input data better will be the performance. Features with overlapping distribution corresponding to different classes make it more difficult and the performance of the Machine Learning Algorithm (MLA) drops. There are different approaches through which those features are selected that are related to the output.
Human can detect the context but, it’s not the case in Machine learning. Machine Learning Models perform best when your case is best matching with the training data set. Performance is best when you use testing data from the training data set.
The concept of continuous learning is also lagging in ML. Training occurs in batch; you train the model and test its performance. A point will come after which the performance will not increase further on training the model. Learning stops at the point. This is not the case in a human who is a continuous learner.
All in all, we can say that there is a large gap between human learning and ML. We need to narrow down this gape which can solve many of these challenges.

Categories
Accentedge

Blog Main

Artificial Intelligence (AI)

Artificial Intelligence is the future of cybersecurity. Artificial Intelligence can model network behavior and improve threat detection.

Data Science

Data is an expensive asset. Use scientific methods and algorithms to extract knowledge and insights from the data.

Blockchain

In near future, Blockchain technology looks to forever change the way businesses do transactions and utilize supply chain.

Internet of Things (IoT)

IoT allows us to use affordable wireless technology and transmit the data into the cloud at a component level.

Challenges in Machine Learning
Improving Deep Learning Performance
Intelligent User Interface Designing for better Communication
Data Science in Agriculture
Problems in Distributed Systems
A Brief Introduction to Machine Learning Tree