Adaline (adaptive linear element) was proposed by Widrow in 1960s and it has been widely applied to construct neural networks in solving tasks of classification, noise cancellation, system identification and signal prediction. An adaline is composed of a receptive field and a threshold function with bipolar output. In this work, we generalize the bipolar threshold function to multi-state transfer function successfully and prove that adaline and perceptron are special cases of it. The supervised learning process is modeled by a mathematical framework mixed with integer and linear programming and solved by a hybrid of mean field annealing and gradient descent methods according to the criteria of minimizing design cost, maximizing utilization of Gaussian units subject to minimal model size. The numerical simulations show that the learning process is able to generate essential internal representations for the mapping underlying training samples.