Date: April 3, 2024

How to cite: Barata, R. (2024). Reinforcement: A Comprehensive View. Human-Animal Science.

 

Definition

Reinforcement is an operant procedure in which anything that, when presented (+) or removed (-) concurrently or immediately after a behavior, increases an aspect of a particular behavior’s frequency, magnitude, topography, and/or duration.

 

Introduction

Reinforcement is a widely accepted and used term, especially in animal training, because of the pleasant connotations it creates. Skinner’s behavioral model analysis shows more efficiency in using several reinforcement models in the behavior modification of humans and nonhuman species (Skinner, 1966, 1969, 1987).

‘Positive/negative’ does not describe the consequence’s nature; it indicates a stimulus’s addition/removal. ‘Reinforcement’ indicates a strengthening of an aspect of the behavior.

In operant conditioning, reinforcers (and punishers) must have the right quality and intensity to work. Abrantes (2013) mentions their ‘window of opportunity.’ Reinforcers also depend on some state of deprivation, as we will expand below. For example, food may work as a reinforcer if the individual is hungry and water if thirsty. Therefore, the quality of the reinforcer is essential for it to function as envisaged. The intensity of the reinforcer is also fundamental. Too low an intensity does not affect the behavior, and a high one creates another behavior.

 

Neuroscience Perspective

In the context of neuroscience, the reinforcement mechanism involves a complex interplay of neural circuits, neurotransmitters, and brain regions. One of the key players in this process is the mesolimbic pathway, also known as the reward pathway, which includes structures such as the ventral tegmental area (VTA), nucleus accumbens, and parts of the prefrontal cortex. The neurotransmitter dopamine is particularly important in mediating reinforcement. Here’s a simplified overview of how the mechanism works:

  1. Initiation of a Behavior: A behavior is initiated, which can be anything from a simple action like pressing a lever to complex behaviors like solving a problem.

  2. Reinforcement Evaluation: The outcome of the behavior is evaluated by the brain. If the outcome is perceived as reinforcing (which can include primary reinforcers like food or water, social or even abstract reinforcers like monetary gain or a sense of achievement), it is noted by the brain’s reward system.

  3. Note: It is important to distinguish the concept of “reward” in neuroscience.Rewards are outcomes that the brain perceives as beneficial, leading to a feeling of pleasure or satisfaction. They are often considered to be similar to “reinforcers” but are specifically associated with the intrinsic value or enjoyment they bring and not with the behavioristic perspective. The reward system in the brain is tied to the idea of reinforcement but focuses on the personal experience of pleasure or contentment. Rewards are typically seen as inherently valuable or enjoyable and drive actions through their perceived worth, often involving conscious anticipation and goal-seeking behavior.

  4. Dopamine Release: When a behavior is deemed rewarding, the VTA releases dopamine, particularly targeting the nucleus accumbens and parts of the prefrontal cortex. This release of dopamine signals that the behavior has produced a positive outcome.

  5. Reinforcement of the Behavior: The release of dopamine strengthens the neural connections associated with the behavior, making it more likely that the behavior will be repeated in the future when similar circumstances arise. This is due to the brain’s learning that this behavior is associated with a beneficial outcome.

  6. Adaptation and Habit Formation: With repeated reinforcement, behaviors can become habits. The brain efficiently processes these habitual actions, sometimes even outside of conscious awareness, making them more automatic and less reliant on immediate reinforcers.

 

Behavioristic Perspective

The reinforcement process involves several key steps and principles that are foundational to understanding how behaviors are learned and maintained over time, particularly within the framework of behaviorist psychology. Here’s a breakdown of this process:

  1. Identification of a Target Behavior: The first step is to clearly identify the behavior that needs to be increased or decreased. This behavior must be observable and measurable.

  2. Determination of a Reinforcer: A reinforcer is any stimulus that increases the likelihood of a behavior occurring again in the future. Reinforcers can be positive (adding something) or negative (removing something). The effectiveness of a reinforcer can vary from individual to individual; therefore, it may be necessary to determine what specifically motivates or is of value to the subject.

  3. Baseline Measurement: Before applying reinforcement, it is often helpful to measure the baseline level of the behavior. This involves observing and recording the frequency, duration, or intensity without any intervention. This data provides a comparison point for assessing the effectiveness of the reinforcement.

  4. Application of Reinforcement: Reinforcement must be applied following the occurrence of the target behavior. The timing is crucial; for reinforcement to be most effective, it should be presented immediately or very soon after the desired behavior occurs. This immediacy helps the subject associate the behavior with the reinforcement. There are some curiosities about the immediacy of reinforcement that I’ll write below.

  5. Consistency: The reinforcement should be applied consistently following the occurrence of the desired behavior to strengthen the association between the behavior and the reinforcement.

  6. Scheduling of Reinforcement: Reinforcers can be delivered according to different schedules, such as fixed or variable intervals (based on time) or fixed or variable ratios (based on the number of responses). The reinforcement schedule can affect the learning process’s speed and strength.

  7. Observation and Adjustment: The behavior should be continuously monitored to assess the effectiveness of the reinforcement strategy. If the desired change in behavior is not occurring, it might be necessary to adjust the type or schedule of reinforcement or all the session plans.

  8. Gradual Reduction of Reinforcement: Once the desired behavior is consistently demonstrated, it may be necessary to gradually reduce the frequency of reinforcement to prevent dependency on the reinforcer. This process, known as fading, helps maintain the behavior’s reliability over time with less frequent reinforcement.

 

Reinforcer Versus Rewards

While rewards are generally seen as positive stimuli intended to promote a particular behavior, they do not always act as reinforcers. A reinforcer is defined by its effect on behavior—it must increase the likelihood that the behavior will occur again in the future. In contrast, a reward may not always have this effect; it might be something that is given with the intention of reinforcement, but if it does not actually lead to an increase in the targeted behavior, it does not function as a reinforcer.

Positive reinforcement is sometimes called rewarding or reward learning. Skinner has rejected this term: “The strengthening effect is missed when reinforcers are called rewards, […] people are rewarded, but behavior is reinforced” (1987, p. 19).

Although we find the term “reward learning” in the learning, biology, and neuroscience literature, the events labeled as rewards often fail to increase or strengthen the following behavior. A stimulus may reinforce a particular behavior in one specific situation but not in another (Sy et al. 2010); “[. . .] a reinforcer can by definition do nothing but increase an aspect of a behavior (Abrantes 2013, p. 20).

 

Positive Reinforcement Versus Negative Reinforcement

Negative reinforcement strengthens a behavior by the removal of a stimulus or a decrease in its intensity. Positive and negative reinforcement increase a behavior’s frequency, intensity, or duration. The difference is that while positive reinforcement adds, negative reinforcement removes something.

Identifying consequences that strengthen behaviors is a useful strategy for behavior modification. Still, it is not always easy to identify whether it is a positive or a negative reinforcer that is working in a particular situation. Iwata (1987, 2006) gives the example of a person in a cold room who turns up the heat. Is the reinforcer an increase (positive reinforcer) or a decrease (negative reinforcer) in the temperature?

Types of Reinforcers

Unconditioned Reinforcers:

Conditioned Reinforcers:

Semi-Conditioned Reinforcers (Proposed by Abrantes, 2013):

Application and Impact:

 

The Immediacy Of A Reinforcement

The principle of the immediacy of reinforcement pertains to the proximal effects of reinforcement, encapsulating the “temporal relationships between behavior and consequent outcomes, which are typically within the span of a few seconds” (Michael, 2004, p. 161). Some literature posits that a lapse of up to 30 seconds may not significantly diminish the efficacy of reinforcement (e.g., Bradley & Poling, 2010; Byrne, LeSage, & Poling, 1997; Critchfield & Lattal, 1993; Lattal & Gleason, 1990; Wilkenfeld, Nickel, Blakely, & Poling, 1992). 

The main literature defends that even a minimal delay in response-to-reinforcement of merely 1 second can reduce effectiveness compared to an immediate reinforcer. This reduction in effectiveness is attributed to the potential occurrence of non-target behaviors within the delay interval, whereby the behavior occurring closest temporally to the reinforcer’s presentation is inadvertently strengthened (Sidman, 1960, p. 371).

There exists a prevalent misunderstanding that delayed outcomes can act as reinforcers, even when these outcomes are separated from the initial responses by extensive periods ranging from days to years. Michael (2004, p. 36) elucidates that while human behavior may appear to be influenced by protracted delayed consequences, such modifications are achieved through the individual’s complex socio-verbal history rather than the direct strengthening of behavior via reinforcement.

To illustrate, consider the scenario where a canine undergoes training to learn a new skill and is reinforced with a treat immediately upon successful execution. Contrasting this, if the reinforcer is postponed to a later time, the direct connection between the skill and the treat becomes obscured, potentially reinforcing unintended behaviors that occurred closer to the reinforcer time. 

Applying verbal signals can significantly enhance the dog’s understanding and motivation. For instance, a trainer might use a session plan like “If you sit each time I give a specific sound signal, you’ll get a treat,” thereby establishing a verbal contingency that influences the dog’s behavior. 

 

Factors Influencing Reinforcement

 

Reinforcement as a Non-Circular Concept

A prevalent misunderstanding asserts that the concept of reinforcement within the behavioral sciences is entangled in circular reasoning, thereby diminishing its explanatory value in understanding behavior. This viewpoint mistakenly equates circular reasoning with the explanatory mechanism of reinforcement. Circular reasoning is a logical fallacy where the terminology employed to delineate an observed effect is erroneously presumed to cause that effect, leading to a logical impasse where the effect itself is the sole criterion for inferring the cause.

Contrary to this misunderstanding, the concept of reinforcement is distinctively non-circular, as it involves a clearly separable cause-and-effect relationship within the response-consequence paradigm. This delineation permits the experimental manipulation of consequences to ascertain their effect on the frequency of the subsequent behavior. Epstein (1982) elucidates this distinction, stating, “If we can demonstrate that a response augments in frequency solely because it is succeeded by a specific stimulus, we designate that stimulus as a reinforcer, and the act of presenting it, reinforcement.” This clarification underlines the empirical basis of reinforcement, devoid of circular reasoning, by emphasizing the observation of specific event relationships.

However, misinterpretations arise when the term ‘reinforcer’ is employed in a manner that suggests circularity. Such a misuse occurs when a stimulus is deemed a reinforcer simply because it enhances a behavior, thereby employing the term ‘reinforcer’ to explain its own efficacy in strengthening behavior. Epstein (1982) further distinguishes between applying reinforcement as an empirically substantiated principle within theoretical behavior analysis and its misuse in circular arguments.

Epstein also reflects on Skinner’s speculative or interpretive use of reinforcement in accounting for behavior, such as verbal actions, which may have developed through reinforcement processes. Skinner’s speculation that certain behaviors persist due to past reinforcement is not an instance of circular reasoning but rather a conjectural interpretation supported by extensive empirical data and established behavioral principles under controlled conditions. Thus, when utilized appropriately, reinforcement signifies an empirically validated (or theoretically speculative) functional relationship between an immediate stimulus change (consequence) following a response and a consequent increase in similar future responses. This conceptual clarity reinforces the validity of reinforcement as a foundational construct in behavioral science, transcending unfounded allegations of circular reasoning.

In simple words, imagine you’re trying to teach your dog a new behavior, like sitting on the signal. Every time your dog sits when you say “sit,” you give it a treat. The treat is what we call a “reinforcer” because it encourages your dog to sit again the next time you ask. This isn’t a circle of reasoning; it’s a straightforward cause (saying “sit”) and effect (dog sits) relationship, where the treat helps make the sitting happen more often.

Some people think talking about reinforcers is like saying, “The dog sits because it gets a treat,” and then saying, “It gets a treat because it sits,” which sounds like we’re going in circles without explaining anything. But actually, we’re not just going in circles. We’re saying we can clearly see and test that giving a treat (the cause) makes the sitting happen more often (the effect).

So, when scientists or animal trainers use the word “reinforcement,” they mean (or should) they’ve seen that something (like a treat) actually does make a behavior (like sitting) happen more often. It’s not just a guess or a circular argument; it’s based on real observations and experiments.

 

Schedules of Reinforcement 

Continuous Reinforcement Schedule:

Partial (Intermittent) Reinforcement Schedules:

Types:

Fixed-Ratio Schedule:

Fixed-Interval Schedule:

Variable-Ratio Schedule:

Variable-Interval Schedule:

Effectiveness of Reinforcement Schedules:

 

Reinforcement and Motivation

Mcfarland (2006), defines motivation as “[…] A reversible aspect of the animal’s state that plays a causal role in behaviour. Changes in behaviour in an unchanging environment may be due to irreversible processes such as learning, maturation, or injury, or reversible motivational processes. [. . .] An animal’s motivational state changes continually as a result of both external and internal changes. [. . .] The dominant motivational tendency is the one that controls the ongoing behaviour. The dominant tendency inhibits the tendencies for other aspects of behaviour.”

Hull (1952) believes that humans and nonhumans act because of motivational states called drives caused by a period of deprivation (such as food). His drive-reduction theory attributes the effectiveness of a reinforcer to the reduction of the drive. It works mainly with primary reinforcers because they alter the individual’s physiological state.

Other motivation theories appear to confirm the efficiency of working with motivational processes, though under the constraint of conditions and correct application.

Deprivation requires a nuanced approach to prevent negative impacts on an animal’s health and drive. At the biological level, animals operate under homeostatic mechanisms that manage essential needs such as hunger, thirst, and social engagement. Excessive deprivation can upset these mechanisms, resulting in stress and behavioral complications.

Effective training methods adopt a balanced strategy, employing moderate deprivation in a controlled environment to amplify motivation without undermining the animal’s physical or mental state. For instance, marginally decreasing an animal’s food supply during training to perform a specific task can make food a more compelling incentive, thus improving learning efficiency. 

Grasping the biological underpinnings of motivation—where an animal is impelled to act to satisfy a need—underscores the significance of striking a balance between deprivation and enrichment in training settings. When finely adjusted, this equilibrium fosters successful training outcomes that are in harmony with the animal’s innate behaviors and inclinations, steering clear of the detrimental effects of excessive deprivation.

 

Practical Applications of Reinforcement

Behavior modification depends on several conditions. Both reinforcers and punishers (‘inhibitors’ as suggested by Abrantes, 2–13, p. 16) are always subject to the individual, the behavior, and the situation. Behavioral therapies apply these techniques to human and non-human behavior and behavior management in various environments and situations.

Some learning procedures to change or strengthen behaviors, e.g., shaping, chaining, and differential reinforcement, apply positive reinforcers.

Shaping in Animal Training

Shaping in animal training involves gradually reinforcing behaviors that are successively closer to the target behavior. This method is particularly useful when training animals to perform complex tasks or behaviors that they would not naturally exhibit. In shaping, a trainer starts by reinforcing any behavior that is even remotely close to the desired outcome. Gradually, the criteria for reinforcement become stricter, only reinforcing behaviors that more closely resemble the target. For instance, if training a dog to ring a bell, the trainer might initially reinforce the dog’s behavior by merely looking at it, touching it, and, finally, only ringing it.

Chaining Techniques

Chaining breaks down a complex behavior into smaller, more manageable components, teaching each step within the sequence of the overall behavior. Two primary types of chaining are used in animal training: forward chaining and backward chaining.

Differential Reinforcement 

Differential reinforcement involves reinforcing only the desired behavior while not reinforcing all other behaviors. This technique is adapted to animal training to encourage specific behaviors without the use of punishment. 

Types of Differential Reinforcement

  1. Differential Reinforcement of Alternative Behavior (DRA): Reinforces a specific, desirable behavior as an alternative to the undesirable behavior.

  2. Differential Reinforcement of Incompatible Behavior (DRI): Focuses on reinforcing a behavior that is physically incompatible with the undesired behavior.

  3. Differential Reinforcement of Other Behavior (DRO): Reinforces the absence of the undesired behavior over a specified period.

  4. Differential Reinforcement of Lower Rates of Behavior (DRL): Used to reduce, but not eliminate, the frequency of a behavior by reinforcing lower rates of the behavior.

  5. Differential Reinforcement of Higher Rates of Behavior (DRH): Aims to increase the frequency of a desired behavior by reinforcing higher rates of that behavior.

Note: Skinner advocated for using differential reinforcement as an adjunct to, rather than a replacement for, punishment. 

From my practical experiences with clients, it’s notable that some trainers who claim to rely solely on differential reinforcement may not fully grasp or implement it as intended and inadvertently use other strategies. They presumed they applied Differential Reinforcement of Incompatible (DRI) or Alternative Behavior (DRA) techniques. In reality, their methods more closely resembled forward chaining or using the Premack Principle—a technique that uses preferred activities as reinforcers for less preferred behaviors—rather than genuine differential reinforcement. Moreover, their training strategies sometimes failed to account for crucial contingencies, potentially undermining the effectiveness of the intervention.

 

The “Dark Side” of Reinforcement

Reinforcement is a powerful tool in behavior modification and management and perhaps the main cause of the common “problem behavior” in pets. Therefore, it’s necessary to acknowledge its limitations, disadvantages, and potential dangers, including inadvertent reinforcement. Its application must be thoughtful, strategic, and adaptable to the needs and contexts of individuals.

Limitations of Reinforcement

Disadvantages of Reinforcement

Dangers of Reinforcement

 

References

Abrantes, R. (2013). The 20 principles all animal trainers must know. Wakan Tanka Publishers.

Barata, R. (2020). Positive Reinforcement. In: Vonk, J., Shackelford, T. (eds) Encyclopedia of Animal Cognition and Behavior. Springer, Cham. https://doi.org/10.1007/978-3-319-47829-6_761-3

Bradley, K. P., & Poling, A. (2010). Defining delayed consequences as reinforcers: Some do, some don’t, and nothing changes. The Analysis of Verbal Behavior, 26, 41-49.

Critchfield, T. S., & Lattal, K. A. (1993). Acquisition of a spatially defined operant with delayed reinforcement. Journal of the Experimental Analysis of Behavior, 59, 373-387.

Epstein, R. (1982). Skinner for the classroom. Champaign, IL: Research Press.

Hull, C. L. (1952). A behavior system. New Haven: Yale University Press.

Iwata, B. A. (1987). Negative reinforcement in applied behavior analysis: An emerging technology. Journal of Applied Behavior Analysis, 20(4), 361–378.

Iwata, B. A. (2006). On the distinction between positive and negative reinforcement. Journal of Applied Behavior Analysis, 29(1), 121–123.

Lattal, K. A., & Gleeson, S. (1990). Response acquisition with delayed reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 16, 27-39.

Mcfarland, D. (2006). A dictionary of animal behaviour. Oxford: Oxford University Press.

Michael, J. (2004). Concepts and principles of behavior analysis (rev. ed.). Kalamazoo, MI: Society for the Advancement of Behavior Analysis.

Olson, M. & Hergenhahn, B. R. (2016). Theories of Learning, Ninth edition. Psychology Press.

Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychology. New York: Basic Books.

Skinner, B. F. (1966). The behaviour of organisms: An experimental analyses. New York: Appleton-Century Crofts.

Skinner, B. F. (1969). Contingencies of reinforcement, a theoretical analysis. New York: Meredith Corporation. Skinner, B. F. (1987). Antecedents. Journal of the Experimental Analysis of Behavior, 48, 447–448.

Sy, J. R. C., & Borrero, C. S. W. (2010). Characterizing response-reinforcer relations in the natural environment: Exploratory matching analyses. The Psychological Record, 60, 609–626.

Wilkenfeld, J., Nickel, M., Blakely, E., & Poling, A. (1992). Acquisition of lever press responding in rats with delayed reinforcement: A comparison of three procedures. Journal of the Experimental Analysis of Behavior, 51, 431-443.

Kontakt

☏ (+45) 6091 1712 (Mandag til fredag: 8-10)

Nyhedsbreve

You have been successfully Subscribed! Ops! Something went wrong, please try again.

Copyright © 2009-2024 by etologi.dk